SerializerManager¶
SerializerManager is used to select the Serializer for shuffle blocks.
Creating Instance¶
SerializerManager takes the following to be created:
- Default Serializer
- SparkConf
- (optional) Encryption Key (
Option[Array[Byte]])
SerializerManager is created when:
SparkEnvutility is used to create a SparkEnv (for the driver and executors)
Kryo-Compatible Types¶
Kryo-Compatible Types are the following primitive types, Arrays of the primitive types and Strings:
BooleanByteCharDoubleFloatIntLongNullShort
Default Serializer¶
SerializerManager is given a Serializer when created (based on spark.serializer configuration property).
The Serializer is used when SerializerManager is requested for a Serializer.
Tip
Enable DEBUG logging level of SparkEnv to be told about the selected Serializer.
Using serializer: [serializer]
Accessing SerializerManager¶
SerializerManager is available using SparkEnv on the driver and executors.
import org.apache.spark.SparkEnv
SparkEnv.get.serializerManager
KryoSerializer¶
SerializerManager creates a KryoSerializer when created.
KryoSerializer is used as the serializer when the types of a given key and value are Kryo-compatible.
Selecting Serializer¶
getSerializer(
ct: ClassTag[_],
autoPick: Boolean): Serializer
getSerializer(
keyClassTag: ClassTag[_],
valueClassTag: ClassTag[_]): Serializer
getSerializer returns the KryoSerializer when the given ClassTags are Kryo-compatible and the autoPick flag is true. Otherwise, getSerializer returns the default Serializer.
autoPick flag is true for all BlockIds but Spark Streaming's StreamBlockIds.
getSerializer (with autoPick flag) is used when:
SerializerManageris requested to dataSerializeStream, dataSerializeWithExplicitClassTag and dataDeserializeStreamSerializedValuesHolder(of MemoryStore) is requested for aSerializationStream
getSerializer (with key and value ClassTags only) is used when:
ShuffledRDDis requested for dependencies
dataSerializeStream¶
dataSerializeStream[T: ClassTag](
blockId: BlockId,
outputStream: OutputStream,
values: Iterator[T]): Unit
dataSerializeStream...FIXME
dataSerializeStream is used when:
BlockManageris requested to doPutIterator and dropFromMemory
dataSerializeWithExplicitClassTag¶
dataSerializeWithExplicitClassTag(
blockId: BlockId,
values: Iterator[_],
classTag: ClassTag[_]): ChunkedByteBuffer
dataSerializeWithExplicitClassTag...FIXME
dataSerializeWithExplicitClassTag is used when:
BlockManageris requested to doGetLocalBytesSerializerManageris requested to dataSerialize
dataDeserializeStream¶
dataDeserializeStream[T](
blockId: BlockId,
inputStream: InputStream)
(classTag: ClassTag[T]): Iterator[T]
dataDeserializeStream...FIXME
dataDeserializeStream is used when:
BlockStoreUpdateris requested to saveDeserializedValuesToMemoryStoreBlockManageris requested to getLocalValues and getRemoteValuesMemoryStoreis requested to putIteratorAsBytes (whenPartiallySerializedBlockis requested for aPartiallyUnrolledIterator)