ShuffleManager¶
ShuffleManager
is an abstraction of shuffle managers that manage shuffle data.
ShuffleManager
is specified using spark.shuffle.manager configuration property.
ShuffleManager
is used to create a BlockManager.
Contract¶
Getting ShuffleReader for ShuffleHandle¶
getReader[K, C](
handle: ShuffleHandle,
startPartition: Int,
endPartition: Int,
context: TaskContext,
metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C]
ShuffleReader to read shuffle data (for the given ShuffleHandle)
Used when the following RDD
s are requested to compute a partition:
CoGroupedRDD
is requested to compute a partitionShuffledRDD
is requested to compute a partitionSubtractedRDD
is requested to compute a partitionShuffledRowRDD
(Spark SQL) is requested tocompute
a partition
getReaderForRange¶
getReaderForRange[K, C](
handle: ShuffleHandle,
startMapIndex: Int,
endMapIndex: Int,
startPartition: Int,
endPartition: Int,
context: TaskContext,
metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C]
ShuffleReader for a range of reduce partitions to read from map output in the ShuffleHandle
Used when ShuffledRowRDD
(Spark SQL) is requested to compute a partition
Getting ShuffleWriter for ShuffleHandle¶
getWriter[K, V](
handle: ShuffleHandle,
mapId: Long,
context: TaskContext,
metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V]
ShuffleWriter to write shuffle data in the ShuffleHandle
Used when ShuffleWriteProcessor
is requested to write a partition
Registering Shuffle of ShuffleDependency (and Getting ShuffleHandle)¶
registerShuffle[K, V, C](
shuffleId: Int,
dependency: ShuffleDependency[K, V, C]): ShuffleHandle
Registers a shuffle (by the given shuffleId
and ShuffleDependency) and gives a ShuffleHandle
Used when ShuffleDependency
is created (and registers with the shuffle system)
ShuffleBlockResolver¶
shuffleBlockResolver: ShuffleBlockResolver
ShuffleBlockResolver of the shuffle system
Used when:
SortShuffleManager
is requested for a ShuffleWriter for a ShuffleHandle, to unregister a shuffle and stopBlockManager
is requested to getLocalBlockData and getHostLocalShuffleData
Stopping ShuffleManager¶
stop(): Unit
Stops the shuffle system
Used when SparkEnv
is requested to stop
Unregistering Shuffle¶
unregisterShuffle(
shuffleId: Int): Boolean
Unregisters a given shuffle
Used when BlockManagerSlaveEndpoint
is requested to handle a RemoveShuffle message
Implementations¶
Accessing ShuffleManager using SparkEnv¶
ShuffleManager
is available on the driver and executors using SparkEnv.shuffleManager.
val shuffleManager = SparkEnv.get.shuffleManager