ShuffleStatus¶
ShuffleStatus
is used by MapOutputTrackerMaster to keep track of the shuffle map outputs of a ShuffleMapStage.
Creating Instance¶
ShuffleStatus
takes the following to be created:
- Number of Partitions (of the RDD of a ShuffleDependency)
ShuffleStatus
is created when:
MapOutputTrackerMaster
is requested to register a shuffle (whenDAGScheduler
is requested to createShuffleMapStage)
Registering Shuffle Map Output¶
addMapOutput(
mapIndex: Int,
status: MapStatus): Unit
addMapOutput
...FIXME
addMapOutput
is used when:
MapOutputTrackerMaster
is requested to registerMapOutput
Deregistering Shuffle Map Output¶
removeMapOutput(
mapIndex: Int,
bmAddress: BlockManagerId): Unit
removeMapOutput
...FIXME
removeMapOutput
is used when:
MapOutputTrackerMaster
is requested to unregisterMapOutput
Missing Partitions¶
findMissingPartitions(): Seq[Int]
findMissingPartitions
...FIXME
findMissingPartitions
is used when:
MapOutputTrackerMaster
is requested to findMissingPartitions
Serializing Shuffle Map Output Statuses¶
serializedMapStatus(
broadcastManager: BroadcastManager,
isLocal: Boolean,
minBroadcastSize: Int,
conf: SparkConf): Array[Byte]
serializedMapStatus
...FIXME
serializedMapStatus
is used when:
MessageLoop
(of the MapOutputTrackerMaster) is requested to send map output locations for shuffle
Logging¶
Enable ALL
logging level for org.apache.spark.ShuffleStatus
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.ShuffleStatus=ALL
Refer to Logging.