ShuffleStatus¶
ShuffleStatus is used by MapOutputTrackerMaster to keep track of the shuffle map outputs of a ShuffleMapStage.
Creating Instance¶
ShuffleStatus takes the following to be created:
- Number of Partitions (of the RDD of a ShuffleDependency)
ShuffleStatus is created when:
MapOutputTrackerMasteris requested to register a shuffle (whenDAGScheduleris requested to createShuffleMapStage)
Registering Shuffle Map Output¶
addMapOutput(
mapIndex: Int,
status: MapStatus): Unit
addMapOutput...FIXME
addMapOutput is used when:
MapOutputTrackerMasteris requested to registerMapOutput
Deregistering Shuffle Map Output¶
removeMapOutput(
mapIndex: Int,
bmAddress: BlockManagerId): Unit
removeMapOutput...FIXME
removeMapOutput is used when:
MapOutputTrackerMasteris requested to unregisterMapOutput
Missing Partitions¶
findMissingPartitions(): Seq[Int]
findMissingPartitions...FIXME
findMissingPartitions is used when:
MapOutputTrackerMasteris requested to findMissingPartitions
Serializing Shuffle Map Output Statuses¶
serializedMapStatus(
broadcastManager: BroadcastManager,
isLocal: Boolean,
minBroadcastSize: Int,
conf: SparkConf): Array[Byte]
serializedMapStatus...FIXME
serializedMapStatus is used when:
MessageLoop(of the MapOutputTrackerMaster) is requested to send map output locations for shuffle
Logging¶
Enable ALL logging level for org.apache.spark.ShuffleStatus logger to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.ShuffleStatus=ALL
Refer to Logging.