BlockId¶
BlockId is an abstraction of data block identifiers based on an unique name.
Contract¶
Name¶
name: String
A globally unique identifier of this Block
Used when:
BlockManageris requested to putBlockDataAsStream and readDiskBlockFromSameHostExecutorUpdateBlockInfois requested to writeExternalDiskBlockManageris requested to getFile and containsBlockDiskStoreis requested to getBytes, remove, moveFileToBlock, contains
Implementations¶
Sealed Abstract Class
BlockId is a Scala sealed abstract class which means that all of the implementations are in the same compilation unit (a single file).
BroadcastBlockId¶
BlockId for broadcast variable blocks:
broadcastIdidentifier- Optional
fieldname (default:empty)
Uses broadcast_ prefix for the name
Used when:
TorrentBroadcastis created, requested to store a broadcast and the blocks in a local BlockManager, and read blocksBlockManageris requested to remove all the blocks of a broadcast variableSerializerManageris requested to shouldCompressAppStatusListeneris requested to onBlockUpdated
RDDBlockId¶
BlockId for RDD partitions:
rddIdidentifiersplitIndexidentifier
Uses rdd_ prefix for the name
Used when:
StorageStatusis requested to register the status of a data block, get the status of a data block, updateStorageInfoLocalRDDCheckpointDatais requested to doCheckpointRDDis requested to getOrComputeDAGScheduleris requested for the BlockManagers (executors) for cached RDD partitionsBlockManagerMasterEndpointis requested to removeRddAppStatusListeneris requested to updateRDDBlock (when onBlockUpdated for anRDDBlockId)
Compressed when spark.rdd.compress configuration property is enabled
ShuffleBlockBatchId¶
ShuffleBlockId¶
BlockId for shuffle blocks:
shuffleIdidentifiermapIdidentifierreduceIdidentifier
Uses shuffle_ prefix for the name
Used when:
ShuffleBlockFetcherIteratoris requested to throwFetchFailedExceptionMapOutputTrackerutility is requested to convertMapStatusesNettyBlockRpcServeris requested to handle a FetchShuffleBlocks messageExternalSorteris requested to writePartitionedMapOutputShuffleBlockFetcherIteratoris requested to mergeContinuousShuffleBlockIdsIfNeededIndexShuffleBlockResolveris requested to getBlockData
Compressed when spark.shuffle.compress configuration property is enabled
ShuffleDataBlockId¶
ShuffleIndexBlockId¶
StreamBlockId¶
BlockId for ...FIXME:
streamIduniqueId
Uses the following name:
input-[streamId]-[uniqueId]
Used in Spark Streaming
TaskResultBlockId¶
TempLocalBlockId¶
TempShuffleBlockId¶
TestBlockId¶
Creating BlockId by Name¶
apply(
name: String): BlockId
apply creates one of the available BlockIds by the given name (that uses a prefix to differentiate between different BlockIds).
apply is used when:
NettyBlockRpcServeris requested to handle OpenBlocks, UploadBlock messages and receiveStreamUpdateBlockInfois requested to deserialize (readExternal)DiskBlockManageris requested for all the blocks (from files stored on disk)ShuffleBlockFetcherIteratoris requested to sendRequestJsonProtocolutility is used to accumValueFromJson, taskMetricsFromJson and blockUpdatedInfoFromJson