DiskStore¶
DiskStore
manages data blocks on disk for BlockManager.
Creating Instance¶
DiskStore
takes the following to be created:
- SparkConf
- DiskBlockManager
-
SecurityManager
DiskStore
is created for BlockManager.
Block Sizes¶
blockSizes: ConcurrentHashMap[BlockId, Long]
DiskStore
uses ConcurrentHashMap
(Java) as a registry of blocks and the data size (on disk).
A new entry is added when put and moveFileToBlock.
An entry is removed when remove.
putBytes¶
putBytes(
blockId: BlockId,
bytes: ChunkedByteBuffer): Unit
putBytes
put the block and writes the buffer out (to the given channel).
putBytes
is used when:
ByteBufferBlockStoreUpdater
is requested to saveToDiskStoreBlockManager
is requested to dropFromMemory
getBytes¶
getBytes(
blockId: BlockId): BlockData
getBytes(
f: File,
blockSize: Long): BlockData
getBytes
requests the DiskBlockManager for the block file and the size.
getBytes
requests the SecurityManager for getIOEncryptionKey
and returns a EncryptedBlockData
if available or a DiskBlockData
otherwise.
getBytes
is used when:
TempFileBasedBlockStoreUpdater
is requested to blockDataBlockManager
is requested to getLocalValues, doGetLocalBytes
getSize¶
getSize(
blockId: BlockId): Long
getSize
looks up the block in the blockSizes registry.
getSize
is used when:
BlockManager
is requested to getStatus, getCurrentBlockStatus, doPutIteratorDiskStore
is requested for the block bytes
moveFileToBlock¶
moveFileToBlock(
sourceFile: File,
blockSize: Long,
targetBlockId: BlockId): Unit
moveFileToBlock
...FIXME
moveFileToBlock
is used when:
TempFileBasedBlockStoreUpdater
is requested to saveToDiskStore
Checking if Block File Exists¶
contains(
blockId: BlockId): Boolean
contains
requests the DiskBlockManager for the block file and checks whether the file actually exists or not.
contains
is used when:
BlockManager
is requested to getStatus, getCurrentBlockStatus, getLocalValues, doGetLocalBytes, dropFromMemoryDiskStore
is requested to put
Persisting Block to Disk¶
put(
blockId: BlockId)(
writeFunc: WritableByteChannel => Unit): Unit
put
prints out the following DEBUG message to the logs:
Attempting to put block [blockId]
put
requests the DiskBlockManager for the block file for the input block.
put
opens the block file for writing (wrapped into a CountingWritableChannel
to count the bytes written). put
executes the given writeFunc
function (with the WritableByteChannel
of the block file) and saves the bytes written (to the blockSizes registry).
In the end, put
prints out the following DEBUG message to the logs:
Block [fileName] stored as [size] file on disk in [time] ms
In case of any exception, put
deletes the block file.
put
throws an IllegalStateException
when the block is already stored:
Block [blockId] is already present in the disk store
put
is used when:
BlockManager
is requested to doPutIterator and dropFromMemoryDiskStore
is requested to putBytes
Removing Block¶
remove(
blockId: BlockId): Boolean
remove
...FIXME
remove
is used when:
BlockManager
is requested to removeBlockInternalDiskStore
is requested to put (and anIOException
is thrown)
Logging¶
Enable ALL
logging level for org.apache.spark.storage.DiskStore
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.storage.DiskStore=ALL
Refer to Logging.