DiskBlockObjectWriter¶
DiskBlockObjectWriter is a custom java.io.OutputStream that BlockManager offers for writing data blocks to disk.
DiskBlockObjectWriter is used when:
-
BypassMergeSortShuffleWriteris requested for partition writers -
UnsafeSorterSpillWriteris requested for a partition writer -
ShuffleExternalSorteris requested to writeSortedFile -
ExternalSorteris requested to spillMemoryIteratorToDisk
Creating Instance¶
DiskBlockObjectWriter takes the following to be created:
- Java File
- SerializerManager
- SerializerInstance
- Buffer size
-
syncWritesflag (based on spark.shuffle.sync configuration property) - ShuffleWriteMetricsReporter
- BlockId (default:
null)
DiskBlockObjectWriter is created when:
BlockManageris requested for one
SerializationStream¶
DiskBlockObjectWriter manages a SerializationStream for writing a key-value record:
-
Opens it when requested to open
-
Closes it when requested to commitAndGet
-
Dereferences it (
nulls it) when closeResources
States¶
DiskBlockObjectWriter can be in one of the following states (that match the state of the underlying output streams):
- Initialized
- Open
- Closed
Writing Out Record¶
write(
key: Any,
value: Any): Unit
write opens the underlying stream unless open already.
write requests the SerializationStream to write the key and then the value.
In the end, write updates the write metrics.
write is used when:
-
BypassMergeSortShuffleWriteris requested to write records of a partition -
ExternalAppendOnlyMapis requested to spillMemoryIteratorToDisk -
ExternalSorteris requested to write all records into a partitioned fileSpillableIteratoris requested tospill
-
WritablePartitionedPairCollectionis requested for adestructiveSortedWritablePartitionedIterator
commitAndGet¶
commitAndGet(): FileSegment
commitAndGet...FIXME
commitAndGet is used when...FIXME
Committing Writes and Closing Resources¶
close(): Unit
close...FIXME
close is used when...FIXME
revertPartialWritesAndClose¶
revertPartialWritesAndClose(): File
revertPartialWritesAndClose...FIXME
revertPartialWritesAndClose is used when...FIXME
Writing Bytes (From Byte Array Starting From Offset)¶
write(
kvBytes: Array[Byte],
offs: Int,
len: Int): Unit
write...FIXME
write is used when...FIXME
Opening DiskBlockObjectWriter¶
open(): DiskBlockObjectWriter
open opens the DiskBlockObjectWriter, i.e. initializes and re-sets bs and objOut internal output streams.
Internally, open makes sure that DiskBlockObjectWriter is not closed (hasBeenClosed flag is disabled). If it was, open throws a IllegalStateException:
Writer already closed. Cannot be reopened.
Unless DiskBlockObjectWriter has already been initialized (initialized flag is enabled), open initializes it (and turns initialized flag on).
Regardless of whether DiskBlockObjectWriter was already initialized or not, open requests SerializerManager to wrap mcs output stream for encryption and compression (for blockId) and sets it as bs.
open requests the SerializerInstance to serialize bs output stream and sets it as objOut.
Note
open uses the SerializerInstance that was used to create the DiskBlockObjectWriter.
In the end, open turns streamOpen flag on.
open is used when DiskBlockObjectWriter writes out a record or bytes from a specified byte array and the stream is not open yet.