DiskBlockObjectWriter¶
DiskBlockObjectWriter
is a custom java.io.OutputStream that BlockManager offers for writing data blocks to disk.
DiskBlockObjectWriter
is used when:
-
BypassMergeSortShuffleWriter
is requested for partition writers -
UnsafeSorterSpillWriter
is requested for a partition writer -
ShuffleExternalSorter
is requested to writeSortedFile -
ExternalSorter
is requested to spillMemoryIteratorToDisk
Creating Instance¶
DiskBlockObjectWriter
takes the following to be created:
- Java File
- SerializerManager
- SerializerInstance
- Buffer size
-
syncWrites
flag (based on spark.shuffle.sync configuration property) - ShuffleWriteMetricsReporter
- BlockId (default:
null
)
DiskBlockObjectWriter
is created when:
BlockManager
is requested for one
SerializationStream¶
DiskBlockObjectWriter
manages a SerializationStream for writing a key-value record:
-
Opens it when requested to open
-
Closes it when requested to commitAndGet
-
Dereferences it (
null
s it) when closeResources
States¶
DiskBlockObjectWriter
can be in one of the following states (that match the state of the underlying output streams):
- Initialized
- Open
- Closed
Writing Out Record¶
write(
key: Any,
value: Any): Unit
write
opens the underlying stream unless open already.
write
requests the SerializationStream to write the key and then the value.
In the end, write
updates the write metrics.
write
is used when:
-
BypassMergeSortShuffleWriter
is requested to write records of a partition -
ExternalAppendOnlyMap
is requested to spillMemoryIteratorToDisk -
ExternalSorter
is requested to write all records into a partitioned fileSpillableIterator
is requested tospill
-
WritablePartitionedPairCollection
is requested for adestructiveSortedWritablePartitionedIterator
commitAndGet¶
commitAndGet(): FileSegment
commitAndGet
...FIXME
commitAndGet
is used when...FIXME
Committing Writes and Closing Resources¶
close(): Unit
close
...FIXME
close
is used when...FIXME
revertPartialWritesAndClose¶
revertPartialWritesAndClose(): File
revertPartialWritesAndClose
...FIXME
revertPartialWritesAndClose
is used when...FIXME
Writing Bytes (From Byte Array Starting From Offset)¶
write(
kvBytes: Array[Byte],
offs: Int,
len: Int): Unit
write
...FIXME
write
is used when...FIXME
Opening DiskBlockObjectWriter¶
open(): DiskBlockObjectWriter
open
opens the DiskBlockObjectWriter
, i.e. initializes and re-sets bs and objOut internal output streams.
Internally, open
makes sure that DiskBlockObjectWriter
is not closed (hasBeenClosed flag is disabled). If it was, open
throws a IllegalStateException
:
Writer already closed. Cannot be reopened.
Unless DiskBlockObjectWriter
has already been initialized (initialized flag is enabled), open
initializes it (and turns initialized flag on).
Regardless of whether DiskBlockObjectWriter
was already initialized or not, open
requests SerializerManager
to wrap mcs
output stream for encryption and compression (for blockId) and sets it as bs.
open
requests the SerializerInstance to serialize bs
output stream and sets it as objOut.
Note
open
uses the SerializerInstance that was used to create the DiskBlockObjectWriter
.
In the end, open
turns streamOpen flag on.
open
is used when DiskBlockObjectWriter
writes out a record or bytes from a specified byte array and the stream is not open yet.