Shuffle System¶
Shuffle System is a core service of Apache Spark that is responsible for shuffle block management.
The core abstraction is ShuffleManager with the default and only known implementation being SortShuffleManager.
spark.shuffle.manager configuration property allows for a custom ShuffleManager.
Shuffle System uses shuffle handles, readers and writers.
Resources¶
- Improving Apache Spark Downscaling by Christopher Crosbie (Google) Ben Sidhom (Google)
- Spark shuffle introduction by Raymond Liu (aka colorant)