Partitioner¶
Partitioner
is an abstraction of partitioners that define how the elements in a key-value pair RDD are partitioned by key.
Partitioner
maps keys to partition IDs (from 0 to numPartitions exclusive).
Partitioner
ensures that records with the same key are in the same partition.
Partitioner
is a Java Serializable
.
Contract¶
Partition for Key¶
getPartition(
key: Any): Int
Partition ID for the given key
Number of Partitions¶
numPartitions: Int
Implementations¶
- CoalescedPartitioner (Spark SQL)
- HashPartitioner
- RangePartitioner