Skip to content

SparkEnv

SparkEnv is a handle to Spark Execution Environment with the core services of Apache Spark (that interact with each other to establish a distributed computing platform for a Spark application).

There are two separate SparkEnvs of the driver and executors.

Core Services

Property Service
blockManager BlockManager
broadcastManager BroadcastManager
closureSerializer Serializer
conf SparkConf
mapOutputTracker MapOutputTracker
memoryManager MemoryManager
metricsSystem MetricsSystem
outputCommitCoordinator OutputCommitCoordinator
rpcEnv RpcEnv
securityManager SecurityManager
serializer Serializer
serializerManager SerializerManager
shuffleManager ShuffleManager

Creating Instance

SparkEnv takes the following to be created:

SparkEnv is created using create utility.

Driver's Temporary Directory

driverTmpDir: Option[String]

SparkEnv defines driverTmpDir internal registry for the driver to be used as the root directory of files added using SparkContext.addFile.

driverTmpDir is undefined initially and is defined for the driver only when SparkEnv utility is used to create a "base" SparkEnv.

Demo

import org.apache.spark.SparkEnv
// :pa -raw
// BEGIN
package org.apache.spark
object BypassPrivateSpark {
  def driverTmpDir(sparkEnv: SparkEnv) = {
    sparkEnv.driverTmpDir
  }
}
// END
val driverTmpDir = org.apache.spark.BypassPrivateSpark.driverTmpDir(SparkEnv.get).get

The above is equivalent to the following snippet.

import org.apache.spark.SparkFiles
SparkFiles.getRootDirectory

Creating SparkEnv for Driver

createDriverEnv(
  conf: SparkConf,
  isLocal: Boolean,
  listenerBus: LiveListenerBus,
  numCores: Int,
  mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv

createDriverEnv creates a SparkEnv execution environment for the driver.

Spark Environment for driver

createDriverEnv accepts an instance of SparkConf.md[SparkConf], spark-deployment-environments.md[whether it runs in local mode or not], scheduler:LiveListenerBus.md[], the number of cores to use for execution in local mode or 0 otherwise, and a OutputCommitCoordinator (default: none).

createDriverEnv ensures that spark-driver.md#spark_driver_host[spark.driver.host] and spark-driver.md#spark_driver_port[spark.driver.port] settings are defined.

It then passes the call straight on to the <> (with driver executor id, isDriver enabled, and the input parameters).

createDriverEnv is used when SparkContext is created.

Creating SparkEnv for Executor

createExecutorEnv(
  conf: SparkConf,
  executorId: String,
  hostname: String,
  numCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  isLocal: Boolean): SparkEnv
createExecutorEnv(
  conf: SparkConf,
  executorId: String,
  bindAddress: String,
  hostname: String,
  numCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  isLocal: Boolean): SparkEnv

createExecutorEnv creates an executor's (execution) environment that is the Spark execution environment for an executor.

Spark Environment for executor

createExecutorEnv simply <> (passing in all the input parameters) and <>.

NOTE: The number of cores numCores is configured using --cores command-line option of CoarseGrainedExecutorBackend and is specific to a cluster manager.

createExecutorEnv is used when CoarseGrainedExecutorBackend utility is requested to run.

Creating "Base" SparkEnv

create(
  conf: SparkConf,
  executorId: String,
  bindAddress: String,
  advertiseAddress: String,
  port: Option[Int],
  isLocal: Boolean,
  numUsableCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  listenerBus: LiveListenerBus = null,
  mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv

create creates the "base" SparkEnv (that is common across the driver and executors).

create creates a RpcEnv as sparkDriver on the driver and sparkExecutor on executors.

create creates a Serializer (based on spark.serializer configuration property). create prints out the following DEBUG message to the logs:

Using serializer: [serializer]

create creates a SerializerManager.

create creates a JavaSerializer as the closure serializer.

creates creates a BroadcastManager.

creates creates a MapOutputTrackerMaster (on the driver) or a MapOutputTrackerWorker (on executors). creates registers or looks up a MapOutputTrackerMasterEndpoint under the name of MapOutputTracker. creates prints out the following INFO message to the logs (on the driver only):

Registering MapOutputTracker

creates creates a ShuffleManager (based on spark.shuffle.manager configuration property).

create creates a UnifiedMemoryManager.

With spark.shuffle.service.enabled configuration property enabled, create creates an ExternalBlockStoreClient.

create creates a BlockManagerMaster.

create creates a NettyBlockTransferService.

Creating BlockManager for the Driver

Creating BlockManager for Executor

create creates a BlockManager.

create creates a MetricsSystem.

create creates a OutputCommitCoordinator and registers or looks up a OutputCommitCoordinatorEndpoint under the name of OutputCommitCoordinator.

create creates a SparkEnv (with all the services "stitched" together).

Logging

Enable ALL logging level for org.apache.spark.SparkEnv logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.SparkEnv=ALL

Refer to Logging.

Back to top