Skip to content

Utils Utility

Local Directories for Storing Files

getConfiguredLocalDirs(
  conf: SparkConf): Array[String]

getConfiguredLocalDirs returns the local directories where Spark can write files to.

Internally, getConfiguredLocalDirs uses the given SparkConf to know if External Shuffle Service is enabled (based on spark.shuffle.service.enabled configuration property).

getConfiguredLocalDirs checks if Spark runs on YARN and if so, returns LOCAL_DIRS-controlled local directories.

In non-YARN mode (or for the driver in yarn-client mode), getConfiguredLocalDirs checks the following environment variables (in the order) and returns the value of the first met:

  1. SPARK_EXECUTOR_DIRS environment variable
  2. SPARK_LOCAL_DIRS environment variable
  3. MESOS_DIRECTORY environment variable (only when External Shuffle Service is not used)

In the end, when no earlier environment variables were found, getConfiguredLocalDirs uses the following properties (in the order):

  1. spark.local.dir configuration property
  2. java.io.tmpdir System property

getConfiguredLocalDirs is used when:

Local URI Scheme

Utils defines a local URI scheme for files that are locally available on worker nodes in the cluster.

The local URL scheme is used when:

  • Utils is used to isLocalUri
  • Client (Spark on YARN) is used

isLocalUri

isLocalUri(
  uri: String): Boolean

isLocalUri is true when the URI is a local: URI (the given uri starts with local: scheme).

isLocalUri is used when:

  • FIXME

getCurrentUserName

getCurrentUserName(): String

getCurrentUserName computes the user name who has started the SparkContext.md[SparkContext] instance.

NOTE: It is later available as SparkContext.md#sparkUser[SparkContext.sparkUser].

Internally, it reads SparkContext.md#SPARK_USER[SPARK_USER] environment variable and, if not set, reverts to Hadoop Security API's UserGroupInformation.getCurrentUser().getShortUserName().

NOTE: It is another place where Spark relies on Hadoop API for its operation.

localHostName

localHostName(): String

localHostName computes the local host name.

It starts by checking SPARK_LOCAL_HOSTNAME environment variable for the value. If it is not defined, it uses SPARK_LOCAL_IP to find the name (using InetAddress.getByName). If it is not defined either, it calls InetAddress.getLocalHost for the name.

NOTE: Utils.localHostName is executed while SparkContext.md#creating-instance[SparkContext is created] and also to compute the default value of spark-driver.md#spark_driver_host[spark.driver.host Spark property].

getUserJars

getUserJars(
  conf: SparkConf): Seq[String]

getUserJars is the spark.jars configuration property with non-empty entries.

getUserJars is used when:

extractHostPortFromSparkUrl

extractHostPortFromSparkUrl(
  sparkUrl: String): (String, Int)

extractHostPortFromSparkUrl creates a Java URI with the input sparkUrl and takes the host and port parts.

extractHostPortFromSparkUrl asserts that the input sparkURL uses spark scheme.

extractHostPortFromSparkUrl throws a SparkException for unparseable spark URLs:

Invalid master URL: [sparkUrl]

extractHostPortFromSparkUrl is used when:

isDynamicAllocationEnabled

isDynamicAllocationEnabled(
  conf: SparkConf): Boolean

isDynamicAllocationEnabled is true when the following hold:

  1. spark.dynamicAllocation.enabled configuration property is true
  2. spark.master is non-local

isDynamicAllocationEnabled is used when:

checkAndGetK8sMasterUrl

checkAndGetK8sMasterUrl(
  rawMasterURL: String): String

checkAndGetK8sMasterUrl...FIXME

checkAndGetK8sMasterUrl is used when:

getLocalDir

getLocalDir(
  conf: SparkConf): String

getLocalDir...FIXME

getLocalDir is used when:

  • Utils is requested to <>

  • SparkEnv is core:SparkEnv.md#create[created] (on the driver)

  • spark-shell.md[spark-shell] is launched

  • Spark on YARN's Client is requested to spark-yarn-client.md#prepareLocalResources[prepareLocalResources] and spark-yarn-client.md#createConfArchive[create ++spark_conf.zip++ archive with configuration files and Spark configuration]

  • PySpark's PythonBroadcast is requested to readObject

  • PySpark's EvalPythonExec is requested to doExecute

Fetching File

fetchFile(
  url: String,
  targetDir: File,
  conf: SparkConf,
  securityMgr: SecurityManager,
  hadoopConf: Configuration,
  timestamp: Long,
  useCache: Boolean): File

fetchFile...FIXME

fetchFile is used when:

  • SparkContext is requested to SparkContext.md#addFile[addFile]

  • Executor is requested to executor:Executor.md#updateDependencies[updateDependencies]

  • Spark Standalone's DriverRunner is requested to downloadUserJar

getOrCreateLocalRootDirs

getOrCreateLocalRootDirs(
  conf: SparkConf): Array[String]

getOrCreateLocalRootDirs...FIXME

getOrCreateLocalRootDirs is used when:

  • Utils is requested to <>

  • Worker is requested to spark-standalone-worker.md#receive[handle a LaunchExecutor message]

getOrCreateLocalRootDirsImpl

getOrCreateLocalRootDirsImpl(
  conf: SparkConf): Array[String]

getOrCreateLocalRootDirsImpl...FIXME

getOrCreateLocalRootDirsImpl is used when Utils is requested to getOrCreateLocalRootDirs

Back to top