Utils Utility¶
Local Directories for Storing Files¶
getConfiguredLocalDirs(
conf: SparkConf): Array[String]
getConfiguredLocalDirs
returns the local directories where Spark can write files to.
Internally, getConfiguredLocalDirs
uses the given SparkConf to know if External Shuffle Service is enabled (based on spark.shuffle.service.enabled configuration property).
getConfiguredLocalDirs
checks if Spark runs on YARN and if so, returns LOCAL_DIRS-controlled local directories.
In non-YARN mode (or for the driver in yarn-client mode), getConfiguredLocalDirs
checks the following environment variables (in the order) and returns the value of the first met:
SPARK_EXECUTOR_DIRS
environment variableSPARK_LOCAL_DIRS
environment variableMESOS_DIRECTORY
environment variable (only when External Shuffle Service is not used)
In the end, when no earlier environment variables were found, getConfiguredLocalDirs
uses the following properties (in the order):
- spark.local.dir configuration property
java.io.tmpdir
System property
getConfiguredLocalDirs
is used when:
DiskBlockManager
is requested to createLocalDirsUtils
utility is used to get a local directory and getOrCreateLocalRootDirsImpl
Local URI Scheme¶
Utils
defines a local
URI scheme for files that are locally available on worker nodes in the cluster.
The local
URL scheme is used when:
Utils
is used to isLocalUriClient
(Spark on YARN) is used
isLocalUri¶
isLocalUri(
uri: String): Boolean
isLocalUri
is true
when the URI is a local:
URI (the given uri
starts with local: scheme).
isLocalUri
is used when:
- FIXME
getCurrentUserName¶
getCurrentUserName(): String
getCurrentUserName
computes the user name who has started the SparkContext.md[SparkContext] instance.
NOTE: It is later available as SparkContext.md#sparkUser[SparkContext.sparkUser].
Internally, it reads SparkContext.md#SPARK_USER[SPARK_USER] environment variable and, if not set, reverts to Hadoop Security API's UserGroupInformation.getCurrentUser().getShortUserName()
.
NOTE: It is another place where Spark relies on Hadoop API for its operation.
localHostName¶
localHostName(): String
localHostName
computes the local host name.
It starts by checking SPARK_LOCAL_HOSTNAME
environment variable for the value. If it is not defined, it uses SPARK_LOCAL_IP
to find the name (using InetAddress.getByName
). If it is not defined either, it calls InetAddress.getLocalHost
for the name.
NOTE: Utils.localHostName
is executed while SparkContext.md#creating-instance[SparkContext
is created] and also to compute the default value of spark-driver.md#spark_driver_host[spark.driver.host Spark property].
getUserJars¶
getUserJars(
conf: SparkConf): Seq[String]
getUserJars
is the spark.jars configuration property with non-empty entries.
getUserJars
is used when:
SparkContext
is created
extractHostPortFromSparkUrl¶
extractHostPortFromSparkUrl(
sparkUrl: String): (String, Int)
extractHostPortFromSparkUrl
creates a Java URI with the input sparkUrl
and takes the host and port parts.
extractHostPortFromSparkUrl
asserts that the input sparkURL
uses spark scheme.
extractHostPortFromSparkUrl
throws a SparkException
for unparseable spark URLs:
Invalid master URL: [sparkUrl]
extractHostPortFromSparkUrl
is used when:
StandaloneSubmitRequestServlet
is requested tobuildDriverDescription
RpcAddress
is requested to extract an RpcAddress from a Spark master URL
isDynamicAllocationEnabled¶
isDynamicAllocationEnabled(
conf: SparkConf): Boolean
isDynamicAllocationEnabled
is true
when the following hold:
- spark.dynamicAllocation.enabled configuration property is
true
- spark.master is non-
local
isDynamicAllocationEnabled
is used when:
SparkContext
is created (to start an ExecutorAllocationManager)DAGScheduler
is requested to checkBarrierStageWithDynamicAllocationSchedulerBackendUtils
is requested to getInitialTargetExecutorNumberStandaloneSchedulerBackend
(Spark Standalone) is requested tostart
ExecutorPodsAllocator
(Spark on Kubernetes) is requested toonNewSnapshots
ApplicationMaster
(Spark on YARN) is created
checkAndGetK8sMasterUrl¶
checkAndGetK8sMasterUrl(
rawMasterURL: String): String
checkAndGetK8sMasterUrl
...FIXME
checkAndGetK8sMasterUrl
is used when:
SparkSubmit
is requested to prepareSubmitEnvironment (for Kubernetes cluster manager)
getLocalDir¶
getLocalDir(
conf: SparkConf): String
getLocalDir
...FIXME
getLocalDir
is used when:
-
Utils
is requested to <> -
SparkEnv
is core:SparkEnv.md#create[created] (on the driver) -
spark-shell.md[spark-shell] is launched
-
Spark on YARN's
Client
is requested to spark-yarn-client.md#prepareLocalResources[prepareLocalResources] and spark-yarn-client.md#createConfArchive[create ++spark_conf.zip++ archive with configuration files and Spark configuration] -
PySpark's
PythonBroadcast
is requested toreadObject
-
PySpark's
EvalPythonExec
is requested todoExecute
Fetching File¶
fetchFile(
url: String,
targetDir: File,
conf: SparkConf,
securityMgr: SecurityManager,
hadoopConf: Configuration,
timestamp: Long,
useCache: Boolean): File
fetchFile
...FIXME
fetchFile
is used when:
-
SparkContext
is requested to SparkContext.md#addFile[addFile] -
Executor
is requested to executor:Executor.md#updateDependencies[updateDependencies] -
Spark Standalone's
DriverRunner
is requested todownloadUserJar
getOrCreateLocalRootDirs¶
getOrCreateLocalRootDirs(
conf: SparkConf): Array[String]
getOrCreateLocalRootDirs
...FIXME
getOrCreateLocalRootDirs
is used when:
-
Utils
is requested to <> -
Worker
is requested to spark-standalone-worker.md#receive[handle a LaunchExecutor message]
getOrCreateLocalRootDirsImpl¶
getOrCreateLocalRootDirsImpl(
conf: SparkConf): Array[String]
getOrCreateLocalRootDirsImpl
...FIXME
getOrCreateLocalRootDirsImpl
is used when Utils
is requested to getOrCreateLocalRootDirs