Utils Utility¶
Local Directories for Storing Files¶
getConfiguredLocalDirs(
conf: SparkConf): Array[String]
getConfiguredLocalDirs returns the local directories where Spark can write files to.
Internally, getConfiguredLocalDirs uses the given SparkConf to know if External Shuffle Service is enabled (based on spark.shuffle.service.enabled configuration property).
getConfiguredLocalDirs checks if Spark runs on YARN and if so, returns LOCAL_DIRS-controlled local directories.
In non-YARN mode (or for the driver in yarn-client mode), getConfiguredLocalDirs checks the following environment variables (in the order) and returns the value of the first met:
SPARK_EXECUTOR_DIRSenvironment variableSPARK_LOCAL_DIRSenvironment variableMESOS_DIRECTORYenvironment variable (only when External Shuffle Service is not used)
In the end, when no earlier environment variables were found, getConfiguredLocalDirs uses the following properties (in the order):
- spark.local.dir configuration property
java.io.tmpdirSystem property
getConfiguredLocalDirs is used when:
DiskBlockManageris requested to createLocalDirsUtilsutility is used to get a local directory and getOrCreateLocalRootDirsImpl
Local URI Scheme¶
Utils defines a local URI scheme for files that are locally available on worker nodes in the cluster.
The local URL scheme is used when:
Utilsis used to isLocalUriClient(Spark on YARN) is used
isLocalUri¶
isLocalUri(
uri: String): Boolean
isLocalUri is true when the URI is a local: URI (the given uri starts with local: scheme).
isLocalUri is used when:
- FIXME
getCurrentUserName¶
getCurrentUserName(): String
getCurrentUserName computes the user name who has started the SparkContext.md[SparkContext] instance.
NOTE: It is later available as SparkContext.md#sparkUser[SparkContext.sparkUser].
Internally, it reads SparkContext.md#SPARK_USER[SPARK_USER] environment variable and, if not set, reverts to Hadoop Security API's UserGroupInformation.getCurrentUser().getShortUserName().
NOTE: It is another place where Spark relies on Hadoop API for its operation.
localHostName¶
localHostName(): String
localHostName computes the local host name.
It starts by checking SPARK_LOCAL_HOSTNAME environment variable for the value. If it is not defined, it uses SPARK_LOCAL_IP to find the name (using InetAddress.getByName). If it is not defined either, it calls InetAddress.getLocalHost for the name.
NOTE: Utils.localHostName is executed while SparkContext.md#creating-instance[SparkContext is created] and also to compute the default value of spark-driver.md#spark_driver_host[spark.driver.host Spark property].
getUserJars¶
getUserJars(
conf: SparkConf): Seq[String]
getUserJars is the spark.jars configuration property with non-empty entries.
getUserJars is used when:
SparkContextis created
extractHostPortFromSparkUrl¶
extractHostPortFromSparkUrl(
sparkUrl: String): (String, Int)
extractHostPortFromSparkUrl creates a Java URI with the input sparkUrl and takes the host and port parts.
extractHostPortFromSparkUrl asserts that the input sparkURL uses spark scheme.
extractHostPortFromSparkUrl throws a SparkException for unparseable spark URLs:
Invalid master URL: [sparkUrl]
extractHostPortFromSparkUrl is used when:
StandaloneSubmitRequestServletis requested tobuildDriverDescriptionRpcAddressis requested to extract an RpcAddress from a Spark master URL
isDynamicAllocationEnabled¶
isDynamicAllocationEnabled(
conf: SparkConf): Boolean
isDynamicAllocationEnabled is true when the following hold:
- spark.dynamicAllocation.enabled configuration property is
true - spark.master is non-
local
isDynamicAllocationEnabled is used when:
SparkContextis created (to start an ExecutorAllocationManager)DAGScheduleris requested to checkBarrierStageWithDynamicAllocationSchedulerBackendUtilsis requested to getInitialTargetExecutorNumberStandaloneSchedulerBackend(Spark Standalone) is requested tostartExecutorPodsAllocator(Spark on Kubernetes) is requested toonNewSnapshotsApplicationMaster(Spark on YARN) is created
checkAndGetK8sMasterUrl¶
checkAndGetK8sMasterUrl(
rawMasterURL: String): String
checkAndGetK8sMasterUrl...FIXME
checkAndGetK8sMasterUrl is used when:
SparkSubmitis requested to prepareSubmitEnvironment (for Kubernetes cluster manager)
getLocalDir¶
getLocalDir(
conf: SparkConf): String
getLocalDir...FIXME
getLocalDir is used when:
-
Utilsis requested to <> -
SparkEnvis core:SparkEnv.md#create[created] (on the driver) -
spark-shell.md[spark-shell] is launched
-
Spark on YARN's
Clientis requested to spark-yarn-client.md#prepareLocalResources[prepareLocalResources] and spark-yarn-client.md#createConfArchive[create ++spark_conf.zip++ archive with configuration files and Spark configuration] -
PySpark's
PythonBroadcastis requested toreadObject -
PySpark's
EvalPythonExecis requested todoExecute
Fetching File¶
fetchFile(
url: String,
targetDir: File,
conf: SparkConf,
securityMgr: SecurityManager,
hadoopConf: Configuration,
timestamp: Long,
useCache: Boolean): File
fetchFile...FIXME
fetchFile is used when:
-
SparkContextis requested to SparkContext.md#addFile[addFile] -
Executoris requested to executor:Executor.md#updateDependencies[updateDependencies] -
Spark Standalone's
DriverRunneris requested todownloadUserJar
getOrCreateLocalRootDirs¶
getOrCreateLocalRootDirs(
conf: SparkConf): Array[String]
getOrCreateLocalRootDirs...FIXME
getOrCreateLocalRootDirs is used when:
-
Utilsis requested to <> -
Workeris requested to spark-standalone-worker.md#receive[handle a LaunchExecutor message]
getOrCreateLocalRootDirsImpl¶
getOrCreateLocalRootDirsImpl(
conf: SparkConf): Array[String]
getOrCreateLocalRootDirsImpl...FIXME
getOrCreateLocalRootDirsImpl is used when Utils is requested to getOrCreateLocalRootDirs