SparkSubmit¶
SparkSubmit
is the entry point to spark-submit shell script.
Special Primary Resource Names¶
SparkSubmit
uses the following special primary resource names to represent Spark shells rather than application jars:
spark-shell
- pyspark-shell
sparkr-shell
pyspark-shell¶
SparkSubmit
uses pyspark-shell
when:
SparkSubmit
is requested to prepareSubmitEnvironment for.py
scripts orpyspark
, isShell and isPython
isShell¶
isShell(
res: String): Boolean
isShell
is true
when the given res
primary resource represents a Spark shell.
isShell
is used when:
SparkSubmit
is requested to prepareSubmitEnvironment and isUserJarSparkSubmitArguments
is requested to handleUnknown (and determine a primary application resource)
Actions¶
SparkSubmit
executes actions (based on the action argument).
Killing Submission¶
kill(
args: SparkSubmitArguments): Unit
kill
...FIXME
Displaying Version¶
printVersion(): Unit
printVersion
...FIXME
Submission Status¶
requestStatus(
args: SparkSubmitArguments): Unit
requestStatus
...FIXME
Submission¶
submit(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
submit
...FIXME
Running Main Class¶
runMain(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
runMain
prepareSubmitEnvironment with the given SparkSubmitArguments (that gives a 4-element tuple of childArgs
, childClasspath
, sparkConf
and childMainClass).
With verbose enabled, runMain
prints out the following INFO messages to the logs:
Main class:
[childMainClass]
Arguments:
[childArgs]
Spark config:
[sparkConf_redacted]
Classpath elements:
[childClasspath]
runMain
creates and sets a context classloader (based on spark.driver.userClassPathFirst
configuration property) and adds the jars (from childClasspath
).
runMain
loads the main class (childMainClass
).
runMain
creates a SparkApplication (if the main class is a subtype of) or creates a JavaMainApplication (with the main class).
In the end, runMain
requests the SparkApplication
to start (with the childArgs
and sparkConf
).
Cluster Managers¶
SparkSubmit
has a built-in support for some cluster managers (that are selected based on the master argument).
Nickname | Master URL |
---|---|
KUBERNETES | k8s:// -prefix |
LOCAL | local -prefix |
MESOS | mesos -prefix |
STANDALONE | spark -prefix |
YARN | yarn |
Launching Standalone Application¶
main(
args: Array[String]): Unit
main
...FIXME
doSubmit¶
doSubmit(
args: Array[String]): Unit
doSubmit
...FIXME
doSubmit
is used when:
InProcessSparkSubmit
standalone application is startedSparkSubmit
standalone application is started
prepareSubmitEnvironment¶
prepareSubmitEnvironment(
args: SparkSubmitArguments,
conf: Option[HadoopConfiguration] = None): (Seq[String], Seq[String], SparkConf, String)
prepareSubmitEnvironment
creates a 4-element tuple made up of the following:
childArgs
for argumentschildClasspath
for Classpath elementssysProps
for Spark properties- childMainClass
Tip
Use --verbose
command-line option to have the elements of the tuple printed out to the standard output.
prepareSubmitEnvironment
...FIXME
For isPython in CLIENT
deploy mode, prepareSubmitEnvironment
sets the following based on primaryResource:
-
For pyspark-shell the mainClass is
org.apache.spark.api.python.PythonGatewayServer
-
Otherwise, the mainClass is
org.apache.spark.deploy.PythonRunner
and the main python file, extra python files and the childArgs
prepareSubmitEnvironment
...FIXME
prepareSubmitEnvironment
determines the cluster manager based on master argument.
For KUBERNETES, prepareSubmitEnvironment
checkAndGetK8sMasterUrl.
prepareSubmitEnvironment
...FIXME
prepareSubmitEnvironment
is used when...FIXME
childMainClass¶
childMainClass
is the last 4th argument in the result tuple of prepareSubmitEnvironment.
// (childArgs, childClasspath, sparkConf, childMainClass)
(Seq[String], Seq[String], SparkConf, String)
childMainClass
can be as follows:
Deploy Mode | Master URL | childMainClass |
---|---|---|
client | any | mainClass |
cluster | KUBERNETES | KubernetesClientApplication |
cluster | MESOS | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | ClientApp |
cluster | YARN | YarnClusterApplication |
isKubernetesClient¶
prepareSubmitEnvironment
uses isKubernetesClient
flag to indicate that:
isKubernetesClusterModeDriver¶
prepareSubmitEnvironment
uses isKubernetesClusterModeDriver
flag to indicate that:
- isKubernetesClient
spark.kubernetes.submitInDriver
configuration property is enabled (Spark on Kubernetes)
renameResourcesToLocalFS¶
renameResourcesToLocalFS(
resources: String,
localResources: String): String
renameResourcesToLocalFS
...FIXME
renameResourcesToLocalFS
is used for isKubernetesClusterModeDriver mode.
downloadResource¶
downloadResource(
resource: String): String
downloadResource
...FIXME
Checking Whether Resource is Internal¶
isInternal(
res: String): Boolean
isInternal
is true
when the given res
is spark-internal.
isInternal
is used when:
SparkSubmit
is requested to isUserJarSparkSubmitArguments
is requested to handleUnknown
isUserJar¶
isUserJar(
res: String): Boolean
isUserJar
is true
when the given res
is none of the following:
isShell
- isPython
- isInternal
isR
isUserJar
is used when:
- FIXME
isPython Utility¶
isPython(
res: String): Boolean
isPython
is true
when the given res
primary resource represents a PySpark application:
.py
script- pyspark-shell
isPython
is used when:
SparkSubmit
is requested to isUserJarSparkSubmitArguments
is requested to handleUnknown (and setisPython
internal flag)