SparkSubmitCommandBuilder¶
SparkSubmitCommandBuilder
is an AbstractCommandBuilder.
SparkSubmitCommandBuilder
is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.
SparkSubmitCommandBuilder
uses the first argument to distinguish the shells:
pyspark-shell-main
sparkr-shell-main
run-example
Describe run-example
SparkSubmitCommandBuilder
parses command-line arguments using OptionParser
(which is a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser]). OptionParser
comes with the following methods:
-
handle
to handle the known options (see the table below). It sets upmaster
,deployMode
,propertiesFile
,conf
,mainClass
,sparkArgs
internal properties. -
handleUnknown
to handle unrecognized options that usually lead toUnrecognized option
error message. -
handleExtraArgs
to handle extra arguments that are considered a Spark application's arguments.
Note
For spark-shell
it assumes that the application arguments are after spark-submit
's arguments.
pyspark-shell-main App Resource¶
SparkSubmitCommandBuilder
uses pyspark-shell-main
as the name of the app resource to identify the PySpark shell.
pyspark-shell-main
is used when:
SparkSubmitCommandBuilder
is created and requested to buildCommand
buildCommand¶
List<String> buildCommand(
Map<String, String> env)
buildCommand
is part of the AbstractCommandBuilder abstraction.
buildCommand
branches off based on the appResource:
- buildPySparkShellCommand for PYSPARK_SHELL
buildSparkRCommand
for SparkR- buildSparkSubmitCommand
buildPySparkShellCommand¶
List<String> buildPySparkShellCommand(
Map<String, String> env)
buildPySparkShellCommand
...FIXME
buildSparkSubmitCommand¶
List<String> buildSparkSubmitCommand(
Map<String, String> env)
buildSparkSubmitCommand
starts by building so-called effective config. When in client mode, buildSparkSubmitCommand
adds spark.driver.extraClassPath to the result Spark command.
buildSparkSubmitCommand
builds the first part of the Java command passing in the extra classpath (only for client
deploy mode).
Add isThriftServer
case
buildSparkSubmitCommand
appends SPARK_SUBMIT_OPTS
and SPARK_JAVA_OPTS
environment variables.
(only for client
deploy mode) ...
Elaborate on the client deply mode case
addPermGenSizeOpt
case...elaborate
Elaborate on addPermGenSizeOpt
buildSparkSubmitCommand
appends org.apache.spark.deploy.SparkSubmit
and the command-line arguments (using buildSparkSubmitArgs).
buildSparkSubmitArgs¶
List<String> buildSparkSubmitArgs()
buildSparkSubmitArgs
builds a list of command-line arguments for spark-submit.
buildSparkSubmitArgs
uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit
recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser
parser to parse command-line arguments).
buildSparkSubmitArgs
is used when:
InProcessLauncher
is requested tostartApplication
SparkLauncher
is requested to createBuilderSparkSubmitCommandBuilder
is requested to buildSparkSubmitCommand and constructEnvVarArgs
SparkSubmitCommandBuilder Properties and SparkSubmitOptionParser Attributes¶
SparkSubmitCommandBuilder Property | SparkSubmitOptionParser Attribute |
---|---|
verbose | VERBOSE |
master | MASTER [master] |
deployMode | DEPLOY_MODE [deployMode] |
appName | NAME [appName] |
conf | CONF [key=value]* |
propertiesFile | PROPERTIES_FILE [propertiesFile] |
jars | JARS [comma-separated jars] |
files | FILES [comma-separated files] |
pyFiles | PY_FILES [comma-separated pyFiles] |
mainClass | CLASS [mainClass] |
sparkArgs | sparkArgs (passed straight through) |
appResource | appResource (passed straight through) |
appArgs | appArgs (passed straight through) |
==== [[getEffectiveConfig]] getEffectiveConfig
Internal Method
[source, java]¶
Map getEffectiveConfig()¶
getEffectiveConfig
internal method builds effectiveConfig
that is conf
with the Spark properties file loaded (using spark-AbstractCommandBuilder.md#loadPropertiesFile[loadPropertiesFile] internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in <
NOTE: Command-line options (e.g. --driver-class-path
) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath
). You can therefore control the final settings by overriding Spark settings on command line using the command-line options. charset and trims white spaces around values.
==== [[isClientMode]] isClientMode
Internal Method
[source, java]¶
private boolean isClientMode(Map userProps)¶
isClientMode
checks master
first (from the command-line options) and then spark.master
Spark property. Same with deployMode
and spark.submit.deployMode
.
CAUTION: FIXME Review master
and deployMode
. How are they set?
isClientMode
responds positive when no explicit master and client
deploy mode set explicitly.
=== [[OptionParser]] OptionParser
OptionParser
is a custom spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser] that SparkSubmitCommandBuilder
uses to parse command-line arguments. It defines all the spark-submit-SparkSubmitOptionParser.md#callbacks[SparkSubmitOptionParser callbacks], i.e. <
==== [[OptionParser-handle]] OptionParser's handle
Callback
[source, scala]¶
boolean handle(String opt, String value)¶
OptionParser
comes with a custom handle
callback (from the spark-submit-SparkSubmitOptionParser.md#callbacks[SparkSubmitOptionParser callbacks]).
.handle
Method [options="header",width="100%"] |=== | Command-Line Option | Property / Behaviour | --master
| master
| --deploy-mode
| deployMode
| --properties-file
| propertiesFile
| --driver-memory
| Sets spark.driver.memory
(in conf
) | --driver-java-options
| Sets spark.driver.extraJavaOptions
(in conf
) | --driver-library-path
| Sets spark.driver.extraLibraryPath
(in conf
) | --driver-class-path
| Sets spark.driver.extraClassPath
(in conf
) | --conf
| Expects a key=value
pair that it puts in conf
| --class
| Sets mainClass
(in conf
).
It may also set allowsMixedArguments
and appResource
if the execution is for one of the special classes, i.e. spark-shell.md[spark-shell], SparkSQLCLIDriver
, or spark-sql-thrift-server.md[HiveThriftServer2]. | --kill
| --status
| Disables isAppResourceReq
and adds itself with the value to sparkArgs
. | --help
| --usage-error
| Disables isAppResourceReq
and adds itself to sparkArgs
. | --version
| Disables isAppResourceReq
and adds itself to sparkArgs
. | anything else | Adds an element to sparkArgs
|===
==== [[OptionParser-handleUnknown]] OptionParser's handleUnknown
Method
[source, scala]¶
boolean handleUnknown(String opt)¶
If allowsMixedArguments
is enabled, handleUnknown
simply adds the input opt
to appArgs
and allows for further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].
CAUTION: FIXME Where's allowsMixedArguments
enabled?
If isExample
is enabled, handleUnknown
sets mainClass
to be org.apache.spark.examples.[opt]
(unless the input opt
has already the package prefix) and stops further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].
CAUTION: FIXME Where's isExample
enabled?
Otherwise, handleUnknown
sets appResource
and stops further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].
==== [[OptionParser-handleExtraArgs]] OptionParser's handleExtraArgs
Method
[source, scala]¶
void handleExtraArgs(List extra)¶
handleExtraArgs
adds all the extra
arguments to appArgs
.