SparkSubmitArguments¶
SparkSubmitArguments
is a custom SparkSubmitArgumentsParser
to handle the command-line arguments of spark-submit script that the actions use for execution (possibly with the explicit env
environment).
SparkSubmitArguments
is created when launching spark-submit script with only args
passed in and later used for printing the arguments in verbose mode.
Command-Line Options¶
--files¶
- Configuration Property: spark.files
- Configuration Property (Spark on YARN):
spark.yarn.dist.files
Printed out to standard output for --verbose
option
When SparkSubmit
is requested to prepareSubmitEnvironment, the files are:
Creating Instance¶
SparkSubmitArguments
takes the following to be created:
- Arguments (
Seq[String]
) - Environment Variables (default:
sys.env
)
SparkSubmitArguments
is created when:
SparkSubmit
is requested to parseArguments and launched as a command-line application
Loading Spark Properties¶
loadEnvironmentArguments(): Unit
loadEnvironmentArguments
loads the Spark properties for the current execution of spark-submit.
loadEnvironmentArguments
reads command-line options first followed by Spark properties and System's environment variables.
Note
Spark config properties start with spark.
prefix and can be set using --conf [key=value]
command-line option.
Handling Options¶
handle(
opt: String,
value: String): Boolean
handle
parses the input opt
argument and returns true
or throws an IllegalArgumentException
when it finds an unknown opt
.
handle
sets the internal properties in the table Command-Line Options, Spark Properties and Environment Variables.
mergeDefaultSparkProperties¶
mergeDefaultSparkProperties(): Unit
mergeDefaultSparkProperties
merges Spark properties from the default Spark properties file, i.e. spark-defaults.conf
with those specified through --conf
command-line option.
isPython Flag¶
isPython: Boolean = false
isPython
indicates whether the application resource is a PySpark application (a Python script or pyspark
shell).