Spark Tips and Tricks
= Spark Tips and Tricks
== [[SPARK_PRINT_LAUNCH_COMMAND]] Print Launch Command of Spark Scripts
SPARK_PRINT_LAUNCH_COMMAND
environment variable controls whether the Spark launch command is printed out to the standard error output, i.e. System.err
, or not.
Spark Command: [here comes the command]
========================================
All the Spark shell scripts use org.apache.spark.launcher.Main
class internally that checks SPARK_PRINT_LAUNCH_COMMAND
and when set (to any value) will print out the entire command line to launch it.
$ SPARK_PRINT_LAUNCH_COMMAND=1 ./bin/spark-shell
Spark Command: /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java -cp /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar -Dscala.usejavacp=true -Xms1g -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://localhost:7077 --class org.apache.spark.repl.Main --name Spark shell spark-shell
========================================
== Show Spark version in Spark shell
In spark-shell, use sc.version
or org.apache.spark.SPARK_VERSION
to know the Spark version:
scala> sc.version
res0: String = 1.6.0-SNAPSHOT
scala> org.apache.spark.SPARK_VERSION
res1: String = 1.6.0-SNAPSHOT
== Resolving local host name
When you face networking issues when Spark can't resolve your local hostname or IP address, use the preferred SPARK_LOCAL_HOSTNAME
environment variable as the custom host name or SPARK_LOCAL_IP
as the custom IP that is going to be later resolved to a hostname.
Spark checks them out before using http://docs.oracle.com/javase/8/docs/api/java/net/InetAddress.html#getLocalHost--[java.net.InetAddress.getLocalHost()] (consult https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L759[org.apache.spark.util.Utils.findLocalInetAddress()] method).
You may see the following WARN messages in the logs when Spark finished the resolving process:
WARN Your hostname, [hostname] resolves to a loopback address: [host-address]; using...
WARN Set SPARK_LOCAL_IP if you need to bind to another address
== [[spark-standalone-windows]] Starting standalone Master and workers on Windows 7
Windows 7 users can use spark-class.md[spark-class] to start spark-standalone.md[Spark Standalone] as there are no launch scripts for the Windows platform.
$ ./bin/spark-class org.apache.spark.deploy.master.Master -h localhost
$ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077