24

I am trying to debug a Spark Application on a cluster using a master and several worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager. I downloaded the spark folder with binaries and use the following commands to setup worker and master nodes. These commands are executed from the spark directory.

command for launching master

./sbin/start-master.sh

command for launching worker node

./bin/spark-class org.apache.spark.deploy.worker.Worker master-URL

command for submitting application

./sbin/spark-submit --class Application --master URL ~/app.jar

Now, I would like to understand the flow of control through the Spark source code on the worker nodes when I submit my application(I just want to use one of the given examples that use reduce()). I am assuming I should setup Spark on Eclipse. The Eclipse setup link on the Apache Spark website seems to be broken. I would appreciate some guidance on setting up Spark and Eclipse to enable stepping through Spark source code on the worker nodes.

Thanks!

0

4 Answers 4

37

It's important to distinguish between debugging the driver program and debugging one of the executors. They require different options passed to spark-submit

For debugging the driver you can add the following to your spark-submit command. Then set your remote debugger to connect to the node you launched your driver program on.

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

In this example port 5005 was specified, but you may need to customize that if something is already running on that port.

Connecting to an executor is similar, add the following options to your spark-submit command.

--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.yourcomputer.org:5005,suspend=n"

Replace the address with your local computer's address. (It's a good idea to test that you can access it from your spark cluster).

In this case, start your debugger in listening mode, then start your spark program and wait for the executor to attach to your debugger. It's important to set the number of executors to 1 or multiple executors will all try to connect to your debugger, likely causing problems.

These examples are for running with sparkMaster set as yarn-client although they may also work when running under mesos. If you're running using yarn-cluster mode you may have to set the driver to attach to your debugger rather than attaching your debugger to the driver, since you won't necessarily know in advance what node the driver will be executing on.

0
9

You could run the Spark application in local mode if you just need to debug the logic of your transformations. This can be run in your IDE and you'll be able to debug like any other application:

val conf = new SparkConf().setMaster("local").setAppName("myApp")

You're of course not distributing the problem with this setup. Distributing the problem is as easy as changing the master to point to your cluster.

0
2

When you run a spark application on yarn, there is an option like this:

YARN_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5455 $YARN_OPTS"

You can add it to yarn-env.sh and remote debugging will be available via port 5455.

If you use spark in standalone mode, I believe this can help:

export SPARK_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
1

I had followed the same steps to setup a spark standalone cluster. I was able to debug the driver, master , worked and executor JVM's.

The master and the worker node is configured on a server class machine. The machine has 12 CPU cores. Source code for Spark -2.2.0 has been cloned from the Spark Git Repo.

STEPS:

1] Command to launch the Master JVM:

root@ubuntu:~/spark-2.2.0-bin-hadoop2.7/bin#
    ./spark-class  -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8787     org.apache.spark.deploy.master.Master

The shell script spark-class is used to launch the master manually. The first args are JVM args launching the master in debug mode. The JVM is suspended and waits for the IDE to make a remote connection.

Following are the screenshots showing the IDE configuration for Remote Debugging:

2] Command to launch the Worker JVM:

root@ubuntu:~/spark-2.2.0-bin-hadoop2.7/bin# ./spark-class -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8788 org.apache.spark.deploy.worker.Worker spark://10.71.220.34:7077

Same as master, the last argument specifies the address of the spark master. The debug port for worker is 8788.

As part of the launch the worker registers with the master.

Screenshot

3] A Basic java app with a main method is compiled and wrapped in an uber/fat jar. This has been explained in the text “learning spark”. Basically an Uber jar contains all transitive dependencies.

Created by running mvn package at the following directory:

root@ubuntu:/home/customer/Documents/Texts/Spark/learning-spark-master# mvn package

The above generates a jar under ./target folder

The screen shot below is the java application, which would be submitted to the Spark Cluster:

4] Command to submit the command to the standalone cluster

root@ubuntu:/home/customer/Documents/Texts/Spark/learning-spark-master# /home/customer/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --master spark://10.71.220.34:7077

--conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8790"

--conf "spark.executor.extraClassPath=/home/customer/Documents/Texts/Spark/
learning-spark-master/target/java-0.0.2.jar" 

--class com.oreilly.learningsparkexamples.java.BasicMapToDouble

--name "MapToDouble"

./target/java-0.0.2.jar

spark://10.71.220.34:7077 → Argument to the java program → 

com.oreilly.learningsparkexamples.java.BasicMapToDouble

· The above command is from the client node that runs the application with the main method in it. However the transformations are executed on the remote executor JVM.

·

  • The –conf parameters are important. They are used to configure the executor JVM's. The Executor JVMS are launched at runtime by the Worker JVM's.

    · The first conf parameter specifies that the Executor JVM should be launched in debug mode and suspended right away. It comes up on port 8790.

    · The second conf parameter specifies that the executor class path should contain the application specific jars that are submitted to the executor. On a distributed setup these jars need to be moved to the Executor JVM machine.

    · The last argument is used by the client app to connect to the spark master.

To understand, how the client application connects to the Spark cluster, we need to debug the client app and step through it. For that we need to configure it to run in the debug mode.

To debug the client, we need to edit the script spark-submit as follows:

Contents from the spark-submit

exec "${SPARK_HOME}"/bin/spark-class -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8789 org.apache.spark.deploy.SparkSubmit "$@"

5] After the client registers, The worker starts an executor at run-time on a different thread.

Screenshot below shows the class ExecutorRunner.scala

6] We now connect to the forked executor JVM using the IDE. Executor JVM would run the transformation functions in our submitted application.

 JavaDoubleRDD result = rdd.mapToDouble( → ***Transformation function/lambda***
      new DoubleFunction<Integer>() {
        public double call(Integer x) {
          double y = (double) x;
          return y * y;
        }
      });

7] The transformation function runs, only when the action “collect” will be invoked.

8] The screenshot below displays the Executor view, when the mapToDouble function is invoked in parallel on multiple elements of the list. The Executor JVM executes the function in 12 threads as there are 12 cores. As the number of Cores was not set on the command line, the worker JVM by default set the option: -cores=12.

9] Screen shot showing the client submitted code [maptodouble()] running in the remote forked Executor JVM.

10] After all the tasks have been executed, the Executor JVM exits. After the client app exits, the worker node gets unblocked and waits for the next submission.

References

https://spark.apache.org/docs/latest/configuration.html

I have created a Blog that describes the steps on how to debug these sub-systems. Hopefully, this helps others.

Blog that outlines the steps:

https://sandeepmspark.blogspot.com/2018/01/spark-standalone-cluster-internals.html

1
  • 1
    Welcome to SO! Although your answer may be 100% correct, it might also become 100% useless if that link is moved, changed, merged into another one or the main site just disappears. Therefore, please edit your answer, and copy the relevant steps from the link into your answer, thereby guaranteeing your answer for 100% of the lifetime of this site! ;-) You can always leave the link in at the bottom of your answer as a source for your material...
    – Murmel
    Commented Jan 20, 2018 at 18:30

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.