I have really simple workflow.

<workflow-app name="testSparkjob" xmlns="uri:oozie:workflow:0.5">
<start to="testJob"/>

  <action name="testJob">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <name>Spark Example</name>
        <spark-opts>--executor-memory 1G --num-executors 3 
--executor-cores     1 </spark-opts>
    <ok to="end"/>
    <error to="killAction"/>
 <kill name="killAction">
    <message>"Killed job due to error"</message>
<end name="end"/>

Spark script does pretty much nothing:

if len(sys.argv) < 2:
  print('You must pass 2 parameters ')
  #just for testing, later will be discarded, sys.exit(1) will be used.")
  ext = 'testArgA'
  int = 'testArgB'
  print('arguments accepted')
  ext = sys.argv[1]
  int = sys.argv[2]

The script is located on hdfs in the same folder as workflow.xml.

When I runt the workflow I got following error

Launcher ERROR, reason: Main class 
[org.apache.oozie.action.hadoop.SparkMain], exit code [2]

I tought it is permission issue, so I set the hdfs folder -chmod 777 and my local folder also to chmod 777 I am using spark 1.6. When I run the script through spark-submit, everything is fine (even much more comlicated scripts which read/write to hdfs or to hive).

EDIT: I tried this

<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
<!-- single -->
<!-- py script -->
<!-- local file to be moved-->
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
<ok to="end"/>
<error to="killAction"/>

The workkflow SUCCEEDED, but the file is not copied to the hdfs. No errors. The script does work by itself tho. More here.

3 Answers 3


Unfortunately Oozie Spark action supports only Java artifacts, so you have to specify the main class (that error message hardly trying to explain). So you have two options:

  1. rewrite your code to Java/Scala
  2. use custom action or script like this (I did not test it)
  • Running the shell-script with python scrip as an argument was my initial idea, but it was no go. I hope, it will be now :) Thx a lot for confiming my thoughts. Commented Jul 25, 2017 at 16:23
  • Anyway, it is weird, becasue in documentatiton I found : The jar element indicates a comma separated list of jars or python files. Commented Jul 25, 2017 at 16:46
  • 1: Try with getOpt

In your property


Code .py

    parametros, args = getopt.getopt(sys.argv[1:], "f:i:", ["fuente=", "id="])
    if len(parametros) < 2:
        print("Argumentos incompletos")
except getopt.GetoptError:
    print("Error en los argumentos")

for opt, arg in parametros:
    if opt in ("-f", "--fuente"):
        nom_fuente = str(arg).strip()
    elif opt in ("-i", "--id"):
        id_proceso = str(arg).strip()
        print("Parametro '" + opt + "' no reconocido")

In your workflow

                    --queue ${queueName}
                    --num-executors 40
                    --executor-cores 2
                    --executor-memory 8g

                <arg>-f ${fuente}</arg>
                <arg>-i ${wf:id()}</arg>

And output 'vuala'

Contexto:<pyspark.context.SparkContext object at 0x7efd80424090>

You can use spark-action in order to run a python script, but you must pass as argument the path to the Python API for Spark. Also the 1st line of your file must be as such:

#!/usr/bin/env python.

