Can I run py spark as a shell job in Oozie?

Question

I have python script which I 'm able to run through spark-submit. I need to use it in Oozie.

<!-- move files from local disk to hdfs -->
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>${queueName}</value>
    </property>
  </configuration>
  <exec>driver-script.sh</exec>
<!-- single -->
  <argument>s</argument>
<!-- py script -->
  <argument>load_local_2_hdfs.py</argument>
<!-- local file to be moved-->
  <argument>localPathFile</argument>
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
  <argument>hdfFolder</argument>
  <file>${workflowRoot}driver-script.sh#driver-script.sh</file>
  <file>${workflowRoot}load_local_2_hdfs.py#load_local_2_hdfs.py</file>
</shell>
<ok to="end"/>
<error to="killAction"/> 
</action>

The script by itself through driver-script.sh runs fine. Through oozie, even the status of workflow is SUCCEEDED, the file is not copied to hdfs. I was not able to find any error logs, or related logs to pyspark job.

I have another topic about supressed logs from Spark by oozie here

Rob · Accepted Answer · 2017-08-05 11:06:27Z

0

Set your script to set -x in the beginning that will show you which line the script is it. You can see those in the stderr.

Can you elaborate on what you mean by file is not copied ? To help you better.

answered Aug 5, 2017 at 11:06

Rob

1624 silver badges13 bronze badges

Hello, I found the logs under the yarn. File was not copied from local to hdfs. It's the job of the script :)
– la_femme_it
Commented Aug 7, 2017 at 7:16

Add a comment |

Collectives™ on Stack Overflow

Can I run py spark as a shell job in Oozie?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
hadoop
apache-spark
pyspark
hdfs
oozie
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged hadoopapache-sparkpysparkhdfsoozie or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
hadoop
apache-spark
pyspark
hdfs
oozie
or ask your own question.