0

I have python script which I 'm able to run through spark-submit. I need to use it in Oozie.

<!-- move files from local disk to hdfs -->
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>${queueName}</value>
    </property>
  </configuration>
  <exec>driver-script.sh</exec>
<!-- single -->
  <argument>s</argument>
<!-- py script -->
  <argument>load_local_2_hdfs.py</argument>
<!-- local file to be moved-->
  <argument>localPathFile</argument>
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
  <argument>hdfFolder</argument>
  <file>${workflowRoot}driver-script.sh#driver-script.sh</file>
  <file>${workflowRoot}load_local_2_hdfs.py#load_local_2_hdfs.py</file>
</shell>
<ok to="end"/>
<error to="killAction"/> 
</action>

The script by itself through driver-script.sh runs fine. Through oozie, even the status of workflow is SUCCEEDED, the file is not copied to hdfs. I was not able to find any error logs, or related logs to pyspark job.

I have another topic about supressed logs from Spark by oozie here

1 Answer 1

0

Set your script to set -x in the beginning that will show you which line the script is it. You can see those in the stderr.

Can you elaborate on what you mean by file is not copied ? To help you better.

1
  • Hello, I found the logs under the yarn. File was not copied from local to hdfs. It's the job of the script :) Commented Aug 7, 2017 at 7:16

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.