Running Spark Jobs via Oozie

Question

Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?

In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.

Thanks.

Zia Kiyani · Accepted Answer · 2015-03-30 15:17:44Z

3

Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml, workflow.xml and a lib directory containing your Jar files.
But remember Oozie starts the job with java -cp command, not with spark-submit, so if you have to run it with Oozie, Here is a trick.
Run your jar with spark-submit in background. Look for that process in process list. It will be running under java -cp command but with some additional Jars, that are added by spark-submit. Add those Jars in CLASS_PATH. and that's it. Now you can run your Spark applications through Oozie.

1.  nohup spark-submit --class package.to.MainClass /path/to/App.jar &
2.  ps aux | grep '/path/to/App.jar'

EDITED: You can also use latest Oozie, which has Spark Action also.

edited Mar 30, 2015 at 15:17

answered Mar 27, 2015 at 1:33

Zia Kiyani

8225 silver badges21 bronze badges

Could you please give an example how you triggered this from the Oozie workflow please. What I tried was using the <exec> command in Oozie and then in the shell script itself, I ran nohup spark-submit --class ... /path/to/app.jar & but that didn't seem to work. It seems Oozie did nothing but just quit so no spark job was submitted. What I am trying to do is to let Oozie submit the spark job and then quit (mark the job as success & completed) because it consume quite a lot of resources otherwise (2 cores & 2G of ram as a minimum, I can't find away to make it go lower). Thanks a lot!
– RHE
Commented Aug 15, 2016 at 16:41
1

I didn't get you, what you are actually trying to do, can you please elaborate this??
– Zia Kiyani
Commented Aug 15, 2016 at 17:09
Hi Zia, thanks for the reply. When you run a spark job using Oozie, let's say the Spark job takes 20 minutes to finish, usually the Oozie job will finish after the Spark job finishes, in other words after 20 minutes. What I would like to do is to finish the Oozie process early (i.e. by running the Spark job in the background using nohup or disown) immediately after spark-submit is run. You probably don't want to do this for a normal Spark job, but for Spark streaming it kind of makes sense because Spark streaming jobs runs 24/7 non-stop. Maybe I shouldn't use Oozie for Spark Streaming...
– RHE
Commented Aug 15, 2016 at 22:10
Ohh I got it now. Yes for spark streaming why you want to use Oozie? As spark streaming run continuously, Oozie is used where we have to schedule job after intervals. Anyway if you still want this then the best option is run a command using your code. But for that, you have to use <exec> command in a daemon thread. So that your command can run after the program terminates.
– Zia Kiyani
Commented Aug 16, 2016 at 9:05
1

Good, Yes, this is a perfect way, schedule batch jobs through Oozie and run streaming jobs with spark-submit. I also use the same technique.
– Zia Kiyani
Commented Aug 18, 2016 at 9:24

| Show 3 more comments

Arvind Kumar · Accepted Answer · 2016-07-15 06:05:34Z

0

To run Spark SQL by Oozie you need to use Oozie Spark Action. You can locate oozie.gz on your distribution. Usually in cloudera you can find this oozie examples directory at below path. ]$ locate oozie.gz /usr/share/doc/oozie-4.1.0+cdh5.7.0+267/oozie-examples.tar.gz

Spark SQL need hive-site.xml file for execution which you need to provide in workflow.xml

< spark-opts>--file /hive-site.xml < /spark-opts>

answered Jul 15, 2016 at 6:05

Arvind Kumar

1,3351 gold badge19 silver badges27 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Running Spark Jobs via Oozie

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
hadoop
apache-spark
oozie
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged hadoopapache-sparkoozie or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
hadoop
apache-spark
oozie
or ask your own question.