0

We are running a Oozie workflow which has Shell action and a Spark action which means a shell script and a Spark job which runs in sequence.

Running single workflow:

  • Total: 3 mins
  • Shell action: 50 secs
  • Spark job: 2 mins
  • The rest of the time is gone in initializing from oozie and allocating containers from yarn which is absolutely fine.

Usecase: We are suppose to run 700 instances of the same workflow at once( by region, zone and area, which is a business case).


When running the 700 instances of the same workflow we are noticing delay in completion of 700 workflows although we have scaled the cluster linearly. We are expecting 700 workflows to complete in 3 mins or atleast by 5mins but this is not the case. There is a delay of 5mins to launch all the 700 workflows which is fine too by that it should complete by 10mins but it is not the case.

What exactly is happening is that when 700 workflows are submitted it is taking arond 5-6 mins to launch all the workflows from ooize (we are ok with this). The overall time taken to complete 700 workflows is around 30 mins which means some workflows which kickstarted at 7:00 would complete at 7:30. But the time taken by actions remains same which means shell action still take 50s and spark job is taking 2-3mins to complete. Noticing delay in starting the shell action and spark job although oozie has taken the workflow into the prep state.

What we checked so far:

  • Initially we thought it is to do with Oozie and worked on the configurations.
  • Later we thought Yarn and tuned some configurations.
  • Also, did create queue to run shell and launcher jobs in one queue and spark jobs in another queue.
  • We have gone through yarn and oozie logs too.

Can someone throw somelight around this?

3
  • 1
    You need to do memory management in you spark applications. Do you know the following, if yes please answer: (a) cluster size and configuration (b) how much memory is allocated to spark (c) no of cores, executors used along with driver and executor memory Commented Sep 4, 2021 at 17:56
  • If you are running spark on client mode, spark driver will consume memory on your master node. Which will restrict Oozie to create more no of launcher. Try running spark job on cluster mode. Read more on spark client vs cluster mode
    – SnigJi
    Commented Sep 5, 2021 at 20:56
  • Thanks, for the response. we are using auto scaling cluster with core nodes and task nodes. We also tried with fixed nodes. The cluster scaling is not showing any anomalies. Almost 100gig of mem is allocated to Spark in a given node. I'm submitting the application with 4 cores, executor mem of 12g, no of executors 4. I'm submitting Spark application in 'cluster' mode.
    – ravi
    Commented Sep 5, 2021 at 21:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Browse other questions tagged or ask your own question.