0

We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt file.

After running continuosly for 2 -3 days, the job is getting crashed with Too many open files error. Upon investigating we found that the spark is creating too many threads at driver node and each thread is opening a file descriptor to flush the logs into txt file, which is causing to breach fd threshold set by the operating system. At one point, we could see ~ 1500 threads are spawned.

I was wondering is there any way/configuration that we can set to limit the thread count that the spark framework is creating on the driver node?

We are using spark version 2.3.1

2
  • Streaming or structured streaming?
    – Ged
    Commented Sep 18, 2021 at 16:59
  • 1
    It is streaming Commented Sep 18, 2021 at 16:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.