We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt
file.
After running continuosly for 2 -3 days, the job is getting crashed with Too many open files
error. Upon investigating we found that the spark is creating too many threads at driver node and each thread is opening a file descriptor to flush the logs into txt
file, which is causing to breach fd
threshold set by the operating system. At one point, we could see ~ 1500 threads are spawned.
I was wondering is there any way/configuration that we can set to limit the thread count that the spark framework is creating on the driver node?
We are using spark version 2.3.1