0

I use Spark Structured Streaming (pyspark) to read data from Kafka topic. It works well but when I open executors stderr my whole log page is WARN from Kafka saying that kafkadataconsumer is not running in uninterruptiblethread. it may hang when kafkadataconsumer's methods are interrupted because of kafka-1894. How can I disable this warning or maybe fix consumer?

Spark: 3.1.1 with org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2

I tried several options but afaik kafka consumer doesn't know that it runs within some spark application so it is useless trying to set sparkContext.setLogLevel. The most recent try was with something like this:

logger = spark._jvm.org.apache.log4j
logger.LogManager.getLogger("org.apache.kafka").setLevel(logger.Level.ERROR)

But it doesn't work :(

P.S. Yeah, I know that it is just warning and warning is not an error, but I think one executor generates a nearly 2k rows per second of these warnings so you can't find a useful prints. You either scroll for a really long time or waiting for log file to be loaded. So its kinda frustrating

1 Answer 1

0

Solved this by adding two lines at the end of default log4j.properties file

log4j.logger.org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer=ERROR
log4j.additivity.org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer=false

Then just add this file to my application and set this spark config variable --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///some_path/log4j.properties"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.