Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
66 views

How does Application Master, Spark Driver and Spark Context work together?

After reading over this topic through multiple articles and answers, some ideas are still hazy. My understanding of spark execution process is like this. When a spark job is submitted an application ...
theGreedyLearner's user avatar
0 votes
0 answers
26 views

Efficiency metric in Apache Spark Monitoring API

I got the following from an API call for monitoring Apache Spark: "diagnostic_details": { "data": { "stages":[...], "executors":{"...
Keyser's user avatar
  • 11
0 votes
1 answer
117 views

How to calculate Spark driver and executor memory in local machine?

I am a beginner with spark, generally in some executions with a java.lang.OutOfMemoryError: Java heap space is raised: java.lang.OutOfMemoryError: Java heap space at java.base/java.nio.HeapByteBuffer.&...
Tadeo's user avatar
  • 13
-1 votes
1 answer
687 views

How to get spark.executor.memory size?

I am using Spark 2.3.2.3.1.0.0-78. I tried to use spark_session.sparkContext._conf.get('spark.executor.memory') but I only received None. How can I get spark.executor.memory's value?
Sonnh's user avatar
  • 101
0 votes
1 answer
2k views

How to get number of executors from SparkSession in pyspark?

spark = SparkSession.builder.getOrCreate() spark.sparkContext.getConf().get('spark.executor.instances') # Result: None spark.conf.get('spark.executor.instances') # Result: java.util....
alryosha's user avatar
  • 743
0 votes
1 answer
207 views

Why executor input for spark application running on amazon emr is showing more than the actual file size its processing?

I am running an amazon emr cluster with 20 spark applications having cluster configurations as 1 master node and 2 worker node as c5.24xlarge instance. Giving 3 executors and one driver to each ...
sriparth's user avatar
0 votes
1 answer
778 views

Spark SparkFiles.get() returns driver path instead of worker path

I am piping the partitions of an RDD through an external executable. I use sparkContext.addFiles() so that the executable will be available to the workers. I when I attempt to run the code I am ...
mikelus's user avatar
  • 1,039
2 votes
0 answers
257 views

Spark: I see more executors than available cluster's cores

I'm working with Spark and Yarn on an Azure HDInsight cluster, and I have some troubles on understanding the relations between the workers' resources, executors and containers. My cluster has 10 ...
andream's user avatar
  • 43
0 votes
1 answer
106 views

SparkUI Executor tab active is 1

Why do i only see the driver in the executor tab of sparkUI and not the executors as well ?
Suraj Tripathi's user avatar
0 votes
1 answer
181 views

When would PySpark executor libraries be different than the driver?

I was following this guide (apologies for Medium post) and it showed how you could separately package up your python env and libraries for your Spark executors and your driver. When would it apply ...
Brendan's user avatar
  • 2,075
1 vote
0 answers
406 views

Does Spark clear memory before going to the next loop iteration?

I am parsing files via partitions using for loops in pyspark. I have 7 partitions and each partition is about 300GB in size which is the reason I am using for loop. But when it comes to about the 4 or ...
thentangler's user avatar
  • 1,246
14 votes
2 answers
36k views

Spark Memory Overhead

Spark memory overhead related question asked multiple times in SO, I went through most of them. However, after going through multiple blogs, I got confused. Below are the questions I have whether ...
data_addict's user avatar
1 vote
0 answers
264 views

YARN RM not releasing the resources

I'm running spark with yarn as Resource Manager(RM). I'm submitting the application with max attempts 2 i.e. spark.yarn.maxAppAttempts=2.One of the application is processing around 3 TB of data, ...
data_addict's user avatar
1 vote
0 answers
580 views

After I increased spark.executor.memory for pyspark, it crashed at where it used to pass. How can I get through this?

To cut it short, I used to run a piece of pyspark code in pyspark shell with default settings(driver 1g, executor 1g). The code crashed some where because of some unknown memory leak after several ...
Torlek's user avatar
  • 11
0 votes
2 answers
133 views

Non-Uniform distribution of task and data on Pyspark executors

I am running an application on pyspark. For this application below is the snapshot of the distribution of executors. It looks like non-uniformly distributed. Can someone have look and tell where is ...
Rakesh Kumar's user avatar
  • 4,420