22 questions
0
votes
0
answers
45
views
Config params are not propagated when using Spark Connect
I am trying to get Spark Connect working on Amazon EMR (Spark v3.5.1). I started the Connect server on EMR primary node, making sure the JARs required for S3 auth are present in the Classpath:
/usr/...
0
votes
0
answers
503
views
RDD is not implemented error on pyspark.sql.connect.dataframe.Dataframe
I have a dataframe on databricks on which I would like to use the RDD api on. The type of the dataframe is pyspark.sql.connect.dataframe.Dataframe after reading from the catalog. I found out that this ...
0
votes
0
answers
48
views
PySpark SQL with Spark Connect 3.5.2
Trying to get PySpark to work with Spark Connect, not sure if this is supported.
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
print("Connected to Spark ...
0
votes
0
answers
206
views
Troubleshooting Apache Spark Connect Server with Docker Compose
Here's a Docker Compose setup for a distributed Apache Spark environment using Bitnami's Spark image. It includes:
spark-master: Runs the Spark master node on ports 8080 (web UI) and 7077.
spark-...
0
votes
0
answers
51
views
running snowflake query using spark connect on a standalone cluster
I have configured a spark standalone cluster as follows:
# start spark master
$SPARK_HOME/sbin/start-master.sh
# start 2 spark workers
SPARK_WORKER_INSTANCES=2 $SPARK_HOME/sbin/start-worker.sh spark:/...
0
votes
1
answer
85
views
Spark Connect questions
I am currently using Spark Connect inside a Docker container to send Python tasks through Airflow.
I frequently encounter unreleased memory, which forces me to reboot the Spark Connect every night.
My ...
1
vote
0
answers
307
views
Spark Connect set checkpoint directory
I have a setup to run jobs/code on Databricks clusters from a local IDE during development using spark connect. For the most part spark connect is essentially the same as an older spark session if ...
0
votes
0
answers
85
views
I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. It throws circular import error for importing Spark Session inside Celery
I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. For running the jobs in background I use celery combined with eventlet with concurrency of 5. This works fine for Spark Cluster ...
1
vote
0
answers
69
views
spark-connect with standalone spark cluster error
I'm trying to read stream from Kafka using pyspark.
The Stack I'm working with:
Kubernetes.
Stand alone spark cluster with 2 workers.
spark-connect connected to the cluster and has the dependencies ...
6
votes
0
answers
1k
views
sparksession in Databricks Runtime 14.3
I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the ...
0
votes
1
answer
66
views
Java Spark Bigtable connector to write dataset to Bigtable table
Error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor
at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(...
0
votes
1
answer
1k
views
How to use Spark Connect with pyspark on Python 3.12?
I'm trying to use Spark Connect to create a Spark session on a remote Spark cluster with pyspark in Python 3.12:
ingress_ep = "..."
access_token = "..."
conn_string = f"sc://{...
1
vote
0
answers
146
views
how to create a multi node spark connect server cluster?
I'm using docker compose this is my spark-connect service:
spark-connect:
hostname: spark-connect
container_name: spark-connect
image: bitnami/spark:latest
command: ["./sbin/...
0
votes
0
answers
214
views
How do I create spark session without spark connect?
I am running spark in Databricks, and I am not able to use reduce. I am getting the following error.
How can I create a spark session without using spark connect?
I tried setting config while ...
0
votes
0
answers
81
views
Spark error while using count() or pandasAPI len(df)
I have a problem. I'm using a Spark cluster via Spark Connect Server in Airflow. Everything is run through docker containers.
I have no problem preprocessing my data, or even showing the DataFrame in ...
1
vote
1
answer
388
views
SparkException on collect() in Spark Connect with PySpark
I'm developing an API that makes requests to a Spark cluster (Spark 3.5), but I'm encountering a SparkException error when trying to collect results from a DataFrame. I'm relatively new to Spark and I'...
0
votes
0
answers
799
views
Is any solution related to SparkConnectGrpcException?
I want to connect two VM machine in remote and executing my PySpark program using spark resources
VM1: Standalone Spark
VM2: Jupyter Notebook with Pyspark code
I have used "Spark Connect" ...
1
vote
1
answer
589
views
Spark connect client failing with java.lang.NoClassDefFoundError
java: 1.8,sbt: 1.9,scala: 2.12
I have a very simple repo with the following dependency in build.sbt
libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % &...
0
votes
1
answer
1k
views
Spark connect fails when using udf
I'm having problems using udf's from "spark connect"
spark-connect basically works fine (i'm using Jupyter Notebook) however it fails when using udf's.
It seems that it is trying to use ...
2
votes
1
answer
2k
views
Running Spark-Connect Server on kubernetes in cluster mode/high availability mode
I am trying to figure out how to effectively use the new Spark-Connect feature of Spark version >= 3.4.0. Specifically, I want so set up a kubernetes Spark cluster where various applications (...
2
votes
0
answers
165
views
The proper way to run Spark Connect in Anaconda - error '$HOME' is not recognized as an internal or external command, operable program or batch file
I try to learn this lesson https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html
Method 1: from anaconda - window
by download the JP notebook to my Downloads folder,...
2
votes
1
answer
1k
views
Using Spark Connect with Scala
I would like to use the new Spark Connect feature within a Scala program.
I started the Connect server and I am able to connect to it from Pyspark and also when submitting Python script, e.g., with ...