Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
45 views

Config params are not propagated when using Spark Connect

I am trying to get Spark Connect working on Amazon EMR (Spark v3.5.1). I started the Connect server on EMR primary node, making sure the JARs required for S3 auth are present in the Classpath: /usr/...
Ninad's user avatar
  • 71
0 votes
0 answers
503 views

RDD is not implemented error on pyspark.sql.connect.dataframe.Dataframe

I have a dataframe on databricks on which I would like to use the RDD api on. The type of the dataframe is pyspark.sql.connect.dataframe.Dataframe after reading from the catalog. I found out that this ...
imawful's user avatar
  • 111
0 votes
0 answers
48 views

PySpark SQL with Spark Connect 3.5.2

Trying to get PySpark to work with Spark Connect, not sure if this is supported. spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate() print("Connected to Spark ...
Parag Mehta's user avatar
0 votes
0 answers
206 views

Troubleshooting Apache Spark Connect Server with Docker Compose

Here's a Docker Compose setup for a distributed Apache Spark environment using Bitnami's Spark image. It includes: spark-master: Runs the Spark master node on ports 8080 (web UI) and 7077. spark-...
Parbat's user avatar
  • 9
0 votes
0 answers
51 views

running snowflake query using spark connect on a standalone cluster

I have configured a spark standalone cluster as follows: # start spark master $SPARK_HOME/sbin/start-master.sh # start 2 spark workers SPARK_WORKER_INSTANCES=2 $SPARK_HOME/sbin/start-worker.sh spark:/...
Pbd's user avatar
  • 1,299
0 votes
1 answer
85 views

Spark Connect questions

I am currently using Spark Connect inside a Docker container to send Python tasks through Airflow. I frequently encounter unreleased memory, which forces me to reboot the Spark Connect every night. My ...
gtnchtb's user avatar
1 vote
0 answers
307 views

Spark Connect set checkpoint directory

I have a setup to run jobs/code on Databricks clusters from a local IDE during development using spark connect. For the most part spark connect is essentially the same as an older spark session if ...
Tarique's user avatar
  • 711
0 votes
0 answers
85 views

I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. It throws circular import error for importing Spark Session inside Celery

I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. For running the jobs in background I use celery combined with eventlet with concurrency of 5. This works fine for Spark Cluster ...
DevD's user avatar
  • 35
1 vote
0 answers
69 views

spark-connect with standalone spark cluster error

I'm trying to read stream from Kafka using pyspark. The Stack I'm working with: Kubernetes. Stand alone spark cluster with 2 workers. spark-connect connected to the cluster and has the dependencies ...
waseemoo1's user avatar
6 votes
0 answers
1k views

sparksession in Databricks Runtime 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the ...
Tarique's user avatar
  • 711
0 votes
1 answer
66 views

Java Spark Bigtable connector to write dataset to Bigtable table

Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(...
Sagar Sitap's user avatar
0 votes
1 answer
1k views

How to use Spark Connect with pyspark on Python 3.12?

I'm trying to use Spark Connect to create a Spark session on a remote Spark cluster with pyspark in Python 3.12: ingress_ep = "..." access_token = "..." conn_string = f"sc://{...
Kai Roesner's user avatar
1 vote
0 answers
146 views

how to create a multi node spark connect server cluster?

I'm using docker compose this is my spark-connect service: spark-connect: hostname: spark-connect container_name: spark-connect image: bitnami/spark:latest command: ["./sbin/...
Ayham Zinedine's user avatar
0 votes
0 answers
214 views

How do I create spark session without spark connect?

I am running spark in Databricks, and I am not able to use reduce. I am getting the following error. How can I create a spark session without using spark connect? I tried setting config while ...
Martin Storm's user avatar
0 votes
0 answers
81 views

Spark error while using count() or pandasAPI len(df)

I have a problem. I'm using a Spark cluster via Spark Connect Server in Airflow. Everything is run through docker containers. I have no problem preprocessing my data, or even showing the DataFrame in ...
gtnchtb's user avatar
1 vote
1 answer
388 views

SparkException on collect() in Spark Connect with PySpark

I'm developing an API that makes requests to a Spark cluster (Spark 3.5), but I'm encountering a SparkException error when trying to collect results from a DataFrame. I'm relatively new to Spark and I'...
Ayeris's user avatar
  • 34
0 votes
0 answers
799 views

Is any solution related to SparkConnectGrpcException?

I want to connect two VM machine in remote and executing my PySpark program using spark resources VM1: Standalone Spark VM2: Jupyter Notebook with Pyspark code I have used "Spark Connect" ...
saravanan kumar's user avatar
1 vote
1 answer
589 views

Spark connect client failing with java.lang.NoClassDefFoundError

java: 1.8,sbt: 1.9,scala: 2.12 I have a very simple repo with the following dependency in build.sbt libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % &...
ffff's user avatar
  • 3,030
0 votes
1 answer
1k views

Spark connect fails when using udf

I'm having problems using udf's from "spark connect" spark-connect basically works fine (i'm using Jupyter Notebook) however it fails when using udf's. It seems that it is trying to use ...
rick123's user avatar
2 votes
1 answer
2k views

Running Spark-Connect Server on kubernetes in cluster mode/high availability mode

I am trying to figure out how to effectively use the new Spark-Connect feature of Spark version >= 3.4.0. Specifically, I want so set up a kubernetes Spark cluster where various applications (...
scienceseba's user avatar
2 votes
0 answers
165 views

The proper way to run Spark Connect in Anaconda - error '$HOME' is not recognized as an internal or external command, operable program or batch file

I try to learn this lesson https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html Method 1: from anaconda - window by download the JP notebook to my Downloads folder,...
Tom Tom's user avatar
  • 352
2 votes
1 answer
1k views

Using Spark Connect with Scala

I would like to use the new Spark Connect feature within a Scala program. I started the Connect server and I am able to connect to it from Pyspark and also when submitting Python script, e.g., with ...
hage's user avatar
  • 6,133