Newest 'spark-connect' Questions

0 votes

0 answers

45 views

Config params are not propagated when using Spark Connect

I am trying to get Spark Connect working on Amazon EMR (Spark v3.5.1). I started the Connect server on EMR primary node, making sure the JARs required for S3 auth are present in the Classpath: /usr/...

Ninad

71

asked Nov 29 at 12:02

0 votes

0 answers

503 views

RDD is not implemented error on pyspark.sql.connect.dataframe.Dataframe

I have a dataframe on databricks on which I would like to use the RDD api on. The type of the dataframe is pyspark.sql.connect.dataframe.Dataframe after reading from the catalog. I found out that this ...

imawful

111

asked Sep 25 at 8:16

0 votes

0 answers

48 views

PySpark SQL with Spark Connect 3.5.2

Trying to get PySpark to work with Spark Connect, not sure if this is supported. spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate() print("Connected to Spark ...

Parag Mehta

1

asked Sep 8 at 17:45

0 votes

0 answers

206 views

Troubleshooting Apache Spark Connect Server with Docker Compose

Here's a Docker Compose setup for a distributed Apache Spark environment using Bitnami's Spark image. It includes: spark-master: Runs the Spark master node on ports 8080 (web UI) and 7077. spark-...

Parbat

9

asked Jul 11 at 11:08

0 votes

0 answers

51 views

running snowflake query using spark connect on a standalone cluster

I have configured a spark standalone cluster as follows: # start spark master $SPARK_HOME/sbin/start-master.sh # start 2 spark workers SPARK_WORKER_INSTANCES=2 $SPARK_HOME/sbin/start-worker.sh spark:/...

Pbd

1,299

asked Jul 7 at 17:31

0 votes

1 answer

85 views

Spark Connect questions

I am currently using Spark Connect inside a Docker container to send Python tasks through Airflow. I frequently encounter unreleased memory, which forces me to reboot the Spark Connect every night. My ...

gtnchtb

1

asked Jun 26 at 11:15

1 vote

0 answers

307 views

Spark Connect set checkpoint directory

I have a setup to run jobs/code on Databricks clusters from a local IDE during development using spark connect. For the most part spark connect is essentially the same as an older spark session if ...

Tarique

711

asked Jun 14 at 10:06

0 votes

0 answers

85 views

I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. It throws circular import error for importing Spark Session inside Celery

I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. For running the jobs in background I use celery combined with eventlet with concurrency of 5. This works fine for Spark Cluster ...

DevD

35

asked May 4 at 19:54

1 vote

0 answers

69 views

spark-connect with standalone spark cluster error

I'm trying to read stream from Kafka using pyspark. The Stack I'm working with: Kubernetes. Stand alone spark cluster with 2 workers. spark-connect connected to the cluster and has the dependencies ...

waseemoo1

11

asked Apr 22 at 13:25

6 votes

0 answers

1k views

sparksession in Databricks Runtime 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the ...

Tarique

711

asked Apr 11 at 17:01

0 votes

1 answer

66 views

Java Spark Bigtable connector to write dataset to Bigtable table

Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(...

Sagar Sitap

1

asked Apr 4 at 22:42

0 votes

1 answer

1k views

How to use Spark Connect with pyspark on Python 3.12?

I'm trying to use Spark Connect to create a Spark session on a remote Spark cluster with pyspark in Python 3.12: ingress_ep = "..." access_token = "..." conn_string = f"sc://{...

Kai Roesner

505

asked Mar 22 at 15:31

1 vote

0 answers

146 views

how to create a multi node spark connect server cluster?

I'm using docker compose this is my spark-connect service: spark-connect: hostname: spark-connect container_name: spark-connect image: bitnami/spark:latest command: ["./sbin/...

Ayham Zinedine

11

asked Mar 20 at 11:20

0 votes

0 answers

214 views

How do I create spark session without spark connect?

I am running spark in Databricks, and I am not able to use reduce. I am getting the following error. How can I create a spark session without using spark connect? I tried setting config while ...

Martin Storm

1

asked Mar 15 at 12:59

0 votes

0 answers

81 views

Spark error while using count() or pandasAPI len(df)

I have a problem. I'm using a Spark cluster via Spark Connect Server in Airflow. Everything is run through docker containers. I have no problem preprocessing my data, or even showing the DataFrame in ...

gtnchtb

1

asked Feb 1 at 14:59

1 vote

1 answer

388 views

SparkException on collect() in Spark Connect with PySpark

I'm developing an API that makes requests to a Spark cluster (Spark 3.5), but I'm encountering a SparkException error when trying to collect results from a DataFrame. I'm relatively new to Spark and I'...

Ayeris

34

asked Jan 10 at 16:31

0 votes

0 answers

799 views

Is any solution related to SparkConnectGrpcException?

I want to connect two VM machine in remote and executing my PySpark program using spark resources VM1: Standalone Spark VM2: Jupyter Notebook with Pyspark code I have used "Spark Connect" ...

saravanan kumar

1

asked Oct 18, 2023 at 9:36

1 vote

1 answer

589 views

Spark connect client failing with java.lang.NoClassDefFoundError

java: 1.8,sbt: 1.9,scala: 2.12 I have a very simple repo with the following dependency in build.sbt libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % &...

ffff

3,030

asked Sep 21, 2023 at 15:55

0 votes

1 answer

1k views

Spark connect fails when using udf

I'm having problems using udf's from "spark connect" spark-connect basically works fine (i'm using Jupyter Notebook) however it fails when using udf's. It seems that it is trying to use ...

rick123

1

asked Sep 5, 2023 at 7:41

2 votes

1 answer

2k views

Running Spark-Connect Server on kubernetes in cluster mode/high availability mode

I am trying to figure out how to effectively use the new Spark-Connect feature of Spark version >= 3.4.0. Specifically, I want so set up a kubernetes Spark cluster where various applications (...

scienceseba

129

asked Aug 7, 2023 at 20:14

2 votes

0 answers

165 views

The proper way to run Spark Connect in Anaconda - error '$HOME' is not recognized as an internal or external command, operable program or batch file

I try to learn this lesson https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html Method 1: from anaconda - window by download the JP notebook to my Downloads folder,...

Tom Tom

352

asked Jul 2, 2023 at 18:25

2 votes

1 answer

1k views

Using Spark Connect with Scala

I would like to use the new Spark Connect feature within a Scala program. I started the Connect server and I am able to connect to it from Pyspark and also when submitting Python script, e.g., with ...

hage

6,133

asked May 9, 2023 at 11:49

Collectives™ on Stack Overflow

Config params are not propagated when using Spark Connect

RDD is not implemented error on pyspark.sql.connect.dataframe.Dataframe

PySpark SQL with Spark Connect 3.5.2

Troubleshooting Apache Spark Connect Server with Docker Compose

running snowflake query using spark connect on a standalone cluster

Spark Connect questions

Spark Connect set checkpoint directory

I'm trying to integrate Spark Connect(spark-connect) into my spark jobs. It throws circular import error for importing Spark Session inside Celery

spark-connect with standalone spark cluster error

sparksession in Databricks Runtime 14.3

Java Spark Bigtable connector to write dataset to Bigtable table

How to use Spark Connect with pyspark on Python 3.12?

how to create a multi node spark connect server cluster?

How do I create spark session without spark connect?

Spark error while using count() or pandasAPI len(df)

SparkException on collect() in Spark Connect with PySpark

Is any solution related to SparkConnectGrpcException?

Spark connect client failing with java.lang.NoClassDefFoundError

Spark connect fails when using udf

Running Spark-Connect Server on kubernetes in cluster mode/high availability mode

The proper way to run Spark Connect in Anaconda - error '$HOME' is not recognized as an internal or external command, operable program or batch file

Using Spark Connect with Scala

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags