Spark Connect Explained

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Spark connect explained

What is Spark Connect?


What is Spark Connect? ScholarNest

What is a Spark Connect?


• A thin embedded connector API to implement client-server for Apache Spark
• Allows remote connectivity to Spark clusters
• Enables Spark from everywhere
• Similar to JDBC allows communication between clients and remote database

How it works?

www.scholarnest.in
How Spark Connect works? ScholarNest

How it works?
Client Server/Cluster

gRPC/Arrow
Spark
Application Connect Executors
Driver/Server
gRPC/protobuf

Spark Code Creates Spark Execute


Parse, Analyse,
Session and
optimise,
sends logical
execution plan
plan

Why do we need it?

www.scholarnest.in
Why do we need Spark Connect? ScholarNest

Why Spark Connect?


• Allows to design Spark Connect Client in any language
• Currently supported languages are Python, Java, Scala, Go, Rust
• Allows remote development and testing
• Allows remote debugging
• Allows easily upgrade cluster for performance improvements and security fixes

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

Prerequisite?
• Databricks workspace enabled with Unity Catalog
• Databricks cluster runtime 13.3 LTS or higher
• Databricks cluster running in shared access mode

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

1. Install Databricks CLI


Execute the following commands on your Windows machine

Example

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

2. Verify Databricks CLI


Restart the command line window and run the below command

Example

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

3. Setup authentication profile


Use the Databricks CLI to initiate OAuth token management locally by running the following command

Example

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

4. Login to your workspace


The previous command should start a browser window to login to your Databricks workspace

Example

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

5. Create a new Spark project in your PyCharm IDE

Example

Python 3.10 for


Databricks runtime
14.x

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

6. Install the Databricks connect package

Example

Match version with your


Install databricks-connect cluster runtime version

How to use it?

www.scholarnest.in
How to use Spark connect with Databricks? ScholarNest

7. Create Spark session and start running your application code


Your authentication
Example profile name

Your results

www.scholarnest.in
www.scholarnest.in

You might also like