Connect to Postgres via AWS Glue Python script

Question

Going through the AWS Glue docs I can't see any mention of how to connect to a Postgres RDS via a Glue job of "Python shell" type. I've set up a RDS connection in AWS Glue and verified I can connect to my RDS. Also, when creating the Python job I can see my connection and I've added it to the script.

How do I use the connection which I've added to the Glue job to run some raw SQL?

Thanks in advance,

Did you have any luck with it?
– gibbz00
Commented Jun 12, 2019 at 14:06 — gibbz00, Commented Jun 12, 2019 at 14:06

Harsh Bafna · Accepted Answer · 2019-05-05 13:57:08Z

9

There are 2 possible ways to access data from RDS in glue etl (spark):

1st Option:

Create a glue connection on top of RDS
Create a glue crawler on top of this glue connection created in first step
Run the crawler to populate the glue catalogue with database and table pointing to RDS tables.
Create a dynamic frame in glue etl using the newly created database and table in glue catalogue.

Code Sample :

from pyspark.context import SparkContext
from awsglue.context import GlueContext
glueContext = GlueContext(SparkContext.getOrCreate())
DyF = glueContext.create_dynamic_frame.from_catalog(database="{{database}}", table_name="{{table_name}}")

2nd Option

Create a dataframe using spark sql :

url = "jdbc:postgresql://<rds_host_name>/<database_name>"
properties = {
"user" : "<username>",
"password" : "<password>"
}
df = spark.read.jdbc(url=url, table="<schema.table>", properties=properties)

Note :

You will need to pass postgres jdbc jar for creating the database using spark sql.
I have tried first method on glue etl and second method on python shell (dev-endpoint)

answered May 5, 2019 at 13:57

Harsh Bafna

2,2241 gold badge15 silver badges24 bronze badges

Want to be able to execute Raw SQL queries. Such as CREATE .... In the above case that's not possible...from my understanding. :/
– mcm
Commented Jul 19, 2019 at 21:01
@Harsh "You will need to pass postgres jdbc jar for creating the database using spark sql." - how would I do this?
– t_warsop
Commented Oct 14, 2019 at 9:09
@t_warsop : You will need to ssh to end point, download the postgre jar and pass it with your spark-submit command. I couldn't figure out a better way for dev endpoints.
– Harsh Bafna
Commented Oct 16, 2019 at 5:39
@mcm : you can use spark's sqlcontext to execute the CREATE command, sqlContext.sql(query).
– Harsh Bafna
Commented Oct 16, 2019 at 5:40

Add a comment |

Collectives™ on Stack Overflow

Connect to Postgres via AWS Glue Python script

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
aws-glue
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged aws-glue or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
aws-glue
or ask your own question.