7

Going through the AWS Glue docs I can't see any mention of how to connect to a Postgres RDS via a Glue job of "Python shell" type. I've set up a RDS connection in AWS Glue and verified I can connect to my RDS. Also, when creating the Python job I can see my connection and I've added it to the script.

How do I use the connection which I've added to the Glue job to run some raw SQL?

Thanks in advance,

1
  • Did you have any luck with it?
    – gibbz00
    Commented Jun 12, 2019 at 14:06

1 Answer 1

9

There are 2 possible ways to access data from RDS in glue etl (spark):

1st Option:

  • Create a glue connection on top of RDS
  • Create a glue crawler on top of this glue connection created in first step
  • Run the crawler to populate the glue catalogue with database and table pointing to RDS tables.
  • Create a dynamic frame in glue etl using the newly created database and table in glue catalogue.

Code Sample :

from pyspark.context import SparkContext
from awsglue.context import GlueContext
glueContext = GlueContext(SparkContext.getOrCreate())
DyF = glueContext.create_dynamic_frame.from_catalog(database="{{database}}", table_name="{{table_name}}")

2nd Option

Create a dataframe using spark sql :

url = "jdbc:postgresql://<rds_host_name>/<database_name>"
properties = {
"user" : "<username>",
"password" : "<password>"
}
df = spark.read.jdbc(url=url, table="<schema.table>", properties=properties)

Note :

  • You will need to pass postgres jdbc jar for creating the database using spark sql.
  • I have tried first method on glue etl and second method on python shell (dev-endpoint)
4
  • Want to be able to execute Raw SQL queries. Such as CREATE .... In the above case that's not possible...from my understanding. :/
    – mcm
    Commented Jul 19, 2019 at 21:01
  • @Harsh "You will need to pass postgres jdbc jar for creating the database using spark sql." - how would I do this?
    – t_warsop
    Commented Oct 14, 2019 at 9:09
  • @t_warsop : You will need to ssh to end point, download the postgre jar and pass it with your spark-submit command. I couldn't figure out a better way for dev endpoints. Commented Oct 16, 2019 at 5:39
  • @mcm : you can use spark's sqlcontext to execute the CREATE command, sqlContext.sql(query). Commented Oct 16, 2019 at 5:40

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.