Load CSV data into Jupyter Notebook from S3

Question

I have several CSV files (50 GB) in an S3 bucket in Amazon Cloud. I am trying to read these files in a Jupyter Notebook (with Python3 Kernel) using the following code:

import boto3
from boto3 import session
import pandas as pd

session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config = boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')

names = ['id','origin','name']
dataset = pd.read_csv(response['Body'], names=names)
dataset.head()

But I face the following error when I run the code:

valueError: Invalid file path or buffer object type: class 'botocore.response.StreamingBody'

I came across this bug report about pandas and boto3 object not being compatible yet.

My question is, how else can I import these CSV files from my S3 bucket into my Jupyter Notebook which runs on the Cloud.

Paul Meinshausen · Accepted Answer · 2019-09-03 00:42:55Z

You can also use s3fs which allows pandas to read directly from S3:

import s3fs

# csv file
df = pd.read_csv('s3://{bucket_name}/{path_to_file}')

# parquet file
df = pd.read_parquet('s3://{bucket_name}/{path_to_file}')

And then if you have multiple files in a bucket, you can iterate through them like so:

import boto3
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(name='{bucket_name}')
for file in bucket.objects.all():
    # do what you want with the files
    # for example:
    if 'filter' in file.key:
        print(file.key)
        new_df = pd.read_csv('s3:://{bucket_name}/{}'.format(file.key))

sepideh · Accepted Answer · 2018-02-12 16:16:45Z

4

I am posting this fix to my problem, in case somebody needs it. I replaces the read_csv line with the following and the problem was solved:

dataset = pd.read_csv(io.BytesIO(response['Body'].read()), encoding='utf8')

answered Feb 12, 2018 at 16:16

sepideh

6351 gold badge6 silver badges13 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Load CSV data into Jupyter Notebook from S3

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
csv
jupyter-notebook
boto3
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandascsvjupyter-notebookboto3 or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
csv
jupyter-notebook
boto3
or ask your own question.