I have several CSV files (50 GB) in an S3 bucket in Amazon Cloud. I am trying to read these files in a Jupyter Notebook (with Python3 Kernel) using the following code:
import boto3
from boto3 import session
import pandas as pd
session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config = boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')
names = ['id','origin','name']
dataset = pd.read_csv(response['Body'], names=names)
dataset.head()
But I face the following error when I run the code:
valueError: Invalid file path or buffer object type: class 'botocore.response.StreamingBody'
I came across this bug report about pandas and boto3 object not being compatible yet.
My question is, how else can I import these CSV files from my S3 bucket into my Jupyter Notebook which runs on the Cloud.