Read csv from Amazon s3 using python2.7

Question

I can easily get the bucket name from s3 but when I read the csv file from s3, it gives error every time.

import boto3
import pandas as pd

s3 = boto3.client('s3',
         aws_access_key_id='yyyyyyyy',
         aws_secret_access_key='xxxxxxxxxxx')
# Call S3 to list current buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
    print bucket['Name']

output
s3-bucket-data

.

import pandas as pd
import StringIO
from boto.s3.connection import S3Connection

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('s3-bucket-data')

fileName = "data.csv"

content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))

getting error-

boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request

How I can read the csv from s3?

What error are you getting? Without seeing that it is hard to help. — AChampion, Commented Apr 11, 2017 at 12:29
Possible duplicate of Reading a file from a private S3 bucket to a pandas dataframe — shad0w_wa1k3r, Commented Apr 11, 2017 at 12:30

muon · Accepted Answer · 2017-04-13 02:15:22Z

you can use s3fs package

s3fs also supports aws profiles in credential files.

Here is an example (you don't have to chunk it, but i just had this example handy),

import os
import pandas as pd
import s3fs
import gzip

chunksize = 999999
usecols = ["Col1", "Col2"]

filename = 'some_csv_file.csv.gz'
s3_bucket_name = 'some_bucket_name'

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
s3f = s3fs.S3FileSystem(
    anon=False,
    key=AWS_KEY,
    secret=AWS_SECRET)

# or if you have a profile defined in credentials file:
#aws_shared_credentials_file = 'path/to/aws/credentials/file/'
#os.environ['AWS_SHARED_CREDENTIALS_FILE'] = aws_shared_credentials_file
#s3f = s3fs.S3FileSystem(
#    anon=False,
#    profile_name=s3_profile)

filepath = os.path.join(s3_bucket_name, filename)
with s3f.open(filepath, 'rb') as f:
    gz = gzip.GzipFile(fileobj=f)  # Decompress data with gzip

    chunks = pd.read_csv(gz,
                            usecols=usecols,
                            chunksize=chunksize,
                            iterator=True,
                            )

    df = pd.concat([c for c in chunks], axis=1)

rrmerugu · Accepted Answer · 2017-04-11 15:16:53Z

boto is onething I love when it comes to handling data on S3 with python..

install boto using pip install boto

import boto
from boto.s3.key import Key

keyId ="your_aws_key_id"
sKeyId="your_aws_secret_key_id"
srcFileName="abc.txt" # filename on S3
destFileName="s3_abc.txt" # output file name
bucketName="mybucket001" # S3 bucket name 

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)

#Get the Key object of the given key, in the bucket
k = Key(bucket,srcFileName)

#Get the contents of the key into a file 
k.get_contents_to_filename(destFileName)

Manas Gaur · Accepted Answer · 2017-04-20 15:16:09Z

I experienced this issue with a few AWS Regions. I created a bucket in "us-east-1" and the following code worked fine:

import boto
from boto.s3.key import Key
import StringIO
import pandas as pd
keyId ="xxxxxxxxxxxxxxxxxx"
sKeyId="yyyyyyyyyyyyyyyyyy"
srcFileName="zzzzz.csv"
bucketName="elasticbeanstalk-us-east-1-aaaaaaaaaaaa"

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
k = Key(bucket,srcFileName)
content = k.get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))

Try creating a new bucket in us-east-1 and see if it works.

sepideh · Accepted Answer · 2018-02-12 17:00:32Z

0

Try the following:

import boto3
from boto3 import session
import pandas as pd
import io

session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config = 
boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')

dataset = pd.read_csv(io.BytesIO(response['Body'].read()), encoding='utf8')

answered Feb 12, 2018 at 17:00

sepideh

6351 gold badge6 silver badges13 bronze badges

response I get is encrypted. How do I decrypt and read the data.
– Hyder Tom
Commented Sep 2, 2018 at 16:15
Could you illustrate what you mean by encrypted? ( e.g. share the response you are getting). Is the file you are trying to read from S3 encrypted?
– sepideh
Commented Sep 3, 2018 at 15:39

Add a comment |

Collectives™ on Stack Overflow

Read csv from Amazon s3 using python2.7

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
csv
pandas
amazon-s3
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpython-2.7csvpandasamazon-s3 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
csv
pandas
amazon-s3
or ask your own question.