Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
1 answer

GCS Connector on EMR failing with java.lang.ClassNotFoundException

I have created an emr cluster with the instructions on how to create a connection from gcs provided here and keep running the hadoop distcp command. It keeps failing with the following error: 2023-07-...
Abhinav Rai's user avatar
-1 votes
2 answers

Copying data from one s3 bucket to another s3 bucket of different account in fast manner, just using access_id, secret_access_key cred of both

I have access_key, access_id for both of the aws bucket belong to a different account. I have to copy data from one location to another, is there a way to do it faster. I have tried map-reduced-based ...
lifeisshubh's user avatar
1 vote
0 answers

"Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4" when copying data from HDFS to S3

I am trying to use distcp to copy data from HDFS to S3, but I got an error: Caused by: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: ...
michelle's user avatar
  • 197
1 vote
2 answers

Is it possible to specify the number of mappers-reducers while using s3-dist-cp?

I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?
Kshitij Kohli's user avatar
1 vote
0 answers

Distcp to S3 OK but cant list

Basically I'm using the command ´distcp´ to put some data in S3, pretty small files. I am sending the data into a bucket, and a put the files inside a folder of the bucket. It works fine and I can ...
DDDDEEEEXXXX's user avatar
1 vote
1 answer

AWS file upload

I want to upload few files into AWS bucket from hadoop. I have AWS ACCESS KEY, SECRET KEY and S3 IMPORT PATH. I am not able to access though AWS CLI command. I set the keys in aws credential file....
akr's user avatar
  • 43
0 votes
1 answer

oozie distcp s3 to s3 copy invalid arguments

I have a distcp action as follows <action name="ExecuteDataCopyS3ToHDFS"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <arg></arg> ...
Himateja Madala's user avatar
1 vote
0 answers

distcp: How to avoid flattening dir if there's only one file in hdfs to s3 copying

Currently my hdfs structure is: /data/xxx/xxx/2014 /data/xxx/xxx/2015 /data/xxx/xxx/2016 two files under 2015, two under 2016, only one file in 2014 i use this command to copy them seperately: ...
viva's user avatar
  • 13
1 vote
1 answer

AWS instance distcp to s3 - Access keys

If I have an EC2 instance created with a role, what is the best practice way to get access keys to do a distcp from hdfs to s3? I don't want to be sending access keys to the instance using our ...
user avatar
0 votes
1 answer

Can you use s3distcp with gzipped input?

I'm trying to use s3distcp to compy a lot of small gzipped files which unfortunately don't end in a gz extension. There s3distcp has an outputCodec argument that can be used to zip the output, but ...
maxymoo's user avatar
  • 36.4k
2 votes
0 answers

Download large volumes from S3 to Local Machine? - s3distcp

Currently using distcp is slow, taking up to 4:16 minutes to copy 1 hour's worth of logs, while a custom function wrote by me only takes 16 seconds. Given that Amazon provides s3distcp examples ...
ylun's user avatar
  • 2,534
0 votes
2 answers

Multiple source files for s3distcp

Is there a way to copy a list of files from S3 to hdfs instead of complete folder using s3distcp? this is when srcPattern can not work. I have multiple files on a s3 folder all having different ...
its me's user avatar
  • 127
0 votes
1 answer

How to copy from subdirectories using s3DistCp

Trying to use s3DistCp to copy from s3://my-bucket/dir1/ , s3://my-bucket/dir2, s3://my-bucket/dir3. And all three DIRs has some files in them. Wanted to do something like: hadoop jar s3distcp.jar --...
yunt's user avatar
  • 1
3 votes
0 answers

distcp s3 instance profile temporary credentials

I'm using distcp on my hadoop cluster in AWS. Now we are switching over to use IAM roles for the cluster nodes. A solution I was going to try was add in my own implementation of org.apache.hadoop.fs....
Elan H's user avatar
  • 33