Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
1 answer
181 views

GCS Connector on EMR failing with java.lang.ClassNotFoundException

I have created an emr cluster with the instructions on how to create a connection from gcs provided here and keep running the hadoop distcp command. It keeps failing with the following error: 2023-07-...
Abhinav Rai's user avatar
-1 votes
2 answers
2k views

Copying data from one s3 bucket to another s3 bucket of different account in fast manner, just using access_id, secret_access_key cred of both

I have access_key, access_id for both of the aws bucket belong to a different account. I have to copy data from one location to another, is there a way to do it faster. I have tried map-reduced-based ...
lifeisshubh's user avatar
1 vote
0 answers
701 views

"Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4" when copying data from HDFS to S3

I am trying to use distcp to copy data from HDFS to S3, but I got an error: Caused by: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: ...
michelle's user avatar
  • 197
1 vote
2 answers
1k views

Is it possible to specify the number of mappers-reducers while using s3-dist-cp?

I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?
Kshitij Kohli's user avatar
1 vote
0 answers
49 views

Distcp to S3 OK but cant list

Basically I'm using the command ´distcp´ to put some data in S3, pretty small files. I am sending the data into a bucket, and a put the files inside a folder of the bucket. It works fine and I can ...
DDDDEEEEXXXX's user avatar
1 vote
1 answer
990 views

AWS file upload

I want to upload few files into AWS bucket from hadoop. I have AWS ACCESS KEY, SECRET KEY and S3 IMPORT PATH. I am not able to access though AWS CLI command. I set the keys in aws credential file....
akr's user avatar
  • 43
0 votes
1 answer
449 views

oozie distcp s3 to s3 copy invalid arguments org.jets3t.service.impl.rest.HttpException

I have a distcp action as follows <action name="ExecuteDataCopyS3ToHDFS"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <arg>-Dmapred.job.queue.name=dev</arg> ...
Himateja Madala's user avatar
1 vote
0 answers
213 views

distcp: How to avoid flattening dir if there's only one file in hdfs to s3 copying

Currently my hdfs structure is: /data/xxx/xxx/2014 /data/xxx/xxx/2015 /data/xxx/xxx/2016 two files under 2015, two under 2016, only one file in 2014 i use this command to copy them seperately: ...
viva's user avatar
  • 13
1 vote
1 answer
701 views

AWS instance distcp to s3 - Access keys

If I have an EC2 instance created with a role, what is the best practice way to get access keys to do a distcp from hdfs to s3? I don't want to be sending access keys to the instance using our ...
user avatar
0 votes
1 answer
644 views

Can you use s3distcp with gzipped input?

I'm trying to use s3distcp to compy a lot of small gzipped files which unfortunately don't end in a gz extension. There s3distcp has an outputCodec argument that can be used to zip the output, but ...
maxymoo's user avatar
  • 36.4k
2 votes
0 answers
462 views

Download large volumes from S3 to Local Machine? - s3distcp

Currently using distcp is slow, taking up to 4:16 minutes to copy 1 hour's worth of logs, while a custom function wrote by me only takes 16 seconds. Given that Amazon provides s3distcp examples ...
ylun's user avatar
  • 2,534
0 votes
2 answers
5k views

Multiple source files for s3distcp

Is there a way to copy a list of files from S3 to hdfs instead of complete folder using s3distcp? this is when srcPattern can not work. I have multiple files on a s3 folder all having different ...
its me's user avatar
  • 127
0 votes
1 answer
1k views

How to copy from subdirectories using s3DistCp

Trying to use s3DistCp to copy from s3://my-bucket/dir1/ , s3://my-bucket/dir2, s3://my-bucket/dir3. And all three DIRs has some files in them. Wanted to do something like: hadoop jar s3distcp.jar --...
yunt's user avatar
  • 1
3 votes
0 answers
313 views

distcp s3 instance profile temporary credentials

I'm using distcp on my hadoop cluster in AWS. Now we are switching over to use IAM roles for the cluster nodes. A solution I was going to try was add in my own implementation of org.apache.hadoop.fs....
Elan H's user avatar
  • 33