All Questions
Tagged with distcp amazon-web-services
14 questions
0
votes
1
answer
181
views
GCS Connector on EMR failing with java.lang.ClassNotFoundException
I have created an emr cluster with the instructions on how to create a connection from gcs provided here and keep running the hadoop distcp command.
It keeps failing with the following error:
2023-07-...
-1
votes
2
answers
2k
views
Copying data from one s3 bucket to another s3 bucket of different account in fast manner, just using access_id, secret_access_key cred of both
I have access_key, access_id for both of the aws bucket belong to a different account. I have to copy data from one location to another, is there a way to do it faster.
I have tried map-reduced-based ...
1
vote
0
answers
701
views
"Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4" when copying data from HDFS to S3
I am trying to use distcp to copy data from HDFS to S3, but I got an error:
Caused by: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: ...
1
vote
2
answers
1k
views
Is it possible to specify the number of mappers-reducers while using s3-dist-cp?
I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?
1
vote
0
answers
49
views
Distcp to S3 OK but cant list
Basically I'm using the command ´distcp´ to put some data in S3, pretty small files.
I am sending the data into a bucket, and a put the files inside a folder of the bucket.
It works fine and I can ...
1
vote
1
answer
990
views
AWS file upload
I want to upload few files into AWS bucket from hadoop. I have
AWS ACCESS KEY, SECRET KEY and S3 IMPORT PATH.
I am not able to access though AWS CLI command.
I set the keys in aws credential file....
0
votes
1
answer
449
views
oozie distcp s3 to s3 copy invalid arguments org.jets3t.service.impl.rest.HttpException
I have a distcp action as follows
<action name="ExecuteDataCopyS3ToHDFS">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<arg>-Dmapred.job.queue.name=dev</arg>
...
1
vote
0
answers
213
views
distcp: How to avoid flattening dir if there's only one file in hdfs to s3 copying
Currently my hdfs structure is:
/data/xxx/xxx/2014
/data/xxx/xxx/2015
/data/xxx/xxx/2016
two files under 2015, two under 2016, only one file in 2014
i use this command to copy them seperately:
...
1
vote
1
answer
701
views
AWS instance distcp to s3 - Access keys
If I have an EC2 instance created with a role, what is the best practice way to get access keys to do a distcp from hdfs to s3?
I don't want to be sending access keys to the instance using our ...
0
votes
1
answer
644
views
Can you use s3distcp with gzipped input?
I'm trying to use s3distcp to compy a lot of small gzipped files which unfortunately don't end in a gz extension. There s3distcp has an outputCodec argument that can be used to zip the output, but ...
2
votes
0
answers
462
views
Download large volumes from S3 to Local Machine? - s3distcp
Currently using distcp is slow, taking up to 4:16 minutes to copy 1 hour's worth of logs, while a custom function wrote by me only takes 16 seconds. Given that Amazon provides s3distcp examples ...
0
votes
2
answers
5k
views
Multiple source files for s3distcp
Is there a way to copy a list of files from S3 to hdfs instead of complete folder using s3distcp? this is when srcPattern can not work.
I have multiple files on a s3 folder all having different ...
0
votes
1
answer
1k
views
How to copy from subdirectories using s3DistCp
Trying to use s3DistCp to copy from s3://my-bucket/dir1/ , s3://my-bucket/dir2, s3://my-bucket/dir3.
And all three DIRs has some files in them. Wanted to do something like:
hadoop jar s3distcp.jar --...
3
votes
0
answers
313
views
distcp s3 instance profile temporary credentials
I'm using distcp on my hadoop cluster in AWS. Now we are switching over to use IAM roles for the cluster nodes. A solution I was going to try was add in my own implementation of org.apache.hadoop.fs....