All Questions
9 questions
1
vote
1
answer
285
views
-Dmapred.job.name does not work with s3-dist-cp command
I'd like to copy some files from emr-hdfs to s3 bucket using s3-dist-cp, I've tried this cmd from "EMR Master Node":
s3-dist-cp -Dmapred.job.name=my_copy_job --src hdfs:///user/hadoop/abc s3://...
4
votes
1
answer
2k
views
Hadoop Distcp - increasing distcp.dynamic.max.chunks.tolerable config and tuning distcp
I am trying to move data between two hadoop clusters using distcp. There is a lot of data to move with a large number of small files. In order to make it faster, I tried using -strategy dynamic, which ...
0
votes
0
answers
45
views
How can I transfer data after processing to another cluster using MapReduce?
I am new to Hadoop. I want to write a single MR job which does some processing of the data and moves the result to another cluster. I am aware I can simply change the destination within the driver ...
0
votes
1
answer
488
views
Map Reduce job in java for distcp
I am trying to copy data from one cluster to another on daily basis. Searched a lot but everybody is suggesting to to call main function of DistCp with args. I was wring java code for same. But its ...
0
votes
1
answer
4k
views
Number of mappers while doing distcp
How can I set the number of mappers to do distcp job? I know that we can set the max number of mappers by doing Hadoop distcp -m. But is it possible to set the number instead of the maximum number of ...
0
votes
1
answer
2k
views
Not able to copy one HDFS data to another HDFS location using distcp
I am trying to copy one HDFS data to another HDFS location.
I am able to achieve the same using "distcp" command
hadoop distcp hdfs://mySrcip:8020/copyDev/* hdfs://myDestip:8020/copyTest
But I want ...
0
votes
1
answer
608
views
How do I determine if a call to distcp2 was successful?
The best advice I could find online is that you should either compare the files after transfer or make a second run with -update, and the second is considered unreliable.
Is there a way of ...
0
votes
2
answers
1k
views
hadoop distcp not working,MR job in accepted state
I am trying to copy data from CDH4 to CDH5 cluster. When I submit the distcp job from CDH5, MR job goes to accepted state and stays there ( I have tried it multiple times, it stayed there for more ...
5
votes
1
answer
1k
views
Hadoop DistCp handle same file name by renaming
Is there any way to run DistCp, but with an option to rename on file name collisions? Maybe it's easiest to explain with an example.
Let's say I'm copying to hdfs:///foo to hdfs:///bar, and foo ...