How to sort by key and value in mapreduce?

Question

I have a text file:

And I would like to get this file as an output of my MapReduce job:

It means it has to be sorted by 1st, 2nd and 3rd columns.

I use this command:

#!/bin/bash

IN_DIR="/user/cloudera/temp"
OUT_DIR="/user/cloudera/temp_out"
NUM_REDUCERS=1

hdfs dfs -rmr ${OUT_DIR} > /dev/null

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
-D mapred.jab.name="Parsing mista pages job 1 (parsing)" \
-D stream.num.map.output.key.fields=3 \
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \
-D mapreduce.partition.keycomparator.options='-k1,1n -k2,2n -k3,3n' \
-D mapreduce.job.reduces=${NUM_REDUCERS} \
-mapper 'cat' \
-reducer 'cat' \
-input ${IN_DIR} \
-output ${OUT_DIR}

hdfs dfs -cat ${OUT_DIR}/* | head -100

And get exactly what I want. BUT. When I do NUM_REDUCERS=2 I get this output:

[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/temp_out/part-00000 | head -100
9   1   1   
10  9   45  
10  12  30  
10  15  55  
12  7   18  
12  10  1   
14  5   5

[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/temp_out/part-00001 | head -100
9   0   1   
9   2   1   
10  1   15  
10  9   40  
12  9   0

Why partitioner splits my data with same keys (for example '9') to different reducers?

How can I force partitioner to split Mapper output by the key and sort it by value. For example, if I have 4 reducers the reducers input should be:

HbnKing · Accepted Answer · 2018-01-29 11:23:43Z

0

you can overwrite the default Partioner to put each key into diferent reduce .Set the same Nums of reduce . let each reduce to deal with only one key .

for example()

groupMap.put("9", 0);
groupMap.put("10", 1);
groupMap.put("12", 2);
groupMap.put("14", 3);

Add -partitioner argument to use your own partition in your job. I think it might works for you

answered Jan 29, 2018 at 11:23

HbnKing

1,8821 gold badge15 silver badges25 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to sort by key and value in mapreduce?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
sorting
hadoop
mapreduce
hadoop-streaming
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sortinghadoopmapreducehadoop-streaming or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
sorting
hadoop
mapreduce
hadoop-streaming
or ask your own question.