Newest 'hadoop+mapreduce+sorting' Questions

1 vote

0 answers

46 views

Sort compressed data in hadoop

So here is the situation, I want to see if applying sort (from map reduce examples) on a compressed file is more efficient than on the original file. To run this experiment, I first launch ...

wilcoln

73

asked Dec 23, 2019 at 11:42

0 votes

0 answers

180 views

How sorting works on Partitioned skewed data set in Hive

I am having a dataset with below heirarchy in Hive, size of datsets is in TB's. -Country -Year -In_stock -Zone -trans_dt I need to sort trans_dt in ascending order within Zone (one of ...

nilesh1212

1,655

asked Aug 14, 2019 at 14:36

0 votes

1 answer

3k views

How to sort values (with their corresponding key) in mapReduce Hadoop framework?

I am trying to sort the input data I have using Hadoop mapReduce. The problem is that I am only able to sort the key-value pairs by key, while I am trying to sort them by value. Each value's key was ...

Sara

11

asked Apr 3, 2019 at 11:31

0 votes

1 answer

520 views

How to sort a custom writable type in Hadoop

I have a custom type which contains fields of Hadoop native types (e.g. Text and IntWritable) and need to use it as a key and sort as I want during the shuffle/sort phase. There are similar questions ...

Serob_b

1,059

asked Mar 13, 2019 at 1:27

0 votes

1 answer

1k views

How to find top 10 elements in MapReduce

I am trying to write a Python MapReduce job on some datasets I have to find certain statistics. This is a example of the input data and the form it comes in: exchange, stock_symbol, date, ...

faboys

57

asked Nov 1, 2018 at 18:34

1 vote

0 answers

1k views

How to make Group By and sort in Python for mapreducer in hadoop

I have a dataset with 100k rows with 17 cols. I would like to know how to groupby and sort in hadoop mapreducer using python here is my mapper.py #!/usr/bin/python import sys for line in sys.stdin:...

LonelyToh

15

asked Sep 12, 2018 at 12:32

1 vote

1 answer

2k views

How to sort by key and value in mapreduce?

I have a text file: 10 1 15 10 12 30 10 9 45 10 8 40 10 15 55 12 9 0 12 7 18 12 10 1 9 1 1 9 2 1 9 0 1 14 5 5 And I would like to get this file as an output of my ...

Ildar Gabdrakhmanov

185

asked Jan 25, 2018 at 8:02

0 votes

1 answer

7k views

MapReduce sort by value in descending order

I'm trying to write in pseudo code a MapReduce task that returns the items sorted in descending order. For example: for the wordcount task, instead of getting: apple 1 banana 3 mango 2 I want the ...

Shani Gamrian

345

asked Jun 21, 2017 at 15:06

3 votes

1 answer

475 views

Do we really need sorting in the MapReduce framework?

I am completely new to MapReduce and just can't get my mind around the need to sort the mapper output according to the keys in each partition. Eventually all we want is that a reducer is fed a ...

hesk

327

asked Jun 3, 2017 at 13:59

1 vote

0 answers

202 views

Multiple column sorting hadoop streaming (EMR)

I'm trying to sort differently on each column on the mapper output. My output looks like this: xx yy 2 4 xx yy 1 5 xx yy 5 39 xx yy 8 3 So the first 2 columns are text the the last 2 columns are ...

refaelos

8,055

asked Dec 21, 2016 at 16:41

1 vote

2 answers

3k views

Searching between dates in Hbase

I have Hbase table wiht rowKeys as such (delimter = '#') 0CE5C485#1481400000#A#B#C#T 00C6F485#1481600000#F#J#C#G 065ED485#1481500000#T#X#C#G ... ... The first part is actually the hex of the ...

Huga

571

asked Dec 10, 2016 at 21:14

3 votes

0 answers

1k views

Hadoop Mapreduce Multiple Reducer Sorting

I am using Hadoop Mapreduce to sort a large document and using the KeyFieldBasedPartitioner to partition different inputs to different reducers. The idea I have to solve this problem is to have the ...

user4599213

asked Oct 9, 2016 at 21:04

0 votes

1 answer

604 views

Hadoop Map Reduce - how to separate grouping from sorting?

Just getting started writing Hadoop MR jobs. Hopefully we'll be switching over to Spark soon, but we're stuck doing MR for now. I'd like to group records by a hash of their value. But I'd like to ...

medloh

969

asked Jun 20, 2016 at 23:03

1 vote

2 answers

1k views

how to sort the output of a map side program in mapreduce?

My question is about how I can sort the output of a mapper in a mapreduce program(ps: there is no reducers(0)), i use just the map side to filter two inputs and I want that the result(output mappers) ...

Zoro4246

73

asked May 17, 2016 at 15:34

0 votes

1 answer

1k views

MapReduce Sort By Python Tuples Numerically

I'm working wth Python tuples and have a text file that looks like (1,value1) (2,value2) (3,value3) ... (100,value100) How can I configure my MapReduce job to sort by the first key in the tuple as ...

Jack

538

asked May 1, 2016 at 21:57

0 votes

1 answer

44 views

How to dedupe a file and maintain original sort order in Hive?

My data is already sorted by descending last_column and descending third_column. I want to de-duplicate the data set based on last_column while maintaining the original sort order. So for each ...

Utsav Chatterjee

181

asked Mar 10, 2016 at 21:38

0 votes

1 answer

3k views

How to sort a column in data set in descending order using Java Hadoop map reduce?

My data file is: Utsav Chatterjee Dangerous Soccer Coldplay 4 Rodney Purtle Awesome Football Maroon5 3 Michael Gross Amazing Basketball Iron Maiden 6 Emmanuel Ezeigwe Cool ...

Utsav Chatterjee

181

asked Mar 5, 2016 at 20:48

3 votes

1 answer

2k views

How to sort comma separated keys in Reducer ouput?

I am running an RFM Analysis program using MapReduce. The OutputKeyClass is Text.class and I am emitting comma separated R (Recency), F (Frequency), M (Monetory) as the key from Reducer where R=...

Punit Naik

515

asked Feb 17, 2016 at 19:40

-1 votes

1 answer

469 views

Sort data Hadoop Mapreduce

I have the following algorithm that sort data with alphabetic order public void setup(Context context) throws IOException, InterruptedException { conf = context.getConfiguration(); ...

BigBosss

5

asked Dec 16, 2015 at 6:58

0 votes

2 answers

130 views

Sort Mapreduce dataset

I'm trying to run the following project to sort a dataset. But, when I execute the command: Hadoop jar xx.jar /inputdir /output dir, I get following error on the terminal: org.apache.hadoop.mapred....

BigBosss

5

asked Dec 5, 2015 at 16:12

1 vote

0 answers

74 views

Decide number of reducer in sort by statement in hive?

Do we have control over what data can we send to the reducer when doing a sort by - eg . if you have a data with 10 states (and data under each state) and you set the reducer to 6 and then you do a ...

Nikhil vyas

37

asked Nov 22, 2015 at 10:47

0 votes

1 answer

324 views

In a large MapReduce job with "X" mappers and "Y" reducers, how many distinct copy operations will there be in the sort/shuffle phase

As I understand there will be X + Y copy operations , correct me if im wrong Thanks

Emmanuel Ramos

3

asked Oct 29, 2015 at 15:29

3 votes

1 answer

923 views

Hadoop - Properly sort by key and group by reducer

I have some data coming out from the reducer which are like this : 9,2 3 5,7 2 2,3 0 1,5 3 6,3 0 4,2 2 7,1 1 And I would like to sort them according to the number on the second ...

Robin Dupont

339

asked Oct 25, 2015 at 4:23

-1 votes

1 answer

159 views

Join and sort Dataset Hadoop

I'm working project on Hadoop using mapreduce (I have 2 dataset KDD and DARPAA) and I'm looking for algorithm which can group and sort those datasets in one file. The two dataset have this format: @...

BigBosss

5

asked Oct 21, 2015 at 13:05

1 vote

2 answers

269 views

Why mapreduce secondary sorting is not on composite key's compareTo()?

To perform secondary sort, we have to create a composite key which extends WritableComparable interface and implements compareTo(). In the "Hadoop: The Definitive Guide" book and almost all the blogs ...

K246

1,107

asked Oct 7, 2015 at 14:36

0 votes

0 answers

58 views

What sorting algorithm does mapreduce use in Hadoop? Can I change it? [duplicate]

At first I parsed my .jar file containing the program of WordCount along with input and output destination in the command line. After the completion of my job, I saw the content of my output file: it ...

Udit Solanki

531

asked Jul 17, 2015 at 6:41

1 vote

1 answer

169 views

Custom SortComparator not working in MapReduce wordcount program

I am trying to understand how MapReduce Sorts the Map output keys and what is the sort algorithm which it uses. I have a text file like this a b e f c b how it performs the sorting with these keys. ...

user4498972

asked Jun 4, 2015 at 10:13

3 votes

3 answers

2k views

NullPointerException in MapReduce Sorting Program

I know that SortComparator is used to sort the map output by their keys. I have written a custom SortComparator to understand the MapReduce framework better.This is my WordCount class with custom ...

user4498972

asked Jun 2, 2015 at 5:23

3 votes

3 answers

2k views

MapReduce output key in ascending order

I have written a MapReduce code for which both keys and values are integers. I am using a single Reducer. The output is like this: Key Value 1 78 128 12 174 26 2 44 2957 123 975 ...

MChirukuri

610

asked May 29, 2015 at 11:23

0 votes

4 answers

2k views

Sorted Hadoop WordCount Java

I am running the WordCount program of Hadoop in Java and my first job (getting all the words and their count) works fine. However I come across a problem when I'm doing the second job who should sort ...

Melanie Journe

1,369

asked May 1, 2015 at 15:18

1 vote

1 answer

1k views

What is the hadoop sort comparator class for?

I've implementing the hadoop sort comparator class for sorting my key. I know that it use to compare every key. But, I don't know how it can working in detail? Is that true, if it use to compare? ...

Kenny Basuki

735

asked Apr 21, 2015 at 10:53

0 votes

1 answer

783 views

Sort in mapreduce

I am learning hadoop mapreducing. I am trying to sort (by value) using mapreduce. Below is my code for the mapper: static String splitChar = "\t"; static int colIndexone = 0; static int colIndextwo = ...

Leo

5,225

asked Feb 25, 2015 at 7:11

0 votes

1 answer

367 views

hadoop partitioner not working

public class Partitioner_2 implements Partitioner<Text,Text>{ @Override public int getPartition(Text key, Text value, int numPartitions) { int ...

Nikhil

545

asked Oct 26, 2014 at 20:29

0 votes

1 answer

148 views

Hadoop sort phase taking hours

I started using hadoop for a week. After succesfully running the examples, I a mapreduce job to find the most used word using the WordCount example. I'm trying to run this job with 500 MB or data. ...

Ludovic S

195

asked Oct 26, 2014 at 16:22

1 vote

0 answers

48 views

What's the fastest approach to merging a small number of large, already sorted lists in Hadoop?

I've got a small Hadoop (CDH5.1.0, MRv2/YARN) cluster (5x nodes 4CPU, 16GB RAM, 600GB disk) of which contains a small number ~30 of ~15GB SequenceFiles. The SequenceFiles contains pairs of ...

growse

3,732

asked Oct 12, 2014 at 11:13

1 vote

1 answer

803 views

Is data inside mapreduce partitions sorted, if yes, how does it happen?

Is data inside mapreduce partitions sorted, if yes, how? AFAIK, it is grouped on the basis of the key. If it internally sorts, wouldn't it be an overhead to sort all the data inside all the partitions?...

MohitS

21

asked Sep 25, 2014 at 11:36

1 vote

0 answers

612 views

Hadoop MapReduce secondary sort: Reducer not getting called

I am trying to do a secondary sort on 4 values in my output. I referred to this tutorial. I have a 4 node cluster running Hadoop 2.2.0. I use Idea IntelliJ IDE for debugging locally. Following are ...

anixg33k

25

asked Aug 26, 2014 at 6:26

0 votes

1 answer

236 views

Error during benchmarking Sort in Hadoop2 - Partitions do not match

I am trying to benchmark Hadoop2 MapReduce framework. It is NOT TeraSort. But testmapredsort. step-1 Create random data: hadoop jar hadoop/ randomwriter -Dtest.randomwrite.bytes_per_map=100 -Dtest....

eagertoLearn

10.1k

asked Aug 18, 2014 at 18:25

1 vote

3 answers

2k views

TotalOrderPartitioner ignores partition file location

I was trying to do a simple sort example with TotalOrderPartitioner. The input is a sequence file with IntWritable as key and NullWritable as value. I want to sort based on key. The output of is a ...

Majid Azimi

5,745

asked Jul 31, 2014 at 5:51

6 votes

1 answer

3k views

In-depth understanding of internal working of map phase in a Map reduce job in hadoop?

I am reading Hadoop: The definitive guide 3rd edtition by Tom White. It is an excellent resource for understanding the internals of Hadoop, especially Map-Reduce which I am interested in. From the ...

brain storm

31.2k

asked Jul 23, 2014 at 18:13

2 votes

3 answers

3k views

top-k in mapreduce when k elements do not fit in memory

What would be an efficient MapReduce algorithm to find the top-k elements from a dataset, when k is too big to fit k elements in memory? I am talking about a dataset of millions of elements and k ...

vefthym

7,462

asked Jul 11, 2014 at 7:11

0 votes

1 answer

2k views

Python Hadoop streaming, secondary sorting issues

Hadoop newbie here. I have some user-events logs like this, with userid and timestamp both randomly ordered: userid timestamp serviceId aaa 2012-01-01 13:12:23 4 aaa 2012-01-01 12:...

xiaolong

3,637

asked Jun 26, 2014 at 17:34

1 vote

2 answers

622 views

Sorting Algorithm on hadoop framework

I read numbers of links on internet. Here are few links link1, link2. But I am not able to understand. What they exactly doing. Can you pleae explain this algorithm in a simpler way. And, yes next ...

devsda

4,212

asked Jun 12, 2014 at 6:21

0 votes

1 answer

711 views

Hadoop: Secondary sort does not work

I have implemented an algorithm in Hadoop 1.2.1, where reducer code relies on the secondary sorting. However, when I run the algorithm one reducer receives sorted tuples, but the other does not. I've ...

Krle

70

asked Mar 21, 2014 at 15:41

0 votes

1 answer

4k views

Hadoop WordCount sorted by word occurrences

I need to run WordCount which will give me all the words and their occurrences but sorted by the occurrences and not by the alphabet I understand that I need to create two jobs for this and run one ...

Pini Cheyni

5,419

asked Mar 10, 2014 at 11:06

145 votes

8 answers

125k views

What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?

In Map Reduce programming the reduce phase has shuffling, sorting and reduce as its sub-parts. Sorting is a costly affair. What is the purpose of shuffling and sorting phase in the reducer in Map ...

user1112259

asked Mar 3, 2014 at 8:10

1 vote

1 answer

168 views

Hadoop sorting issue (Alternate title: 1175 is not less than 119!)

I'm new to Hadoop and done with a typical "count the IP addresses in a log" exercise. Now I'm trying to sort the output by running a second MapReduce job immediately after the first. Almost everything ...

sjohnson

11

asked Feb 10, 2014 at 5:53

2 votes

1 answer

2k views

What is the point of using a Partitioner for Secondary Sorting in MapReduce?

If you need to have the values sorted for a given key when passed to the reduce phase, such as for a moving average, or to mimick the LAG/LEAD Analytic functions in SQL, you need to implement a ...

Matthew Moisen

18.2k

asked Jan 23, 2014 at 23:13

1 vote

1 answer

5k views

Map-Reduce/Hadoop sort by integer value (using MRJob)

This is an MRJob implementation of a simple Map-Reduce sorting functionality. In beta.py: from mrjob.job import MRJob class Beta(MRJob): def mapper(self, _, line): """ """ ...

p0lAris

4,820

asked Nov 23, 2013 at 0:15

0 votes

1 answer

2k views

How to control the sort order of mapper result in mapreduce before being sent to reducer

Taking a slight variation of the word count example to explain what I am trying to do. I have 3 mappers each producing a complete word count result on 3 large input files. Let us say the output is: ...

user1967879

3

asked Oct 28, 2013 at 7:37

Collectives™ on Stack Overflow

All Questions

Related Tags