0

I'm finding hard time in understanding the flow of what is happening in nextKeyValue() method explained in the below link:

http://analyticspro.org/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/

especially the for loop in in nextKeyValue()

Any help would be appreciable

Thanks in advance

3 Answers 3

1

nextKeyValue() is the core function which sets the key and value pair for a particular map call. So from your link, the below code(before for loop) it just sets the key with pos which is nothing but the start offset key.set(pos) And it buffers out the previously set value. The corresponding code:

public boolean nextKeyValue() throws IOException, InterruptedException {
    if (key == null) {
        key = new LongWritable();
    }
    key.set(pos);
    if (value == null) {
        value = new Text();
    }
    value.clear();
    final Text endline = new Text("\n");
    int newSize = 0;

After for loop. I have added sufficient comments for each line.

       for(int i=0;i<NLINESTOPROCESS;i++){ //Since this is NLineInputFormat they want to read 3 lines at a time and set that as value,
so this loop will continue until that is satisfied.
            Text v = new Text();
            while (pos < end) { //This is to prevent the recordreader from reading the second split, if it is currently reading the first split. pos would be start
of the split and end would be end offset of the split. 
                newSize = in.readLine(v, maxLineLength,Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),maxLineLength));
//This calls the linereader readline function which reads until it encounters a newline(default delim for TextInputformat and maxlinelength would be max integer size
just to ensure the whole line doesn''t go beyond the maxlinelength and the line read would be stored in Text variable v)
                value.append(v.getBytes(),0, v.getLength());
//Reads from v(whole line) and appends it to the value,append is necessary because we are going to read 3 lines.
                value.append(endline.getBytes(),0, endline.getLength());
//appends newline to each line read
                if (newSize == 0) {
                    break;//If there is nothing to read then come out.
                }
                pos += newSize;
                if (newSize < maxLineLength) {//There is a flaw here it should be >=, to imply if the read line is greater than max integer size then come out
                    break;
                }
            }
        }
        if (newSize == 0) {
            key = null;//If there is nothing to read assign key and value as null else continue the process by returning true to map call.
            value = null;
            return false;
        } else {
            return true;
        }
    }
}
0

The method nextKeyValue() will be used by each mapper, to iterate between all split records.

The class NLinesRecordReader defines that each record has 3 lines.

private final int NLINESTOPROCESS = 3;

The main role for the loop in nextKeyValue() is to get for each record 3 lines. The record will be used as an input value on the map() method.

0

Whenever a new data is required two things happens. The first question which is asked to the reader is

DO YOU HAVE ANY DATA ???

If the reader replies yes then the caller can get the data from the getCurrentValue method.

Now the nextKeyValue method does this task, it just answers the question DO YOU HAVE ANY DATA LEFT TO GIVE ME ?

I am not able to access the link due to firewall issues but a sample implementation which i used is

HashMap<Integer, Invoice> allData= new HashMap<Integer, Invoice>();

    @Override
public boolean nextKeyValue() throws IOException, InterruptedException {
    if(key == null) {
        this.key = new LongWritable();
    }
    this.key.set(startPos);

    if(value == null) {
        this.value = new Invoice();
    }
    if(startPos >= endPos) {
        key = null;
        value = null;
        return false;
    } else {
        this.value = allData.get(startPos);
        startPos = startPos + 1;
        return true;
    }
}

Here Invoice is just a POJO. and in the initialize method i did nothing but parse the whole document and store in a hashmap. In the nextKeyValue method check if the next key exsists if it does return the corresponding value else return the key does not exsist.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.