nextKeyValue() is the core function which sets the key and value pair for a particular map call. So from your link, the below code(before for loop) it just sets the key with pos which is nothing but the start offset key.set(pos)
And it buffers out the previously set value. The corresponding code:
public boolean nextKeyValue() throws IOException, InterruptedException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
value.clear();
final Text endline = new Text("\n");
int newSize = 0;
After for loop. I have added sufficient comments for each line.
for(int i=0;i<NLINESTOPROCESS;i++){ //Since this is NLineInputFormat they want to read 3 lines at a time and set that as value,
so this loop will continue until that is satisfied.
Text v = new Text();
while (pos < end) { //This is to prevent the recordreader from reading the second split, if it is currently reading the first split. pos would be start
of the split and end would be end offset of the split.
newSize = in.readLine(v, maxLineLength,Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),maxLineLength));
//This calls the linereader readline function which reads until it encounters a newline(default delim for TextInputformat and maxlinelength would be max integer size
just to ensure the whole line doesn''t go beyond the maxlinelength and the line read would be stored in Text variable v)
value.append(v.getBytes(),0, v.getLength());
//Reads from v(whole line) and appends it to the value,append is necessary because we are going to read 3 lines.
value.append(endline.getBytes(),0, endline.getLength());
//appends newline to each line read
if (newSize == 0) {
break;//If there is nothing to read then come out.
}
pos += newSize;
if (newSize < maxLineLength) {//There is a flaw here it should be >=, to imply if the read line is greater than max integer size then come out
break;
}
}
}
if (newSize == 0) {
key = null;//If there is nothing to read assign key and value as null else continue the process by returning true to map call.
value = null;
return false;
} else {
return true;
}
}
}