The "getPos" in the RecordReader has changed over time.
In the old mapred RecordReader implementations, it was used to count bytes read.
/**
* Returns the current position in the input.
*
* @return the current position in the input.
* @throws IOException
*/
long getPos() throws IOException;
In the newer mapreduce RecordReader implementations, this information is not provided by the RR class, but rather, it is part of the FSInputStream implementations:
class LocalFSFileInputStream extends FSInputStream implements HasFileDescriptor {
private FileInputStream fis;
private long position;
public LocalFSFileInputStream(Path f) throws IOException {
this.fis = new TrackingFileInputStream(pathToFile(f));
}
@Override
public void seek(long pos) throws IOException {
fis.getChannel().position(pos);
this.position = pos;
}
@Override
public long getPos() throws IOException {
return this.position;
}
Thus, with the new mapreduce API, the RecordReader was abstracted to not necessarily return a getPos(). Newer implementations of RecordReaders which might want to use this underlying implementation can be rewritten to use the FSInputStream objects directly, which do provide a getPos().