1

I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which should be consumed by the first mapper.

For e.g.
A B C D
$ E F

That '$' at the beginning signifies that, it is the continuation of the previous line.

Considering, the second split starts from $. Now, my first mapper won't know that there is something in continuation of first line. Please, also note that there is a very good chance that I do not have any second line in my data at all. So, I won't be able to tell that there is something in continuation of my data until or unless I read the second line.

Please help me find a solution for this problem.

2
  • 1
    Is it possible to preprocess the data? I think that's the easiest way to achieve what you're asking.
    – tsiki
    Commented Dec 4, 2013 at 13:05
  • I do not think so as for pre processing the data we need to parse it and the data being hudge .. its realy difficult to do that. Commented Dec 5, 2013 at 6:42

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.