Newest 'recordreader' Questions

-1 votes

1 answer

236 views

Image to image DataSetIterator using dl4j

I would like to use DeepLearning4j to build and train a U-Net network. To do this I need a dataset iterator that feed the network with an image in input and an other image in output. I am new to DL4j ...

Mateo Lopez

1

asked Oct 16, 2020 at 14:54

1 vote

1 answer

221 views

How does RecordReader send data to mapper in Hadoop

I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Pattern. My question ...

Irvan

175

asked Nov 18, 2018 at 18:31

1 vote

0 answers

206 views

Custom records reader PST file format in Spark Scala

I am working on PST files, I have worked on writing custom record reader for a Mapreduce program for different input formats but this time it is going to be spark. I am not getting any clue or ...

BARATH

372

asked Sep 11, 2018 at 9:30

1 vote

1 answer

680 views

How to read a simple CSV file with Datavec

I want to read a simple CSV file with just a list of numbers using Datavec, for use within Deeplearning4j. I've tried numerous examples but keep getting errors. e.g. when I execute this: ...

Thorin

51

asked Feb 7, 2018 at 6:59

0 votes

1 answer

30 views

MapReduce basics

I have a text file of 300mb with block size of 128mb. So total 3 blocks 128+128+44 mb would be created. Correct me - For map reduce default input split is same as block size that is 128mb which can ...

Boron

1

asked Dec 11, 2017 at 20:28

2 votes

1 answer

716 views

How to create splits from a sequence file in Hadoop?

In Hadoop, I have a sequence file of 3GB size. I want to process it in parallel. Therefore, I am going to create 8 maptasks and hence 8 FileSplits. FileSplit class has constructors that require the: ...

Mosab Shaheen

1,164

asked Apr 12, 2017 at 12:49

2 votes

1 answer

3k views

Java Code to Open a password protected zip file which is opening only with 7zx and keka in mac OS

I have a password protected zip file which is opening only with 7zx and keka in mac. I have to write the code in java to decompress password protected zip file and then do some operation on it. I have ...

mahan07

907

asked Dec 21, 2016 at 11:12

0 votes

1 answer

135 views

Hadoop 2: Empty result when using custom InputFormat

I want to use a own FileInputFormat with a custom RecordReader to read csv data into <Long><String> pairs. Therefore I created the class MyTextInputFormat: import java.io.IOException; ...

D. Müller

3,426

asked Jun 27, 2016 at 9:15

1 vote

3 answers

884 views

How to make Hadoop MR to read only files instead of folders in input path

As per our requirement, the output of one job will be the input of other job. By using Multiple outputs concepts we are creating a new folder in output path and writing those records into folder. ...

Abhinay

635

asked Feb 16, 2016 at 9:53

0 votes

3 answers

2k views

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below: InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/...

Barath

107

asked Jan 19, 2016 at 5:32

1 vote

0 answers

19 views

Pdf Preserve Layout to Text Haoop Mapreduce

I need to convert a PDFPreserveLayout to text file in Mapreduce,I am using PDFBOX to convert a normal pdf file to text file,but it is not working for pdfpreservelayout. Can any one help in solving ...

Barath

107

asked Jan 4, 2016 at 12:58

0 votes

1 answer

61 views

Concept of RecordReaders

We know that prior to Mapper phase the files are split and the RecordReader starts working to emit a input to the Mapper. My question is whether the reducer uses a RecordReader class to read the data ...

Aniruddha Sinha

799

asked Oct 27, 2015 at 9:05

1 vote

1 answer

799 views

Hadoop Mapreduce with compressed/encrypted files (file of large size)

I have hdfs cluster which stores large csv files in a compressed/encrypted form as selected by end user. For compression, encryption, I have create a wrapper input stream which feed data to HDFS in ...

Punit

11

asked Oct 15, 2015 at 9:25

0 votes

2 answers

791 views

passing arguments to record reader in mapreduce hadoop

This is my code for using variours arg import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache....

Barath

107

asked Sep 28, 2015 at 8:18

0 votes

3 answers

3k views

Hadoop custom record reader implementation

I'm finding hard time in understanding the flow of what is happening in nextKeyValue() method explained in the below link: http://analyticspro.org/2012/08/01/wordcount-with-custom-record-reader-of-...

bigdata123

463

asked Aug 20, 2015 at 5:17

1 vote

1 answer

99 views

How does hadoop RecordReader identify records

When processing text file how does hadoop identify records ? Is it based on newline characters or full stops ? If I have a text file list of 5000 words, all on single line, separated by space; no new ...

Kaushik Lele

6,639

asked Aug 7, 2015 at 11:31

4 votes

2 answers

2k views

Hadoop MapReduce RecordReader Implementation Necessary?

From the Apache doc on the Hadoop MapReduce InputFormat Interface: "[L]ogical splits based on input-size is insufficient for many applications since record boundaries are to be respected. In such ...

AST

211

asked Aug 6, 2015 at 13:10

1 vote

0 answers

774 views

How to process Multiline CSV Input File for Map Reduce Hadoop?

I have CSV input data file in which there are several records. Each Record is made up of any number of lines. (1 line, 2 lines, 5 lines or any). One thing for sure is that each record has 24 fields ...

Gaurav Gandhi

101

asked Jun 14, 2015 at 21:42

0 votes

1 answer

361 views

How does mapper run() method process the last record?

public void run(Context context) throws IOException, InterruptedException { setup(context); while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } ...

Vignesh I

2,221

asked Mar 17, 2015 at 17:14

0 votes

2 answers

1k views

Reading a record broken down into two lines because of /n in MapReduce

I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields. For Eg 1,2,3,4("," can be there or not) ,5,6,7,8 My ...

Aviral Kumar

824

asked Jan 21, 2015 at 13:18

0 votes

1 answer

2k views

jackson jsonparser restart parsing in broken JSON

I am using Jackson to process JSON that comes in chunks in Hadoop. That means, they are big files that are cut up in blocks (in my problem it's 128M but it doesn't really matter). For efficiency ...

xmar

1,809

asked Dec 4, 2014 at 16:13

2 votes

1 answer

932 views

Mapreduce combinefileinputformat java.lang.reflect.InvocationTargetException while two jobs access same data

The Hadoop Mapreduce CombineFileInputFormat works great when it comes to read a lot of small size files, however i have been noticing that sometimes the job gets failed with the following exception, ...

Harshit Mathur

76

asked Nov 25, 2014 at 5:24

1 vote

1 answer

1k views

Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks

I am implementing a JSON RecordReader in Hadoop with Jackson. By now I am testing locally with JUnit + MRUnit. The JSON files contain one object each, that after some headers, it has a field whose ...

xmar

1,809

asked Nov 7, 2014 at 14:04

1 vote

1 answer

3k views

mapreduce.TextInputFormat hadoop

I am a hadoop beginner. I came across this custom RecordReader program which reads 3 lines at a time and outputs the number of times a 3 line input was given to the mapper. I am able to understand ...

knk

79

asked Aug 16, 2014 at 12:55

0 votes

0 answers

907 views

Hadoop Record Reader only reads first line then input stream seems to be closed

I'm trying to implement a hadoop job, that counts how often a object (Click) appears in a dataset. Therefore i wrote a custom file input format. The record reader seems to read only the first line of ...

user2450954

29

asked Jul 30, 2014 at 11:55

2 votes

0 answers

2k views

Premature EOF from inputStream in Hadoop

I want to read big files in Hadoop, block by block (not line by line), where each block is of size nearly 5 MB. For that I have written a custom recordreader. But it gives me a error Premature EOF ...

user2758378

91

asked Jun 14, 2014 at 10:04

0 votes

1 answer

773 views

Hadoop Map reduce Testing - custom record reader

I have written a custom record reader and looking for sample test code to test my custom reader using MRUnit or any other testing framework. Its working fine as per the functionality but I would like ...

user3404493

31

asked Mar 11, 2014 at 4:14

1 vote

1 answer

758 views

Hadoop - Multiple Files from Record Reader to Map Function

I have implemented a custom Combine File Input Format in order to create splits for Map task composed by group of files. I created a solution which passes each file of the split through record reader ...

jomazz

23

asked Feb 18, 2014 at 14:43

0 votes

2 answers

763 views

Custom RecordReader initialize not called

I've recently started messing with Hadoop and just created my own inputformat to handle pdf's. For some reason my custom RecordReader class doesn't have it's initialize method called. (checked it ...

zim

11

asked Feb 18, 2014 at 13:43

1 vote

0 answers

239 views

Custom record reader for custom binary format

In Hadoop v2 I need to create a RecordReader and/or an InputFormat based on some large binary formats stored in HDFS. The files are basically concatenated records with the following structure: 4-byte ...

Ken Williams

23.9k

asked Jan 2, 2014 at 20:28

1 vote

0 answers

82 views

splits in map reduce jobs

I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which ...

user3065762

11

asked Dec 4, 2013 at 12:20

4 votes

4 answers

2k views

How to delete input files after successful mapreduce

We have a system that receives archives on a specified directory and on a regular basis it launches a mapreduce job that opens the archives and processes the files within them. To avoid re-processing ...

Brian

372

asked Sep 26, 2013 at 1:19

0 votes

1 answer

258 views

Hadoop RawLocalFileSystem and getPos

I've found that the getPos in the RawLocalFileSystem's input stream can throw a null pointer exception if its underlying stream is closed. I discovered this when playing with a custom record reader. ...

jayunit100

17.6k

asked Sep 9, 2013 at 23:58

Collectives™ on Stack Overflow

Image to image DataSetIterator using dl4j

How does RecordReader send data to mapper in Hadoop

Custom records reader PST file format in Spark Scala

How to read a simple CSV file with Datavec

MapReduce basics

How to create splits from a sequence file in Hadoop?

Java Code to Open a password protected zip file which is opening only with 7zx and keka in mac OS

Hadoop 2: Empty result when using custom InputFormat

How to make Hadoop MR to read only files instead of folders in input path

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

Pdf Preserve Layout to Text Haoop Mapreduce

Concept of RecordReaders

Hadoop Mapreduce with compressed/encrypted files (file of large size)

passing arguments to record reader in mapreduce hadoop

Hadoop custom record reader implementation

How does hadoop RecordReader identify records

Hadoop MapReduce RecordReader Implementation Necessary?

How to process Multiline CSV Input File for Map Reduce Hadoop?

How does mapper run() method process the last record?

Reading a record broken down into two lines because of /n in MapReduce

jackson jsonparser restart parsing in broken JSON

Mapreduce combinefileinputformat java.lang.reflect.InvocationTargetException while two jobs access same data

Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks

mapreduce.TextInputFormat hadoop

Hadoop Record Reader only reads first line then input stream seems to be closed

Premature EOF from inputStream in Hadoop

Hadoop Map reduce Testing - custom record reader

Hadoop - Multiple Files from Record Reader to Map Function

Custom RecordReader initialize not called

Custom record reader for custom binary format

splits in map reduce jobs

How to delete input files after successful mapreduce

Hadoop RawLocalFileSystem and getPos

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags