Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
1 answer
236 views

Image to image DataSetIterator using dl4j

I would like to use DeepLearning4j to build and train a U-Net network. To do this I need a dataset iterator that feed the network with an image in input and an other image in output. I am new to DL4j ...
Mateo Lopez's user avatar
1 vote
1 answer
221 views

How does RecordReader send data to mapper in Hadoop

I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Pattern. My question ...
Irvan's user avatar
  • 175
1 vote
0 answers
206 views

Custom records reader PST file format in Spark Scala

I am working on PST files, I have worked on writing custom record reader for a Mapreduce program for different input formats but this time it is going to be spark. I am not getting any clue or ...
BARATH's user avatar
  • 372
1 vote
1 answer
680 views

How to read a simple CSV file with Datavec

I want to read a simple CSV file with just a list of numbers using Datavec, for use within Deeplearning4j. I've tried numerous examples but keep getting errors. e.g. when I execute this: ...
Thorin's user avatar
  • 51
0 votes
1 answer
30 views

MapReduce basics

I have a text file of 300mb with block size of 128mb. So total 3 blocks 128+128+44 mb would be created. Correct me - For map reduce default input split is same as block size that is 128mb which can ...
Boron's user avatar
  • 1
2 votes
1 answer
716 views

How to create splits from a sequence file in Hadoop?

In Hadoop, I have a sequence file of 3GB size. I want to process it in parallel. Therefore, I am going to create 8 maptasks and hence 8 FileSplits. FileSplit class has constructors that require the: ...
Mosab Shaheen's user avatar
2 votes
1 answer
3k views

Java Code to Open a password protected zip file which is opening only with 7zx and keka in mac OS

I have a password protected zip file which is opening only with 7zx and keka in mac. I have to write the code in java to decompress password protected zip file and then do some operation on it. I have ...
mahan07's user avatar
  • 907
0 votes
1 answer
135 views

Hadoop 2: Empty result when using custom InputFormat

I want to use a own FileInputFormat with a custom RecordReader to read csv data into <Long><String> pairs. Therefore I created the class MyTextInputFormat: import java.io.IOException; ...
D. Müller's user avatar
  • 3,426
1 vote
3 answers
884 views

How to make Hadoop MR to read only files instead of folders in input path

As per our requirement, the output of one job will be the input of other job. By using Multiple outputs concepts we are creating a new folder in output path and writing those records into folder. ...
Abhinay's user avatar
  • 635
0 votes
3 answers
2k views

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below: InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/...
Barath's user avatar
  • 107
1 vote
0 answers
19 views

Pdf Preserve Layout to Text Haoop Mapreduce

I need to convert a PDFPreserveLayout to text file in Mapreduce,I am using PDFBOX to convert a normal pdf file to text file,but it is not working for pdfpreservelayout. Can any one help in solving ...
Barath's user avatar
  • 107
0 votes
1 answer
61 views

Concept of RecordReaders

We know that prior to Mapper phase the files are split and the RecordReader starts working to emit a input to the Mapper. My question is whether the reducer uses a RecordReader class to read the data ...
Aniruddha Sinha's user avatar
1 vote
1 answer
799 views

Hadoop Mapreduce with compressed/encrypted files (file of large size)

I have hdfs cluster which stores large csv files in a compressed/encrypted form as selected by end user. For compression, encryption, I have create a wrapper input stream which feed data to HDFS in ...
Punit's user avatar
  • 11
0 votes
2 answers
791 views

passing arguments to record reader in mapreduce hadoop

This is my code for using variours arg import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache....
Barath's user avatar
  • 107
0 votes
3 answers
3k views

Hadoop custom record reader implementation

I'm finding hard time in understanding the flow of what is happening in nextKeyValue() method explained in the below link: http://analyticspro.org/2012/08/01/wordcount-with-custom-record-reader-of-...
bigdata123's user avatar
1 vote
1 answer
99 views

How does hadoop RecordReader identify records

When processing text file how does hadoop identify records ? Is it based on newline characters or full stops ? If I have a text file list of 5000 words, all on single line, separated by space; no new ...
Kaushik Lele's user avatar
  • 6,639
4 votes
2 answers
2k views

Hadoop MapReduce RecordReader Implementation Necessary?

From the Apache doc on the Hadoop MapReduce InputFormat Interface: "[L]ogical splits based on input-size is insufficient for many applications since record boundaries are to be respected. In such ...
AST's user avatar
  • 211
1 vote
0 answers
774 views

How to process Multiline CSV Input File for Map Reduce Hadoop?

I have CSV input data file in which there are several records. Each Record is made up of any number of lines. (1 line, 2 lines, 5 lines or any). One thing for sure is that each record has 24 fields ...
Gaurav Gandhi's user avatar
0 votes
1 answer
361 views

How does mapper run() method process the last record?

public void run(Context context) throws IOException, InterruptedException { setup(context); while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } ...
Vignesh I's user avatar
  • 2,221
0 votes
2 answers
1k views

Reading a record broken down into two lines because of /n in MapReduce

I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields. For Eg 1,2,3,4("," can be there or not) ,5,6,7,8 My ...
Aviral Kumar's user avatar
0 votes
1 answer
2k views

jackson jsonparser restart parsing in broken JSON

I am using Jackson to process JSON that comes in chunks in Hadoop. That means, they are big files that are cut up in blocks (in my problem it's 128M but it doesn't really matter). For efficiency ...
xmar's user avatar
  • 1,809
2 votes
1 answer
932 views

Mapreduce combinefileinputformat java.lang.reflect.InvocationTargetException while two jobs access same data

The Hadoop Mapreduce CombineFileInputFormat works great when it comes to read a lot of small size files, however i have been noticing that sometimes the job gets failed with the following exception, ...
Harshit Mathur's user avatar
1 vote
1 answer
1k views

Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks

I am implementing a JSON RecordReader in Hadoop with Jackson. By now I am testing locally with JUnit + MRUnit. The JSON files contain one object each, that after some headers, it has a field whose ...
xmar's user avatar
  • 1,809
1 vote
1 answer
3k views

mapreduce.TextInputFormat hadoop

I am a hadoop beginner. I came across this custom RecordReader program which reads 3 lines at a time and outputs the number of times a 3 line input was given to the mapper. I am able to understand ...
knk's user avatar
  • 79
0 votes
0 answers
907 views

Hadoop Record Reader only reads first line then input stream seems to be closed

I'm trying to implement a hadoop job, that counts how often a object (Click) appears in a dataset. Therefore i wrote a custom file input format. The record reader seems to read only the first line of ...
user2450954's user avatar
2 votes
0 answers
2k views

Premature EOF from inputStream in Hadoop

I want to read big files in Hadoop, block by block (not line by line), where each block is of size nearly 5 MB. For that I have written a custom recordreader. But it gives me a error Premature EOF ...
user2758378's user avatar
0 votes
1 answer
773 views

Hadoop Map reduce Testing - custom record reader

I have written a custom record reader and looking for sample test code to test my custom reader using MRUnit or any other testing framework. Its working fine as per the functionality but I would like ...
user3404493's user avatar
1 vote
1 answer
758 views

Hadoop - Multiple Files from Record Reader to Map Function

I have implemented a custom Combine File Input Format in order to create splits for Map task composed by group of files. I created a solution which passes each file of the split through record reader ...
jomazz's user avatar
  • 23
0 votes
2 answers
763 views

Custom RecordReader initialize not called

I've recently started messing with Hadoop and just created my own inputformat to handle pdf's. For some reason my custom RecordReader class doesn't have it's initialize method called. (checked it ...
zim's user avatar
  • 11
1 vote
0 answers
239 views

Custom record reader for custom binary format

In Hadoop v2 I need to create a RecordReader and/or an InputFormat based on some large binary formats stored in HDFS. The files are basically concatenated records with the following structure: 4-byte ...
Ken Williams's user avatar
  • 23.9k
1 vote
0 answers
82 views

splits in map reduce jobs

I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which ...
user3065762's user avatar
4 votes
4 answers
2k views

How to delete input files after successful mapreduce

We have a system that receives archives on a specified directory and on a regular basis it launches a mapreduce job that opens the archives and processes the files within them. To avoid re-processing ...
Brian's user avatar
  • 372
0 votes
1 answer
258 views

Hadoop RawLocalFileSystem and getPos

I've found that the getPos in the RawLocalFileSystem's input stream can throw a null pointer exception if its underlying stream is closed. I discovered this when playing with a custom record reader. ...
jayunit100's user avatar
  • 17.6k