33 questions
-1
votes
1
answer
236
views
Image to image DataSetIterator using dl4j
I would like to use DeepLearning4j to build and train a U-Net network. To do this I need a dataset iterator that feed the network with an image in input and an other image in output.
I am new to DL4j ...
1
vote
1
answer
221
views
How does RecordReader send data to mapper in Hadoop
I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Pattern. My question ...
1
vote
0
answers
206
views
Custom records reader PST file format in Spark Scala
I am working on PST files, I have worked on writing custom record reader for a Mapreduce program for different input formats but this time it is going to be spark.
I am not getting any clue or ...
1
vote
1
answer
680
views
How to read a simple CSV file with Datavec
I want to read a simple CSV file with just a list of numbers using Datavec, for use within Deeplearning4j.
I've tried numerous examples but keep getting errors.
e.g. when I execute this:
...
0
votes
1
answer
30
views
MapReduce basics
I have a text file of 300mb with block size of 128mb.
So total 3 blocks 128+128+44 mb would be created.
Correct me - For map reduce default input split is same as block size that is 128mb which can ...
2
votes
1
answer
716
views
How to create splits from a sequence file in Hadoop?
In Hadoop, I have a sequence file of 3GB size. I want to process it in parallel. Therefore, I am going to create 8 maptasks and hence 8 FileSplits.
FileSplit class has constructors that require the:
...
2
votes
1
answer
3k
views
Java Code to Open a password protected zip file which is opening only with 7zx and keka in mac OS
I have a password protected zip file which is opening only with 7zx and keka in mac.
I have to write the code in java to decompress password protected zip file and then do some operation on it.
I have ...
0
votes
1
answer
135
views
Hadoop 2: Empty result when using custom InputFormat
I want to use a own FileInputFormat with a custom RecordReader to read csv data into <Long><String> pairs.
Therefore I created the class MyTextInputFormat:
import java.io.IOException;
...
1
vote
3
answers
884
views
How to make Hadoop MR to read only files instead of folders in input path
As per our requirement, the output of one job will be the input of other job.
By using Multiple outputs concepts we are creating a new folder in output path and writing those records into folder. ...
0
votes
3
answers
2k
views
How do I convert EBCDIC to TEXT using Hadoop Mapreduce
I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:
InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/...
1
vote
0
answers
19
views
Pdf Preserve Layout to Text Haoop Mapreduce
I need to convert a PDFPreserveLayout to text file in Mapreduce,I am using PDFBOX to convert a normal pdf file to text file,but it is not working for pdfpreservelayout.
Can any one help in solving ...
0
votes
1
answer
61
views
Concept of RecordReaders
We know that prior to Mapper phase the files are split and the RecordReader starts working to emit a input to the Mapper. My question is whether the reducer uses a RecordReader class to read the data ...
1
vote
1
answer
799
views
Hadoop Mapreduce with compressed/encrypted files (file of large size)
I have hdfs cluster which stores large csv files in a compressed/encrypted form as selected by end user.
For compression, encryption, I have create a wrapper input stream which feed data to HDFS in ...
0
votes
2
answers
791
views
passing arguments to record reader in mapreduce hadoop
This is my code for using variours arg
import java.io.File;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache....
0
votes
3
answers
3k
views
Hadoop custom record reader implementation
I'm finding hard time in understanding the flow of what is happening in nextKeyValue() method explained in the below link:
http://analyticspro.org/2012/08/01/wordcount-with-custom-record-reader-of-...
1
vote
1
answer
99
views
How does hadoop RecordReader identify records
When processing text file how does hadoop identify records ?
Is it based on newline characters or full stops ?
If I have a text file list of 5000 words, all on single line, separated by space; no new ...
4
votes
2
answers
2k
views
Hadoop MapReduce RecordReader Implementation Necessary?
From the Apache doc on the Hadoop MapReduce InputFormat Interface:
"[L]ogical splits based on input-size is insufficient for many
applications since record boundaries are to be respected. In such
...
1
vote
0
answers
774
views
How to process Multiline CSV Input File for Map Reduce Hadoop?
I have CSV input data file in which there are several records. Each Record is made up of any number of lines. (1 line, 2 lines, 5 lines or any). One thing for sure is that each record has 24 fields ...
0
votes
1
answer
361
views
How does mapper run() method process the last record?
public void run(Context context) throws IOException, InterruptedException
{
setup(context);
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
...
0
votes
2
answers
1k
views
Reading a record broken down into two lines because of /n in MapReduce
I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields.
For Eg
1,2,3,4("," can be there or not)
,5,6,7,8
My ...
0
votes
1
answer
2k
views
jackson jsonparser restart parsing in broken JSON
I am using Jackson to process JSON that comes in chunks in Hadoop. That means, they are big files that are cut up in blocks (in my problem it's 128M but it doesn't really matter).
For efficiency ...
2
votes
1
answer
932
views
Mapreduce combinefileinputformat java.lang.reflect.InvocationTargetException while two jobs access same data
The Hadoop Mapreduce CombineFileInputFormat works great when it comes to read a lot of small size files, however i have been noticing that sometimes the job gets failed with the following exception,
...
1
vote
1
answer
1k
views
Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks
I am implementing a JSON RecordReader in Hadoop with Jackson.
By now I am testing locally with JUnit + MRUnit.
The JSON files contain one object each, that after some headers, it has a field whose ...
1
vote
1
answer
3k
views
mapreduce.TextInputFormat hadoop
I am a hadoop beginner. I came across this custom RecordReader program which reads 3 lines at a time and outputs the number of times a 3 line input was given to the mapper.
I am able to understand ...
0
votes
0
answers
907
views
Hadoop Record Reader only reads first line then input stream seems to be closed
I'm trying to implement a hadoop job, that counts how often a object (Click) appears in a dataset.
Therefore i wrote a custom file input format. The record reader seems to read only the first line of ...
2
votes
0
answers
2k
views
Premature EOF from inputStream in Hadoop
I want to read big files in Hadoop, block by block (not line by line), where each block is of size nearly 5 MB. For that I have written a custom recordreader. But it gives me a error Premature EOF ...
0
votes
1
answer
773
views
Hadoop Map reduce Testing - custom record reader
I have written a custom record reader and looking for sample test code to test my custom reader using MRUnit or any other testing framework. Its working fine as per the functionality but I would like ...
1
vote
1
answer
758
views
Hadoop - Multiple Files from Record Reader to Map Function
I have implemented a custom Combine File Input Format in order to create splits for Map task composed by group of files. I created a solution which passes each file of the split through record reader ...
0
votes
2
answers
763
views
Custom RecordReader initialize not called
I've recently started messing with Hadoop and just created my own inputformat to handle pdf's.
For some reason my custom RecordReader class doesn't have it's initialize method called. (checked it ...
1
vote
0
answers
239
views
Custom record reader for custom binary format
In Hadoop v2 I need to create a RecordReader and/or an InputFormat based on some large binary formats stored in HDFS. The files are basically concatenated records with the following structure:
4-byte ...
1
vote
0
answers
82
views
splits in map reduce jobs
I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which ...
4
votes
4
answers
2k
views
How to delete input files after successful mapreduce
We have a system that receives archives on a specified directory and on a regular basis it launches a mapreduce job that opens the archives and processes the files within them. To avoid re-processing ...
0
votes
1
answer
258
views
Hadoop RawLocalFileSystem and getPos
I've found that the getPos in the RawLocalFileSystem's input stream can throw a null pointer exception if its underlying stream is closed.
I discovered this when playing with a custom record reader.
...