Hadoop Interviews Q
Hadoop Interviews Q
Hadoop Interviews Q
For processing large data sets in parallel across a hadoop cluster, Hadoop MapReduce
framework is used. Data analysis uses a two-step map and reduce process.
To increase the efficiency of MapReduce Program, Combiners are used. The amount
of data can be reduced with the help of combiner’s that need to be transferred across
to the reducers. If the operation performed is commutative and associative you can use
your reducer code as a combiner. The execution of combiner is not guaranteed in
Hadoop
13) Explain what is difference between an Input Split and HDFS Block?
Logical division of data is known as Split while physical division of data is known as
HDFS Block
15) Mention what are the main configuration parameters that user need to
specify to run Mapreduce Job ?
The user of Mapreduce framework needs to specify
RDBMS Hadoop
It used for OLTP processing whereas Hadoop It is currently used for analytical
and for BIG DATA processing
In RDBMS, the database cluster uses the same data files In Hadoop, the storage data can be
stored in shared storage stored independently in each
processing node.
You need to preprocess data before storing it you don’t need to preprocess data
before storing it
HDFS
MapReduce
24) What is NameNode in Hadoop?
NameNode in Hadoop is where Hadoop stores all the file location information in
HDFS. It is the master node on which job tracker runs and consists of metadata.
Pig
Hive
26) Mention what is the data storage component used by Hadoop?
The data storage component used by Hadoop is HBase.
27) Mention what are the most common input formats defined in Hadoop?
The most common input formats defined in Hadoop are;
TextInputFormat
KeyValueInputFormat
SequenceFileInputFormat
28) In Hadoop what is InputSplit?
It splits input files into chunks and assign each split to a mapper for processing.
29) For a Hadoop job, how will you write a custom partitioner?
You write a custom partitioner for a Hadoop job, you follow the following path
core-site.xml
mapred-site.xml
hdfs-site.xml
36) Explain how can you check whether Namenode is working beside using the
jps command?
Beside using the jps command, to check whether Namenode are working you can also
use
/etc/init.d/hadoop-0.20-namenode status.
In Hadoop, a reducer collects the output generated by the mapper, processes it, and
creates a final output of its own.
42) Mention what daemons run on a master node and slave nodes?
The storage node is the machine or computer where your file system resides to
store the processing data
The compute node is the computer or machine where your actual business logic
will be executed.
45) Mention what is the use of Context Object?
The Context Object enables the mapper to interact with the rest of the Hadoop
system. It includes configuration data for the job, as well as interfaces which allow it
to emit output.
49) Explain how is data partitioned before it is sent to the reducer if no custom
partitioner is defined in Hadoop?
If no custom partitioner is defined in Hadoop, then a default partitioner computes a
hash value for the key and assigns the partition based on the result.
50) Explain what happens when Hadoop spawned 50 tasks for a job and one of
the task failed?
It will restart the task again on some other TaskTracker if the task fails more than the
defined limit.
51) Mention what is the best way to copy files between HDFS clusters?
The best way to copy files between HDFS clusters is by using multiple nodes and the
distcp command, so the workload is shared.
53) Mention how Hadoop is different from other data processing tools?
In Hadoop, you can increase or decrease the number of mappers without worrying
about the volume of data to be processed.
55) Mention what is the Hadoop MapReduce APIs contract for a key and value
class?
For a key and value class, there are two Hadoop MapReduce APIs contract
The text input format will create a line object that is an hexadecimal number. The
value is considered as a whole line text while the key is considered as a line object.
The mapper will receive the value as ‘text’ parameter while key as ‘longwriteable’
parameter.