004 - Hadoop Daemons (HDFS Only)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Hadoop Daemons

There are 5 different daemons in MRv1. These are the whole and soul of Hadoop to make it work
properly. Each daemon has a significance of its own to make Hadoop work properly.
The 5 daemons are:
1)
2)
3)
4)
5)

Namenode
Datanode
Secondary Namenode
Job Tracker and
Task Tracker

Namenode:
Namenode is a master node that manages the filesystem namespace. It contains all information
of all files and their blocks stored in data nodes. It stores all the metadata of all the files and
directories. It is stored in two files namespace image and edit log.
Namenode is the first point of contact for any process to access data. This redirects the
request to actual data path. It is also the single point of failure. Namespace image is the snapshot
of all the Namenode stored metadata. Edit log is the file that contains recent changes and are
merged into namespace image periodically by secondary namenode.
Namenode is responsible for maintaining the replication factor because it has the
information of the data and their replications.
Datanode:
Datanode is the slave node for Namenode. It actually stores the blocks which contain data. It
retrieves the data on the order of Namenode. Datanodes report back to Namenode their status
and the blocks information.
Secondary Namenode:
This acts as a virtual Namenode but it is not primary. The major misconception is that when
Namenode fails this node comes into play and takes care of everything. In fact, Secondary
namenode does not takeover.
The job of this node is to load the namespace image immediately when the namenode is failed
and on restart of the namenode. All these namespace images are stored in both namenode and
secondary namenode on their local filesystems, but not on HDFS.

Goutam Tadi [email protected]

All the above three daemons are called HDFS daemons, because they
perform all HDFS related operations.
File Read:

1 - Client uses open() on FileSystem Object which is an instance of DistributedFileSystem


object which calls the namenode.
2 This instance gets the block locations of the first few blocks. Namenode sends the
addresses of datanodes that have a block in a sorted order of proximity to client.
3 DistributedFileSystem returns FSDataInputStream to client for the data to read which
gives away DFSInputStream which manages datanode and namenode. Client calls read()
on DFSInputStream to read data.
4 DFSInputStream calls read() repeatedly on the datanode using the address received
from namenode and streams data.
5 - When the read is completed on first block then DFSInputStream reads from another
block which may present on another node and streams continuously without any delay.
Simultaneously, this DFSInputStream contacts namenode for remaining datanode
addresses for remaining blocks.
6 When the read is finished client calls close() to end the read operation.
Goutam Tadi [email protected]

File Write:

1 The client creates a file by calling create() on DistributedFileSystem .


2 The DistributedFileSystem calls namenode to create a file with no block associated.
Namenode checks for the permissions and duplications in File System and then confirms
to create a new file by providing FSDataOutputStream object which initializes
DFSOutputStream.
3 DFSOutputStream splits up the data into packets and writes to internal dataqueue. DataStreamer consumes the queue, obtains blocks from namenode. These
blocks (datanode) form a pipeline.
4 - DataStreamer streams packets to first datanode. This datanode saves the packet and
forwards it to second, the seconds stores and forwards it to third to maintain default
replication factor of 3.
5 - DFSOutputStream also maintains an ack queue for the sent packets to receive
acknowledgement. A packet is removed from ack queue when an acknowledgement is
received.
6 When the client finished writing, it calls close(). This waits until all the packets in ack
queue are removed and waits for the signal from namenode that write is complete.
Namenode will the locations of the blocks and datanodes from DataStreamer.

Goutam Tadi [email protected]

You might also like