004 - Hadoop Daemons (HDFS Only)
004 - Hadoop Daemons (HDFS Only)
004 - Hadoop Daemons (HDFS Only)
There are 5 different daemons in MRv1. These are the whole and soul of Hadoop to make it work
properly. Each daemon has a significance of its own to make Hadoop work properly.
The 5 daemons are:
1)
2)
3)
4)
5)
Namenode
Datanode
Secondary Namenode
Job Tracker and
Task Tracker
Namenode:
Namenode is a master node that manages the filesystem namespace. It contains all information
of all files and their blocks stored in data nodes. It stores all the metadata of all the files and
directories. It is stored in two files namespace image and edit log.
Namenode is the first point of contact for any process to access data. This redirects the
request to actual data path. It is also the single point of failure. Namespace image is the snapshot
of all the Namenode stored metadata. Edit log is the file that contains recent changes and are
merged into namespace image periodically by secondary namenode.
Namenode is responsible for maintaining the replication factor because it has the
information of the data and their replications.
Datanode:
Datanode is the slave node for Namenode. It actually stores the blocks which contain data. It
retrieves the data on the order of Namenode. Datanodes report back to Namenode their status
and the blocks information.
Secondary Namenode:
This acts as a virtual Namenode but it is not primary. The major misconception is that when
Namenode fails this node comes into play and takes care of everything. In fact, Secondary
namenode does not takeover.
The job of this node is to load the namespace image immediately when the namenode is failed
and on restart of the namenode. All these namespace images are stored in both namenode and
secondary namenode on their local filesystems, but not on HDFS.
All the above three daemons are called HDFS daemons, because they
perform all HDFS related operations.
File Read:
File Write: