UNIT 2-tt1
UNIT 2-tt1
UNIT 2-tt1
ARCHITECTURE
Explain the core components of Hadoop. (5 Marks).
Following are the Core Components of Hadoop Architecture:-
1. Hadoop Distributed File System (HDFS)
One of the most critical components of Hadoop architecture is the Hadoop Distributed File System
(HDFS). HDFS is the primary storage system used by Hadoop applications. It’s designed to scale to
petabytes of data and runs on commodity hardware. What sets HDFS apart is its ability to maintain
large data sets across multiple nodes in a distributed computing environment.
HDFS operates on the basic principle of storing large files across multiple machines. It achieves high
throughput by dividing large data into smaller blocks, which are managed by different nodes in the
network. This nature of HDFS makes it an ideal choice for applications with large data sets.
2. Yet Another Resource Negotiator (YARN)
Yet Another Resource Negotiator (YARN) is responsible for managing resources in the cluster and
scheduling tasks for users. It is a key element in Hadoop architecture as it allows multiple data
processing engines such as interactive processing, graph processing, and batch processing to handle
data stored in HDFS.
YARN separates the functionalities of resource management and job scheduling into separate daemons.
This design ensures a more scalable and flexible Hadoop architecture, accommodating a broader array
of processing approaches and a wider array of applications.
3. MapReduce Programming Model
MapReduce is a programming model integral to Hadoop architecture. It is designed to process large
volumes of data in parallel by dividing the work into a set of independent tasks. The MapReduce model
simplifies the processing of vast data sets, making it an indispensable part of Hadoop.
MapReduce is characterized by two primary tasks, Map and Reduce. The Map task takes a set of data
and converts it into another set of data, where individual elements are broken down into tuples. On the
other hand, the Reduce task takes the output from the Map as input and combines those tuples into a
smaller set of tuples.
4. Hadoop Common
Hadoop Common, often referred to as the ‘glue’ that holds Hadoop architecture together, contains
libraries and utilities needed by other Hadoop modules. It provides the necessary Java files and scripts
required to start Hadoop. This component plays a crucial role in ensuring that the hardware failures are
managed by the Hadoop framework itself, offering a high degree of resilience and reliability.