Lecture 4 Introduction to Hadoop
Lecture 4 Introduction to Hadoop
Lecture 4 Introduction to Hadoop
Industrial
HDFS
Evolution of Applications Hadoop
Component
Hadoop of Hadoop Architecture
Discussion
Session 1. History and Introduction to Hadoop
9/14/2024 5
Inventor of Hadoop
The Background:
•Lucene and Nutch: Doug Cutting originally worked on a text search library,
and later on Nutch, a web crawler. Nutch was designed to index and search
the web, but it needed to handle the massive scale of the web, which
required a more robust and distributed system for data processing and
storage.
Development of Hadoop:
•Initial Implementation (2004-2005): Doug Cutting, along with his
collaborator began developing an open-source version of the distributed file
system and MapReduce framework, initially as part of the Nutch project.
They aimed to create a scalable, reliable, and fault-tolerant system for
processing and storing large datasets.
Inventor of Hadoop
•Naming Hadoop: The project was named "Hadoop" after Doug
Cutting's son's toy elephant, reflecting the idea of something large and
capable of handling big tasks.
9/14/2024 7
Keyword ‘Hadoop’ Search on Google
8/24
1.1.1 Introduction to Hadoop
9/24
Hadoop Usage
9/14/2024 10
Industrial Applications of Hadoop
Electronic Health Records (EHR): Healthcare providers
use Hadoop to manage and analyze massive volumes of
patient data, helping in personalized treatment plans and
improving patient outcomes.
https://youtu.be/S4RL6prqtGQ
12
While Developing Hadoop Two Major Concerns for the Team of Doug Cutting
16/2
2.1.1 Introduction to HDFS
The file in HDFS is split into large block size of 64 to
128 MB by default and each block of the file is
independently replicated at multiple data nodes.
17/2
HDFS Components:
There are two major components of Hadoop HDFS- NameNode and
DataNode
i. NameNode
It is also known as Master node.
NameNode does not store actual data or dataset. NameNode stores
Metadata i.e. number of blocks, their location, on which Rack, which
Datanode the data is stored and other details. It consists of files and
directories.
19/2
2.1.2 HDFS Architecture (Read-Write Operations)
• This figure demonstrate how to read a file store at datanode in HDFS architecture.
2.1.2 HDFS Architecture (contd..)
• This figure demonstrate how to read a file store at datanode in HDFS architecture.
2.1.6 HDFS Basic Commands
Command Description
hdfs dfs version HDFS version
23/2
Thanks Note
25
tungal/presentations/ad2012