Unit 1 Haoop Architecture
Unit 1 Haoop Architecture
Unit 1 Haoop Architecture
INTRODUCTION TO HADOOP
UNIT I
BE COMP SEM VII
Contents
• Distributed System
• DFS
• Hadoop
• Why its is needed?
• Issues
• Mutate / lease
Hadoop
What is Hadoop?
It's a framework for running applications on large clusters of
commodity hardware which produces huge data and to
process it
Apache Software Foundation Project
Open source
Spark -Using Spark, you have the option to load data from HDFS or
Hbase into the memory of the cluster nodes for faster processing
Conclusion
• Why commodity hw ?
because cheaper
designed to tolerate faults
• Why HDFS ?
network bandwidth vs seek latency
• Why Map reduce programming model?
parallel programming
large data sets
moving computation to data
single compute + data cluster