Storage: BDA Asignment-1 - Diagran Processing
Storage: BDA Asignment-1 - Diagran Processing
Storage: BDA Asignment-1 - Diagran Processing
BDA Asignment-1|
What do you understand by HDFS 7 Ezplain its
-nts wth neat cornpone
a
diagran
The Hadoop Dishibuted Ftle System (HDFS) cwas
designed for blg data processing
I t is a core
part of Haodoop which is Used for data
Storage
I t fs destgned to Tan on commodity hardware
HOFS Components
Data Node
Seconda29
Name Nade
Data node
Ccheckpo
or backep
node)
Name vode.
Explain he Cencept of HOFS block eplication using
Suttable Example.
The destgn ef HDES 1s basel en two types of nodes
a Name Node and multiple Datanodes
.For Ex ample
-
Data Node
Data Block
and 2
3
and2
2 3
ond3
3
3 3 anI
hen HOfs cwiles a sle, 1 1s replicafed a he
cluste
For
For Hadoop elusters containing more than eght Datornk
the cplicaien value is suallg set to 3.
. In a
Hadoop cluster of elghf or fe ewer Datavocdes bu-
more than one Data Node, a vphcaion factor of 9 is
adequale.
The amount of he value ef
aeplcaten s
basc en
dfs.
replicahon in the hdts-stte -Zml le.
.This defaul value con be overrule d eih he hdfs
dfs- se brep common
aBiefly Ceplai, HDES NameNode ecleraion,. NFS Gatewa
Snopshets checkpeints & backups
HOFS Namervode Federation
Older verSions of HDFS prOvided a Single namespace fo»
the enire cluskr mon aqed by a Single lameVode,
of hAe
he Size namespace.
F e de raion adclresses his hmitaion by adding saPport for
mu Hip le Name tYoles Namespaces fo he HDAs file syskm
The key bentfits ave as tollows
-NameSpace Scalability
Bet er Per formance
System isolation
mount point
HDFS Snapshots
int sum- o
for Cntwritable val values)
Sum valgetO
zesult .
Set (sum)
c o n t a t. ewrite ( keg, resu t ) ;
pablic
pablic Staic veid main (skingCJ args) h r o w s Ezcepion t
Steps
. Make Local word Countu classes direc tor
mkdir cword count. classes
wordCount J a r a Program, s e he hadoop
. Comp:lo h e
Classpath'
javoc -cp 'hedoop closspath ' - d cwordcount. ooses Wesd countjava
3. Create a java arehive o r dichbuhion
follawrng
three Steps
.aenerating the 1rput data Via kTaGon Prejrem
y a r n Ja $HADOOP. ExAMPLEsfhadoep-mapvecderce
ex
amples jar teraserlfaseHhdfeferagen-
teragen Soooooo0o J u s e r f h d f s / T e r a e n - So
pertonmane
.For Ezample. he following Command will inskuef terasorFouse
four re ducer tasks
Re clucer Proqram
Cword- Nene
else :
if ceTrent-cword
Prnt'S\t/S' Ccorrent-woTd,cezrent-ount
Carent-count= count
Currert-word word
cuwent-count)
pint ' s \t /s' Ccurren - word,
S e n e r a l HDFS commands
,hdfs verslen
-Haloop 2 6. o . . 2 . 4 3-2
.hdfs dts
list alll commmands n HDFS
HD FS User commands:
Lists files i i n HDESs:
hdfs f s -Is/
-
lists files in the root HDPS brecoy
hdfs dfs -Is/ hdfs ds -Is/ user/hdfs
- lists tthe files in user bome diretory
to HDFS
Cepy F:les
test stoff
hd fe dfs - put
Ftles from HDES
Copg
bdts dfs - set stoff/ test test-local
Delete Ele
P e r n ancnE
deletes he Fle
8Explain Hadoop Ecosystem using neat sketch.
. Apache
pig.
language hat enables
Apache pig is a
hgh-level
progTammers to uwrite Complex mapRe duce ransfom
aion sn19 a Simple scriphng language.
enguage) le ines a set of
Lain h e ackua/
Pig such a 2 9 TEg ade j o n
nadata et as
transomatons
and sort
Pig
Apache Hve
Hive L proess
Engine
Meta Store EXecuhon Engine
Map Reduce
Hive PProvides
ser whob are aleady famlsar
eapabil: ty to query the data en Hadoop
wth sQL the
chusterS
Apache Flume
Apachee
elume 1s an independent aqent designed tocola
and shore data nto HFS
ransport
.often hansport involves a nmber Of flume age
data
hat may raverse a Seies 6f machetnes and lbcahons
Flame is eften used for log iles,s o a l media gene
- aled data. ema messages arnd jusf about eny cntn
-uous data source
Source Sink
channel
web
hroughb} as a
baffer that manags iNPat
and output flaw vales
4 Apache Sqoop
Sqoop s a
tool de signed to toansfer data betwce
Radoop and relational
databases
You Can use to import data frem a relaional
sqoop
databaSe management Syskm into he HOFS ransbm
the data in Hadoop. a n d h e n export the
an RD Bis.
databack into
Apache Tmport melhod:
Seop
data import 1s done n two sBeps:
The
Tn he first Step. Sqoop ezamines hee databases to
meho metada ta tor the data
gahe he ne. CE ssag
be ported.
Sqpop Impost
Gaher
Metadata
Map
Map
RDBMns
Map
Hadeop cluskr
methodl:
Apace Sgoop taport
The e port Sep again uses a map-only Hadaop job to
write the data o he database Sqoop dindes he inpat
he inpat
dataset into spl:ts then uses individual map Fasks
to pash the Spolits fo he database. Again, h i s process
Map
RDBMS
Map
Map
Hadoop Closter
5. HBASE
HBase 's a disributd column-oiented database built on
top ot he Hadoop Fle SySkm It is an open- Sourte pmject
OOzie is a
cwerkflow dineclor Sysem ales1gned lo run
and manage multiple relakel Apache Hadoop jobs
sert Ok
Kiu
cwoTkfloc zml