Bigdata Lecture
Bigdata Lecture
Bigdata Lecture
M. Fanilo Andrianasolo
Data Analytics Tech Lead & Product Manager
Volume
Variety
Problem
(Sometimes) less
network hardware
Scaling is hard
Big Data ecosystem
Apache Hadoop
More than 30 open source projects for managing and analyzing Big Data
…
Hadoop distributions
Hadoop distributions vs Cloud providers
Hadoop ecosystem use cases
Log analysis
Security
Orchestration
A data platform canvas
Security
Orchestration
Acquisition
Acquisition
Import
Hadoop
RDBMS
FS
Export
A data platform canvas
Security
Orchestration
Transport
Transport
{
"type" : "record",
"namespace" : "test",
"name" : "Employee", emp e1=new emp( );
"fields" : [ e1.setName("omar");
{ "name" : "Name" , "type" : "string" }, e1.setAge(21);
{ "name" : "Age" , "type" : "int" }
]
}
.ascv .java
A data platform canvas
Security
Orchestration
Hadoop Distributed File System
NameNode
1960:Fanilo c13e 1 3
2001:Fanilo c13e 1
1990:Omar d45 1
A data platform canvas
Security
Orchestration
YARN – Yet Another Resource Negotiator
music_sales.csv
1, « Let it go », 4.99€, 5
2, « Snow », 7.99€, 1
HiveQL MapReduce
3, « Lion King », 0.99€, 1
4, « SISE », 1.99€, 2
5, « Lyon is great », 2.99€, 3
Metastore
Batch processing
music_sales.csv
Security
Orchestration
Visualizing
Visualizing
A data platform canvas
Security
Orchestration
Security
Kerberos
Orchestration
Orchestration
Overview
Security
Orchestration
Architecture design
Multiple architectures
Lambda architecture
Kappa architecture
CONCLUSION
THANKS
@andfanilo
@andfanilo