Learn Well Technocraft: Hadoop/Big Data Syllabus
Learn Well Technocraft: Hadoop/Big Data Syllabus
Learn Well Technocraft: Hadoop/Big Data Syllabus
TECHNOCRAFT
HADOOP/BIG DATA
SYLLABUS
11th year of Accomplishments
08411002339/07709292162
[email protected]
www.dw-learnwell.com
203, Supreme Center, ITI Road, Above Pizza Hut,
Near Parihar Chowk, Aundh, Pune
A C HIEVEMENTS FROM
TRAINING
CANDIDATE WILL BE ABLE TO SHOW 2-3 YEARS
EXPERIENCE AFTER TRAINING.
REAL TIME SCENARIOS, CASE STUDIES, PROJECTS
INCLUDED.
REAL TIME DATA PROVIDED FOR PRACTICE.
SOFTWARE WILL BE INSTALLED ON CANDIDATES
MACHINE.
INDIVIDUAL 1 ON 1 DISCUSSIONS FOR RESUME
MODIFICATIONS.
LEARN FROM INDUSTRY EXPERTS.
GLOBAL CERTIFICATION PREPARATION.
APPEAR FOR GLOBAL CERTIFICATION AT LEARN
WELL TECHNOCRAFT ITSELF.
GET DISCOUNTED CERTIFICATION VOUCHERS.
AUTHORIZED GLOBAL CERTIFICATION CENTER
FOR PEARSON, PSI, KRYTERION.
WWW.DW-LEARNWELL.COM
RE C OMMENDED
Courses best Suited With with Hadoop
Spark/Scala
AWS
Tableau
ETL Tools
Data Analytics - Python/ R
programming
WWW.DW-LEARNWELL.COM
Hadoop/Big Data and Spark
Syllabus
This course has three pre-requisite – Linux, Java and SQL.
LE
Linux: All commands that are relevant to Hadoop will be taught. No need to
learn separately.
AR
Java: Core Java knowledge is required. Trainee can join separate Java
batch.
SQL: Basic SQL queries and joins required.
N
The syllabus includes Spark and Scala, which will require extra sessions.
work
EC S
YARN
Detailed Hadoop Ecosystem – Hive, Pig, Sqoop, Flume, Oozie,
Zookeeper,
RA
Introduction
Hive
SQL Basics
Hive Basics
Internal & External Tables
Partitioning
Buckets
DDL,DML
Joins, Index and Views
3 Projects – transferring data from one table to multiple
tables, convert
data into structured data and perform analytics and
alter/rename/drop commands in Hive
Pig
Pig Basics
PigLatin Language
Statement Execution Steps
Data Types
Loading data files
Writing queries – SPLIT, FILTER, JOIN, GROUP,
SAMPLE, ILLUSTRATE
etc.
Multi Query Execution
Debugging in Pig
Pig UDF
3 Projects – WordCount using PigLatin, Batting Data
Analytics, Production
Example
HBase
Overview
HBase vs HDFS
Data Model
Key Value
Common Commands in HBase
HBase Basics
Region Server
Flume
Flume Basics
Features
Architecture
Agent Architecture
Example where we ingest files in real-time into
HDFS
Flume Use Cases
HCatalog
Objective
Supported Projects and Formats
Sqoop
Motivation
Sqoop Features
Architecture
Hive Import
Sqooop Import
Sqoop Export
ZooKeeper
Fault Tolerant
Zookeeper Service
Oozie
Overview
Features
Sample Workflow
Action Nodes
Decision Nodes
Workflow Design
Workflow Scheduler
Example of MapReduce task
Hadoop 2.X
Classic MapReduce Architecture
Challenges with Hadoop 1
YARN
Daemons
Architecture
Resource Manager
Node Manager
Application Master
Hadoop 1.X Vs Hadoop 2.x
Introduction to Apache Spark
Spark Details
DAG
Scala
MLLib
GraphX
Module 0 - Scala
What is Scala?
Why Scala for Spark?
Scala in other frameworks
Introduction to Scala REPL
Basic Scala operations
Variable Types in Scala
Control Structures in Scala
Foreach loop
Functions
Procedures
Collections in Scala- Array, ArrayBuffer, Map, Tuples,
Lists, and more.
Module 1 - Spark Core
Introrduction
Introduction to big data,
Challenges with big data
Batch Vs. Real Time big data analytics
Batch Analytics - Hadoop Ecosystem Overview
Real-time Analytics
What is Spark?
Spark Ecosystem
Modes of Spark
Spark installation demo
Overview of Spark on a cluster
Spark Standalone cluster
Spark Web UI
Some configurations.
Components of Spark Unified stack
Spark Streaming
MLlib
Core
Spark SQL
RDDs,
Transformations in RDD,
Actions in RDD,
Loading data in RDD,
Saving data through RDD,
Key-Value Pair RDD,
MapReduce and Pair RDD Operations
Scala and Python shell
word count example
Shared Variables with examples
Submitting jobs in cluster
Hands on examples
Module 2 - Spark SQL
Overview
Hive and Spark SQL architecture
sqlContext in spark sql
Dataframes API
Understanding concept of data frame
Loading data in dataframe
Operations on dataframes.
Interaction with Hive
Reading various data formats
Hands on Examples
Overview of streaming
Spark Streaming Architecture,
First Spark Streaming Program,
Transformations in Spark Streaming,
checkpointing,
Parallelism level
Introduction to queuing systems. Eg. Kafka
Hands on examples
Supervised Learning
Classification - logistic regression, decision trees,
random forests, naive
Bayes
Regression - linear least squares, Lasso, ridge
regression, decision trees
Unsupervised learning :
Hands on examples, Projects, Web Log Analytics and report generation on real web logs data
Twitter sentiment Analytics using actual Tweeter data
Also Available
Internships - Paid / Free
Internship certifications on successful
completion
Final year College Projects on Latest
Skills
Special Project batches
College Seminars
www.dw-learnwell.com