Learn Well Technocraft: Hadoop/Big Data Syllabus

LEARN WELL
TECHNOCRAFT
HADOOP/BIG DATA
SYLLABUS
11th year of Accomplishments
AUTHORIZED GLOBAL CERTIFICATION CENTER FOR

MICROSOFT, ORACLE, IBM, AWS AND MANY MORE.
08411002339/07709292162
[email protected]
www.dw-learnwell.com
203, Supreme Center, ITI Road, Above Pizza Hut,
Near Parihar Chowk, Aundh, Pune
A C HIEVEMENTS FROM
TRAINING
CANDIDATE WILL BE ABLE TO SHOW 2-3 YEARS
EXPERIENCE AFTER TRAINING.
REAL TIME SCENARIOS, CASE STUDIES, PROJECTS
INCLUDED.
REAL TIME DATA PROVIDED FOR PRACTICE.
SOFTWARE WILL BE INSTALLED ON CANDIDATES
MACHINE.
INDIVIDUAL 1 ON 1 DISCUSSIONS FOR RESUME
MODIFICATIONS.
LEARN FROM INDUSTRY EXPERTS.
GLOBAL CERTIFICATION PREPARATION.
APPEAR FOR GLOBAL CERTIFICATION AT LEARN
WELL TECHNOCRAFT ITSELF.
GET DISCOUNTED CERTIFICATION VOUCHERS.
AUTHORIZED GLOBAL CERTIFICATION CENTER
FOR PEARSON, PSI, KRYTERION.
WWW.DW-LEARNWELL.COM
RE C OMMENDED
Courses best Suited With with Hadoop
Spark/Scala
AWS
Tableau
ETL Tools
Data Analytics - Python/ R
programming
WWW.DW-LEARNWELL.COM
Hadoop/Big Data and Spark
Syllabus
This course has three pre-requisite – Linux, Java and SQL.
LE
Linux: All commands that are relevant to Hadoop will be taught. No need to
learn separately.
AR
Java: Core Java knowledge is required. Trainee can join separate Java
batch.
SQL: Basic SQL queries and joins required.
N
The syllabus includes Spark and Scala, which will require extra sessions.
Hadoop and Big Data Syllabus

W
Hadoop Course Objectives/highlights

EL LA
What is Big Data

SY
L T BU
The core technologies of Hadoop

How Hadoop Distributed File System (HDFS) and MapReduce
L
work
EC S
What other projects exist in the Hadoop ecosystem

How to develop MapReduce jobs
Algorithms for common MapReduce tasks
HN
How to create large workflows using multiple MapReduce jobs

Best practices for debugging Hadoop jobs
Advanced features of the Hadoop API
OC
YARN
Detailed Hadoop Ecosystem – Hive, Pig, Sqoop, Flume, Oozie,
Zookeeper,
RA
HCatalog, HBase and YARN

Introduction to Apache Spark
FT
Hadoop on Amazon Web Services
Introduction:
Motivation for Hadoop
Big Data Characteristics, Challenges
with traditional system
Hadoop’s History
Core Hadoop Concepts
Hadoop Clusters, Installation and
Configuration
Linux and Hadoop Basic Comands

Linux Commands
HDFS Commands
Hands-On for All
Commands
Hadoop Basic Concepts
What Hadoop is?
What features the Hadoop Distributed File System
(HDFS) provides
Architecture
Features, Goals and Advantages of HDFS
Name Nodes
Data Nodes
Secondary Name Node
The concepts behind MapReduce
How Map Reduce Works?
Data Type
Input & Output Formats
How a Hadoop cluster operates
Cluster sizing
Capacity planning
Replication
Blocks
Heartbeat Mechanism
Data Organization
VM Installation
Providing Hadoop VM and configuring it

Learning Eclipse and creating MapReduce JAR
Writing a Map Reduce Program

The Hadoop Ecosystem

Introduction
Hive
SQL Basics
Hive Basics
Internal & External Tables
Partitioning
Buckets
DDL,DML
Joins, Index and Views
3 Projects – transferring data from one table to multiple
tables, convert
data into structured data and perform analytics and
alter/rename/drop commands in Hive
Pig
Pig Basics
PigLatin Language
Statement Execution Steps
Data Types
Loading data files
Writing queries – SPLIT, FILTER, JOIN, GROUP,
SAMPLE, ILLUSTRATE
etc.
Multi Query Execution
Debugging in Pig
Pig UDF
3 Projects – WordCount using PigLatin, Batting Data
Analytics, Production
Example
HBase
Overview
HBase vs HDFS
Data Model
Key Value
Common Commands in HBase
HBase Basics
Region Server
Flume
Flume Basics
Features
Architecture
Agent Architecture
Example where we ingest files in real-time into
HDFS
Flume Use Cases
HCatalog
Objective
Supported Projects and Formats
Sqoop
Motivation
Sqoop Features
Architecture
Hive Import
Sqooop Import
Sqoop Export
ZooKeeper
Fault Tolerant
Zookeeper Service
Oozie
Overview
Features
Sample Workflow
Action Nodes
Decision Nodes
Workflow Design
Workflow Scheduler
Example of MapReduce task
Hadoop 2.X
Classic MapReduce Architecture
Challenges with Hadoop 1
YARN
Daemons
Architecture
Resource Manager
Node Manager
Application Master
Hadoop 1.X Vs Hadoop 2.x
Introduction to Apache Spark
Spark Details
DAG
Scala
MLLib
GraphX
Hadoop on Amazon Web Services

Introduction to AWS cloud infrastructure.
Amazon SaaS, Paas and IaaS.
Creating EC2 instance for processing.
Creating S3 buckets
Deploying data on to the cloud.
Choosing size of our instance.
Configuration of EMR instance
Creating a virtual cluster on Amazon Web Services
Spark And Scala

*Candidates can opt for separate Spark and Scala course.
Module 0 - Scala
What is Scala?
Why Scala for Spark?
Scala in other frameworks
Introduction to Scala REPL
Basic Scala operations
Variable Types in Scala
Control Structures in Scala
Foreach loop
Functions
Procedures
Collections in Scala- Array, ArrayBuffer, Map, Tuples,
Lists, and more.
Module 1 - Spark Core
Introrduction
Introduction to big data,
Challenges with big data
Batch Vs. Real Time big data analytics
Batch Analytics - Hadoop Ecosystem Overview
Real-time Analytics
What is Spark?
Spark Ecosystem
Modes of Spark
Spark installation demo
Overview of Spark on a cluster
Spark Standalone cluster
Spark Web UI
Some configurations.
Components of Spark Unified stack
Spark Streaming
MLlib
Core
Spark SQL
RDD - The core concept of Spark
RDDs,
Transformations in RDD,
Actions in RDD,
Loading data in RDD,
Saving data through RDD,
Key-Value Pair RDD,
MapReduce and Pair RDD Operations
Scala and Python shell
word count example
Shared Variables with examples
Submitting jobs in cluster
Hands on examples
Module 2 - Spark SQL
Overview
Hive and Spark SQL architecture
sqlContext in spark sql
Dataframes API
Understanding concept of data frame
Loading data in dataframe
Operations on dataframes.
Interaction with Hive
Reading various data formats
Hands on Examples
Module 3 - Spark Streaming
Overview of streaming
Spark Streaming Architecture,
First Spark Streaming Program,
Transformations in Spark Streaming,
checkpointing,
Parallelism level
Introduction to queuing systems. Eg. Kafka
Hands on examples
Module 4 - Spark MLlib
Supervised Learning
Classification - logistic regression, decision trees,
random forests, naive
Bayes
Regression - linear least squares, Lasso, ridge
regression, decision trees
Unsupervised learning :
Clustering - K-means, Gaussian Mixture, Dimensionality reduction

PCA
Hands on examples, Projects, Web Log Analytics and report generation on real web logs data
Twitter sentiment Analytics using actual Tweeter data
Also Available
Internships - Paid / Free
Internship certifications on successful
completion
Final year College Projects on Latest
Skills
Special Project batches
College Seminars
www.dw-learnwell.com

Learn Well Technocraft: Hadoop/Big Data Syllabus

Uploaded by

Copyright:

Available Formats

Learn Well Technocraft: Hadoop/Big Data Syllabus

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learn Well Technocraft: Hadoop/Big Data Syllabus

Uploaded by

Copyright:

Available Formats

LEARN WELL

AUTHORIZED GLOBAL CERTIFICATION CENTER FOR

Hadoop and Big Data Syllabus

Hadoop Course Objectives/highlights

What is Big Data

The core technologies of Hadoop

What other projects exist in the Hadoop ecosystem

How to create large workflows using multiple MapReduce jobs

HCatalog, HBase and YARN

Linux and Hadoop Basic Comands

Providing Hadoop VM and configuring it

Writing a Map Reduce Program

The Hadoop Ecosystem

Providing Hadoop VM and configuring it

Hadoop on Amazon Web Services

Spark And Scala

RDD - The core concept of Spark

Module 3 - Spark Streaming

Module 4 - Spark MLlib

Clustering - K-means, Gaussian Mixture, Dimensionality reduction

You might also like