Learn Well Technocraft: Hadoop/Big Data Syllabus

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

LEARN WELL

TECHNOCRAFT

HADOOP/BIG DATA
SYLLABUS
11th year of Accomplishments

AUTHORIZED GLOBAL CERTIFICATION CENTER FOR


MICROSOFT, ORACLE, IBM, AWS AND MANY MORE.

08411002339/07709292162
[email protected]
www.dw-learnwell.com
203, Supreme Center, ITI Road, Above Pizza Hut,
Near Parihar Chowk, Aundh, Pune
A C HIEVEMENTS FROM
TRAINING
CANDIDATE WILL BE ABLE TO SHOW 2-3 YEARS
EXPERIENCE AFTER TRAINING.
REAL TIME SCENARIOS, CASE STUDIES, PROJECTS
INCLUDED.
REAL TIME DATA PROVIDED FOR PRACTICE.
SOFTWARE WILL BE INSTALLED ON CANDIDATES
MACHINE.
INDIVIDUAL 1 ON 1 DISCUSSIONS FOR RESUME
MODIFICATIONS.
LEARN FROM INDUSTRY EXPERTS.
GLOBAL CERTIFICATION PREPARATION.
APPEAR FOR GLOBAL CERTIFICATION AT LEARN
WELL TECHNOCRAFT ITSELF.
GET DISCOUNTED CERTIFICATION VOUCHERS.
AUTHORIZED GLOBAL CERTIFICATION CENTER
FOR PEARSON, PSI, KRYTERION.

WWW.DW-LEARNWELL.COM
RE C OMMENDED
Courses best Suited With  with Hadoop

Spark/Scala
AWS
Tableau
ETL Tools
Data Analytics - Python/ R
programming

WWW.DW-LEARNWELL.COM
Hadoop/Big Data and Spark
Syllabus
This course has three pre-requisite – Linux, Java and SQL.
LE

Linux: All commands that are relevant to Hadoop will be taught. No need to
learn separately.
AR

Java: Core Java knowledge is required. Trainee can join separate Java
batch.
SQL: Basic SQL queries and joins required.
N

The syllabus includes Spark and Scala, which will require extra sessions.

Hadoop and Big Data Syllabus


W

Hadoop Course Objectives/highlights


EL LA

What is Big Data


SY
L T BU

The core technologies of Hadoop


How Hadoop Distributed File System (HDFS) and MapReduce
L

work
EC S

What other projects exist in the Hadoop ecosystem


How to develop MapReduce jobs
Algorithms for common MapReduce tasks
HN

How to create large workflows using multiple MapReduce jobs


Best practices for debugging Hadoop jobs
Advanced features of the Hadoop API
OC

YARN
Detailed Hadoop Ecosystem – Hive, Pig, Sqoop, Flume, Oozie,
Zookeeper,
RA

HCatalog, HBase and YARN


Introduction to Apache Spark
FT
Hadoop on Amazon Web Services
Introduction:
Motivation for Hadoop
Big Data Characteristics, Challenges
with traditional system
Hadoop’s History
Core Hadoop Concepts
Hadoop Clusters, Installation and
Configuration

Linux and Hadoop Basic Comands


Linux Commands
HDFS Commands
Hands-On for All
Commands
Hadoop Basic Concepts
What Hadoop is?
What features the Hadoop Distributed File System
(HDFS) provides
Architecture
Features, Goals and Advantages of HDFS
Name Nodes
Data Nodes
Secondary Name Node
The concepts behind MapReduce
How Map Reduce Works?
Data Type
Input & Output Formats
How a Hadoop cluster operates
Cluster sizing
Capacity planning
Replication
Blocks
Heartbeat Mechanism
Data Organization
VM Installation

Providing Hadoop VM and configuring it


Learning Eclipse and creating MapReduce JAR

Writing a Map Reduce Program


Providing Hadoop VM and configuring it
Learning Eclipse and creating MapReduce JAR

The Hadoop Ecosystem

Providing Hadoop VM and configuring it


Learning Eclipse and creating MapReduce JAR

Introduction

Hive
SQL Basics
Hive Basics
Internal & External Tables
Partitioning
Buckets
DDL,DML
Joins, Index and Views
3 Projects – transferring data from one table to multiple
tables, convert
data into structured data and perform analytics and
alter/rename/drop commands in Hive
Pig
Pig Basics
PigLatin Language
Statement Execution Steps
Data Types
Loading data files
Writing queries – SPLIT, FILTER, JOIN, GROUP,
SAMPLE, ILLUSTRATE
etc.
Multi Query Execution
Debugging in Pig
Pig UDF
3 Projects – WordCount using PigLatin, Batting Data
Analytics, Production
Example

HBase

Overview
HBase vs HDFS
Data Model
Key Value
Common Commands in HBase
HBase Basics
Region Server

Flume
Flume Basics
Features
Architecture
Agent Architecture
Example where we ingest files in real-time into
HDFS
Flume Use Cases
HCatalog
Objective
Supported Projects and Formats

Sqoop

Motivation
Sqoop Features
Architecture
Hive Import
Sqooop Import
Sqoop Export

ZooKeeper

Fault Tolerant
Zookeeper Service

Oozie
Overview
Features
Sample Workflow
Action Nodes
Decision Nodes
Workflow Design
Workflow Scheduler
Example of MapReduce task

Hadoop 2.X
Classic MapReduce Architecture
Challenges with Hadoop 1
YARN
Daemons
Architecture
Resource Manager
Node Manager
Application Master
Hadoop 1.X Vs Hadoop 2.x
Introduction to Apache Spark
Spark Details
DAG
Scala
MLLib
GraphX

Hadoop on Amazon Web Services


Introduction to AWS cloud infrastructure.
Amazon SaaS, Paas and IaaS.
Creating EC2 instance for processing.
Creating S3 buckets
Deploying data on to the cloud.
Choosing size of our instance.
Configuration of EMR instance
Creating a virtual cluster on Amazon Web Services

Spark And Scala


*Candidates can opt for separate Spark and Scala course.

Module 0 - Scala

What is Scala?
Why Scala for Spark?
Scala in other frameworks
Introduction to Scala REPL
Basic Scala operations
Variable Types in Scala
Control Structures in Scala
Foreach loop
Functions
Procedures
Collections in Scala- Array, ArrayBuffer, Map, Tuples,
Lists, and more.
Module 1 - Spark Core

Introrduction
Introduction to big data,
Challenges with big data
Batch Vs. Real Time big data analytics
Batch Analytics - Hadoop Ecosystem Overview
Real-time Analytics
What is Spark?
Spark Ecosystem
Modes of Spark
Spark installation demo
Overview of Spark on a cluster
Spark Standalone cluster
Spark Web UI
Some configurations.
Components of Spark Unified stack
Spark Streaming
MLlib
Core
Spark SQL

RDD - The core concept of Spark

RDDs,
Transformations in RDD,
Actions in RDD,
Loading data in RDD,
Saving data through RDD,
Key-Value Pair RDD,
MapReduce and Pair RDD Operations
Scala and Python shell
word count example
Shared Variables with examples
Submitting jobs in cluster
Hands on examples
Module 2 - Spark SQL
Overview
Hive and Spark SQL architecture
sqlContext in spark sql
Dataframes API
Understanding concept of data frame
Loading data in dataframe
Operations on dataframes.
Interaction with Hive
Reading various data formats
Hands on Examples

Module 3 - Spark Streaming

Overview of streaming
Spark Streaming Architecture,
First Spark Streaming Program,
Transformations in Spark Streaming,
checkpointing,
Parallelism level
Introduction to queuing systems. Eg. Kafka
Hands on examples

Module 4 - Spark MLlib

Supervised Learning
Classification - logistic regression, decision trees,
random forests, naive
Bayes
Regression - linear least squares, Lasso, ridge
regression, decision trees

Unsupervised learning :

Clustering - K-means, Gaussian Mixture, Dimensionality reduction


PCA

Hands on examples, Projects, Web Log Analytics and report generation on real web logs data
Twitter sentiment Analytics using actual Tweeter data
Also  Available
Internships - Paid / Free
Internship certifications on successful
completion
Final year College Projects on Latest
Skills
Special Project batches
College Seminars

www.dw-learnwell.com

You might also like