Bigdata Hadoop Spark - Python

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Ø Best Big Data & Analytics Training Institute of India

for the year 2017 by Higher Education Review


Ø Education Excellence award 2017 by Skill India and
Indus Foundation
Ø Best Data Science Training institute 2015 by
SiliconIndia Magazine
Big Data Hadoop Workshop
Overview
Apache Hadoop, the open source data management platform that helps organizations store, access
and analyze massive volume of structured and unstructured data, is a very hot topic in technology.
Hadoop has been deployed by global companies like eBay, Facebook, Yahoo Inc. etc.
Components of Hadoop like MapReduce has its root in functional programming; Hive makes
Hadoop accessible to users who already know SQL; Pig is a scripting language for large datasets.
This Big data workshop has been designed to build your understanding of Hadoop platform and
also to work with both batch processing and streaming data using Hadoop and Spark

Who can attend?


Software Developer / Analyst / Architect
Project Manager / Data Base Administrators

Successful and Satisfied Candidates

“I am really thankful to the faculty as he “Data Brio Academy is good for learners. I
actually helped me a lot to understand the opted for big data course and I will say
big data concepts. It was a really nice that it was my best decision. As
experience for me doing projects and everything is to be done practically
learning Hadoop during the course. whatever concepts are taught they make
Moreover, getting into the live project us do practically. Faculties are
helped me in getting the job I was looking experienced and know well how to teach
for. So I am very thankful to Data Brio and make you learn things.” - Chaitanya
Academy and would recommend DBA to Deoghare, Data Analyst, IBM
others.” - Dibakar Mandal, Hadoop
Developer, Infosys

More testimonials at http://www.databrio.com/testimonial


Hadoop Workshop – Objectives
Introduction

Participants’ take away

§ Understanding of Big Data and Hadoop Ecosystem


§ Hadoop distributed file system (HDFS)
§ Map Reduce, Using MR API and writing algorithms
§ Importing and Exporting data using Sqoop , Hive, HBase and Pig for analysis
§ Core and object-oriented Python programming
§ Apache Spark, Programming with RDDs, Loading and Saving data
§ PySpark application
§ Big Data Project

Pre-requisites

§ Object Oriented Programming - Java / C++


§ User Level knowledge of Linux / UNIX; Understanding of databases
Big Data Hadoop With Spark
Workshop Duration: 60 hours

HADOOP AND ITS ECOSYSTEM

MODULE 1 MODULE 4

Introduction to Big Data MapReduce


Inferential Statistics
• What Is Big Data? • MapReduce Essentials
• Types & elements of big data • MR Daemons
• Business Applications • MR Framework
• Technologies Used • MR API
• Distributed & Parallel computing • Mapper/ Reduce class
• Virtualization & its importance to big • Writing Mapper/Reducer
data • Combiners & Partitioners
• Testing & Debugging
MODULE 2 • Lab 2: MapReduce hands on
Modelling
MODULE 5
Hadoop Ecosystem

• Introduction to Hadoop Ecosystem PIG


• Cloudera Sandbox Setup
• Introduction to Pig
• Pig Latin
MODULE 3 • Structure
• Functions
HDFS • Expression
• Relational operation Schema
• Introduction to HDFS
• Namenode/datanode MODULE 6
• Jobtracker/tasktracker
• Data Replication HIVE
• File Read/ Write
• YARN Framework • Introduction to Hive
• Hadoop 2.X vs Hadoop 1.x • Hive architecture
• Loading/Quering data into a table
• Lab 1: HDFS Commands
MODULE 7 MODULE 10
Hbase
Python – Error Handling

• Introduction to HBase • Exception Handling


• HBase data model • File Handling
• HBase Vs RDBMS • OOPs
• Regular Expressions
MODULE 8
• Multi Threading
Sqoop, Oozie & Flume

• Introduction to Sqoop, Oozie, Flume


• Sqoop Commands MODULE 11
• Oozie architecture & workflow
• Flume architecture & components Introduction to Apache Spark

• Lab 3: Sqoop
• What is Apache Spark
• Lab 4: Sqoop Advanced
• A Unified Stack
• Lab 5: Oozie
• Who uses Spark and for what
• Lab 6: Flume
• A brief history of Spark
• What is good and bad In MapReduce
PYSPARK • Spark Versions and Releases

MODULE 9

Python - Fundamentals

• Language Fundamentals
• Operators
• Input and Output Statements
• Flow Control
• Strings
• List, Tuple, Set, Dictionary
• Functions
• Modules
• Packages
• Hands-on lab
MODULE 12 MODULE 15

Cloudera Quick Start VM Loading and Saving Data


Installation
• File Formats
• Text File
• Include Hadoop
• JSON
• Include Apache Spark
• Comma-Separated Values and Tab
• Include Hive
Separated Values
• Include Sqoop
• Sequence Files
• Include Hue
• Object Files
• Hands-on lab
• Hadoop input and output Formats
• Hands-on lab
MODULE 13
MODULE 16
Programming with RDDs
Build and Monitor Apache Spark
• RDD basics Applications
• Creating RDDs
• RDD operations • Cluster Managers
• Passing Function to Spark • Deploying Application with Spark
• Common transformations and actions Submit
• Persistence(Caching) • A pyspark application
• Hands-on lab • Spark SQL
• Hands-on-lab
MODULE 14
Projects
Playing with Pair RDDs
Real time project on Big Data
• Core concepts of PairRDD
• Creation of PairRDD • Customer 360 degree view - Big Data
• Aggregation in PairRDD Analytics for retail data
• Aggregation functions understanding • Building automatic framework which
in depth will fetch data from different sources
• reduceByKey() and ingest into hadoop
• foldByKey() • Use cases - Analysis of Aadhaar data,
• combineByKey() health data Analysis, Server log
• groupByKey() analysis using Spark, Olympic data,
• aggregateByKey() Json data analysis etc.
• Hands-on lab
WHY DATA BRIO ACADEMY

Learn directly from Expert and Renowned Faculties

• Faculties with more than 10+ years of corporate experience in companies like Fidelity,
Accenture, Nielsen, Genpact, Dell, Infosys, IBM, Cognizant etc.
• Only institute with transparent faculty profiles with academic laurels from BITS, IIM, ISI,
Carnegie Mellon University, Michigan State University, Purdue University, Presidency
College, etc.
• Get the business perspective - develop the mindset to 'see data' the right way instead of
learning just the tools & theory.

Certification from WEBEL, Govt. of West Bengal

• Course curriculum complies to global standards and NASSCOM's QP for data science

Get trained on Commercial Distributions of Hadoop


Modelling
• Learn commercial distibutions like Cloudera, Hortonworks, MapR etc. of Hadoop and
spark and become globally certified professionals

World class training methodology


Multivariate
• Examples & case studies in multiple verticals
• Project and exam for complete learning

Assured internship

• Internship for fresher/student participants in parent company, Business Brio. Business


Brio is an award winning big data analytics company and member of NASSCOM and CII
(Confederation of Indian Industries)

100% Placement Assistance


• Dedicated job placement cell, Resume workshop, Interview guidance and mock interviews
• Placement through industry tie-up/HR network/employee referrals
• Alumni in top analytics and tecnology companies in India and abroad
Authorised Training Partner of WEBEL, Govt. of West Bengal

Introduction
CONTACT US

Data Brio Academy,


PS Srijan Corporate Park, [email protected]
Tower 1,14th Floor, +91 9903376367
GP Block, Sector 5, +91 33 4008 4159
Salt lake, Kolkata - 700091 www.databrio.com

You might also like