PG Program in Data Science
PG Program in Data Science
PG Program in Data Science
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Understanding the Excel Interface
Slicing and Dicing Data - Sort and Filter
Report Making I: Basic Formatting
Introduction
Delimited Files
Discovering Shortcuts
Introduction to Formulae
Taught by one of the most
renowned data scientists in the Complex Functions
Data Analysis country (S.Anand, CEO, Gramener), Data Analysis in Excel - I Cell Referencing and Text Functions
in Excel this module takes you from a begin-
Logical Formulae
ner level Excel user to an almost
professional user. Anand's Anecdotes
Creating and Formatting Charts
Types of Charts
Anecdotes - II
Introduction
Creating a Pivot Table
DATA S C I E N C E TO O L K I T
Introduction
Data Formats and Tableau Interface
Introduction
Bar Charts
Visualising and Analysing
Visualisation Learn an important and widely used Scatter Plots and Pie Charts
Data in Tableau
using Tableau tool for Data Analysts - Tableau. Tree Maps
Dual Axes Charts
Introduction
Histograms
Box Plots
Visualising and Analysing Area Maps
Data with Tableau - II
Calculations in Tableau
Dashboard and Stories
Introduction
Define the Business Problem -
Business Understanding
The CRISP-DM Framework -
This module covers concepts of the Business and Data Understanding Owning an IPL Team - Business Understanding
Analytics CRISP - DM framework for business
Problem Solving Understanding Raw Data
problem solving.
Preparing Data for Analysis
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Understanding
Understanding Primary Actions
UpGrad Coding Console
Understanding Statuses & Important Pointers
Introduction
Getting Started - Installation
Introduction to Jupyter Notebook
The Basics
Data Structures in Python
Sharpen your Data Analysis skills Lists
Introduction with Python, which is the choice of
Tuples
to Python language for simplicity, readability
and quick deployment. Dictionaries
Sets
Introduction
If-Elif-Else
Loops
Control Structures and Functions
Comprehensions
Functions
Map, Filter, and Reduce
Introduction
NumPy Basics
Creating NumPy Arrays
Structure and Content of Arrays
Introduction to NumPy
Subset, Slice, Index and Iterate through Arrays
Multidimensional Arrays
Computation Times in NumPy and Standard
Python Lists
Introduction
Basic Operations
Operations on NumPy Arrays
Operations on Arrays
Introduction
Reading Delimited and Relational Databases
Reading Data From Websites
Getting and Cleaning Data
Getting Data From APIs
Reading Data From PDF Files
Cleaning Datasets
Introduction to Module
Basics of Probability
Joint Probability and Conditional Probability
Bayes' Theorem
Assessments I
Inferential Statistics - Practice Session Standardized Normal Distribution and Z- Score
Assessments II
S T A T I S T I C S A N D E X P L O R A T O R Y D A T A A N A LY S I S
Introduction
Understanding Hypothesis Testing
Null and Alternate Hypotheses
Concepts of Hypothesis Testing - I
Making a Decision
Critical Value Method
Critical Value Method - Examples
Introduction
p-value Method
Concepts of Hypothesis Testing - II
p-value Method - Examples
Types of Errors
Introduction
Z-test
F-Test
General Guidelines
Introduction to EDA
Introduction
Public and Private Data
Data Sourcing
Private Data
Public Data
Public Data Exercise
Introduction
Fixing Rows and Columns
Missing Values
Data Cleaning
Standardising Values
Invalid Values
Filtering Data
Introduction
Data Description
Unordered Categorical Variables - Univariate Analysis
Univariate Analysis
Ordered Categorical Variables - Univariate Analysis
Introduction
Bivariate Analysis on Continuous Variables
Bivariate Analysis Business Problems Involving Correlation
Practice Questions
Bivariate Analysis on categorical variables
Introduction
What are Derived Metrics?
Types of Derived Metrics: Type Driven Metrics
Derived Metrics
Types of Derived Metrics: Business Driven Metrics
Practice Questions
Types of Derived Metrics: Data Driven Metrics
Course Overview
Introduction: Data Visualisation
Introduction
Data Visualisation Toolkit
Introduction
Uber Suppy-Demand and solve uber supply-demand Uber Supply-Demand Gap Evaluation Rubric
Gap gap problem
Submission
Course Wrap - EDA and Statistics Course Wrap - EDA and Statistics
Pre-Reads
Here, you will find all the addition- Basics of Probability
Optional Questions
al content for the course as and
when they are added to this
Additional resources Pre-Reads
module Discrete Probability Distributions
Optional Questions
Power Law
Exploratory Data Analysis Recommended Additional Content
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Introduction to Machine Learning
Regression Line
Simple Linear Regression Best Fit Line
Strength of Simple Linear Regression
Simple Linear Regression in Python
Coding Practice - Simple Linear Regression
Introduction
Multiple Linear Regression
Modelling in Python - I
Modelling in Python - II
Housing Case Study
Derived Variables
Multiple Linear Regression VIF - Variance Inflation Factor
Regression helps us to determine the
strength of the relationship between Housing Case Study Predictions
Linear Regression
one dependent variable and a series Variable Selection Using RFE
of other changing variables.
Assumptions of Linear Regression
Feature Selection
Coding Practice - Building a Multiple Linear
Regression Model
Introduction
Linear Regression: Revision
Prediction vs Projection
Media Company Case Study
Introduction
Making Predictions
Model Building - Coding Exercise
Model Evaluation
Introduction
Commonly Faced Challenges in Implementation
MACHINE LEARNING I
of Logistic Regression
Logistic Regression:
Industry Applications - Part II Model Evaluation (A Second Look)
Model Validation and Importance of Stability
Tracking of Model Performance Over Time
Introduction
Understanding Clustering
Introduction to Clustering
Practical Example of Clustering - Customer
Segmentation
Introduction
Steps of the Algorithm
K Means Algorithm
K Means as Coordinate Descent
K Means Clustering
K Means++ Algorithm
Visualising the K Means Algorithm
Practical Consideration in K Means Algorithm
Cluster Tendency
Introduction
Data Preparation
Introduction
K-Mode Clustering
K-Mode in Python
Other Forms of Clustering K-Prototype in Python
DB Scan Clustering
Practice Question
Gaussian Mixture Model
Introduction
The Why And What of PCA
Building Blocks of PCA
Illustration - Finding Principal Components
Principal Component Analysis Comprehension - Calculating the Principal
This module will cover the concepts Components
of PCA, which is an unsupervised
Unsupervised Singular Value Decomposition
machine learning technique mainly
Learning: Principal used in dimensionality reduction. It SVD Example - Image Compression
Component Analysis will also cover practical applications Practice Questions
of PCA in Python.
Introduction
PCA: Python Implementation
PCA in Python Practical Considerations and Alternatives
Optional Assignment (MNIST Dataset)
Comprehension: PCA, SVD and Eigenvectors
Problem Statement
Use your skills to predict which
HR analytic
employee is going to leave the HR Analytics Case Study Evaluation Rubric
case study
company in the near future. Submission
Introduction
Introduction to SVM
Concept of a Hyperplane in 2D
SVM - Maximal Margin Classifier
Practice Questions
Concept of a Hyperplane in 3D
Maximal Margin Classifier
Introduction
The Soft Margin Classifier
The Slack Variable
SVM - Soft Margin Classifier
Comprehension-1: Notion of Slack Variables
Learn the fundamentals of SVMs
Support Vector Cost of Misclassification
and use them to detect spam emails,
Machine (Optional) recognise alphabets and more! SVM R-Lab
Introduction
Introduction to Kernels
Mapping Nonlinear Data to Linear Data
Feature Transformation
Kernels The Kernel Trick
R Lab - Kernels
Shiny App - Types of kernels
Choosing a Kernel Function
Letter Recognition Using SVM
b
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Introduction to Decision Trees
Interpreting a Decision Tree
Introduction to Decision Trees
Comprehension - Decision Tree Classification
in Python
Regression with Decision Trees
Introduction
Concept of Homogeneity
Introduction
Tree models represent the way we
make decisions. Learn how decisions Advantages and Disadvantages
Tree Model are made in this powerful Tree Truncation
classification algorithm.
Tree Pruning
Truncation and Pruning
Building Decision Trees in Python
Choosing Tree Hyperparameters in Python
Coding Practice Questions
Comprehension - Hyperparameters
Introduction
Ensembles
Comprehension - Ensembles
Creating a Random Forest
Random Forests Comprehension - OOB (Out-of-Bag) Error
Comprehension - Time Taken to Build a Random
Forest
Random Forests Lab
Coding Practice Questions
Introduction
Introduction to Boosting
Understanding Stationarity
Understanding White Noise
Acf & Pacf Plots
Working with Stationary Time Series
Ar & Ma Modelling
MACHINE LEARNING II
Introduction
Introduction to Model Selection
Model and Learning Algorithm
Principles of Model Selection Simplicity, Complexity and Overfitting
Bias-Variance Tradeoff
Comprehension - Bias Variance Tradeoff
You are preparing for a competitive Regularization
exam. Should you learn some tricks
Model Selection for it or focus on the fundamentals? Introduction
Model Selection has the answer Regularization and Hyperparameters
Model Evaluation and Cross Validation
Model Evaluation
Model Evaluation: Python Demonstration-I
Model Evaluation: Python Demonstration-II
Cross-Validation: Motivation
Cross-Validation: Python Demonstration
Cross-Validation: Hyperparameter Tuning
Introduction
Understanding the Business Problem
Comprehension - Logistic Regression
Comparing Different Machine Learning Models - I
Given a business problem, how do Comparing Different Machine Learning Models - II
Model Selection - you choose the best algorithm?
Practical Learn a few practical tips for doing Model Selection - Best Practices Pros and Cons of Different Machine Learning Models
Considerations this here End-to-End Modelling - I
CART and CHAID Trees
Choosing between Trees and Random Forests - I
Choosing between Trees and Random Forests - II
End-to-End Modelling - II
Introduction
Generalized Regression
Generalized Regression Framework-1
Generalized Linear Regression Generalized Regression Framework-2
Systems of Linear Equations
Generalized Regression Framework-3
Generalized Regression in Python
Introduction
Regularized Regression
Ridge and Lasso Regression - I
Advanced This course takes a more advanced
Regression look at linear regression models. Ridge and Lasso Regression - II
Ridge and Lasso Regression in Python
Model Selection Criteria-I
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Defining Data Warehouse
Structure of Data Warehouse
Database design OLAP vs. OLTP
Star Schema
How to Use a Star Schema - A Demonstration
Data Warehouse Schema- Industry Example
Introduction
Adding and Deleting Columns
Changing Column Name and Data Type
Creating Table from existing table
Updating Table
Changing Constraints (Primary key)
Changing Constraints (Foreign key)
String Manipulation
Date Manipulation
Learn the advanced concepts of
Advanced SQL SQL and gain mastery over this Introduction
programming language.
Introduction to Windowing Functions
Window Functions Frames
Named Windows
Window Functions' Restrictions
Introduction
Introduction to User defined Functions
User Defined Functions and Stored
User defined functions (Application)
Procedures
Introduction to Stored Procedures
Stored Procedures (Application)
Introduction
Optimisation in Select Clause
Optimisation in Where Clause
Query Optimisation
Optimisation in Group by and Order by
Optimisation in Joins
Optimisation in Window Function
Problem Introduction
Apply the basics of investing
and your knowledge of Data Science Data Set
Assignment SQL Assignment - Stock Market Analysis
to determine when to buy and sell a Grading Criteria
stock.
Submission
Course Introduction
Introduction to Understand the big data ecosystem Fundamentals of Big Data
Big Data and the various types of job roles in Understanding Big Data
Identifying Big Data
the industry.
Conventional Data Processing Systems and Big Data
Introduction
History of Hadoop
Distributed Computing
Hadoop Terminologies
Master and Slave
Big Data Storage in Hadoop
B I G DATA & S Q L
Introduction
Data Ingestion with Apache Sqoop
Advantages and Industry Use Cases of Sqoop
How Sqoop Performs Import
Comprehension: How Sqoop Import Works
Creating an RDS
Introduction to Apache Sqoop
Migrating Databases to the RDS
Running Sqoop in AWS
Adding a MySQL Connector
Sqoop Commands: Listing Databases and Tables
Sqoop Commands: Import and Import-All-Tables
Sqoop Commands: Job and Eval
Introduction
Introduction to Apache Hive
Key Features of Apache Hive
Use Cases of Apache Hive
The Hive Metastore
Big Data In big data ingestion and processing, Introduction to Apache Hive
Ingestion and learn to use various tools for getting Hive Data Models
Processing and processing data. Creating Tables in Hive
Understanding and Analysing the Data Stored
in Hive Tables
Solution - Movies Graded Questions
Introduction
Partitions
Hive Data Models - Partitions
Creating and Querying Partitioned Tables
and Buckets
Buckets
Comprehension: Data Models (Graded Assessment)
Introduction
File Formats in Apache Hive File Formats in Apache Hive
ORC and Compression Algorithms
Introduction
EDA and UDFs in Hive
Advanced Data Analysis in Hive Advanced Data Analysis using Hive
Basic Text Analysis using Hive
Handling Complex Data Types using Hive
Introduction
Overview of Spark
Spark vs MapReduce
Concepts and Fundamentals of Spark Resilient Distributed Datasets (RDDs)
In-memory Processing
RDD Operations
Programming & Debugging in PySpark
Introduction: Setting Up
Schema-on-Read v/s Schema-on-Write
Big Data Learn Apache Spark, the newest big Comparing Spark With Hive
Processing using data framework with unprecedented Analysis with Spark - I: Reading & Summarising Data
Apache Spark performance and ease of use.
Analysis with Spark - II: Plotting Data
Analysis with Spark - III: Filtering & Grouping
Analysis with Spark - IV: Model-building
Working with Spark
Practice Analysis: Airlines Data
MLlib - I: An Overview
MLlib - II: Preparation for Model Building
MLlib - III: Building ML models
PySpark: An Alternative Library to PySpark
Solution to PySpark Practice Questions
Hive LLAP
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Understanding Text
Text Encoding
TF-IDF Representation
Canonicalisation
Phonetic Hashing
Edit Distance
Advanced Lexical Processing Spell Corrector - I
Spell Corrector - II
Parsing
Parts-of-Speech
Stochastic Parsing
Introduction to
The Viterbi Heuristic
Syntactic Processing
Markov Chain and HMM
Explanation Problem
Constituency Grammars
Top-Down Parsing
Learn algorithms to parse grammar of Parsing Bottom-Up Parsing
sentences - HMMs, CFGs, PCFGs and
Syntactic Processing Probabilistic CFG
build a smart flight-booking NLU
system using techniques such as NER.
Chomsky Normal Form
Dependency Parsing
Information Extraction
E L E C T I V E - N AT U R A L L A N G UAG E P R O C E S S I N G
POS Tagging
Rule-Based Models
Schema
Introduction to
Semantic Associations
Semantic Processing
Databases - WordNet and ConceptNet
Occurrence Matrix
Co-occurrence Matrix
Word Vectors
Word Embeddings
Skipgram Model
Distributional Semantics
Comprehension - Word2Vec
Word2vec in Python - I
Defining a Topic
Probabilistic Model
LDA in Python - I
LDA in Python - II
Python code - II
Chatbot Deployment
ML and AI in Business
Problem Statement
NLP Course Project -
Evaluation Rubric
Building a Chatbot
Final Submission
b
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction to Perceptron
Perceptrons - Training
Activation Functions
Feedforward Algorithm
Batch in Backpropagation
Training in Batches
Regularization
Dropouts
Modifications to Neural Networks
Batch Normalization
Introduction to Keras
Loss Function I
Loss Function II
Gradient Descent I
Gradient Descent II
Initializations
Problem Statement
Neural Networks - Implementing multiclass Introduction to Neural Networks-
Assignment classification on MNIST dataset Assignment Evaluation Rubric
using raw NN model in Numpy.
Final Submission
Applications of CNNs
ELECTIVE- DEEP LEARNING AND NEURAL NETWORKS
Introduction to CNNs
Video Analysis
Introduction to Convolutional
Understanding Convolutions - I
Neural Networks
Understanding Convolutions - II
Important Formulas
Weights of a CNN
Feature Maps
Pooling
GoogleNet
Residual Net
Loss Function
Style Transfer and Object Detection
Style Transfer Notebook
Object Detection - I
Object Detection - II
Architecture of an RNN
Training RNNs
Types of RNNs - II
Bidirectional RNNs
Generating C Code - I
Generating C Code - II
RNNs in Python
Problem Statement
Evaluation Rubric
Final Submission
b
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Understanding the Healthcare Market
With all the necessary DA knowledge, Stakeholders of the Primary Healthcare Ecosystem:
Process
Understanding the it is time to get into the domain
Healthcare Domain details. Learn about the healthcare Stakeholders of the Primary Healthcare Ecosystem:
landscape in the US. Introduction to the Healthcare Space Drivers and Metrics
Stakeholders of the Secondary Healthcare Ecosystem:
Process
Stakeholders of the Secondary Healthcare Ecosystem:
Drivers and Metrics
Other Stakeholders of the Healthcare Ecosystem
Introduction
Analytics Related to Patient-Physician Interactions
Clinical Decision Support Systems
Analytics Related to Patient-Hospital Interactions
In this module, you will explore the Management of Patient Traffic - I
different analytics opportunities that
Provider Analytics exist in the healthcare provider Provider Analytics Management of Patient Traffic - II (Comprehension)
space. Management of Patient Traffic - III
Hospital Performance Analysis - I
Hospital Performance Analysis - II (Comprehension)
Hospital Performance Analysis - III
Hospital Compare
Introduction
Payers in the US
Types of Health Insurance
Types of Insurance Plans
Benefits
Getting Familiar with the
Analytics Opportunities in Benefits
US Payer Market
Coordination of Benefits
E L E C T I V E - H E A LT H C A R E
Provider Management - I
Provider Management - II
Pay for Performance (P4P)
Analytics Opportunities in Provider Management
In this module, you will explore the
Payer Analytics different analytics opportunities that Introduction
exist in the healthcare payer space.
Life Cycle of a Health Insurance Claim
Healthcare Coding
Claims Adjudication
Analytics Opportunities in Claims Management
Analytics to Detect Fraudulent Claims
Claims and Care Management
Care Management
Care Management Framework
Risk Stratification
Evaluating a Care Management Program
Accountable Care Organisations (ACOs)
Analytics Opportunities in Care Management
Introduction
Pharmaceutical Market Overview
Drug Development Life Cycle
Areas of Analytics in Pharma
Drug Development and Sales Analytics Pharmaceutical-Selling Process
Field Activity
Analytics in Sales
Analytics in the
Learn how pharmaceutical companies
Pharmaceutical Sales Data
harness the power of data analytics.
Industries Customer Segmentation
Introduction
Structure of a Marketing Organisation
Multichannel Marketing (MCM) Management
Marketing Analytics
Patient Journey Analytics
Analytics Opportunities in Commercial Operations
Market Forecasting
Get a brief overview of how all that Healthcare Course Wrap by Prof. RC
Course Wrap for you have studied in the healthcare Course Wrap
Healthcare domain, finds application in the real
world. Interview tips by Rohit
Problem Statement
Decipher the CMS hospital star rating
Capstone Project system using supervised and unsuper- Capstone- Healthcare Mid Submission
vised models.
Final Submission
b
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Business of ecommerce
Get acquainted with the various Inventory Management
Introduction to
applications of Data Analytics in
E-commerce Marketing in ecommerce
E-commerce business Data Analytics in ecommerce
Improving User Experience
Fraud Detection
Shipment Delivery
Customer Feedback
Introduction
Understanding Recommendation Systems
Content Based Filtering
Learn about the algorithms that
Recommendation
power the recommendation engines Recommendation Systems User Based Collaborative Filtering
systems of the E-commerce sites
Item Based Collaborative Filtering
Issues in Recommendation Systems
Recommender System in Python
Introduction
Understanding Price Markup & Markdown
Introduction
What is Market Mix Modelling (MMM)?
Introduction
Market Mix Learn how to optimise your marketing Modelling the Advertising Effects - Part I
Modelling spends in order to maximise the ROI.
Modelling the Advertising Effects - Part II (Optional)
Modelling the Advertising Effects - Part III
Introduction
Understand the concept behind A/B
A\B Testing Understanding A/B testing
test and also learn how to execute an A/B Testing
(Optional) A/B test in Optimizely Steps in A/B testing
Setting up an A/B Test in Optimizely
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction
Banking Products - Deposits & Lending
Learn how banks make money How Banks Make Money
Profitability of Credit Cards
Introduction to through various banking products and
Banking and also understand the customer P&L of Banks and Financial Institutions
Financial Services lifecycle.
Introduction
Customer Lifecycle
Customer Lifecycle Customer Lifecycle - Acquisition Analytics
Customer Lifecycle - Engagement Analytics
Customer Lifecycle - Risk Analytics
Introduction
Introduction
Engagement Analytics Framework
Cross-Selling Strategies
Engagement Strategies
Types of Cross-Selling
ELECTIVE - BFSI
Cross-Selling Opportunities
Now that you have learnt how to Customer Lifetime Value (CLV)
Engagement acquire customers, learn how to
Introduction
Analytics engage them and prevent their
attrition Cross-Selling Lab Cross-Selling - Business Objectives
Cross-Selling Analysis
Introduction
Types of Attrition
Retention and Loyalty Management
Attrition - Credit Card
Interpreting a Credit Card Attrition Model
Introduction
Regulatory Risk Analytics
Regulatory Risk Analytics - A Brief Introduction
(Optional)
Note: This curriculum is subject to change based on inputs from IIITB and Industry
Introduction to Kaggle
Kernels
Competitions
Problem Statement
Solve a problem based on one of the
competitions held on Kaggle or on an Problem statement Evaluation Rubric
Mini Capstone industry dataset as a final test of what Final Submission
you have learned so far.
Solution Solution