Syllabus_Principle of Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Principle of Data Science & Engineering

Course Code AIM 3005

Course Name Principle of Data Science & Engineering

Credits 5

Pre-Requisites Basics of data, data structure, data distribution

L-T-P-C 3-1-2-5

Course Objectives
1. To Understanding and Manipulation of Structured Data.
2. To understand the Techniques for Describing, Presenting, and Associating Sets of Data.
3. To learn the Measurement Approaches for Summarizing Data Sets.
4. To familiarize the Counting Principle and Measure of Uncertainty from Data.
5. To learn the Data Distribution Techniques and the Measure of Best Fit Data Distribution.
6. To explore testing of Hypothesis from Any Data Distribution.
7. About the Features Acquired from Data along with a Technique for Selecting Best Ones for the Target

CO-PO Mapping:

COs Define COs PO8 PO9 PO10 PO12 PSO1
1 2 3 4 5 6 7 1 2

CO categories of data
2 2 2 2 2 - - - - - - 2 2 2
1 sets and

Understand and
apply the basic
concepts of
CO various
3 2 2 2 2 - - - - - - 2 2 2
2 probability
distributions on
different data
Analyze and
CO visualize data
3 3 3 3 3 - - - - - - 2 3 3
3 using various
Python libraries

Explain various
CO associated with
3 2 2 2 2 - - - - - - 2 2 2
4 feature
engineering and
feature selection

Course Syllabus:

S. No. Contents Hours

01 Module-I 10
Statistics for Data Science Part - I
Introduction, Types of Data – Understanding, Data Classification, Scales of
Measurement, Categorical Data - Data Description, Numerical Data - Data
Description Association – Introduction, Association between variables of Different
Types, Combinatorics - Basics of Permutation & Combination, Probability Basics-
Sample Space, Outcomes, Events, Probability Evaluation, Conditional Probability
and Bayes Theorem.

02 Module-II 10
Statistics for Data Science Part – II
Random Variables – Introduction, Discrete Random Variables, Continuous Random
Variables, Distributions - Discrete Random Variable – Binomial, Discrete Random
Variable – Poisson, Continuous Random Variable – Uniform, Continuous Random
Variable – Exponential, Expectation, Variance and Covariance, Parameter
Estimation - Maximum Likelihood Estimation (Introduction), Hypothesis Testing -I
– Introduction, Types of Testing, Z-test, P-value, Basics of Regression Analysis

03 Module-III 8
Introduction to Data Analysis & Visualization
Fundamentals of Python Libraries – NumPy, Pandas, SciPy (Self Study). Statistical
Analysis in 2D - Introduction to Matplotlib, Data Representation Formats.

04 Module-IV 8
Introduction to Data Engineering
Introduction to Feature Engineering & Feature Selection, Feature Engineering -
Handling Missing Values & Outliers, Feature Encoding and Selection Approaches,
Feature Selection - Univariate Selection Approach

Sr. No Content
01 Programming problems on Probability basics (Without libraries) - 1 Assignment

02 Distribution problem statements (Matplotlib) - 1 Assignment.

03 Information Extraction from Data Sets through designing appropriate programming

modules. (Numpy, Pandas and Matplotlib) - 1 Assignment.

04 Hypothesis testing problems (Use of Numpy, Pandas, SciPy etc.)- 1 Assignment

05 Problems on Basic Feature Engineering & Selection Mechanisms (Use of Numpy, Pandas,
SciPy etc.) - 1 Assignment.

06 Common Problem on data entities (Using all required Python Libraries)- Taking all together.

List of Assignments:
Sr. No Content
01 Given any data set, identification of queries which can be prepared for information
extraction - Real life data sets will be provided in the form of Google Sheets - 2 Assignments

02 Permutation and Combination - 2 Assignment, Probability basics - 2 Assignment

03 Random Variables - 2 Assignments, Distributions - 2 Assignments

04 Expectation, Variance and Covariance - 1 Assignment

05 Hypothesis Testing -I - 1 Assignment

● Course Outcomes: On completion of the course, the student should be able to:
1. Understand different categories of data sets and associated statistics.
2. Understand and apply the basic concepts of various probability distributions on different data sets.
3. Analyze and visualize data using various Python libraries.
4. Explain various concepts associated with feature engineering and feature selection.

● Text Books:
1. Practical Statistics for Data Scientists - Peter Bruce & Andrew Bruce, O′Reilly.
2. Practical Data Science with Jupyter - Prateek Gupta.
3. A Modern Introduction to Probability and Statistic - F.M. Dekking C. Kraaikamp H.P. Lopuhaa¨
L.E. Meester

● Links:

● Evaluation Scheme:
Theory & Tutorial Class:

Midterm Assignments Attendance Theory Class & Tutorial End term exam
Exam Performance

20 % 05 % 05 % 10 % 60 %

Practical session:

Lab performance Regular Lab Oral Submissio End term Practical

attendance n exam

20 % 05 % 10 % 05 % 60 %

You might also like