Syllabus_Principle of Data Science
Syllabus_Principle of Data Science
Syllabus_Principle of Data Science
Credits 5
L-T-P-C 3-1-2-5
Course Objectives
1. To Understanding and Manipulation of Structured Data.
2. To understand the Techniques for Describing, Presenting, and Associating Sets of Data.
3. To learn the Measurement Approaches for Summarizing Data Sets.
4. To familiarize the Counting Principle and Measure of Uncertainty from Data.
5. To learn the Data Distribution Techniques and the Measure of Best Fit Data Distribution.
6. To explore testing of Hypothesis from Any Data Distribution.
7. About the Features Acquired from Data along with a Technique for Selecting Best Ones for the Target
Objective.
CO-PO Mapping:
PO PO PO PO PO PO PO PO1 PSO
COs Define COs PO8 PO9 PO10 PO12 PSO1
1 2 3 4 5 6 7 1 2
Understand
different
CO categories of data
2 2 2 2 2 - - - - - - 2 2 2
1 sets and
associated
statistics
Understand and
apply the basic
concepts of
CO various
3 2 2 2 2 - - - - - - 2 2 2
2 probability
distributions on
different data
sets.
Analyze and
CO visualize data
3 3 3 3 3 - - - - - - 2 3 3
3 using various
Python libraries
Explain various
concepts
CO associated with
3 2 2 2 2 - - - - - - 2 2 2
4 feature
engineering and
feature selection
Course Syllabus:
01 Module-I 10
Statistics for Data Science Part - I
Introduction, Types of Data – Understanding, Data Classification, Scales of
Measurement, Categorical Data - Data Description, Numerical Data - Data
Description Association – Introduction, Association between variables of Different
Types, Combinatorics - Basics of Permutation & Combination, Probability Basics-
Sample Space, Outcomes, Events, Probability Evaluation, Conditional Probability
and Bayes Theorem.
02 Module-II 10
Statistics for Data Science Part – II
Random Variables – Introduction, Discrete Random Variables, Continuous Random
Variables, Distributions - Discrete Random Variable – Binomial, Discrete Random
Variable – Poisson, Continuous Random Variable – Uniform, Continuous Random
Variable – Exponential, Expectation, Variance and Covariance, Parameter
Estimation - Maximum Likelihood Estimation (Introduction), Hypothesis Testing -I
– Introduction, Types of Testing, Z-test, P-value, Basics of Regression Analysis
03 Module-III 8
Introduction to Data Analysis & Visualization
Fundamentals of Python Libraries – NumPy, Pandas, SciPy (Self Study). Statistical
Analysis in 2D - Introduction to Matplotlib, Data Representation Formats.
04 Module-IV 8
Introduction to Data Engineering
Introduction to Feature Engineering & Feature Selection, Feature Engineering -
Handling Missing Values & Outliers, Feature Encoding and Selection Approaches,
Feature Selection - Univariate Selection Approach
Laboratory:
Sr. No Content
01 Programming problems on Probability basics (Without libraries) - 1 Assignment
05 Problems on Basic Feature Engineering & Selection Mechanisms (Use of Numpy, Pandas,
SciPy etc.) - 1 Assignment.
06 Common Problem on data entities (Using all required Python Libraries)- Taking all together.
List of Assignments:
Sr. No Content
01 Given any data set, identification of queries which can be prepared for information
extraction - Real life data sets will be provided in the form of Google Sheets - 2 Assignments
● Course Outcomes: On completion of the course, the student should be able to:
1. Understand different categories of data sets and associated statistics.
2. Understand and apply the basic concepts of various probability distributions on different data sets.
3. Analyze and visualize data using various Python libraries.
4. Explain various concepts associated with feature engineering and feature selection.
● Text Books:
1. Practical Statistics for Data Scientists - Peter Bruce & Andrew Bruce, O′Reilly.
2. Practical Data Science with Jupyter - Prateek Gupta.
3. A Modern Introduction to Probability and Statistic - F.M. Dekking C. Kraaikamp H.P. Lopuhaa¨
L.E. Meester
● Links:
1. https://www3.cs.stonybrook.edu/~anshul/courses/cse544_f18/
2. http://www.mnnit.ac.in/images/newstories/2020/csed/courses/CurriculumSyllabus_BTECH_CS.p
df
● Evaluation Scheme:
Theory & Tutorial Class:
Midterm Assignments Attendance Theory Class & Tutorial End term exam
Exam Performance
20 % 05 % 05 % 10 % 60 %
Practical session:
20 % 05 % 10 % 05 % 60 %