Lecture-1 Descriptive Statistics
Lecture-1 Descriptive Statistics
Lecture-1 Descriptive Statistics
Lecture – 1
Descriptive Statistics DISCOVER . LEARN . EMPOWER
1
Statistics for Data Science : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. To equip students with the skills to summarize and interpret data using descriptive
statistics and visualization techniques.
2. To develop a foundational understanding of probability and its applications in data
science.
3. To enable students to perform hypothesis testing and construct confidence intervals
for statistical inference.
4. To teach students how to build and assess linear and logistic regression models for
predictive analysis.
5. To provide hands-on experience with statistical software for data manipulation,
analysis, and visualization.
2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Summarize and describe the main features of a dataset using measures such as mean,
CO1 median, mode, variance, and standard deviation, as well as graphical representations
like histograms, box plots, and scatter plots.
Understand of probability theory, including concepts such as random variables,
CO2 probability distributions, and the law of large numbers, enabling them to model and
reason about uncertainty in data.
Apply/perform statistical inference, including hypothesis testing, confidence interval
CO3 estimation, and p-value computation, to draw valid conclusions from sample data about
larger populations.
Utilize statistical software tools to perform data analysis, including data cleaning,
CO5
transformation, visualization, and implementing various statistical methods.
3
Unit-1 Syllabus
4
SUGGESTIVE READINGS
TEXT BOOKS:
• T1. Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York:
Publisher: Springer, Edition: Second Edition (2009), ISBN: 978-0387848570
• T2. Montgomery, Douglas C., and George C. Runger. Applied statistics and probability for
engineers. John Wiley & Sons, 2010.
• T3. Probability and Statistics The Science of Uncertainty Second Ed., Michael J. Evans and
Jeffrey S. Rosenthal.
REFERENCE BOOKS:
• R1. Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al,
Publisher: O'Reilly Media, Edition: Second Edition (2020), ISBN: 978-1492072942
• R2. An Introduction to Statistical Learning: with Applications in R, Authors: Gareth James, et
al, Publisher: Springer, Edition: Second Edition (2021), ISBN: 978-1071614174
• R3. Think Stats: Exploratory Data Analysis in Python, Author: Allen B. Downey, Publisher:
O'Reilly Media, Publication Year: 2014 (2nd Edition), ISBN: 978-1491907337
5
Table of Contents
Introduction to Statistics
Measures of Central Tendency
Mean, Median, Mode
Examples
Solved Questions
6
3. Descriptive Statistics
Stem Leaf
3 6
4
5 37
6 235899
7 011346778999
8 00111233568889
9 02238
2.Numerical descriptions
Mean:
y1 y2 ... yn yi
y
n n
Example: Annual per capita carbon dioxide emissions (metric
tons) for n = 8 largest nations in population size
Ordered sample:
Median =
Mean =
y
Example: Annual per capita carbon dioxide emissions (metric
tons) for n = 8 largest nations in population size
Ordered sample: 0.3, 0.7, 1.2, 1.4, 1.8, 2.3, 9.9, 20.1
Median =
Mean =
y
Example: Annual per capita carbon dioxide emissions (metric
tons) for n = 8 largest nations in population size
Ordered sample: 0.3, 0.7, 1.2, 1.4, 1.8, 2.3, 9.9, 20.1
y
Properties of mean and median
yi y
The variance of the n observations is
2 2 2
( yi y ) ( y1 y ) ... ( yn y )
2
s
n 1
n 1 s
The standard deviation s is the square root of the variance,
2
s s
Example: Political ideology
• For those in the student sample who attend religious
services at least once a week (n = 9 of the 60),
• y = 2, 3, 7, 5, 6, 7, 5, 6, 4
y 5.0,
2 2 2
2 (2 5) (3 5) ... (4 5) 24
s 3.0
9 1 8
s 3.0 1.7
pth percentile: p percent of observations below it, (100 - p)% above it.
p = 50: median
p = 25: lower quartile (LQ)
p = 75: upper quartile (UQ)
Data available at
http://www.stat.ufl.edu/~aa/social/data.html
Example: Survey in Alachua County, Florida, on predictors of mental health
(data for n = 40 on p. 327 of text and at
www.stat.ufl.edu/~aa/social/data.html)
% income
spent on
lottery
e.g., at x = 0, predicted y =
at x = 100, predicted y =
Regression analysis gives
line predicting y using x
Example:
y = mental impairment, x = life events
48
References
Books:
• Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York: Publisher:
Springer, Edition: Second Edition (2009), ISBN: 978-0387848570
• Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al,
Publisher: O'Reilly Media, Edition: Second Edition (2020), ISBN: 978-1492072942
Research Papers:
• Carmichael, Iain, and J. S. Marron. "Data science vs. statistics: two cultures?." Japanese Journal of
Statistics and Data Science 1.1 (2018): 117-138.
• Hardin, Johanna, et al. "Data science in statistics curricula: Preparing students to “think with data”." The
American Statistician 69.4 (2015): 343-353.
Websites:
• https://365datascience.com/resources-center/course-notes/statistics/
• https://www.geeksforgeeks.org/7-basic-statistics-concepts-for-data-science/
Videos:
• https://www.youtube.com/playlist?
list=PLZ2ps__7DhBYrMs3zybOqr1DzMFCX49xG
49
THANK YOU
For queries
Email: [email protected]