MTPDF1 - Introduction To Statistics
MTPDF1 - Introduction To Statistics
MTPDF1 - Introduction To Statistics
Introduction to
Statistics
MPS Department | FEU Institute of Technology
Engineering Data Analysis
Recalling Basic
Concepts
MPS Department | FEU Institute of Technology
Subtopic 1
OBJECTIVES
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Summarize data
• e.g., Sample mean = X i
n
Inferential statistics consists of generalizing from samples to populations, performing estimations and
hypothesis tests, determining relationships among variables, and making predictions.
• Estimation
• e.g., Estimate the population mean weight using
the sample mean weight
• Hypothesis testing
• e.g., Test the claim that the population mean
http://eceresearchmethods.tripod.com/sitebuilderc
weight is 70 kg ontent/sitebuilderfiles/methres.pdf
Inference is the process of drawing conclusions or making decisions about a population based on sample
results
• In a recent study, volunteers who had less than 6 hours of sleep were four times
more likely to answer incorrectly on a science test than were participants who
had at least 8 hours of sleep. Decide which part is the descriptive statistic and
what conclusion might be drawn using inferential statistics.
https://www.researchgate.net/profile/Ahmad_Al_Musawi/publication/323358065_Chapter_Two_Understa
nding_Data_Arabic/links/5a8fe72745851535bcd41d4b/Chapter-Two-Understanding-Data-Arabic.pdf
In a recent survey, 250 college students at Union College were asked if
they go to library regularly. 35 of the students said yes. Identify the
population and the sample.
Responses of all students at Union
College (population)
Responses of students in
survey (sample)
Data consists of information coming from observations, counts,
measurements, or responses. Most data can be put into the following
categories:
• Qualitative - data are measurements that each fail into one of several
categories. (hair color, ethnic groups and other attributes of the
population)
• Quantitative - data are observations that are measured on a
numerical scale (distance traveled to college, number of children in a
family, etc.)
Statistical data are usually obtained by counting or measuring items.
Primary data are collected specifically for the analysis desired
Secondary data have already been compiled and are available for statistical
analysis
Data
Nominal
Levels of Lowest to
Measurement
Ordinal highest
Interval
Ratio
Data at the nominal level of measurement are qualitative only.
Nominal
Levels of Measurement Calculated using names, labels, or qualities. No
mathematical computations can be made at this level.
Colors in the US Names of students in your Textbooks you are using this
flag class semester
Data at the ordinal level of measurement are qualitative or quantitative.
Class standings: freshman, Numbers on the back of each Top 50 songs played on the
sophomore, junior, senior player’s shirt radio
Data at the interval level of measurement are quantitative. A zero entry simply represents
a position on a scale; the entry is not an inherent zero.
Levels of Measurement
Interval
Arranged in order, the differences between data entries can be
calculated.
A ratio of two data values can be formed so one data value can be
Levels of Measurement
expressed as a ratio.
Ratio
Interval Ratio
https://lh3.googleusercontent.com/PKumoPGYG
Nf7Sm9JIDGKpRyWYsFTpWJcmPU051kKVqJiJa2N
ZMgelCgMWvluEqAvf4q80eE=s85
https://www.gs1-
us.info/product-number/
Ordinal https://www.mymarketresearchmeth
ods.com/types-of-data-nominal-
ordinal-interval-ratio/
https://quizlet.com/28350766
2/statistics-polit-and-beck-
2018-chapter-14-flash-cards/
Nominal
Arrange Determine if one
Level of Put data in Subtract data
data in data value is a
measurement categories values
order multiple of another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
Source: Elementary Statistics by Bluman
Elementary Statistics by Bluman
• https://psihomedeor.ro/blog/psihoterapie-consiliere-psihologica/page/8/
• https://docplayer.es/50880990-Preparacion-de-propuestas-en-horizonte-puerto-real-18-de-
junio-de-2015.html
• http://eceresearchmethods.tripod.com/sitebuildercontent/sitebuilderfiles/methres.pdf
• https://www.researchgate.net/profile/Ahmad_Al_Musawi/publication/323358065_Chapter_T
wo_Understanding_Data_Arabic/links/5a8fe72745851535bcd41d4b/Chapter-Two-
Understanding-Data-Arabic.pdf
• https://quizizz.com/
• https://lh3.googleusercontent.com/PKumoPGYGNf7Sm9JIDGKpRyWYsFTpWJcmPU051kK
VqJiJa2NZMgelCgMWvluEqAvf4q80eE=s85
• https://www.shutterstock.com/search/fahrenheit+thermometer
• https://www.gs1-us.info/product-number/
• https://quizlet.com/283507662/statistics-polit-and-beck-2018-chapter-14-flash-cards/
• https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
Engineering Data Analysis
Descriptive Statistics
Data Presentation
Measures of Central tendency
Measures of Dispersion
Measures of Position
Descriptive statistics consists of the collection, organization,
summarization, and presentation of data.
• Collect data https://docplayer.es/50880990-Preparacion-
de-propuestas-en-horizonte-puerto-real-18-
de-junio-de-2015.html
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Summarize data
• e.g., Sample mean = X i
n
Descriptive Statistics
• Describes the important characteristics of a set of data.
• Organize, present, and summarize data:
1. Graphically
2. Numerically
“Shape, Center, and Spread”
• Center: A representative or average value that indicates where the
middle of the data set is located.
2. Grouped Frequency Distribution: for data sets with many different values,
which are grouped together in the classes.
Ungrouped Grouped
6 2 3 5 5 3 5
5 5 7 4 3
4 9
4 5 4 5 6
5 18
5 1 6 2 6
6 6 6 6 4 6 12
4 5 4 5 3
7 3
5 5 7 6 5
Frequency Histogram
• A bar graph that represents the frequency distribution.
• The horizontal scale is quantitative and measures the data values.
• The vertical scale measures the frequencies of the classes.
• Consecutive bars must touch.
frequency
data values
Ex. Peas per Pod
Peas per pod Freq, f Number of Peas in a Pod
1 1
20
2 2
15
Frequency, f
3 5
10
4 9
5
5 18
0
6 12
1 2 3 4 5 6 7
7 3 Number of Peas
Relative Frequency Distribution
• Shows the portion or percentage of the data that falls in a particular
class.
class frequency f
relative frequency
Sample size n
• Upper class limits: are the largest numbers that can actually belong
to different classes
• Class boundaries: the value halfway between an UCL and the next LCL
Skewed
• Data is skewed if it is not symmetric and if it extends more to one
side than the other.
Uniform
• Data is uniform if it is equally distributed (on a histogram, all the
bars are the same height or approximately the same height).
Symmetric Uniform
https://mikerogerstrg.wordpress.com/2015/01/23/outliers-
escaping-average-and-becoming-great/
• A value that represents a typical, or central, entry of a data set.
• Most common measures of central tendency:
• Mean
• Median
• Mode
The sum of all the data entries divided by the number of entries.
• Population mean: x
N
• Sample mean: x
x
n
The weighted mean is a type of mean that is calculated by multiplying
the weight (or probability) associated with a particular event or
outcome with its associated quantitative outcome and then summing
all the products together.
• The value that lies in the middle of the data when the data set is
arranged in order from lowest to highest. .
• Measures the center of an ordered data set by dividing it into two equal
parts.
• A sample mean is often referred to as ~ x.
• If the data set has an
• odd number of entries: median is the middle data entry.
• even number of entries: median is the mean of the two middle data entries.
If the data set has an:
•odd number of entries: median is the middle data entry:
2 5 6 11 13
•even number of entries: median is the mean of the two middle data entries:
2 5 6 7 11 13
6+7
𝑥= = 6.5
2
median is the mean of the by two numbers:
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency, each entry is a
mode (bimodal).
a) 5.40 1.10 0.42 0.73 0.48 1.10 Mode is 1.10
b) 27 27 27 55 55 55 88 88 99 Bimodal - 27 & 55
c) 1 2 3 6 7 8 9 10 No Mode
https://slideplayer.com/slide/10513276/
All three measures describe an “average”. Choose the one that best
represents a “typical” value in the set.
• Mean:
• The most familiar average.
• A reliable measure because it takes into account every entry of a data set.
• May be greatly affected by outliers or skew.
• Median:
• A common average.
• Not as effected by skew or outliers.
• Mode: May be used if there is an overwhelming repeat.
• The shape of your data and the existence of any outliers may help
you choose the best average:
http://chandra-silitonga.blogspot.com/2017/09/contoh-soal-
menghitung-mean-median-dan.html
• Quartiles are used to divide the distribution into four parts
or subgroups
• Deciles are used to divide the distribution into ten parts or
subgroups
• Percentiles are used to divide the distribution into
hundred parts or subgroups
𝒌(𝒏 + 𝟏) 𝐢𝐭𝐞𝐦
𝑸𝒌 = 𝐭𝐡
𝟒 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute quartiles for the data given: 25, 18, 30, 8, 15,
5, 10, 35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟏(𝟏𝟎+𝟏)
• 𝑸𝟏 = 𝟒 𝐭𝐡 = 𝟐. 𝟕𝟓 𝐭𝐡 𝐢𝐭𝐞𝐦
• 𝑸𝟏 = 𝟐𝒏𝒅 𝒊𝒕𝒆𝒎 + 𝟎. 𝟕𝟓 𝟑𝒓𝒅 𝒊𝒕𝒆𝒎 − 𝟐𝒏𝒅 𝒊𝒕𝒆𝒎
• 𝑸𝟏 = 𝟖 + 𝟎. 𝟕𝟓 𝟏𝟎 − 𝟖 = 𝟖 + 𝟎. 𝟕𝟓 𝟐 = 𝟖 + 𝟏. 𝟓 = 𝟗. 𝟓
𝒌(𝒏 + 𝟏) 𝐢𝐭𝐞𝐦
𝑫𝒌 = 𝐭𝐡
𝟏𝟎 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute 𝑫𝟑 for the data given: 25, 18, 30, 8, 15, 5, 10,
35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟑(𝟏𝟎 + 𝟏)
𝑫𝟑 = = ⋯.
𝟏𝟎
𝒌(𝒏 + 𝟏) 𝐢𝐭𝐞𝐦
𝑷𝒌 = 𝐭𝐡
𝟏𝟎𝟎 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute 𝑷𝟕𝟓 for the data given: 25, 18, 30, 8, 15, 5, 10,
35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟕𝟓(𝒏 + 𝟏) 𝟕𝟓(𝟏𝟎 + 𝟏)
𝑷𝟕𝟓 = = = ⋯.
𝟏𝟎𝟎 𝟏𝟎𝟎
Grouped Data
The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of
values, or distribution.
𝑛
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿𝑚 + 2 𝑖
𝑓𝑚
25−22
= 20.5 + 10
12
= 23
Thus, 25 persons take less than 23 minutes to travel to work and another 25 persons take more than 23
minutes to travel to work.
A quartile is one of three points that divide a data set into four equal groups,
each representing a fourth of the distributed sampled population.
Using the same method of calculation as in the Median, we can get Q1 and
Q3 equation as follows:
n 3n
4-F 4 -F
Q1 LQ1 + i Q3 LQ3 + i
f Q1 f Q3
Example: Based on the grouped data below, find the Interquartile Range
Time to travel to work Frequency
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
1st Step: Construct the cumulative frequency distribution
n 50 n
Class Q1 12.5 - F
4 4 Q1 LQ1 4 i
fQ1
Class Q1 is the 2nd class
Therefore, 12.5 - 8
10.5 10
14
13.7143
n
3n 3 50 - F
Class Q3 37.5 Q3 LQ3 4 i
4 4 f
Q3
37.5 - 34
Class Q3 is the 4th class 30.5 10
Therefore, 9
34.3889
Interquartile Range
IQR = Q3 – Q1
Where:
i is the class width
1 is the difference between the frequency of class
mode and the frequency of the class before the class
mode
2 is the difference between the frequency of class mode
and the frequency of the class after the class mode
Lmo is the lower boundary of class mode
Based on the grouped data below, find the mode
Solution:
Based on the table,
Lmo = 10.5, 1 = (14 – 8) = 6, 2 = (14 – 12) = 2
and
i = 10
6
Mode = 10.5 10 17.5
6 2
Another important characteristic of quantitative data is how much the
data varies, or is spread out.
The 4 most common method of measuring spread are:
1. Range
2. Mean Absolute Deviation
3. Quartile Deviation
4. Standard Deviation and Variance
81
• The difference between the maximum and minimum data entries in the
set.
• The data must be quantitative.
Range = (Max. data entry) – (Min. data entry)
The wait time to see a bank teller is studied at 2 banks.
• Note: The range is easy to compute, but only uses 2 values. Do the
following 2 sets vary the same?
• Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
• Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
Measures the typical amount data deviates from the mean.
2
Sample Variance, : s
( x x ) 2
s2
n 1
Sample Standard Deviation, s:
( x x ) 2
s s2
n 1
85
x
1.Find the mean of the sample data set. x
n
2.Find deviation of each entry.
3.Square each deviation.
xx
4.Add to get the sum of the deviations (x x ) 2
squared.
( x x ) 2
5.Divide by n – 1 to get the sample
variance. ( x x ) 2
s2
6.Find the square root to get the sample n 1
standard deviation.
( x x ) 2
s
n 1 86
Wait time, x Deviation: x – x Squares: (x – x)2
x 36.5 (in min)
x 7.3 min 5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41
n 5
6.2 6.2 – 7.3 = ( )2 =
7.5 – 7.3 = )2 =
( x x ) 2 7.5 (
s
2
8.4 8.4 – 7.3 = ( )2 =
n 1 9.2 9.2 – 7.3 = ( )2 =
x x
2
x 36.5 Σ(x – x) =
s s2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Wait time, x Deviation: x – x Squares: (x – x)2
x 36.5 (in min)
x 7.3 min
n 5 6.6
6.8
( x x ) 2
7.5
s2
n 1 7.7
7.9
x x
2
x 36.5 Σ(x – x) =
s s 2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Sample Population
Statistics: Parameters:
Mean x µ
Standard s σ
Deviation
Variance s2 σ2
Note: Unlike x and µ, the formulas for s and σ are not mathematically
the same:
Sample Standard Deviation 2
( x x )
s s 2
n 1
https://www.robinsonschools.com/unit2/images/users/dforbes/Stats/Stats_Notes_2.4.pdf
The gas mileage of 2 cars is sampled over various conditions:
Use a calculator to find the mean and standard deviation for each to
justify your choice.
How does “s” show how much the data varies?
Three methods:
1. Range Rule of Thumb
2. Chebyshev’s Theorem
3. The Empirical Rule
Alternatively, If the range is known, you can use the range rule to estimate the
standard deviation:
Range
s
4
• A sample of women’s heights has a mean of 64 inches and a standard
deviation of 2.5 inches. Using the range rule, “most” women fall
within what heights?
• What would be an “unusual” height?
The sample of Exam Scores used in the class handout had a mean of
73.6. Which of the following is most likely the standard deviation of
the sample?