MTPDF1 - Introduction To Statistics

Engineering Data Analysis
Introduction to
Statistics
MPS Department | FEU Institute of Technology
Recalling Basic
Concepts
Subtopic 1
OBJECTIVES
 Recall basic statistical concepts and sampling techniques

Subtopic 1
Recalling Basic Concepts
 Basic Statistical Terminologies

 Types of Data
 Levels of Measurement https://psihomedeor.ro/blog/psihoterapie-consiliere-
psihologica/page/8/
1. Data are everywhere
2. Statistical techniques are used to make many decisions that affect
our lives
3. No matter what your career, you will make professional decisions
that involve data. An understanding of statistical methods will help
you make these decisions efectively
• The science of collecting, organizing, presenting, analyzing, and
interpreting data to assist in making more effective decisions
• Statistical analysis – used to manipulate, summarize, and investigate
data, so that useful decision-making information results.
The study of statistics has two major branches: descriptive statistics
and inferential statistics.
Statistics
Descriptive statistics Inferential statistics

• Descriptive statistics – Involves the organization, summarization, and
display of data.
• Inferential statistics – Involves using a sample to draw conclusions

about a population.
Population –The entire set of individuals or objects of interest or the

measurements obtained from all individuals or objects of interest
Sample – A portion, or part, of the population of interest
Descriptive statistics consists of the collection, organization, summarization,
and presentation of data.
• Collect data https://docplayer.es/50880990-Preparacion-

de-propuestas-en-horizonte-puerto-real-18-
de-junio-de-2015.html
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Summarize data
• e.g., Sample mean = X i
n
Inferential statistics consists of generalizing from samples to populations, performing estimations and
hypothesis tests, determining relationships among variables, and making predictions.
• Estimation
• e.g., Estimate the population mean weight using
the sample mean weight
• Hypothesis testing
• e.g., Test the claim that the population mean
http://eceresearchmethods.tripod.com/sitebuilderc
weight is 70 kg ontent/sitebuilderfiles/methres.pdf
Inference is the process of drawing conclusions or making decisions about a population based on sample
results
• In a recent study, volunteers who had less than 6 hours of sleep were four times
more likely to answer incorrectly on a science test than were participants who
had at least 8 hours of sleep. Decide which part is the descriptive statistic and
what conclusion might be drawn using inferential statistics.
The statement “four times more likely to answer incorrectly” is a descriptive

statistic. An inference drawn from the sample is that all individuals sleeping less
than 6 hours are more likely to answer science question incorrectly than
individuals who sleep at least 8 hours.
Determine in what area of statistics does each of the following statements belong:
1. The average points per game, percent of free throws made, average number of rebounds per game,
and average number of fouls per game as well as several other measures for players in the NBA are
computed.
Ans. Descriptive
2. Ten percent of the boxes of cereal sampled by a quality technician are found to be under the labeled
weight. Based on this finding, the filling machine is adjusted to increase the amount of fill.
Ans. Inferential
3. A student determines the average weekly amount spent for food in the past 3 months
Ans. Descriptive
4. Based on a study of 500 single parent households by a social researcher, a magazine reports that
25% of all single parent households are headed by a high school dropout.
Ans. Inferential
5. A researcher claim that a new drug will reduce the number of heart attacks in men over 70 years
old.
Ans. Inferential
A population consists of all subjects (human or otherwise) that are
being studied.
A sample is a group of subjects selected from a population.
https://www.researchgate.net/profile/Ahmad_Al_Musawi/publication/323358065_Chapter_Two_Understa
nding_Data_Arabic/links/5a8fe72745851535bcd41d4b/Chapter-Two-Understanding-Data-Arabic.pdf
In a recent survey, 250 college students at Union College were asked if
they go to library regularly. 35 of the students said yes. Identify the
population and the sample.
Responses of all students at Union
College (population)
Responses of students in
survey (sample)
Data consists of information coming from observations, counts,
measurements, or responses. Most data can be put into the following
categories:
• Qualitative - data are measurements that each fail into one of several
categories. (hair color, ethnic groups and other attributes of the
population)
• Quantitative - data are observations that are measured on a
numerical scale (distance traveled to college, number of children in a
family, etc.)
Statistical data are usually obtained by counting or measuring items.
 Primary data are collected specifically for the analysis desired
 Secondary data have already been compiled and are available for statistical
analysis
 A variable is an item of interest that can take on many different

numerical values. Variables whose values are determined by chance
are called random variables.
 A constant has a fixed numerical value.

Data sets can consist of two types of data: qualitative data and quantitative data.
Data
Qualitative Data Quantitative Data
Consists of attributes, Consists of numerical

labels, or measurements or counts.
nonnumerical entries.
Qualitative data are generally described by words or letters. They are not as widely
used as quantitative data because many numerical techniques do not apply to
the qualitative data. For example, it does not make sense to find an average hair
color or blood type.
Qualitative data can be separated into two subgroups:

 dichotomic (if it takes the form of a word with two options (gender - male or
female)
 polynomic (if it takes the form of a word with more than two options (education
- primary school, secondary school and university).
Quantitative data are always numbers and are the result of counting or measuring
attributes of a population.
Quantitative data can be separated into two subgroups:

• discrete (if it is the result of counting (the number of students of a given ethnic
group in a class, the number of books on a shelf, ...)
• continuous (if it is the result of measuring (distance traveled, weight of luggage,
…)
Determine whether the data is discrete or continuous.
1. Distance when you throw a baseball

Ans. Continuous
2. Number of pages of the new published book
Ans. Discrete
3. Amount of possible yields (in grams) from a certain chemical reaction
carried out in the laboratory
Ans. Continuous
4. Sum of points in tossing a pair of dice
Ans. Discrete
• Nominal – consist of categories in each of which the number of respective observations is
recorded. The categories are in no logical order and have no particular relationship. The
categories are said to be mutually exclusive since an individual, object, or measurement can
be included in only one of them.
• Ordinal – contain more information. Consists of distinct categories in which order is implied.
Values in one category are larger or smaller than values in other categories (e.g. rating-
excellent, good, fair, poor)
• Interval – is a set of numerical measurements in which the distance between numbers is of a
known, constant size.
• Ratio – consists of numerical measurements where the distance between numbers is of a
known, constant size, in addition, there is a nonarbitrary zero point.
The level of measurement determines which statistical calculations are meaningful. The
four levels of measurement are: nominal, ordinal, interval, and ratio.
Nominal
Levels of Lowest to
Measurement
Ordinal highest
Interval
Ratio
Data at the nominal level of measurement are qualitative only.
Nominal
Levels of Measurement Calculated using names, labels, or qualities. No
mathematical computations can be made at this level.
Colors in the US Names of students in your Textbooks you are using this
flag class semester
Data at the ordinal level of measurement are qualitative or quantitative.
Levels of Measurement Ordinal

Arranged in order, but differences between data
entries are not meaningful.
Class standings: freshman, Numbers on the back of each Top 50 songs played on the
sophomore, junior, senior player’s shirt radio
Data at the interval level of measurement are quantitative. A zero entry simply represents
a position on a scale; the entry is not an inherent zero.
Levels of Measurement
Interval
Arranged in order, the differences between data entries can be
calculated.
Temperatures Years on a timeline Atlanta Braves World Series

victories
Data at the ratio level of measurement are similar to the interval level, but a zero entry
(absolute zero) is meaningful.
A ratio of two data values can be formed so one data value can be
Levels of Measurement
expressed as a ratio.
Ratio
Ages Grade point averages Weights

https://www.shutterstock.com/s
earch/fahrenheit+thermometer
Interval Ratio
https://lh3.googleusercontent.com/PKumoPGYG
Nf7Sm9JIDGKpRyWYsFTpWJcmPU051kKVqJiJa2N
ZMgelCgMWvluEqAvf4q80eE=s85
https://www.gs1-
us.info/product-number/
Ordinal https://www.mymarketresearchmeth
ods.com/types-of-data-nominal-
ordinal-interval-ratio/
https://quizlet.com/28350766
2/statistics-polit-and-beck-
2018-chapter-14-flash-cards/
Nominal
Arrange Determine if one
Level of Put data in Subtract data
data in data value is a
measurement categories values
order multiple of another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
Source: Elementary Statistics by Bluman
Elementary Statistics by Bluman
• https://psihomedeor.ro/blog/psihoterapie-consiliere-psihologica/page/8/
• https://docplayer.es/50880990-Preparacion-de-propuestas-en-horizonte-puerto-real-18-de-
junio-de-2015.html
• http://eceresearchmethods.tripod.com/sitebuildercontent/sitebuilderfiles/methres.pdf
• https://www.researchgate.net/profile/Ahmad_Al_Musawi/publication/323358065_Chapter_T
wo_Understanding_Data_Arabic/links/5a8fe72745851535bcd41d4b/Chapter-Two-
Understanding-Data-Arabic.pdf
• https://quizizz.com/
• https://lh3.googleusercontent.com/PKumoPGYGNf7Sm9JIDGKpRyWYsFTpWJcmPU051kK
VqJiJa2NZMgelCgMWvluEqAvf4q80eE=s85
• https://www.shutterstock.com/search/fahrenheit+thermometer
• https://www.gs1-us.info/product-number/
• https://quizlet.com/283507662/statistics-polit-and-beck-2018-chapter-14-flash-cards/
• https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
Descriptive Statistics

Subtopic 2
OBJECTIVES
 Summarize and present data in different forms

Subtopic 2
 Data Presentation
 Measures of Central tendency
 Measures of Dispersion
 Measures of Position
Descriptive statistics consists of the collection, organization,
summarization, and presentation of data.
• Collect data https://docplayer.es/50880990-Preparacion-
de-propuestas-en-horizonte-puerto-real-18-
de-junio-de-2015.html
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Summarize data
• e.g., Sample mean = X i
n
• Describes the important characteristics of a set of data.
• Organize, present, and summarize data:
1. Graphically
2. Numerically
“Shape, Center, and Spread”
• Center: A representative or average value that indicates where the
middle of the data set is located.
• Variation: A measure of the amount that the values vary among

themselves.
• Distribution: The nature or shape of the distribution of data (such as

bell-shaped, uniform, or skewed).
Frequency Distributions
and Their Graphs
A table that organizes data values into classes or intervals along with
number of values that fall in each class (frequency, f ).
1. Ungrouped Frequency Distribution – for data sets with few different values.
Each value is in its own class.
2. Grouped Frequency Distribution: for data sets with many different values,
which are grouped together in the classes.
Ungrouped Grouped
Courses Frequency, f Age of Frequency, f

Taken Voters
1 25 18-30 202
2 38 31-42 508
3 217 43-54 620
4 1462 55-66 413
5 932 67-78 158
6 15 78-90 32
Number of Peas in a Pea Pod
Sample Size: 50 Peas per pod Freq, f
5 5 4 6 4
1 1
3 7 6 3 5
6 5 4 5 5 2 2
6 2 3 5 5 3 5
5 5 7 4 3
4 9
4 5 4 5 6
5 18
5 1 6 2 6
6 6 6 6 4 6 12
4 5 4 5 3
7 3
5 5 7 6 5
Frequency Histogram
• A bar graph that represents the frequency distribution.
• The horizontal scale is quantitative and measures the data values.
• The vertical scale measures the frequencies of the classes.
• Consecutive bars must touch.
frequency
data values
Ex. Peas per Pod
Peas per pod Freq, f Number of Peas in a Pod
1 1
20
2 2
15
Frequency, f
3 5
10
4 9
5
5 18
0
6 12
1 2 3 4 5 6 7
7 3 Number of Peas
Relative Frequency Distribution
• Shows the portion or percentage of the data that falls in a particular
class.
class frequency f
relative frequency  
Sample size n
Relative Frequency Histogram

• Has the same shape and the same horizontal scale as the
corresponding frequency histogram.
• The vertical scale measures the relative frequencies, not frequencies.
• Has the same shape and horizontal scale as a histogram, but the
vertical scale is marked with relative frequencies.
• For data sets with many different values.
• Groups data into 5-20 classes of equal width.
Exam Scores Freq, f Exam Scores Freq, f
30-39 30-39 1
40-49 40-49 0
50-59 50-59 4
60-69 60-69 9
70-79 70-79 13
80-89 80-89 10
90-99 90-99 3
• Lower class limits: are the smallest numbers that can actually belong
to different classes
• Upper class limits: are the largest numbers that can actually belong
to different classes
• Class width: is the difference between two consecutive lower class

limits (or upper class limits)
• Class midpoints: the value halfway between LCL and UCL:
(Lower class limit)  (Upper class limit)
2
• Class boundaries: the value halfway between an UCL and the next LCL
(Upper class limit)  (next Lower class limit)

2
1. Determine the range of the data:
• Range = highest data value – lowest data value
• May round up to the next convenient number
2. Decide on the number of classes.
• Usually between 5 and 20; otherwise, it may be difficult to detect any
patterns.
3. Find the class width:
range
class width =
number of classes
• Round up to the next convenient number.
4. Find the class limits.
• Choose the first LCL: use the minimum data entry or something smaller that is
convenient.
• Find the remaining LCLs: add the class width to the lower limit of the preceding
class.
• Find the UCLs: Remember that classes must cover all data values and cannot
overlap.
5. Find the frequencies for each class. (You may add a tally column first
and make a tally mark for each data value in the class).
Symmetric
• Data is symmetric if the left half of its histogram is roughly a mirror
image of its right half.
Skewed
• Data is skewed if it is not symmetric and if it extends more to one
side than the other.
Uniform
• Data is uniform if it is equally distributed (on a histogram, all the
bars are the same height or approximately the same height).
Symmetric Uniform
Skewed left Skewed Right
Source: Elementary Statistics by Bluman

Unusual data values as compared to the rest of the set. They may be
distinguished by gaps in a histogram.
https://mikerogerstrg.wordpress.com/2015/01/23/outliers-
escaping-average-and-becoming-great/
• A value that represents a typical, or central, entry of a data set.
• Most common measures of central tendency:
• Mean
• Median
• Mode
The sum of all the data entries divided by the number of entries.
• Population mean: x

N
• Sample mean: x
x
n
The weighted mean is a type of mean that is calculated by multiplying
the weight (or probability) associated with a particular event or
outcome with its associated quantitative outcome and then summing
all the products together.
• The value that lies in the middle of the data when the data set is
arranged in order from lowest to highest. .
• Measures the center of an ordered data set by dividing it into two equal
parts.
• A sample mean is often referred to as ~ x.
• If the data set has an
• odd number of entries: median is the middle data entry.
• even number of entries: median is the mean of the two middle data entries.
If the data set has an:
•odd number of entries: median is the middle data entry:
2 5 6 11 13
median is the exact middle value: 𝑥=6
•even number of entries: median is the mean of the two middle data entries:
2 5 6 7 11 13
6+7
𝑥= = 6.5
2
median is the mean of the by two numbers:
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency, each entry is a
mode (bimodal).
a) 5.40 1.10 0.42 0.73 0.48 1.10  Mode is 1.10
b) 27 27 27 55 55 55 88 88 99  Bimodal - 27 & 55
c) 1 2 3 6 7 8 9 10  No Mode
https://slideplayer.com/slide/10513276/
All three measures describe an “average”. Choose the one that best
represents a “typical” value in the set.
• Mean:
• The most familiar average.
• A reliable measure because it takes into account every entry of a data set.
• May be greatly affected by outliers or skew.
• Median:
• A common average.
• Not as effected by skew or outliers.
• Mode: May be used if there is an overwhelming repeat.
• The shape of your data and the existence of any outliers may help
you choose the best average:
http://chandra-silitonga.blogspot.com/2017/09/contoh-soal-
menghitung-mean-median-dan.html
• Quartiles are used to divide the distribution into four parts
or subgroups
• Deciles are used to divide the distribution into ten parts or
subgroups
• Percentiles are used to divide the distribution into
hundred parts or subgroups
𝒌(𝒏 + 𝟏) 𝐢𝐭𝐞𝐦
𝑸𝒌 = 𝐭𝐡
𝟒 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute quartiles for the data given: 25, 18, 30, 8, 15,
5, 10, 35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟏(𝟏𝟎+𝟏)
• 𝑸𝟏 = 𝟒 𝐭𝐡 = 𝟐. 𝟕𝟓 𝐭𝐡 𝐢𝐭𝐞𝐦
• 𝑸𝟏 = 𝟐𝒏𝒅 𝒊𝒕𝒆𝒎 + 𝟎. 𝟕𝟓 𝟑𝒓𝒅 𝒊𝒕𝒆𝒎 − 𝟐𝒏𝒅 𝒊𝒕𝒆𝒎
• 𝑸𝟏 = 𝟖 + 𝟎. 𝟕𝟓 𝟏𝟎 − 𝟖 = 𝟖 + 𝟎. 𝟕𝟓 𝟐 = 𝟖 + 𝟏. 𝟓 = 𝟗. 𝟓
𝑫𝒌 = 𝐭𝐡
𝟏𝟎 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute 𝑫𝟑 for the data given: 25, 18, 30, 8, 15, 5, 10,
35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟑(𝟏𝟎 + 𝟏)
𝑫𝟑 = = ⋯.
𝟏𝟎
𝑷𝒌 = 𝐭𝐡
𝟏𝟎𝟎 𝐨𝐛𝐬𝐞𝐫𝐯𝐚𝐭𝐢𝐨𝐧
• Compute 𝑷𝟕𝟓 for the data given: 25, 18, 30, 8, 15, 5, 10,
35, 40, 45
• Arrange the data: 5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝟕𝟓(𝒏 + 𝟏) 𝟕𝟓(𝟏𝟎 + 𝟏)
𝑷𝟕𝟓 = = = ⋯.
𝟏𝟎𝟎 𝟏𝟎𝟎
Grouped Data
The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of
values, or distribution.
Example: The following table gives the Number f

of order
frequency distribution of the number
10 – 12 4
of orders received each day during the 13 – 15 12
past 50 days at the office of a mail-order 16 – 18 20
company. Calculate the mean. 19 – 21 14
n = 50
Solution:
Number f x fx X is the midpoint of the

of order class. It is adding the class
10 – 12 4 11 44
13 – 15 12 14 168
limits and divide by 2.
16 – 18 20 17 340 x=
 fx = 832 =16.64
19 – 21 14 20 280 n 50
n = 50 = 832
Step 1: Construct the cumulative
frequency distribution.
A median is described as the numerical value separating the Step 2: Decide the class that contain the
higher half of a sample, a population, or a probability median
distribution, Class Median is the first class with the
value of cumulative frequency equal at
least n/2.
Step 3: Find the median by using the
following formula:
 n 
 2 -F 
Median = Lm + i
 fm 
Where:  
n = the total frequency
F = the cumulative frequency before class median
i = the class width
Lm== the lower boundary of the class median
the frequency of the class median
fm
Example: Based on the grouped data below, find the median:
Time to travel to work Frequency
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
1st Step: Construct the cumulative frequency distribution
Time to travel Frequency Cumulative
to work Frequency
1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50
n 50
  25 class median is the 3rd class
2 2
So, F = 22, = 12, fm = 20.5 and
Lm i = 10
Therefore,
𝑛
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿𝑚 + 2 𝑖
𝑓𝑚
25−22
= 20.5 + 10
12
= 23
Thus, 25 persons take less than 23 minutes to travel to work and another 25 persons take more than 23
minutes to travel to work.
A quartile is one of three points that divide a data set into four equal groups,
each representing a fourth of the distributed sampled population.
Using the same method of calculation as in the Median, we can get Q1 and
Q3 equation as follows:
n   3n 
 4-F   4 -F 
Q1  LQ1 +  i Q3  LQ3 +  i
 f Q1   f Q3 
   
Example: Based on the grouped data below, find the Interquartile Range
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
1st Step: Construct the cumulative frequency distribution
Time to travel Frequency Cumulative

to work Frequency
1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50
2nd Step: Determine the Q1 and Q3
n 50 n 
Class Q1    12.5  - F 
4 4 Q1  LQ1   4 i
 fQ1 
Class Q1 is the 2nd class  
Therefore,  12.5 - 8 
 10.5   10
 14 
 13.7143
n 
3n 3  50   - F 
Class Q3    37.5 Q3  LQ3   4 i
4 4 f
 Q3 
 
 37.5 - 34 
Class Q3 is the 4th class  30.5   10
Therefore,  9 
 34.3889
Interquartile Range
IQR = Q3 – Q1
Calculate the IQR

IQR = Q3 – Q1 = 34.3889 – 13.7143 = 20.6746
•Mode is the value that has the highest frequency in a data set.
•For grouped data, class mode (or, modal class) is the class with the highest
frequency.
•To find mode for grouped data, use the following formula:
 Δ1 
Mode = Lmo +  i
Δ
 1 + Δ2 
Where:
i is the class width
1 is the difference between the frequency of class
mode and the frequency of the class before the class
mode
 2 is the difference between the frequency of class mode
and the frequency of the class after the class mode
Lmo is the lower boundary of class mode
Based on the grouped data below, find the mode

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
Based on the table,
Lmo = 10.5, 1 = (14 – 8) = 6, 2 = (14 – 12) = 2
and
i = 10
 6 
Mode = 10.5   10  17.5
 6  2 
Another important characteristic of quantitative data is how much the
data varies, or is spread out.
The 4 most common method of measuring spread are:
1. Range
2. Mean Absolute Deviation
3. Quartile Deviation
4. Standard Deviation and Variance
81
• The difference between the maximum and minimum data entries in the
set.
• The data must be quantitative.
Range = (Max. data entry) – (Min. data entry)
The wait time to see a bank teller is studied at 2 banks.
Bank A has multiple lines, one for each teller.

Bank B has a single wait line for 1st available teller.
5 wait times (in minutes) are sampled from each bank:

Bank A: 5.2 6.2 7.5 8.4 9.2
Bank B: 6.6 6.8 7.5 7.7 7.9
Find the mean, median, and range for each bank.
• Bank A: Range = ?
• Bank B: Range = ?
• Note: The range is easy to compute, but only uses 2 values. Do the
following 2 sets vary the same?
• Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
• Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
Measures the typical amount data deviates from the mean.
2
Sample Variance, : s
 ( x  x ) 2
s2 
n 1
Sample Standard Deviation, s:
( x  x ) 2
s s2 
n 1
85
x
1.Find the mean of the sample data set. x 
n
2.Find deviation of each entry.
3.Square each deviation.
xx
4.Add to get the sum of the deviations (x  x ) 2
squared.
( x  x ) 2
5.Divide by n – 1 to get the sample
variance.  ( x  x ) 2
s2 
6.Find the square root to get the sample n 1
standard deviation.
( x  x ) 2
s
n 1 86
Wait time, x Deviation: x – x Squares: (x – x)2
x 36.5 (in min)
x   7.3 min 5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41
n 5
6.2 6.2 – 7.3 = ( )2 =
7.5 – 7.3 = )2 =
( x  x ) 2 7.5 (
s 
2
 8.4 8.4 – 7.3 = ( )2 =
n 1 9.2 9.2 – 7.3 = ( )2 =
 x  x  
2
 x  36.5 Σ(x – x) =
s s2 
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Wait time, x Deviation: x – x Squares: (x – x)2
x 36.5 (in min)
x   7.3 min
n 5 6.6
6.8
 ( x  x ) 2
7.5
s2  
n 1 7.7
7.9
 x  x  
2
 x  36.5 Σ(x – x) =
s s 2

• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Sample Population
Statistics: Parameters:
Mean x µ
Standard s σ
Deviation
Variance s2 σ2
Note: Unlike x and µ, the formulas for s and σ are not mathematically
the same:
Sample Standard Deviation 2
( x  x )
s s  2
n 1
Population Standard Deviation

( x   ) 2
  2 
N
• Standard deviation is a measure of the typical amount an entry
deviates from the mean.
• The more the entries are spread out, the greater the standard
deviation.
https://www.robinsonschools.com/unit2/images/users/dforbes/Stats/Stats_Notes_2.4.pdf
The gas mileage of 2 cars is sampled over various conditions:
Car A: 21.1 21.2 20.8 19.8 23.8 (mpg)

Car B: 25.2 19.1 18.0 24.4 20.3 (mpg)
Which car do you think gets “better” mpg?
Use a calculator to find the mean and standard deviation for each to
justify your choice.
How does “s” show how much the data varies?
Three methods:
1. Range Rule of Thumb
2. Chebyshev’s Theorem
3. The Empirical Rule
Alternatively, If the range is known, you can use the range rule to estimate the
standard deviation:
Range
s
4
• A sample of women’s heights has a mean of 64 inches and a standard
deviation of 2.5 inches. Using the range rule, “most” women fall
within what heights?
• What would be an “unusual” height?
The sample of Exam Scores used in the class handout had a mean of
73.6. Which of the following is most likely the standard deviation of
the sample?
s = 3.6 s = 12.8 s = 74.5
Use the range rule to help justify your choice.

For data with any distribution, the proportion (or fraction) of any set of
data lying within K standard deviations of the mean is always at least 1-
1/K2, where K is any positive number greater than 1.
• For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard
deviations of the mean
• For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard
deviations of the mean
Source: Elemntary Statistics by Bluman
A sample of salaries at an elementary school has a mean of $32,000
and a standard deviation of $3000.
Use Chebyshev’s Theorem to describe how the salaries are spread out.
Would a salary of $28,000 be “unusual?”
Would a salary of $45,000 be “unusual”?
For data sets having a symmetric/bell shaped distribution:
• About 68% of all values fall within 1 standard deviation of the mean
• About 95% of all values fall within 2 standard deviations of the mean
• About 99.7% of all values fall within 3 standard deviations of the
mean
https://www.semanticscholar.org/paper/An-investigation-into-
undergraduate-student%27s-in-%3A-
Onyancha/de966fe2c3e5d73aea742cf831f0bc9996f753c6/figure/39
A sample of IQs has a symmetric distribution with a mean of 100 and a
standard deviation of 15.
1. Sketch the distribution.

2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?
Elementary Statistics by Bluman
• https://docplayer.es/50880990-Preparacion-de-propuestas-en-horizonte-puerto-real-18-de-
junio-de-2015.html
• https://mikerogerstrg.wordpress.com/2015/01/23/outliers-escaping-average-and-becoming-
great/
• https://slideplayer.com/slide/10513276/
• http://chandra-silitonga.blogspot.com/2017/09/contoh-soal-menghitung-mean-median-
dan.html
• https://www.robinsonschools.com/unit2/images/users/dforbes/Stats/Stats_Notes_2.4.pdf
• https://www.semanticscholar.org/paper/An-investigation-into-undergraduate-student%27s-
in-%3A-Onyancha/de966fe2c3e5d73aea742cf831f0bc9996f753c6/figure/39

MTPDF1 - Introduction To Statistics

Uploaded by

Copyright:

Available Formats

MTPDF1 - Introduction To Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MTPDF1 - Introduction To Statistics

Uploaded by

Copyright:

Available Formats

Engineering Data Analysis

 Recall basic statistical concepts and sampling techniques

 Basic Statistical Terminologies

Descriptive statistics Inferential statistics

• Inferential statistics – Involves using a sample to draw conclusions

Population –The entire set of individuals or objects of interest or the

• Collect data https://docplayer.es/50880990-Preparacion-

The statement “four times more likely to answer incorrectly” is a descriptive

 A variable is an item of interest that can take on many different

 A constant has a fixed numerical value.

Qualitative Data Quantitative Data

Consists of attributes, Consists of numerical

Qualitative data can be separated into two subgroups:

Quantitative data can be separated into two subgroups:

1. Distance when you throw a baseball

Levels of Measurement Ordinal

Temperatures Years on a timeline Atlanta Braves World Series

Ages Grade point averages Weights

MPS Department | FEU Institute of Technology

 Summarize and present data in different forms

• Variation: A measure of the amount that the values vary among

• Distribution: The nature or shape of the distribution of data (such as

Courses Frequency, f Age of Frequency, f

Relative Frequency Histogram

• Class width: is the difference between two consecutive lower class

(Upper class limit)  (next Lower class limit)

Skewed left Skewed Right

Source: Elementary Statistics by Bluman

median is the exact middle value: 𝑥=6

Example: The following table gives the Number f

Number f x fx X is the midpoint of the

Time to travel Frequency Cumulative

Calculate the IQR

Time to travel to work Frequency

Bank A has multiple lines, one for each teller.

5 wait times (in minutes) are sampled from each bank:

Population Standard Deviation

Car A: 21.1 21.2 20.8 19.8 23.8 (mpg)

Which car do you think gets “better” mpg?

s = 3.6 s = 12.8 s = 74.5

Use the range rule to help justify your choice.

1. Sketch the distribution.

You might also like