Chap 4
Chap 4
Chap 4
CHAPTER 3
DESCRIBING DATA
Coefficient of
Variation
Content
Mean
Median
Mode
Weighted Mean
Range
Mean (Arithmetic Average)
Cha
p 3- (continued)
4
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Mean
x i
x1 x 2 x n
x i 1
n n
Population mean
N = Population Size
N
x x1 x 2 x N
i
i 1
N N
Ex : Suppose that in thirty shots at a target,
a mark makes the following scores:
5 2 2 3 4 4 3 2 0 3 0 3 2 1 5 1 3 1 5 5 2 4 0 0 4
54455
Calculate Mean
Weighted Mean
Example: Sample of
26 Repair Projects
Days to
Frequency
Complete Weighted Mean Days
5 4 to Complete:
6 12
7 8
8 2
XW
w x
i i
(4 5) (12 6) (8 7) (2 8)
w i 4 12 8 2
164
6.31 days
26
Weighted mean –Case 2
Decisions : Weighted means can help with decisions
where some things are more important than others.
Ex: Sam want to buy a new cammera, and decides on the
following rating system:
Image Quality 50%
Battery life 30%
Zoom Range 20%
The sony camera gets 8 ( out of 10) for Image Quality, 6 for Battery
Life and 7 for Zoom Range
The Canon camera gets 9 for Image Quality, 4 for Battery Life and
6 for Zoom Range.
Which cammera is best
Sony: 0.5 x 8 + 0.3 x 6 + 0.2 x 7 = 7.2
Canon : 0.5 x 9 + 0.3 x 4 + 0.2 x 6 = 6.9
-> Sam decides to buy the Sony.
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
How to find?
Putting the number in order
Counting how many of each number ( or the highest
frequency of value is mode)
Find the mode
1, 3, 3, 3, 4, 4, 6, 6, 6, 9
3 appears three times, as does 6
-> there are two mode: at 3 and 6
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Disadvantage of range
The range can sometims be misleading when there
are extremely high or low values.
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Review Example
Cha
p 3-
40
Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000
= $600,000
500,000
300,000
100,000
100,000
Median: middle value of ranked data
Sum 3,000,000 = $300,000
How to find?
* First put the list of numbers in order
* Then cut the list into four equal parts
-> The quartiles are at the”cuts”
Example
5, 8, 4, 4, 6, 3, 8
2007 2008 Here are the top eleven 50 m goat racing times in
12.1 12.3 seconds for 2007 and 2008.
14.0 13.7 Work out the mean and range.
15.3 15.5
2007 2008
15.4 15.5
Mean 15.4 16.1
15.4 15.6
Range 4.9 10.6
15.6 15.9
15.7 16.0 Which year was better and why?
15.7 16.1 Why might this comparison be unfair?
16.1 16.1
16.7 17.1 The interquartile range is a better measure of spread
17.0 22.9 when the data contains an outlier.
Noted
interquartile range =
upper quartile – lower quartile
Box and Whisker Diagrams.
4 5 6 7 8 9 10 11 12
Boys
Box plots are useful for comparing two or more sets of data like
that shown below for heights of boys and girls in a class.
S1: Sort the data
S2: Calculate quartiles
S3: draw the box correspond to Q1 and Q3
S4: Draw a vertical line through the box at the median
S5: compute the upper and lower limit
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
4 5 6 7 8 9 10 11 12
Drawing a Box Plot.
Q1 Q2 Q3
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
3 4 5 6 7 8 9 10 11 12 13 14 15
Drawing a Box Plot.
137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186
Lower Upper
Quartile Median Quartile
= 158 = 171 = 180
Girls
1. The girls are taller on average. 2. The boys are taller on average.
3. The girls show less variability in height. 5. The smallest person is a girl.
4. The boys show less variability in height. 6. The tallest person is a boy.
PERCENTILES
i
( x ) 2
2 i 1
N
• Where 2 stands for the population variance
• is the population mean
• N is the total number of values in the population
• xi is the value of the i-th observation.
• represents a summation
These are the numbers of newspapers sold at the local shop over the
last 20 days:
22, 20, 18, 23, 20, 25, 22, 20, 18, 20, 19 ,19 ,20, 22, 21, 20, 21,
23, 25, 29.
Standard Deviation
Standard deviation is a measure of how spread out
numbers are
It is a symbol is
The formula is the square root of the variance
Population Variance
In practice population variance cannot be
computed directly because the entire population is
not ordinarily observed.
An analogous measure of variability may be
determined with sample data.
This referred to as sample variance
MEASURES OF VARIABILITY
SAMPLE VARIANCE
i
( x x ) 2
s2 i 1
n 1
• Where s2 stands for the sample variance
• x is the sample mean
• n is the total number of values in the sample
• xi is the value of the i-th observation.
• represents a summation
MEASURES OF VARIABILITY
POPULATION/SAMPLE STANDARD DEVIATION
Population coefficient of variation: CV
s
Sample coefficient of variation: cv
x
Coefficient of Variation
Cha
p 3-
83
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data
measured in different units
Population Sample
σ s
CV 100% CV 100%
μ x
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Comparing Coefficient
Cha of Variation
p 3-
84
Stock A:
Average price last year = $50
Standard deviation = $5
s $5
CVA 100%
100% 10%
x $50 Both stocks
have the same
Stock B: standard
Average price last year = $100 deviation, but
stock B is less
Standard deviation = $5
variable relative
to its price
s $5
CVB 100% 100% 5%
x $100
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Data can be “distributed” (spread out) in different
way.
Skewed and Symmetric Data
Data in a population or sample can be either
symmetric or skewed ( shape of data), depending
on how the data are distributed around the center.
Use for
In probability theory,
the normal (or Gaussian)distribution is a very
common continuous probability distribution.
68%
μ
μ 1σ
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Cha
The Empirical Rule
p 3-
102
μ 2σ contains about 95% of the values in
the population or the sample
μ 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-
Hall, Inc.
Standardized Data Values
Cha
p 3-
103
referred to as z-scores
x μ
z
σ
where:
x = original data value
μ = population mean
z = standard score
xx
z
s
where:
x = original data value
x = sample mean
z = standard score
The mean is 22.5 and the standard deviation is 6.75 and these are
standard scores:
-0.45 ; -1.21, 1.36, - 0.76, 0.76, 1.82, -1.36, 0.45, -0.15, -0.91
Therefore, 2 students will fail ( the one who scored 15 and 14 on the test)
In more detail
Here is the standard normal distribution with
percentages for every half of a standard deviation
and cumulative percentages.
Your score is a recent test was 0.5 standard
deviations above the average, how many people
scored lower than you did? ( P (z <0.5)
Question 1
95% of students at school weight between 62 kg and
90 kg.
Assuming this data is normally distributed, what are
the mean and standard deviation.
Question 2
Start at the row for 0.4 and read along until 0.45:
there is the value 0.1736
And 0.1736 is 17.36%
So 17.36% of the population are between 0.045
standard deviations from the mean.
Because the curve is symmetrical, the same table
can be used for value going either direction, so
negative 0.45 (-0.45) also has an area of 0.1736.
Example
Find the percent of population z between -1 and + 2
Use the standard Normal Distribution table to find
P(0<Z
P (Z
P ( -1.65 <Z
P (0.85 < Z
P (Z>1.75)
P (Z -0.69)
P (-1.27 < Z
P (Z >-2.64)
P (Z 0.96)
Find Z when you know percentage.
P( Z to + ) = 50.8%
P ( - to Z) =30.85%
P ( -2 to Z ) = 11.29%
P ( Z to 3) = 0.3%
Question 9
The mean July daily rainfall in Waterville is 10mm
and the standard deviation is 1.5mm
Assume that this data is normally distributed
How many days in July would you expect the daily
rainfall to be less than 8.5 mm?
Trong clip “Em gái mưa” của Hương Tràm. Người
ta đo được lượng mưa trung bình trong mỗi cảnh
mưa là 20mm, độ lệch chuẩn là 3mm.
Giả sử rằng lượng mưa theo phân phối chuẩn.
Hỏi có tổng cộng bao nhiêu cảnh mưa có lượng
mưa dưới 16 mm. Biết rằng có tổng cộng 40 cảnh
mưa trong clip của Hương Tràm.
Độ tuổi trung bình trong một tổng thể của người
dân tại địa phương là 43, độ lệch chuẩn ( standard
deviation là 14. Địa phương có 5,000 người. Hỏi có
bao nhiêu người trong độ tuổi từ 22 đến 57. Giả sử
rằng đây là phân phối chuẩn ( normal distribution)
Biết trung bình trong một tổng thể là 20, độ lệch
chuẩn (standard deviation) là 3. Tổng thể có 2000.
Lưu ý giả sử rằng đây là phân phối chuẩn (normal
distribution). Hỏi:
a. Có bao nhiêu giá trị từ 14 đến 17. (1 điểm)
b. Người ta biết rằng trong cuộc điều tra này giá trị
từ 17 đến x chiếm 68.28%. Tìm x ( 1 điểm)
C. Người ta biết rằng trong cuộc điều tra này giá
trị lớn hơn hoặc bằng x có 131 người ( số chưa
làm tròn) . Tìm x
The mean July daily rainfall in Waterville is 10mm
and the standard deviation is 1.5mm
Assume that this data is normally distributed
How many days in July would you expect the daily
rainfall from 8.5 mm to 11.5?
Thank you