Discrete Distributions: Business Statistics, 4E, by Ken Black. © 2003 John Wiley & Sons
Discrete Distributions: Business Statistics, 4E, by Ken Black. © 2003 John Wiley & Sons
Discrete Distributions: Business Statistics, 4E, by Ken Black. © 2003 John Wiley & Sons
Statistics
Discrete Distributions
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-1
Learning Objectives
1. Distinguish between measures of central tendency,
measures of variability, measures of shape, and
measures of association.
2. Understand the meanings of mean, median, mode,
quartile, percentile, and range.
3. Compute mean, median, mode, percentile, quartile,
range, variance, standard deviation, and mean
absolute deviation on ungrouped data.
4. Differentiate between sample and population variance
and standard deviation.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-2
Learning Objectives --
Continued
5. Understand the meaning of standard deviation as it is
applied by using the empirical rule and Chebyshev’s
theorem.
6. Compute the mean, median, standard deviation, and
variance on grouped data.
7. Understand box and whisker plots, skewness, and
kurtosis.
8. Compute a coefficient of correlation and interpret it.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-3
Measures of Central
Tendency: Ungrouped Data
Measures of central tendency yield information
about “particular places or locations in a group of
numbers.”
Common Measures of Location
◦ Mode
◦ Median
◦ Mean
◦ Percentiles
◦ Quartiles
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-4
Mode
The most frequently occurring value in a data set
Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-5
Mode -- Example
35 41 44 45
37 41 44 46
The mode is 44.
There are more 44s 37 43 44 46
40 43 44 46
40 43 45 48
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-6
Median
•Middle value in an ordered array of numbers.
•Applicable for ordinal, interval, and ratio data
•Not applicable for nominal data
•Unaffected by extremely large and extremely
small values.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-7
Median: Computational
Procedure
First Procedure
◦ Arrange the observations in an ordered array.
◦ If there is an odd number of terms, the median is the
middle term of the ordered array.
◦ If there is an even number of terms, the median is
the average of the middle two terms.
Second Procedure
◦ The median’s position in an ordered array is given by
(n+1)/2.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-8
Median: Example
with an Odd Number of
Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-9
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-10
Arithmetic Mean
Commonly called ‘the mean’
is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including extreme
values
Computed by summing all values in the data set and
dividing the sum by the number of values in the data set
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-11
Population Mean
X X X X
1 2 3
... X N
N N
24 13 19 26 11
5
93
5
18. 6
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-12
Sample Mean
X
X X X X
1 2 3
... X n
n n
57 86 42 38 90 66
6
379
6
63.167
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-13
Percentiles
Measures of central tendency that divide a group of data
into 100 parts
At least n% of the data lie below the nth percentile, and at
most (100 - n)% of the data lie above the nth percentile
Example: 90th percentile indicates that at least 90% of the
data lie below it, and at most 10% of the data lie above it
The median and the 50th percentile have the same value.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-14
Percentiles: Computational Procedure
Organize the data into an ascending ordered array.
Calculate the
percentile location:
P
i (n)
100
Determine the percentile’s location and its value.
If i is a whole number, the percentile is the average of the
values at the i and (i+1) positions.
If i is not a whole number, the percentile is at the (i+1)
position in the ordered array.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-15
Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
Location of 30th
percentile:
30
i (8 ) 2. 4
100
The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the
whole number portion is 3; the 30th percentile is at the 3rd
location of the array; the 30th percentile is 13.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-16
Quartiles
Measures of central tendency that divide a group of
data into four subgroups
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
Q1 is equal to the 25th percentile
Q2 is located at 50th percentile and equals the median
Q3 is equal to the 75th percentile
Quartile values are not necessarily members of the
data set
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-17
Quartiles
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-18
Quartiles: Example
Ordered array: 106, 109, 114, 116, 121, 122, 125, 129
Q1
25 109 114
i (8 ) 2 Q1 111 . 5
Q2: 100 2
50 116 121
i (8 ) 4 Q2 118 . 5
Q3: 100 2
75 122 125
i (8 ) 6 Q3 123 . 5
100 2
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-19
Variability
No Variability in Cash Flow Mean
Mean
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-20
Variability
Variability
No Variability
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-21
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread or the
dispersion of a set of data.
Common Measures of Variability
◦ Range
◦ Interquartile Range
◦ Mean Absolute Deviation
◦ Variance
◦ Standard Deviation
◦ Z scores
◦ Coefficient of Variation
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-22
Range
The difference between the largest and
the smallest values in a set of data
Simple to compute
Ignores all data points except
the two extremes
Example:
Range
= Largest - Smallest
= 48 - 35 = 13
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-23
Interquartile Range
Range of values between the first and third quartiles
Range of the “middle half”
Less influenced by extremes
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-24
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean:
X
65
13
N 5
Deviations from the mean: -8, -4, 3, 4, 5
-4 +5
0 5
-8 10 15 20 +4
+3
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-25
Mean Absolute Deviation
Average of the absolute deviations from the mean
X X X X
M . A. D.
5 -8 +8 N
9 -4 +4 24
16 +3 +3
17 +4 +4
5
18 +5 +5 4. 8
0 24
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-26
Population Variance
Average of the squared deviations from the arithmetic mean
X X
X X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9 130
17 +4 16 5
18 +5 25 26.0
0 130
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-27
Population Standard
Deviation
X
Square root of the 2
variance
2
X X X
2
N
130
5 -8 64
9 -4 16 5
16 +3 9 26.0
17 +4 16
2
18 +5 25
0 130 26.0
51
.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-28
Sample Variance
Average of the squared deviations from the arithmetic mean
X X X XX X X
2
2
2
2,398 625 390,625 S
n 1
1,844 71 5,041
1,539 -234 54,756 663,866
1,311 -462 213,444 3
7,092 0 663,866 221,288.67
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-29
Sample Standard Deviation
X X
Square root of the sample variance 2
2
S
X X X XX n 1
2
663,866
2,398 625 390,625
1,844 71 5,041 3
1,539 -234 54,756 221,288.67
1,311 -462 213,444 2
7,092 0 663,866 S S
221,288.67
470.41
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-30
Uses of Standard Deviation
Indicator of financial risk
Quality Control
◦ construction of quality control charts
◦ process capability studies
Comparing populations
◦ household incomes in two cities
◦ employee absenteeism at two plants
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-31
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security
A 15% 3%
B 15% 7%
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-32
Empirical Rule
Data are normally distributed (or approximately normal)
1 68
2 95
3 99.7
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-33
Chebyshev’s Theorem
Applies to all distributions
1
P ( k X k ) 1 2
k
for k > 1
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-34
Chebyshev’s Theorem
Applies to all distributions
C .V . 100
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-36
Coefficient of Variation
1 29 2 84
1
4.6 2
10
. . 100
CV 1
1
CV
. . 2
2
100
1 2
4.6 10
100 100
29 84
1586
. 1190
.
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-37
Measures of Central Tendency
and Variability: Grouped Data
Measures of Central Tendency
◦ Mean
◦ Median
◦ Mode
Measures of Variability
◦ Variance
◦ Standard Deviation
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-38
Mean of Grouped Data
Weighted average of class midpoints
Class frequencies are the weights
fM
f
fM
N
f 1 M 1 f 2 M 2 f 3 M 3 f iM i
f 1 f 2 f 3 fi
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-39
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM
2150
43 . 0
f 50
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-40
Median of Grouped Data
N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-41
Median of Grouped Data --
Example
Cumulative N
cfp
Class Interval Frequency Frequency Md L 2 W
20-under 30 6 6 fmed
30-under 40 18 24 50
40-under 50 11 35 24
50-under 60 11 46 40 2 10
11
60-under 70 3 49
40.909
70-under 80 1 50
N = 50
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-42
Mode of Grouped Data
Midpoint of the modal class
Modal class has the greatest frequency
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-43
Variance and Standard Deviation
of Grouped Data
Population Sample
f M S MX
2 2
2 f
2
n 1
N
2
2 S S
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-44
Population Variance and
Standard Deviation of Grouped
Data
M M M
2
f fM
2
f
M
2 2
f 7200
144 12
2
144
N 50
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-45
Measures of Shape
Skewness
◦ Absence of symmetry
◦ Extreme values in one side of a distribution
Kurtosis
◦ Peakedness of a distribution
◦ Leptokurtic: high and thin
◦ Mesokurtic: normal shape
◦ Platykurtic: flat and spread out
Box and Whisker Plots
◦ Graphic display of a distribution
◦ Reveals skewness
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-46
Skewness
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-47
Skewness
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-48
Coefficient of Skewness
Summary measure for skewness
3 M d
S is negatively skewed (skewed to the left).
If S < 0, the distribution
(not skewed).
If S = 0, the distribution is symmetric
If S > 0, the distribution is positively skewed (skewed to the right).
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-49
Coefficient of Skewness
23
1
2
26 3
29
M 26
d1 M d2 26 M d3 26
12.3
1 2
12.3 3
12.3
3 M d1
3 2 M d2
3 3 M
d3
S S S
1
1
1
2
2
3
3
Leptokurtic
Mesokurtic
Platykurtic
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-51
Box and Whisker Plot
Five secific values are used:
◦ Median, Q2
◦ First quartile, Q1
◦ Third quartile, Q3
◦ Minimum value in the data set
◦ Maximum value in the data set
Inner Fences
◦ IQR = Q3 - Q1
◦ Lower inner fence = Q1 - 1.5 IQR
◦ Upper inner fence = Q3 + 1.5 IQR
Outer Fences
◦ Lower outer fence = Q1 - 3.0 IQR
◦ Upper outer fence = Q3 + 3.0 IQR
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-52
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-53
Skewness: Box and Whisker
Plots, and Coefficient of
Skewness
S<0 S=0 S>0
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-54
Pearson Product-Moment
Correlation Coefficient
SSXY
r
SSX SSY
X X Y Y
X X Y Y
2 2
X Y
XY n
X
2
Y 2
Y 2
1 r 1
X
2
n n
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-55
Three Degrees of Correlation
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-56
Computation of r for
the Economics Example (Part 1)
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-57
Computation of r
for the Economics Example (Part 2)
X Y
XY
n
r
X 2
Y
2
X n Y n
2 2
92.93 2725
21115
, .07
12
720.22 92 .
93
2
619,207 2725
2
12 12
.815
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-58
Scatter Plot and Correlation Matrix
for the Economics Example
245
240
Futures Index
235
230
225
220
7.40 7.60 7.80 8.00 8.20
Interest
BUSINESS STATISTICS, 4E, BY KEN BLACK. © 2003 JOHN WILEY & SONS. 3-59