SBE12 CH 03 B

..
.. .
SLIDES BY
.. John Loucks
.
.. St. Edward’s
.. University
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 3, Part B
Descriptive Statistics: Numerical Measures
 Measures of Distribution Shape, Relative Location,
and Detecting Outliers
 Five Number Summaries and Box Plots
 Measures of Association Between Two Variables
 Data Dashboards: Adding Numerical Measures to
Improve Effectiveness
Slide
2
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
 Distribution Shape
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers
Slide
3
Distribution Shape: Skewness
 An important measure of the shape of a distribution
is called skewness.
 The formula for the skewness of sample data is
3
n  xi  x 
Skewness   
(n  1)(n  2)  s 

 Skewness can be easily computed using statistical

software.
Slide
4
 Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide
5
 Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = - .31
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide
6
 Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide
7
 Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide
8
 Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide
9
.35 Skewness = .92

.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide
10
z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data

value xi is from the mean.
xi  x
zi 
s
Excel’s STANDARDIZE function can be used to

compute the z-score.
Slide
11
z-Scores
 An observation’s z-score is a measure of the relative

location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.
Slide
12
z-Scores

z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Slide
13
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be

within z standard deviations of the mean, where z is
any value greater than 1.
Chebyshev’s theorem requires z > 1, but z need not

be an integer.
Slide
14
At least 75% of the data values must be

within z = 2 standard deviations of the mean.


Slide
15

Let z = 1.5 with x = 490.80 and s = 54.74
At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

of the rent values must be between
x - z(s) = 490.80 - 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values

are between 409 and 573.)
Slide
16
Empirical Rule
When the data are believed to approximate a

bell-shaped distribution …
The empirical rule can be used to determine the

percentage of data values that must be within a
specified number of standard deviations of the
mean.
The empirical rule is based on the normal

distribution, which is covered in Chapter 6.
Slide
17
Empirical Rule
For data having a bell-shaped distribution:
68.26% of the values of a normal random variable

are within +/- 1 standard deviation of its mean.

are within +/- 2 standard deviations of its mean.

are within +/- 3 standard deviations of its mean.
Slide
18
Empirical Rule
99.72%
95.44%
68.26%
m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
Slide
19
Detecting Outliers
 An outlier is an unusually small or unusually large

value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set
Slide
20
Detecting Outliers

• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Slide
21
Five-Number Summaries
and Box Plots
Summary statistics and easy-to-draw graphs can be

used to quickly summarize large quantities of data.
Two tools that accomplish this are five-number

summaries and box plots.
Slide
22
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
Slide
23
Five-Number Summary

Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide
24
Box Plot
A box plot is a graphical summary of data that is

based on a five-number summary.
A key to the development of a box plot is the

computation of the median and the quartiles Q1 and
Q3 .
Box plots provide another way to identify outliers.
Slide
25
Box Plot

• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Slide
26
Box Plot
 Limits are located (not drawn) using the interquartile

range (IQR).
 Data outside these limits are considered outliers.
 The locations of each outlier is shown with the
symbol * .
continued
Slide
27
Box Plot

• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
• The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
• There are no outliers (values less than 325 or

greater than 645) in the apartment rent data.
Slide
28
Box Plot

Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value Largest value

inside limits = 425 inside limits = 615
Slide
29
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
Often a manager or decision maker is interested in

the relationship between two variables.
Two descriptive measures of the relationship

between two variables are covariance and correlation
coefficient.
Slide
30
Covariance
The covariance is a measure of the linear association

between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.
Slide
31
Covariance
The covariance is computed as follows:
 ( xi  x )( yi  y ) for
sxy 
n 1 samples
 ( xi   x )( yi   y ) for
 xy  populations
N
Slide
32
Correlation Coefficient
Correlation is a measure of linear association and not

necessarily causation.
Just because two variables are highly correlated, it

does not mean that one variable is the cause of the
other.
Slide
33
The correlation coefficient is computed as follows:

sxy  xy
rxy   xy 
sx s y  x y
for for
samples populations
Slide
34
The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear

relationship.
Values near +1 indicate a strong positive linear

relationship.
The closer the correlation is to zero, the weaker the

relationship.
Slide
35
Covariance and Correlation Coefficient
 Example: Golfing Study

A golfer is interested in investigating the
relationship, if any, between driving distance and
18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69
Slide
36
x y ( xi  x ) ( y i  y ) ( xi  x )( y i  y )
277.6 69 10.65 -1.0 -10.65

259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944
Slide
37

• Sample Covariance
sxy 
 ( x  x )( y
i i  y)

35.40
  7.08
n1 61
• Sample Correlation Coefficient

sxy 7.08
rxy    -.9631
sx sy (8.2192)(.8944)
Slide
38
Data Dashboards:
Adding Numerical Measures
to Improve Effectiveness
 Data dashboards are not limited to graphical displays.
 The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard
is often critical.
 Dashboards are often interactive.
 Drilling down refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.
Slide
39
End of Chapter 3, Part B
Slide
40

SBE12 CH 03 B

Uploaded by

Copyright:

Available Formats

SBE12 CH 03 B

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SBE12 CH 03 B

Uploaded by

Copyright:

Available Formats

..

 Skewness can be easily computed using statistical

.35 Skewness = .92

The z-score is often called the standardized value.

It denotes the number of standard deviations a data

Excel’s STANDARDIZE function can be used to

 An observation’s z-score is a measure of the relative

 Example: Apartment Rents

Standardized Values for Apartment Rents

At least (1 - 1/z2) of the items in any data set will be

Chebyshev’s theorem requires z > 1, but z need not

At least 75% of the data values must be

At least 89% of the data values must be

At least 94% of the data values must be

 Example: Apartment Rents

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

(Actually, 86% of the rent values

When the data are believed to approximate a

The empirical rule can be used to determine the

The empirical rule is based on the normal

For data having a bell-shaped distribution:

68.26% of the values of a normal random variable

95.44% of the values of a normal random variable

99.72% of the values of a normal random variable

 An outlier is an unusually small or unusually large

 Example: Apartment Rents

Standardized Values for Apartment Rents

Summary statistics and easy-to-draw graphs can be

Two tools that accomplish this are five-number

 Example: Apartment Rents

A box plot is a graphical summary of data that is

A key to the development of a box plot is the

Box plots provide another way to identify outliers.

 Example: Apartment Rents

 Limits are located (not drawn) using the interquartile

 Example: Apartment Rents

• The upper limit is located 1.5(IQR) above Q3.

• There are no outliers (values less than 325 or

 Example: Apartment Rents

Smallest value Largest value

Often a manager or decision maker is interested in

Two descriptive measures of the relationship

The covariance is a measure of the linear association

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.

The covariance is computed as follows:

Correlation is a measure of linear association and not

Just because two variables are highly correlated, it

The correlation coefficient is computed as follows:

The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear

Values near +1 indicate a strong positive linear

The closer the correlation is to zero, the weaker the

 Example: Golfing Study

 Example: Golfing Study

277.6 69 10.65 -1.0 -10.65

 Example: Golfing Study

• Sample Correlation Coefficient

You might also like