SBE12 CH 03 B

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

..

.. .
SLIDES BY
.. John Loucks
.
.. St. Edward’s
.. University

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 3, Part B
Descriptive Statistics: Numerical Measures
 Measures of Distribution Shape, Relative Location,
and Detecting Outliers
 Five Number Summaries and Box Plots
 Measures of Association Between Two Variables
 Data Dashboards: Adding Numerical Measures to
Improve Effectiveness

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
2
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
 Distribution Shape
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
3
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 An important measure of the shape of a distribution
is called skewness.
 The formula for the skewness of sample data is
3
n  xi  x 
Skewness   
(n  1)(n  2)  s 

 Skewness can be easily computed using statistical


software.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
4
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
5
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = - .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
6
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
7
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency

.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
8
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
9
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Example: Apartment Rents

.35 Skewness = .92


.30
Relative Frequency

.25

.20
.15

.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
10
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

xi  x
zi 
s

Excel’s STANDARDIZE function can be used to


compute the z-score.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
11
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores

 An observation’s z-score is a measure of the relative


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
12
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores

 Example: Apartment Rents


z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
13
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem

At least (1 - 1/z2) of the items in any data set will be


within z standard deviations of the mean, where z is
any value greater than 1.

Chebyshev’s theorem requires z > 1, but z need not


be an integer.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
14
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem

At least 75% of the data values must be


within z = 2 standard deviations of the mean.

At least 89% of the data values must be


within z = 3 standard deviations of the mean.

At least 94% of the data values must be


within z = 4 standard deviations of the mean.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
15
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem

 Example: Apartment Rents


Let z = 1.5 with x = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80 - 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
16
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

When the data are believed to approximate a


bell-shaped distribution …

The empirical rule can be used to determine the


percentage of data values that must be within a
specified number of standard deviations of the
mean.

The empirical rule is based on the normal


distribution, which is covered in Chapter 6.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
17
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

For data having a bell-shaped distribution:

68.26% of the values of a normal random variable


are within +/- 1 standard deviation of its mean.

95.44% of the values of a normal random variable


are within +/- 2 standard deviations of its mean.

99.72% of the values of a normal random variable


are within +/- 3 standard deviations of its mean.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
18
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

99.72%
95.44%
68.26%

m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
19
or duplicated, or posted to a publicly accessible website, in whole or in part.
Detecting Outliers

 An outlier is an unusually small or unusually large


value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
20
or duplicated, or posted to a publicly accessible website, in whole or in part.
Detecting Outliers

 Example: Apartment Rents


• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
21
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summaries
and Box Plots

Summary statistics and easy-to-draw graphs can be


used to quickly summarize large quantities of data.

Two tools that accomplish this are five-number


summaries and box plots.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
22
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
23
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summary

 Example: Apartment Rents


Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
24
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

A box plot is a graphical summary of data that is


based on a five-number summary.

A key to the development of a box plot is the


computation of the median and the quartiles Q1 and
Q3 .

Box plots provide another way to identify outliers.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
25
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

 Example: Apartment Rents


• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
26
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

 Limits are located (not drawn) using the interquartile


range (IQR).
 Data outside these limits are considered outliers.
 The locations of each outlier is shown with the
symbol * .
continued

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
27
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

 Example: Apartment Rents


• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
28
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

 Example: Apartment Rents


Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.

400 425 450 475 500 525 550 575 600 625

Smallest value Largest value


inside limits = 425 inside limits = 615
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
29
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.

Often a manager or decision maker is interested in


the relationship between two variables.

Two descriptive measures of the relationship


between two variables are covariance and correlation
coefficient.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
30
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance

The covariance is a measure of the linear association


between two variables.

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
31
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance

The covariance is computed as follows:

 ( xi  x )( yi  y ) for
sxy 
n 1 samples

 ( xi   x )( yi   y ) for
 xy  populations
N

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
32
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient

Correlation is a measure of linear association and not


necessarily causation.

Just because two variables are highly correlated, it


does not mean that one variable is the cause of the
other.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
33
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient

The correlation coefficient is computed as follows:


sxy  xy
rxy   xy 
sx s y  x y

for for
samples populations

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
34
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient

The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear


relationship.

Values near +1 indicate a strong positive linear


relationship.

The closer the correlation is to zero, the weaker the


relationship.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
35
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study


A golfer is interested in investigating the
relationship, if any, between driving distance and
18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
36
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study

x y ( xi  x ) ( y i  y ) ( xi  x )( y i  y )

277.6 69 10.65 -1.0 -10.65


259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
37
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study


• Sample Covariance
sxy 
 ( x  x )( y
i i  y)

35.40
  7.08
n1 61

• Sample Correlation Coefficient


sxy 7.08
rxy    -.9631
sx sy (8.2192)(.8944)

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
38
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Dashboards:
Adding Numerical Measures
to Improve Effectiveness
 Data dashboards are not limited to graphical displays.
 The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard
is often critical.
 Dashboards are often interactive.
 Drilling down refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
39
or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Chapter 3, Part B

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
40
or duplicated, or posted to a publicly accessible website, in whole or in part.

You might also like