Measures of Dispersion

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Measures of Dispersion

The Range
In the preceding section we introduced three types of average values for a data
set- the mean, the median and the mode. Some characteristics of a set of data may not be
evident from an examination of averages. For instance, consider a soft-drink dispensing
machine that should dispense 8 oz of your selection into a cup. Table 4.5 shows data for
two of these machines.

Machine 1 Machine 2

Table 4.5 9.52 8.01

Soda Dispensed (ounces) 6.41 7.99

10.07 7.95

5.85 8.03

8.15 8.02

x̄ = 8.0 x̄ = 8.0
The mean data value for each machine is 8 oz. However look at the variation
in data values for machine 1. The quantity of soda dispensed is very inconsistent- in
some cases soda overflows the cup, and in other cases too little soda is dispensed. The
machine obviously needs adjustment. Machine 2, on the other hand, is working just
fine. The quantity dispensed is very consistent with little variation.

This example shows that average values do not reflect the spread or
dispersion of data. To measure the spread or dispersion of data, we must introduce
statistical values known as the range and the standard deviation.

Range

The range of a set of data values is the difference between the greatest data value
and the least data value.
Example: Find a Range
Find the range of the number of ounces dispensed by machine 1 in table 4.5.

Solution:

The greatest number of ounces dispensed is 10.07 and the least is 5.85. The
range of the number of ounces is 10.07 - 5.85 = 4.22 oz
The Standard Deviation
The range of a set of data is easy to compute, but it can be deceiving. The range
is a measure that depends only on the two most extreme values, and as such it is very
sensitive. A measure of dispersion that is less sensitive to extreme value is the standard
deviation. The standard deviation of a set of numerical data makes use of the amount by
which each individual data value deviates from the mean. These deviations represented by
( x - x̄ ), are positive when the data value x is greater than the mean x̄ and are negative
when x is less than the mean x̄ . The sum of all the deviations ( x - x̄ ) is 0 for all sets of
data. This is shown in table 4.6 for the machine 2 data of table 4.5.
Table 4.6

Machine 2: Deviations from the mean

x x - x̄

8.01 8.01 - 8 = 0.01

7.99 7.99 - 8 = -0.01

7.95 7.95 - 8 = -0.05

8.03 8.03 - 8 = 0.03

8.02 8.02 - 8 = 0.02

Sum of deviations = 0
Because the sum of all the deviations of the data values from the mean is always
0, we cannot use the sum of the deviations as a measure of dispersion for a set of data.
Instead, the standard deviation uses the sum of the squares of the deviations.

Standard Deviations for Populations and Samples

If x1, x2, x3, … , xn is a population of n numbers with a mean of μ, then the


Standard Deviation of the population is:

Σ( x - μ )2
𝝈
n
If x1, x2, x3, … , xn is a sample of n numbers with a mean of x̄ , then the Standard
Deviation of the sample is:
Σ( x - x̄ )2
s
n-1

You may question why a denominator of n - 1 is used instead of n when we


compute a sample standard deviation. The reason is that a sample standard deviation is
often used to estimate the population standard deviation, and it can be shown
mathematically that the use of n - 1 tends to yield better estimate.
Most statistical applications involve a sample rather than a population, which is
the complete set of data values. Sample standard deviations are designated by a lowercase
letter s. In those cases in which we do work with a population, we designate the standard
deviation of the population by 𝝈 which is the lowercase greek letter sigma. We can use the
following procedure to calculate the standard deviation of n numbers.

Procedure for Computing a Standard Deviation

1. Determine the mean of the n numbers.


2. For each number, calculate the deviation (difference) between the number and the
mean of the numbers.
3. Calculate the square of each deviation and find the sum of these squared deviations.
4. If the data is a population, then divide the sum by n. If the data is a sample, then
divide the sum by n-1.
5. Find the square root of the quotient in step 4.
Example: Find the Standard Deviation
The following numbers are obtained by sampling a population.

2, 4, 7, 12, 15

Find the standard deviation of the sample.

Solution:

Step 1: The mean of the number is

2 + 4 + 7 + 12 + 15 40
x̄ 5 8
5
Step 2: For each number, calculate the deviation between the number and the mean.

x x - x̄

2 2 - 8 = -6

4 4 - 8 = -4

7 7 - 8 = -1

12 12 - 8 = 4

15 15 - 8 = 7
Step 3: Calculate the square of each deviation in step 2, and find the sum of these squared
deviations.

x x - x̄ ( x - x̄ )2

2 2 - 8 = -6 ( -6 )2 = 36

4 4 - 8 = -4 ( -4 )2 = 16

7 7 - 8 = -1 ( -1 )2 = 1

12 12 - 8 = 4 42 = 16

15 15 - 8 = 7 72 = 49

Sum of the squared deviations = 118


Step 4: Because we have a sample of n = 5 values, divide the sum 118 by n - 1, which is 4.

118
4 29.5
Step 5: The standard deviation of the sample is s = √29.5. To the nearest hundredth, the
standard deviation is s = 5.43
In the next example we use standard deviation to determine which company
produces batteries that are most consistent with regard to their life expectancy.

Example: Use Standard Deviations


A consumer group has tested a sample of 8 size-D batteries from each of 3
computers. The results of the tests are shown in the following table. According to the tests,
which company produces batteries for which of the values representing hours of constant
use have the smallest standard deviation.

Company Hours of Constant use per battery

EverSoBright 6.2, 6.4, 7.1, 5.9, 8.3, 5.3, 7.5, 9.3

Dependable 6.8, 6.2,7.2, 5.9, 7.0, 7.4, 7.3, 8.2

Beacon 6.1, 6.6, 7.3, 5.7, 7.1, 7.6, 7.1, 8.5


Solution:

The mean for each sample of batteries is 7h.

The batteries from EverSoBright have a standard deviation of:

( 6.2 - 7 )2 + ( 6.4 - 7 )2 + …….+ ( 9.3 - 7 )2


s1
7

12.34
1.328h
7
The batteries from Dependable have a standard deviation of:

( 6.8 - 7 )2 + ( 6.2 - 7 )2 + …….+ ( 8.2 - 7 )2


s2
7

3.62
0.719h
7

The batteries from Beacon have a standard deviation of:


( 6.1 - 7 )2 + ( 6.6 - 7 )2 + …….+ ( 8.5 - 7 )2
s2
7

5.38
0.877h
7
The batteries from Dependable have the smallest standard deviation. According
to these results, the Dependable company produces the most consistent batteries with
regard to life expectancy under constant use.
The Variance
A statistic known as the variance is also used as a measure of dispersion. The
variance for a given set of data is the square of the standard deviation of the data. The
term variance refers to a statistical measurement of the spread between numbers in a data
set. More specifically, variance measures how far each number in the set is from the mean
and thus from every other number in the set

Notations for Standard Deviation and Variance:

𝝈 is the standard deviation of the population

𝝈2 is the variance of a population

s is the standard deviation of a sample

s2 is the variance of a sample


Example: Find the Variance
Find the variance for the sample given in example 2.

Solution:

In example 2, we found s = √29.5. The variance is the square of the standard


deviation. Thus the variance is s2 = ( √29.5 )2 = 29.5.

Although the variance of a set of data is an important measure of dispersion, it


has a disadvantage that is not shared by the standard deviation: the variance does not have
the same unit of measure as the original data. For instance, if a set of data consists of times
measured in hours, then the variance of the data will be measured is square hours. The
standard deviation of this data set is the square root of the variance, and as such it is
measured in hours, which is a more intuitive unit of measure.
A Geometric View of Variance and Standard Deviation

The following geometric explanation of the variance and standard deviation of a


set of data is designed to provide you with a deeper understanding of these important
concepts.

Consider the data x1, x2, …, xn, which are arranged in ascending order. The
average, or mean, of this data is

and the variance is:


Σxi
μ n Σ( xi - μ )2
𝝈2 n
In the last formula, each term ( xi - μ )2 can be pictured as the area of a square
whose sides are of length | xi - μ |, the distance between the ith data value and the mean.
We will refer to these squares as tiles, denoting by T i the area of the tile associated with
the data value xi. Thus
ΣTi
𝝈 2
n
Which means that the variance may be thought of as the area of the average-
sized tile and the standard deviation as the length of a side of this averaged-sized tile. By
drawing the tiles associated with the data set, you can visually estimate an average-sized
tile, and thus you can roughly approximate the variance and standard deviation.
A typical data set, with its associated tiles and average-sized tiles
These geometric representations of variance and standard deviation enable us to
visualize how these values are used as measures of the dispersion of a set of data. If all of
the data are bunched up near the mean, it is clear that the average-sized tile will be small
and, consequently, so will its side length, which represents the standard deviation. But if
even a small portion of the data lies far from the mean, the average-sized tile may be
rather large, and thus its side length will also be large.

You might also like