Measurement of Variability
Measurement of Variability
Measurement of Variability
Table of Contents
Next Section
Look at the two data sets in Table 2.1 "Two Data Sets" and the graphical representation of
each, called a dot plot, in Figure 2.10 "Dot Plots of Data Sets".
The Range
The first measure of variability that we discuss is the simplest.
Definition
R=xmax−xmin
where xmax
is the smallest.
Example 10
Find the range of each data set in Table 2.1 "Two Data Sets".
Solution:
For Data Set I the maximum is 43 and the minimum is 38, so the range is R=43−38=5.
For Data Set II the maximum is 47 and the minimum is 33, so the range is R=47−33=14.
The range is a measure of variability because it indicates the size of the interval over which
the data points are distributed. A smaller range indicates less variability (less dispersion)
among the data, whereas a larger range indicates the opposite.
Definition
The sample variance of a set of n sample data is the number s2 defined by the formula
s2=Σ(x−x−−)2n−1
s2=Σx2−1n(Σx)2n−1
The sample standard deviation of a set of n sample data is the square root of the sample
variance, hence is the number s given by the formulas
s=Σ(x−x−−)2n−1−−−−−−−−−√=Σx2−1n(Σx)2n−1−−−−−−−−−−−−√
Although the first formula in each case looks less complicated than the second, the latter is
easier to use in hand computations, and is called a shortcut formula.
Example 11
Find the sample variance and the sample standard deviation of Data Set II in Table 2.1 "Two
Data Sets".
Solution:
To use the defining formula (the first formula) in the definition we first compute for each
observation x its deviation x−x−−
from the sample mean. Since the mean of the data is x−−=40
, we obtain the ten numbers displayed in the second line of the supplied table.
xx−x−−46637−340033−742236−440047734−6455
Then
Σ(x−x−−)2=62+(−3)2+02+(−7)2+22+(−4)2+02+72+(−6)2+52=224
so
s2=Σ(x−x−−)2n−1=2249=24.8–
and
s=24.8–−−−−√≈4.99
The student is encouraged to compute the ten deviations for Data Set I and verify that their
squares add up to 20, so that the sample variance and standard deviation of Data Set I are the
much smaller numbers s2=20/9=2.2–
and s=20∕9−−−−−√≈1.49.
Example 12
Find the sample variance and the sample standard deviation of the ten GPAs in Note 2.12
"Example 3" in Section 2.2 "Measures of Central Location".
1.903.002.533.712.121.762.711.394.003.33
Solution:
Since
Σx=1.90+3.00+2.53+3.71+2.12+1.76+2.71+1.39+4.00+3.33=26.45
and
Σx2==1.902+3.002+2.532+3.712+2.122+1.762+2.712+1.392+4.002+3.33276.7321
s2=Σx2−1n(Σx)2n−1=76.7321−(26.45)21010−1=6.771859=.752427–
and
s=.752427–−−−−−−−√≈.867
The sample variance has different units from the data. For example, if the units in the data set
were inches, the new units would be inches squared, or square inches. It is thus primarily of
theoretical importance and will not be considered further in this text, except in passing.
If the data set comprises the whole population, then the population standard deviation,
denoted σ (the lower case Greek letter sigma), and its square, the population variance σ2, are
defined as follows.
Definition
The population variance and population standard deviation of a set of N population data are
the numbers σ2 and σ defined by the formulas
Note that the denominator in the fraction is the full number of observations, not that number
reduced by one, as is the case with the sample standard deviation. Since most data sets are
samples, we will always work with the sample standard deviation and variance.
Finally, in many real-life situations the most important statistical issues have to do with
comparing the means and standard deviations of two data sets. Figure 2.11 "Difference
between Two Data Sets" illustrates how a difference in one or both of the sample mean and
the sample standard deviation are reflected in the appearance of the data set as shown by the
curves derived from the relative frequency histograms built using the data.
The range, the standard deviation, and the variance each give a quantitative answer to the
question “How variable are the data?”
Exercises
Basic
1. Find the range, the variance, and the standard deviation for the following sample.
1 2 3 4
Find the range, the variance, and the standard deviation for the following sample.
2 −3 6 0 3 1
Find the range, the variance, and the standard deviation for the following sample.
2 1 2 7
Find the range, the variance, and the standard deviation for the following sample.
−1 0 1 4 1 1
Find the range, the variance, and the standard deviation for the sample represented by
the data frequency table.
xf112271
Find the range, the variance, and the standard deviation for the sample represented by
the data frequency table.
xf−11011341
6.
Applications
7. Find the range, the variance, and the standard deviation for the sample of ten IQ
scores randomly selected from a school for academically gifted students.
132139162147133160145150148153
Find the range, the variance and the standard deviation for the sample of ten IQ scores
randomly selected from a school for academically gifted students.
142139152147138155145150148153
8.
Additional Exercises
xf26327428162912306312321
and Σx2=35,926.
1.
2. Use the information in part (a) to compute the sample mean and the sample standard
deviation.
xf13842208398456528
xf612788293101
A random sample of 49 invoices for repairs at an automotive body shop is taken. The
data are arrayed in the stem and leaf diagram shown. (Stems are thousands of dollars, leaves
are hundreds, so that for example the largest observation is 3,800.)
332211005050505460605068160518170638271644827482749479889
, Σx2=244,830,000.
Create a sample data set of size n = 3 for which the range is 0 and the sample mean is 2.
Create a sample data set of size n = 3 for which the sample variance is 0 and the sample
mean is 1.
has mean x−−=0 and standard deviation s = 1. Create a sample data set of size n = 3 for
which x−−=0
5−2614−3014325
18.
1. Compute the sample standard deviation of Data Set I.
2. Form a new data set, Data Set II, by adding 3 to each number in Data Set I.
Calculate the sample standard deviation of Data Set II.
3. Form a new data set, Data Set III, by subtracting 6 from each number in Data
Set I. Calculate the sample standard deviation of Data Set III.
4. Comparing the answers to parts (a), (b), and (c), can you guess the pattern?
State the general principle that you expect to be true.
19. Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
1. Compute the range and sample standard deviation of the 1,000 SAT scores.
2. Compute the range and sample standard deviation of the 1,000 GPAs.
20. Large Data Set 1 lists the SAT scores of 1,000 students.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
1. Regard the data as arising from a census of all students at a high school, in
which the SAT score of every student was measured. Compute the population
range and population standard deviation σ.
2. Regard the first 25 observations as a random sample drawn from this
population. Compute the sample range and sample standard deviation s and
compare them to the population range and σ.
3. Regard the next 25 observations as a random sample drawn from this
population. Compute the sample range and sample standard deviation s and
compare them to the population range and σ.
21. Large Data Set 1 lists the GPAs of 1,000 students.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
1. Regard the data as arising from a census of all freshman at a small college at
the end of their first academic year of college study, in which the GPA of
every such person was measured. Compute the population range and
population standard deviation σ.
2. Regard the first 25 observations as a random sample drawn from this
population. Compute the sample range and sample standard deviation s and
compare them to the population range and σ.
3. Regard the next 25 observations as a random sample drawn from this
population. Compute the sample range and sample standard deviation s and
compare them to the population range and σ.
22. Large Data Sets 7, 7A, and 7B list the survival times in days of 140 laboratory mice
with thymic leukemia from onset to death.
http://www.gone.2012books.lardbucket.org/sites/all/files/data7.xls
http://www.gone.2012books.lardbucket.org/sites/all/files/data7A.xls
http://www.gone.2012books.lardbucket.org/sites/all/files/data7B.xls
1. Compute the range and sample standard deviation of survival time for all
mice, without regard to gender.
2. Compute the range and sample standard deviation of survival time for the 65
male mice (separately recorded in Large Data Set 7A).
3. Compute the range and sample standard deviation of survival time for the 75
female mice (separately recorded in Large Data Set 7B). Do you see a
difference in the results for male and female mice? Does it appear to be
significant?
Answers
1. R = 3, s2 = 1.7, s = 1.3.
2.
3. R = 6, s2=7.3–
3. , s = 2.7.
4.
5. R = 6, s2 = 7.3, s = 2.7.
6.
9. x−−=28.55
, s = 1.3.
1. x−−=2063
, x˜=2000, mode=2000.
1.
2. R = 3400.
3. s = 869.
{1,1,1}
17.
18.
19.
1. R = 1350 and s = 212.5455
2. R = 4.00 and s = 0.7407
20.
21.
1. R = 4.00 and σ = 0.740375
2. R = 3.04 and s = 0.808045
3. R = 2.49 and s = 0.657843
22.
Previous Section
Table of Contents
Next Section