Chapter 4 Answers
Chapter 4 Answers
Chapter 4 Answers
b) The distribution of gas prices is skewed to the right, centered around $2.10 per gallon, with
most stations charging between $2.05 and $2.13. The lowest and highest prices were $2.03
and $2.27.
c) There is a gap in the distribution of gasoline prices. There were no stations that charged
between $2.19 and $2.25.
12. The Great One.
a) Wayne Gretzsky –
Games played per season
8 000000122
7 8899
7 0344
6
6 4 Key:
5 7 | 8 = 78
5 games
4 58
4
b) The distribution of the number of games played by Wayne Gretzky is skewed to the left.
c) Typically, Wayne Gretzky played about 80 games per season. The number of games
played is tightly clustered in the upper 70s and low 80s.
d) Two seasons are low outliers, when Gretzky played fewer than 50 games. He may have
been injured during those seasons. Regardless of any possible reasons, these seasons were
unusual compared to Gretzky’s other seasons.
30 Part I Exploring and Understanding Data
3
2
1
0 20 40 60 80
Home runs per year
Chapter 4 Displaying Quantitative Data 31
17. Horsepower.
The distribution of horsepower of cars reviewed by Consumer
Consumer Reports is nearly uniform. The lowest Reports
Horsepower
horsepower was 65 and the highest was 155. The center of
the distribution was around 105 horsepower.
18. Population growth.
The distribution of population growth in NE/MW states is
unimodal, symmetric and tightly clustered around 5%
growth. The distribution of population growth in S/W KEY:
11 | 5 =
states is much more spread out, with most states having 115 horsepower
population growth between 5% and 30%. A typical state
had about 15% growth. There were two outliers, Arizona
and Nebraska, with 40% and 66% growth, respectively. In
Generally, the growth rates in the S/W states were higher
OOOOOOOOOOOOOOOOO
19. Hurricanes.
OOOOOOOOOOOO
OOOO
OO
O
b) The distribution of hospital stays for men is skewed to the right, with many men having
very short stays of about 1 or 2 days. The distribution tapers off to a maximum stay of
approximately 25 days. The distribution of hospital stays for women is skewed to the
right, with a mode at approximately 5 days, and tapering off to a maximum stay of
approximately 22 days. Typically, hospital stays for women are longer than those for men.
c) The peak in the distribution of women’s hospital stays can be explained by childbirth. This
time in the hospital increases the length of a typical stay for women, and not for men.
24. Deaths.
According to the National Vital Statistics report in 1999, there were several key differences
between the distributions of age at death for Black Americans and White Americans. The
distribution of age at death for Black Americans was skewed left, with a center at
approximately 65 to 75 years of age. There was a cluster of ages at death corresponding to
the very young. The distribution of age at death for White Americans was also skewed to
the left, although to a greater extent than the distribution of age at death for Black
Americans. White Americans had a center at approximately 75 to 85 years old at death,
roughly 10 years higher than Black Americans. Additionally, the cluster of ages at death
corresponding to the very young was much smaller for White Americans than for Blacks,
probably indicating a higher infant mortality rate for Black Americans.
25. Final grades.
The width of the bars is much too wide to be of much use. The distribution of grades is
skewed to the left, but not much more information can be gathered.
26. Cities.
a) The distribution of cost of living in 25 international cities is unimodal and skewed to the
right. The distribution is centered around $100, and spread out, with values ranging from
$60 to $180.
b) The most expensive city included here, Tokyo, does not appear to be an outlier. It seems to
be the end of a long tail.
27. Final grades revisited.
a) This display has a bar width that is much too narrow. As it is, the histogram is only
slightly more useful than a list of scores. It does little to summarize the distribution of final
exam scores.
b) The distribution of test scores is skewed to the left, with center at approximately 170 points.
There are several low outliers below 100 points, but other than that, the distribution of
scores is fairly tightly clustered.
28. Cities revisited.
a) The distribution of cost of living in 25 international cities now appears bimodal, with many
cities costing just under $100 and another smaller cluster around $140.
b) Either answer is OK. You have the right to decide that Tokyo is an outlier, or that it is not
sufficiently extreme. Certainly, this histogram makes Tokyo appear more extreme than the
histogram in Exercise 26.
34 Part I Exploring and Understanding Data
Frequency
killed or injured per 1000
officers, for eleven federal
Frequency
agencies with officers
authorized to carry firearms
and make arrests.
b) The distribution of assault rates
for these federal agencies is Assaults Killed – Injured
(per 1000) (per 1000)
roughly symmetric with two
high outliers. The center of the
distribution is between 5 and 10 assaults per 1000 officers and, with the exception of two
agencies, BATF and National Park Service, with over 30 assaults per 100 officers, the
distribution is tightly clustered. The distribution of killed and injured rates for these
eleven law enforcement agencies is very tightly clustered with almost all of the agencies
reporting rates of between 0 and 5 officers per 1000 killed and injured. Customs Service,
with a rate of 5.1 officers per 1000, is essentially part of this group, as well. There is one
high outlier, the National Park Service, with a rate of 15 officers per 1000 killed and injured.
c) The National Park Service is a high outlier for assaults and killed-injured, with rates of 38.7
officers per 1000 assaulted and 15 officers per 1000 killed or injured. The BATF is a high
outlier for assaults, with 31.1 officers per 1000 assaulted.
34. Cholesterol.
The distribution of cholesterol levels for smokers
is unimodal and skewed slightly to the right, with
Frequency
35. MPG.
a) A back-to-back stemplot of these data is shown US Cars Other Cars
at the right. A plot with with tens and units
digits for stems and tenths for leaves would
have been quite long, but still useable. A plot
with tens as stems and rounded units as leaves
would have been too compact. This plot has
tens as the stems, but the stems are split 5 ways.
The uppermost 2 stem displays 29 and 28, the KEY:
2 | 5 = 24.5 – 25.4 mpg
next 2 stem displays 27 and 26, and so on. The
key indicates the rounding used, as well as the
accuracy of the original data. In this case, the
mileages were given to the nearest tenth.
b) In general, the Other cars got better gas mileage than the US Cars, although both
distributions were highly variable. The distribution of US cars was bimodal and skewed to
the right, with many cars getting mileages in the high teens and low twenties, and another
group of cars whose mileages were in the high twenties and low thirties. Two high outliers
had mileages of 34 miles per gallon. The distribution of Other cars, in contrast, was
bimodal and skewed to the left. Most cars had mileages in the high twenties and thirties,
with a small group of cars whose mileages were in the low twenties. Two low outliers had
mileages of 16 and 17 miles per gallon.
36. Baseball.
American National
a) The back-to-back stemplot shown at the right has
split stems to show the distribution in a bit more
detail than a stemplot with single stems.
b) The distribution of number of runs per game in
stadiums in the American League is unimodal and
slightly skewed to the right, clustered
predominantly in the interval of 9 to 10 runs per
KEY:
game. In the National League, the number of runs 10 | 3 = 10.3
scored per game is distributed symmetrically and runs per game
is possibly bimodal, with clusters in the high 9s
and low 10s and also in the low 8s. There are two
high outliers of 11.6 and 14 runs per game. The
number of runs scored per game is generally
higher and more consistent in the American
League.
c) The 14 runs per game scored at Coors Field is an outlier in the National League data for the
first half of the 2001 season. There appear to be more runs per game scored there than in
other Major League Stadiums.
Chapter 4 Displaying Quantitative Data 37
d) From the timeplot, it is apparent that the nuclear plant construction costs generally
increased over time.
38. Drunk driving.
a) The stemplot (near right) Drunk Driving
shows the distribution Deaths
Drunk Driving
of drunk driving
Deaths
deaths.
b) The timeplot (far right)
shows the change in
drunk driving deaths
over time.
KEY:
22|4 = 22,400
deaths
c) The distribution of the number of drunk driving deaths is bimodal, with a cluster between
22 and 25 thousand deaths and another cluster between 16 and 17 thousand deaths. The
timeplot shows that this corresponds to a rapid decrease in the drunk driving deaths in the
early nineties. Previously, the number of deaths was high, then decreased dramatically.
38 Part I Exploring and Understanding Data
39. Assets.
a) The distribution of assets of 79 companies chosen from the Forbes list of the nation’s top
corporations is skewed so heavily to the right that the vast majority of companies have
assets represented in the first bar of the histogram, 0 to 10 billion dollars. This makes
meaningful discussion of center and spread impossible.
b) Re-expressing these data by, for example, logs or square roots might help make the
distribution more nearly symmetric. Then a meaningful discussion of center might be
possible.
42. Rainmakers.
a) Since one acre-foot is about 320,000 gallons, these
numbers are more manageable than gallons.
b) The distribution of rainfall from 26 clouds seeded with
Frequency
Frequency
d) Since the reexpressed scale is measured in log (acre-
feet), we need to raise 10 to the power of the number
on our scale to convert back to acre feet. For example,
if a cloud in the new scale has a log (rainfall) of 2.3,
we convert back to rainfall as follows:
log(rainfall) = 2.3
rainfall = 102.3 Log(Rainfall)
rainfall = 199.5