Chapter 4 Answers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Chapter 4 Displaying Quantitative Data 27

Chapter 4 – Displaying Quantitative Data

1. Statistics in print. Answers will vary.


2. Not a histogram. Answers will vary.
3. Thinking about shape.
a) The distribution of the number of speeding tickets each student in the senior class of a
college has ever had is likely to be unimodal and skewed to the right. Most students will
have very few speeding tickets (maybe 0 or 1), but a small percentage of students will
likely have comparatively many (3 or more?) tickets.
b) The distribution of player’s scores at the U.S. Open Golf Tournament would most likely be
unimodal and slightly skewed to the right. The best golf players in the game will likely
have around the same average score, but some golfers might be off their game and score 15
strokes above the mean. (Remember that high scores are undesirable in the game of golf!)
c) The weights of female babies in a particular hospital over the course of a year will likely
have a distribution that is unimodal and symmetric. Most newborns have about the same
weight, with some babies weighing more and less than this average. There may be slight
skew to the left, since there seems to be a greater likelihood of premature birth (and low
birth weight) than post-term birth (and high birth weight).
d) The distribution of the length of the average hair on the heads of students in a large class
would likely be bimodal and skewed to the right. The average hair length of the males
would be at one mode, and the average hair length of the females would be at the other
mode, since women typically have longer hair than men. The distribution would be
skewed to the right, since it is not possible to have hair length less than zero, but it is
possible to have a variety of lengths of longer hair.
4. More shapes.
a) The distribution of the ages of people at a Little League game would likely be bimodal and
skewed to the right. The average age of the players would be at one mode and the average
age of the spectators (probably mostly parents) would be at the other mode. The
distribution would be skewed to the right, since it is possible to have a greater variety of
ages among the older people, while there is a natural left endpoint to the distribution at
zero years of age.
b) The distribution of the number of siblings of people in your class is likely to be unimodal
and skewed to the right. Most people would have 0, 1, or 2 siblings, with some people
having more siblings.
c) The distribution of pulse rate of college-age males would likely be unimodal and
symmetric. Most males’ pulse rates would be around the average pulse rate for college-age
males, with some males having lower and higher pulse rates.
d) The distribution of the number of times each face of a die shows in 100 tosses would likely
be uniform, with around 16 or 17 occurrences of each face (assuming the die had six sides).
28 Part I Exploring and Understanding Data

5. Heart attack stays.


The distribution of the length of hospital stays of female heart attack patients is skewed to
the right, with stays ranging from 1 day to 36 days. The distribution is centered around 8
days, with the majority of the hospital stays lasting between 1 and 15 days. There are a
relatively few hospital stays longer than 27 days. Many patients have a stay of only one
day, possibly because the patient died.
6. Emails.
The distribution of the number of emails received from each student by a professor in a
large introductory statistics class during an entire term is skewed to the right, with the
number of emails ranging from 1 to 21 emails. The distribution is centered at about 2
emails, with many students only sending 1 email. There is one outlier in the distribution, a
student who sent 21 emails. The next highest number of emails sent was only 8.
7. Sugar in cereals.
a) The distribution of the sugar content of breakfast cereals is bimodal, with a cluster of
cereals with sugar content around 10% sugar and another cluster of cereals around 48%
sugar. The lower cluster shows a bit of skew to the right. Most cereals in the lower cluster
have between 0% and 10% sugar. The upper cluster is symmetric, with center around 45%
sugar.
b) There are two different types of breakfast cereals, those for children and those for adults.
The children’s cereals are likely to have higher sugar contents, to make them taste better (to
kids, anyway!). Adult cereals often advertise low sugar content.
8. Singers.
a) The distribution of the heights of singers in the chorus is bimodal, with a mode at around
65 inches and another mode around 71 inches. No chorus member has height below 60
inches or above 76 inches.
b) The two modes probably represent the mean heights of the male and female members of
the chorus.
9. Vineyards.
a) There is information displayed about 36 vineyards and it appears that 28 of the vineyards
are smaller than 60 acres. That’s around 78% of the vineyards. (75% would be a good
estimate!)
b) The distribution of the size of 36 Finger Lakes vineyards is skewed to the right. Most
vineyards are smaller than 75 acres, with a few larger ones, from 90 to 160 acres. One
vineyard was larger than all the rest, over 240 acres. The mode of the distribution is
between 0 and 30 acres.
Chapter 4 Displaying Quantitative Data 29

10. Run times.


The distribution of runtimes is skewed to the right. The shortest runtime was around 28.5
minutes and the longest runtime was around 35.5 minutes. A typical run time was
between 30 and 31 minutes, and the majority of runtimes were between 29 and 32 minutes.
It is easier to run slightly slower than usual and end up with a longer runtime than it is to
run slightly faster than usual and end up with a shorter runtime. This could account for
the skew to the right seen in the distribution.
11. Gasoline.
a) Gasoline Prices
in Ithaca
2.2 67
2.2
2.1 79
2.1 023
2.0 578889
2.0 344
Key:
2.1 | 9 = $2.189/gal

b) The distribution of gas prices is skewed to the right, centered around $2.10 per gallon, with
most stations charging between $2.05 and $2.13. The lowest and highest prices were $2.03
and $2.27.
c) There is a gap in the distribution of gasoline prices. There were no stations that charged
between $2.19 and $2.25.
12. The Great One.
a) Wayne Gretzsky –
Games played per season
8 000000122
7 8899
7 0344
6
6 4 Key:
5 7 | 8 = 78
5 games
4 58
4

b) The distribution of the number of games played by Wayne Gretzky is skewed to the left.
c) Typically, Wayne Gretzky played about 80 games per season. The number of games
played is tightly clustered in the upper 70s and low 80s.
d) Two seasons are low outliers, when Gretzky played fewer than 50 games. He may have
been injured during those seasons. Regardless of any possible reasons, these seasons were
unusual compared to Gretzky’s other seasons.
30 Part I Exploring and Understanding Data

13. Home runs.


The distribution of the number of homeruns hit by Mark McGwire during the 1986 – 2000
seasons is skewed to the right, with a typical number of homeruns per season in the 30s.
With the exception of 3 seasons in which McGwire hit fewer than 10 homeruns, his total
number of homeruns per season was between 22 and the maximum of 70.
14. Bird species.
Christmas Bird
a) The results of the 1999 Laboratory of Ornithology
Count Totals
Christmas Bird Count are displayed in the stem and leaf 1999
display at the right. This display uses split stems, to give
the display a bit more definition. The lower stem contains
leaves with digits 0,1,2,3,4 and the upper stem contains
leaves with digits 5,6,7,8,9.
b) The distribution of the number of birds spotted by
participants in the 1999 Laboratory of Ornithology
Christmas Bird Count is skewed right, with a center at
around 160 birds. There are several high outliers, with two
participants spotting 206 birds and another spotting 228.
With the exception of these outliers, most participants saw
between 152 and 186 birds. KEY:
18 | 6 = 186
species
spotted.

15. Home runs, again.


a) This is not a histogram. The horizontal axis should the number of home runs per year,
split into bins of a convenient width. The vertical axis should show the frequency; that is,
the number years in which McGwire hit a number of home runs within the interval of each
bin. The display shown is a bar chart/time plot hybrid that simply displays the data table
visually. It is of no use in describing the shape, center, spread, or unusual features of the
distribution of home runs hit per year by McGwire.
b) Mark McGwire’s
Home Runs
5
4
Frequency

3
2
1

0 20 40 60 80
Home runs per year
Chapter 4 Displaying Quantitative Data 31

16. Return of the birds.


a) This is not a histogram. The horizontal axis should split the number of counts from each
site into bins. The vertical axis should show the number of sites in each bin. The given
graph is nothing more than a bar chart, showing the bird count from each site as its own
bar. It is of absolutely no use for describing the shape, center, spread, or unusual features
of the distribution of bird counts.
b) Christmas Bird Count
8
Number of sites

150 170 190 210


Number of
species sighted

17. Horsepower.
The distribution of horsepower of cars reviewed by Consumer
Consumer Reports is nearly uniform. The lowest Reports
Horsepower
horsepower was 65 and the highest was 155. The center of
the distribution was around 105 horsepower.
18. Population growth.
The distribution of population growth in NE/MW states is
unimodal, symmetric and tightly clustered around 5%
growth. The distribution of population growth in S/W KEY:
11 | 5 =
states is much more spread out, with most states having 115 horsepower
population growth between 5% and 30%. A typical state
had about 15% growth. There were two outliers, Arizona
and Nebraska, with 40% and 66% growth, respectively. In
Generally, the growth rates in the S/W states were higher
OOOOOOOOOOOOOOOOO

and more variable than the rates in the NE/MW states.


OOOOOOOOOOOOOO

19. Hurricanes.
OOOOOOOOOOOO

a) A dotplot of the number of hurricanes each year from 1944


through 2000 is displayed. Each dot represents a year in
which there were that many hurricanes.
b) The distribution of the number of hurricanes per year is
OOOOO

OOOO

unimodal and skewed to the right, with center around 2


hurricanes per year. The number of hurricanes per year
OO

OO
O

ranges from 0 to 7. There are no outliers. There may be a


second mode at 5 hurricanes per year, but since there were
only 4 years in which 5 hurricanes occurred, it is unlikely
that this is anything other than natural variability.
32 Part I Exploring and Understanding Data

20. Hurricanes, again.


The distribution of the number of hurricanes per year Number of Hurricanes per Year
before 1970 is unimodal and skewed to the right. The
center of the distribution is about 2 to 3 hurricanes per 1944 – 1969 1970 – 2000
year. The number of hurricanes per year ranges from 0 to OO 0 OOO
7. After 1970, the distribution of the number of hurricanes OOO 1 OOOOOOOOOOO
per year is also unimodal and skewed right, with center OOOOOOOOO 2 OOOOOOOO
OOOOOO 3 OOOOOO
around 1 or 2 hurricanes per year. The number of
OO 4
hurricanes per year ranges from 0 to 6. There may be a OO 5 OO
difference in the number of hurricanes per year before and O 6 O
after 1970. Before 1970, there may have been a slightly O 7
greater number of hurricanes in a typical year.
21. Acid rain.
The distribution of the pH Acidity of Water Samples in
readings of water samples in Allegheny County, Penn.
Allegheny County, Penn. is
bimodal. A roughly uniform
cluster is centered around a
pH of 4.4. This cluster ranges
from pH of 4.1 to 4.9.
Another smaller, tightly
packed cluster is centered
around a pH of 5.6. Two
readings in the middle seem
to belong to neither cluster.
22. Marijuana. Percent of 9th Graders in Western
European Countries Who Have
The distribution of the percentage of Tried Marijuana
9th graders in 20 Western European
countries who have tried marijuana is
unimodal and skewed to the right.
Frequency

Greece, at 2%, has the lowest


percentage of 9th graders who have
tried marijuana. Scotland has the
highest percentage, at 53%. A typical
country might have a percentage of
between 10% and 15%.

23. Hospital stays.


a) The histograms of male and female hospital stay durations would be easier to compare if
the were constructed with the same scale, perhaps from 0 to 20 days
Chapter 4 Displaying Quantitative Data 33

b) The distribution of hospital stays for men is skewed to the right, with many men having
very short stays of about 1 or 2 days. The distribution tapers off to a maximum stay of
approximately 25 days. The distribution of hospital stays for women is skewed to the
right, with a mode at approximately 5 days, and tapering off to a maximum stay of
approximately 22 days. Typically, hospital stays for women are longer than those for men.
c) The peak in the distribution of women’s hospital stays can be explained by childbirth. This
time in the hospital increases the length of a typical stay for women, and not for men.
24. Deaths.
According to the National Vital Statistics report in 1999, there were several key differences
between the distributions of age at death for Black Americans and White Americans. The
distribution of age at death for Black Americans was skewed left, with a center at
approximately 65 to 75 years of age. There was a cluster of ages at death corresponding to
the very young. The distribution of age at death for White Americans was also skewed to
the left, although to a greater extent than the distribution of age at death for Black
Americans. White Americans had a center at approximately 75 to 85 years old at death,
roughly 10 years higher than Black Americans. Additionally, the cluster of ages at death
corresponding to the very young was much smaller for White Americans than for Blacks,
probably indicating a higher infant mortality rate for Black Americans.
25. Final grades.
The width of the bars is much too wide to be of much use. The distribution of grades is
skewed to the left, but not much more information can be gathered.
26. Cities.
a) The distribution of cost of living in 25 international cities is unimodal and skewed to the
right. The distribution is centered around $100, and spread out, with values ranging from
$60 to $180.
b) The most expensive city included here, Tokyo, does not appear to be an outlier. It seems to
be the end of a long tail.
27. Final grades revisited.
a) This display has a bar width that is much too narrow. As it is, the histogram is only
slightly more useful than a list of scores. It does little to summarize the distribution of final
exam scores.
b) The distribution of test scores is skewed to the left, with center at approximately 170 points.
There are several low outliers below 100 points, but other than that, the distribution of
scores is fairly tightly clustered.
28. Cities revisited.
a) The distribution of cost of living in 25 international cities now appears bimodal, with many
cities costing just under $100 and another smaller cluster around $140.
b) Either answer is OK. You have the right to decide that Tokyo is an outlier, or that it is not
sufficiently extreme. Certainly, this histogram makes Tokyo appear more extreme than the
histogram in Exercise 26.
34 Part I Exploring and Understanding Data

29. Zip codes.


Even though zipcodes are numbers, they are not quantitative in nature. Zipcodes are
categories. A histogram is not an appropriate display for categorical data. The histogram
the Holes R Us staff member displayed doesn’t take into account that some 5-digit
numbers do not correspond to zipcodes or that zipcodes falling into the same classes may
not even represent similar cities or towns. The employee could design a better display by
constructing a bar chart that groups together zipcodes representing areas with similar
demographics and geographic locations.
30. CEO data revisited.
a) First of all, it must be noted that industry codes are categorical, so the use of a histogram as
a display is inappropriate. Strangely enough, the investment analyst made even more
mistakes! This display is really just a poorly constructed bar chart (of sorts). There are
gaps in the display because all of the industry codes are integers and the widths of the bars
are all less than 1. With 5 bars between 0 and 3.75, we can find the width to be 0.75. So, for
example, there appears to be a gap between 2.25 and 3, simply because there are no
integers between 2.25 and 3 (remember, the upper boundary is not inclusive). The gaps
aren’t really there, but a poor choice of scale makes them appear.
b) This question doesn’t really make any sense. “Unimodal” is a vocabulary word specific to
describing distributions of quantitative data. As mentioned before, the industry codes are
categorical.
c) A histogram can never be used to summarize categorical data. The analyst would be better
off displaying the data in a bar chart, with relative heights of the bars representing the
number of CEOs involved in each industry, or a pie chart displaying the percentage of
CEOs involved in each industry.
31. Productivity study.
The productivity graph is useless without a horizontal scale indicating the time period over
which the productivity increased. Also, we don’t know the units in which productivity is
measured.
32. Productivity revisited.
The display of productivity and wages display has no scale on either axis and no units are
indicated. For the vertical axis, it is unlikely that wages and productivity can be measured
meaningfully in the same units, so the two are almost certainly incomparable. Also, we
don’t know the time scale (on the horizontal axis) over which productivity and wages were
measured. In fact, given the problems on the vertical axis, it is not even apparent that the
horizontal axis has comparable time periods for wages and productivity.
Chapter 4 Displaying Quantitative Data 35

33. Law enforcement.


a) The histograms at the right
show the number of assaults
per 1000 officers and number of

Frequency
killed or injured per 1000
officers, for eleven federal

Frequency
agencies with officers
authorized to carry firearms
and make arrests.
b) The distribution of assault rates
for these federal agencies is Assaults Killed – Injured
(per 1000) (per 1000)
roughly symmetric with two
high outliers. The center of the
distribution is between 5 and 10 assaults per 1000 officers and, with the exception of two
agencies, BATF and National Park Service, with over 30 assaults per 100 officers, the
distribution is tightly clustered. The distribution of killed and injured rates for these
eleven law enforcement agencies is very tightly clustered with almost all of the agencies
reporting rates of between 0 and 5 officers per 1000 killed and injured. Customs Service,
with a rate of 5.1 officers per 1000, is essentially part of this group, as well. There is one
high outlier, the National Park Service, with a rate of 15 officers per 1000 killed and injured.
c) The National Park Service is a high outlier for assaults and killed-injured, with rates of 38.7
officers per 1000 assaulted and 15 officers per 1000 killed or injured. The BATF is a high
outlier for assaults, with 31.1 officers per 1000 assaulted.
34. Cholesterol.
The distribution of cholesterol levels for smokers
is unimodal and skewed slightly to the right, with
Frequency

a mode around 210. Cholesterol levels vary from


approximately 140 to 350, but are generally
clustered between 200 and 300. There is one low
cholesterol level and one high cholesterol level,
but these don’t seem to depart from the overall
pattern.
The distribution of cholesterol levels for non-
smokers is unimodal and roughly symmetric,
with a center around 240. Cholesterol levels vary
from approximately 120 to 340, and seem spread
Frequency

out. There is one low cholesterol level, but not


unusually low.
In general, the cholesterol levels of smokers seem
to be slightly lower than the cholesterol levels of
non-smokers. Additionally, the cholesterol levels
of smokers appear more consistent than
cholesterol levels of non-smokers.
36 Part I Exploring and Understanding Data

35. MPG.
a) A back-to-back stemplot of these data is shown US Cars Other Cars
at the right. A plot with with tens and units
digits for stems and tenths for leaves would
have been quite long, but still useable. A plot
with tens as stems and rounded units as leaves
would have been too compact. This plot has
tens as the stems, but the stems are split 5 ways.
The uppermost 2 stem displays 29 and 28, the KEY:
2 | 5 = 24.5 – 25.4 mpg
next 2 stem displays 27 and 26, and so on. The
key indicates the rounding used, as well as the
accuracy of the original data. In this case, the
mileages were given to the nearest tenth.
b) In general, the Other cars got better gas mileage than the US Cars, although both
distributions were highly variable. The distribution of US cars was bimodal and skewed to
the right, with many cars getting mileages in the high teens and low twenties, and another
group of cars whose mileages were in the high twenties and low thirties. Two high outliers
had mileages of 34 miles per gallon. The distribution of Other cars, in contrast, was
bimodal and skewed to the left. Most cars had mileages in the high twenties and thirties,
with a small group of cars whose mileages were in the low twenties. Two low outliers had
mileages of 16 and 17 miles per gallon.
36. Baseball.
American National
a) The back-to-back stemplot shown at the right has
split stems to show the distribution in a bit more
detail than a stemplot with single stems.
b) The distribution of number of runs per game in
stadiums in the American League is unimodal and
slightly skewed to the right, clustered
predominantly in the interval of 9 to 10 runs per
KEY:
game. In the National League, the number of runs 10 | 3 = 10.3
scored per game is distributed symmetrically and runs per game
is possibly bimodal, with clusters in the high 9s
and low 10s and also in the low 8s. There are two
high outliers of 11.6 and 14 runs per game. The
number of runs scored per game is generally
higher and more consistent in the American
League.
c) The 14 runs per game scored at Coors Field is an outlier in the National League data for the
first half of the 2001 season. There appear to be more runs per game scored there than in
other Major League Stadiums.
Chapter 4 Displaying Quantitative Data 37

37. Nuclear power.


a) The stemplot at the right shows the distribution of Construction Cost
construction costs, measured in $1000 per mW of 12 ($1000/mW)
nuclear power generators. 8 0148
7 9
b) The distribution of nuclear power plant construction 6 023 Key:
costs is skewed left, and has several modes, with gaps 5 6 7|9 = $79000
in between. Several plants had costs near $80,000 per 4 per mW
mW, another few had costs near $60,000 per mW and 3 25
2 8
several had costs near $30,000 per mW.
c) The timeplot at
the right shows
the change in Nuclear Plant
nuclear plant Construction Costs
construction costs
over time.

d) From the timeplot, it is apparent that the nuclear plant construction costs generally
increased over time.
38. Drunk driving.
a) The stemplot (near right) Drunk Driving
shows the distribution Deaths
Drunk Driving
of drunk driving
Deaths
deaths.
b) The timeplot (far right)
shows the change in
drunk driving deaths
over time.
KEY:
22|4 = 22,400
deaths

c) The distribution of the number of drunk driving deaths is bimodal, with a cluster between
22 and 25 thousand deaths and another cluster between 16 and 17 thousand deaths. The
timeplot shows that this corresponds to a rapid decrease in the drunk driving deaths in the
early nineties. Previously, the number of deaths was high, then decreased dramatically.
38 Part I Exploring and Understanding Data

39. Assets.
a) The distribution of assets of 79 companies chosen from the Forbes list of the nation’s top
corporations is skewed so heavily to the right that the vast majority of companies have
assets represented in the first bar of the histogram, 0 to 10 billion dollars. This makes
meaningful discussion of center and spread impossible.
b) Re-expressing these data by, for example, logs or square roots might help make the
distribution more nearly symmetric. Then a meaningful discussion of center might be
possible.

40. Music Library.


a) The distribution of the number of songs students had in their digital libraries is extremely
skewed to the right. That makes it difficult to determine a center. The typical number of
songs in a library is probably in the first bar of the histogram.
b) Re-expressing these data by, for example, logs or square roots might help make the
distribution more nearly symmetric. Then a meaningful discussion of center might be
possible.

41. Assets again.


a) The distribution of logarithm of assets is preferable, because it is roughly unimodal and
symmetric. The distribution of the square root of assets is still skewed right, with outliers.
b) If Assets = 50 , then the companies assets are approximately 50 2 = 2500 million dollars.
c) If log( Assets) = 3 , then the companies assets are approximately 10 3 = 1000 million dollars.

42. Rainmakers.
a) Since one acre-foot is about 320,000 gallons, these
numbers are more manageable than gallons.
b) The distribution of rainfall from 26 clouds seeded with
Frequency

silver iodide is skewed heavily to the right, with the


vast majority of clouds producing less than 500 acre-
feet of rain. Several clouds produced more, with a
maximum of 2745 acre-feet.
Chapter 4 Displaying Quantitative Data 39

c) The distribution of log (base 10) of rainfall is much


more symmetric than the distribution of rainfall. We
can see that the center of the distribution is around
log 2 – log 2.5 acre-feet.

Frequency
d) Since the reexpressed scale is measured in log (acre-
feet), we need to raise 10 to the power of the number
on our scale to convert back to acre feet. For example,
if a cloud in the new scale has a log (rainfall) of 2.3,
we convert back to rainfall as follows:

log(rainfall) = 2.3
rainfall = 102.3 Log(Rainfall)
rainfall = 199.5

The cloud produced 199.5 acre-feet of rain.

You might also like