In order to draw general conclusions, such as the one above, information must
be gathered, organized, and displayed clearly.
population—refers to When information is gathered from all people in a population, the activity
the entire group about is called a census. For example, every five years, Statistics Canada takes a
which data are being
census of the population. The first census of the millennium was on May 15, 2001.
The results from that census were released beginning in the spring of 2002.
data—information A poll (or opinion survey) is a method of collecting data from a sample
providing the basis of a of a population by asking people to give their answers to a set of questions.
discussion from which
conclusions may be drawn; Once collected, the data are then organized in a meaningful way so that valid
data often take the form conclusions can be made.
of numbers that can be
displayed graphically or in
a table
Example 1 Organizing Data: Frequency Tables
sample—part of a The members of a Grade 12 class were asked on what day of the week they were
population that is selected
to gain information about born. The results were as follows:
the whole population Monday, Tuesday, Wednesday, Thursday, Monday, Friday, Friday, Tuesday,
frequency—the number Thursday, Wednesday, Saturday, Friday, Tuesday, Wednesday, Saturday,
of times an event occurs or Monday, Wednesday, Wednesday, Thursday, Thursday, Tuesday,
the number of items in a
given category
Wednesday, Tuesday, Thursday, Tuesday, Thursday, Saturday, Tuesday,
Sunday, Monday
frequency table—a (a) Organize the data in a frequency table.
table listing a variable
together with the (b) How many students responded to the question?
frequency of each value (c) What percent of the students were born on weekends?
Day Tally Frequency
Monday 4
Tuesday 7
Wednesday 6
Thursday 6
Friday 3
Saturday 3
Sunday 1
(a) The set of numbers is rewritten, with each of the hundreds and tens digits
becoming a stem and the units digits becoming the leaves. The leaves are
entered in numerical order to produce a stem-and-leaf plot.
Stem Leaf
12 022
13 066
This branch
represents the 14 39
numbers 143 and
149. 15 23566677
16 034447
The units digits
17 12244566 are called the
18 11112344466666678
The hundreds and
tens digits are 19 023
called the stems.
class interval—a (b) A reasonable class interval for this data Class Interval Frequency
category or division used is a spread of 10 units. Given that the
for grouping a set of 120–129 3
smallest value is 120 and the largest value
is 193, the intervals to best display this 130–139 3
data are 120–129, 130–139, and so on.
140–149 2
(c) In the stem-and-leaf plot, individual items
150–159 8
were listed. In the frequency table in
part (b), items were grouped into class 160–169 6
intervals. 170–179 8
(d) Using individual items would create a
180–189 17
table with the data so spread out that it
would become difficult to view any trends. 190–199 3
Tables are used to organize data; however, graphs are used to display data in a
more meaningful way. A bar graph consists of parallel bars of equal widths with
lengths proportional to the frequency of the variables they represent. A bar graph
is used to represent nominal data, such as days of the week. Typically, bar graphs
are used for discrete data. Look at the example on the top of the next page.
Give thoughtful 6
consideration to Number of
Students 4
the type of graph
that will best 2
display your data. 0
Day of the Week
Number of 10
Students 8
This shows that 120 130 140 150 160 170 180 190 200
all other values
are assumed to Height (cm)
be included.
? Think about
TV Viewing Habits
Television Viewing Habits of Canadians
Calculate the angle for each sector, as shown in the following table.
Walk 15 135°
Bus 9 81°
Car 6 54°
Total 40 360°
Use the sector angles and a protractor to construct the circle graph.
Mode of Transportation Legend
15% 25%
To find the median, the heights are listed in ascending order (smallest to
largest). The middle value (or the average of the middle two values if there is an
even number of items in the data set) is the median. The median for this data set
is 173.
The medians of the upper half and the lower half of the data are calculated to
find the upper and lower limits of the box. In this data set, these values are 184
and 156, respectively.
To construct the box-and-whisker plot, the data are plotted on a number line,
and the three calculated values are indicated. A box is drawn around the central
half of the data, and then lines are drawn extending to the smallest and largest
values of the distribution to create the whiskers.
overall median
smallest value in largest value in
the distribution the distribution
(ºC) 60
0 1 2 3 4 ? Don't
Number of Correctly Named Beatles
(a) The vertical scale is missing. If seven students knew the names of
three of the Beatles, what would the scale be?
(b) How can the scale be altered, yet still display the same meaning?
(c) If these data represent the response of 1000 students at a local high
school, how many would be able to name all four Beatles?
3. Some Grade 12 students were asked to estimate the number of hours of
television they watch each day. These are their responses:
1, 1, 0.5, 1, 1, 0, 3, 2, 1.5, 0.5, 1, 1, 2, 2, 2, 5, 1, 0.5, 0.5, 2, 1, 0.5, 1,
0, 0.5, 3
(a) Construct a tally and frequency table.
(b) Use the information to produce a graph.
(c) How many hours of television per day do you watch?
(d) Investigate the average daily television watching time for Canadians.
Compare this information with that from the class. Draw
conclusion(s) from your comparison. Give reasons for your answer.
40 80 120 160 200 240
French Language
English Fiction
English Non-Fiction
Spanish Language
(a) Based on the graph, estimate what percent of the money should be
spent on each book type.
(b) If the library received a donation of $125 000, estimate how much
money should be spent on each book type.
6. Thirty people were asked to state their favourite sport. The responses are
listed below.
Tennis 6
Football 8
Swimming 10
Badminton 3
Volleyball 3
Construct a circle graph to display this information.
17. Application
(a) Choose a topic that interests you and survey your classmates to find
their responses.
(b) Organize your data in a frequency table.
(c) Create the most appropriate graph to display your data.
(d) Use the table and your graph to draw a conclusion about your class.
18. Thinking, Inquiry, Problem Solving
(a) Use the Internet to find data that show a trend over time.
(b) Use the most appropriate graph to display your data.
(c) Use the graph to make a prediction. Give reasons for your answer.
19. Communication Explain the difference between histograms and bar
graphs, and provide an example of a set of data that is best suited for each
of these types of graphs.
Chapter Problem
Trends in Canada’s Population
Use the data given in the chapter problem on page 2 to answer these
CP1. Create three different types of graphs that can be used to compare
the structure of Canada’s population in 1996.
CP2. Of the three graphs you created, which graph best displays the
similarities and differences in the structure of the population?
Technolink Male
The data shown in 250
the split-bar graph and the 200
table above were taken 150
from the full data set on 100
the textbook CD.
I hate I don't I like I like I like
school like school school school
school a bit quite very
? Think about
The Graph and the
very much a bit
The analysis used the first
and fifth bars to arrive at Solution
the conclusion that females
like school more than From the split-bar graph it is clear that
males do. Do the other
bars also support this con- • about three times as many males chose “I hate school” than did females
clusion? Why or why not? • more females chose “I like school very much” than did males
Technolink Frequency 6
For assistance 4
performing this analysis in 2
Fathom™, see Appendix D
starting on page 415. I hate I don't I like I like I like
school like school school school
school a bit quite very
very much a bit much
Students Who Do
Students Who Do Well Very Poorly
200 4
150 3
Frequency Frequency
100 2
50 1
0 0
I hate I don't I like I like I like I hate I don't
school like school school school school like
school a bit quite very school
very much a bit much very much
How Do You Feel About School? How Do You Feel About School?
90 180
80 160
70 140
60 120
50 100
Frequency Frequency
40 80
30 60
20 40
10 20
0 0
I hate I don't I like I like I like I hate I don't I like I like I like
school like school school school school like school school school
school a bit quite very school a bit quite very
very much a bit much very much a bit much
How Do You Feel About School? How Do You Feel About School?
For more data that
imply causal relationships, sample—part of a population selected so as to gain information about the
see pages 388, 391, and whole population
396 of Appendix A.
causal relationship—where one variable directly affects another. Proving
a causal relationship is the result of an in-depth study.
No Yes
2. Students were asked whether they are able to wiggle their ears as well as
whether they know the words to the national anthem. Are students who
know the words to the national anthem more likely to be able to wiggle
their ears? Explain.
Wiggle Your Ears and Knowledge of National Anthem Lyrics
No Yes
Wiggle Ears
4. Students were asked in which direction they prefer the toilet paper to come
off the roll: don’t care (DC), over the top of the roll (O), or from under the
bottom of the roll (U). The responses are broken down by gender. Do males
or females care less about how the paper comes off the roll? Explain.
No Yes
Words Known
No Yes
Corrective Lenses Worn
7. In addition to being
asked if they wear cor- Corrective Lenses and Braces
rective lenses, students 20
were asked if they had
ever worn braces. The 16
graph shows the
responses. Do more 12
students who have
worn braces wear
corrective lenses than
students who have not?
Explain. 0
No Yes
Corrective Lenses Worn
Counterclockwise Clockwise
Spin Direction
Write a brief article for the school newspaper summarizing the results of
these two questions.
10. You have been asked to conduct a survey to analyze the shopping habits of
the people in your community.
(a) Compose 5 to 10 questions that you would ask in the survey.
(b) List the steps that you would follow to choose a sample of the
(c) How would you display your results? Give reasons for your answer.
11. The teacher sponsoring this year’s ski trip has said that 55% of the student
body must be in favour of the trip or it will be cancelled. The student
council distributed a questionnaire to a sample of the student body and the
results are summarized below.
Number of Students Responded
I definitely will go on the ski trip. 18
I probably will go on the ski trip. 25
I may go on the ski trip. 11
I probably will not go on the ski trip. 4
I definitely will not go on the ski trip. 27
(a) Is there enough interest to hold this year’s ski trip? Explain.
(b) To improve the appearance of the results, explain how the student
council should present the findings of the questionnaire.
I don't like it
very much
I like it a bit
I like it a lot
I don't know
I will complete
middle school
I will complete
high school
I will complete a
college program
I will complete a
university program
(a) Do the data allow you to support or refute the conclusion? Give
reasons for your answer.
(b) Describe how you would organize the data to help you decide whether
to accept or reject the hypothesis.
(c) Is there enough data for you to be confident in supporting or refuting
the hypothesis? Explain.
13. Application The following graphs show Canadian population data from
1948 to 1997.
Live Births in Canada Infant Deaths During Birth in Canada
1948–1997 1948–1997
1945 1955 1965 1975 1985 1995 1945 1955 1965 1975 1985 1995
Year Year
Source: Data have been extracted from Fathom Dynamic Statistics™, Key Curriculum Press.
(a) State a conclusion based on the graphs. Give reasons for your answer.
(b) What issue would require further exploration based on the conclusion
you have drawn? Give reasons for your answer.
–20 –20
–25 –25
–30 –30
–35 –35
Temperature –40 Temperature –40
(ºC) –45 (ºC) –45
–50 –50
–55 –55
–60 –60
–65 –65
–70 –70
Month Month
Source: Data have been extracted from Fathom Dynamic Statistics™, Key Curriculum Press.
(a) State a conclusion based on the graphs. Give reasons for your answer.
(b) What issue would require further exploration based on the conclusion
you have drawn? Give reasons for your answer.
(c) What is the relationship between the sample size and the degree of
confidence you may have in a conclusion you have drawn based on
the sample? Explain.
15. On June 21, 2001, COMPAS Inc. issued a report outlining the results of a
survey of Canadians in which participants were asked to respond to 10
questions taken from the citizenship test. Some of the results are shown in
the tables that follow.
(a) State a conclusion based on the information in Table 1. Give reasons
for your answer.
(b) How would you display the data in Table 2 (on page 26) more
(c) State a conclusion based on the information in Table 2. Give reasons
for your answer.
Table 1: Citizenship Test Report Card
Number of Questions
0 1 2 3 4 5 6 7 8 9 10
Correctly Answered
Percent of Respondents 14 14 15 12 15 13 8 4 3 2 0
High Post-Grad/
School College University Law/Medicine
What important trade or commerce did the Hudson’s
Bay Company control during the early settlement of 60 69 83 90
Which group of people played a major role in
physically building the Canadian Pacific Railway 33 56 55 54
across the West?
Who are the Métis? From whom are they descended? 34 51 54 56
Parliament created a new territory in Canada’s North.
What is the name of the new territory? 29 52 58 62
What does one call a law before it is passed? 23 51 49 71
What does one call the Queen’s representative in the
12 28 35 50
provinces and territories?
Which four provinces first formed the Confederation? 12 16 33 38
Which province is the only bilingual province? 11 18 31 23
When did the British North America Act come
9 20 23 37
into effect?
How many electoral districts are there in Canada? 0 4 5 4
Source: COMPAS Inc.
16. Thinking, Inquiry, Problem Solving Television networks bid for the right
to televise the Olympic games. Using Fathom™ and the data in the
Olympics–Cost file on the textbook CD, consider the issue of the cost of
hosting the Olympics. Write a report that includes
(a) the part of the issue you have chosen to investigate;
(b) your hypothesis based on the data;
(c) an analysis of how the data supports your hypothesis; and
(d) any modifications you would make to your original hypothesis.
17. With a small group of students, brainstorm some issues that are of interest.
(a) Choose two of the issues from your list and state a hypothesis for
(b) Use the Internet, Fathom™, Statistics Canada, and so on, to find data
related to your hypothesis.
(c) Does the data support or refute your hypothesis?
(d) What other issues arise from the data?
Below is a graph of the age classes from the data provided in the table
on page 2.
Structure of Canada's Population
Series 1
40 Series 2
35 Series 3
30 Series 4
Percent of the Series 5
Overall Population (%)
1850 1880 1910 1940 1970 2000
The data in Section 1.2 helped you determine if the issues raised were supported
or not supported by the data. Data like these represent the moment in time when
they were collected. Once you have identified a pattern at one moment in time,
you might find it useful to look at the data over a longer period. Looking at data
collected over a longer period of time may show trends and allow you to make
predictions about future events. One effective way to predict these events is to
create a visual display of the data in the form of a scatter plot.
Typically, the independent variable is on the horizontal axis and the dependent
variable is on the vertical axis. Each piece of data is then plotted as an individual
point. A legend is used to differentiate points from one year to the next.
12 Year 1
Year 2
10 Year 3
Number 8
of Fatal 6
0 16 18 20 22 24 26 28
Age of Driver (years)
The above table shows a comparison of tuition fees based on assigning $100 as
the value of the fees in 1992. The other fees are determined relative to the fees
in 1992 using the consumer price index (CPI).
A. Make a scatter plot of the data shown in the table. Describe any patterns that
you observe in the graph that you did not notice in the table.
B. In what year(s) does the relative cost increase the most? The least?
trend—a pattern of C. Which of the following words would you use to describe the trend of the
average behaviour that data: steady growth, steady decline, irregular growth, or irregular decline?
occurs over time
Justify your choice.
D. Predict the cost of tuition when you enter college or university relative to the
cost in 1992.
line of best fit—a E. Draw a line of best fit. Was it easier to use the table or the scatter plot to
straight-line graph that make your prediction? Give reasons for your answer.
best represents a set of
Discussion Questions
1. Explain how you arrived at your prediction in Step D. Compare your predic-
tion with those of your classmates.
2. Copy the following confidence scale into your notebook.
1 2 3 4 5 6 7 8 9 10
Not Confident Completely Confident
Place a dot on the scale to show your confidence in your prediction. Give
reasons for your answer.
3. Do you think you can afford to go to college or university? Why or why not?
160 20
120 15
Stride 100 Number
(cm) 80 of 10
60 Absences
40 5
42 80 Gender
70 Male
38 60
Number 50
34 of Years 40
to Live 30
30 20
26 0 10 20 30 40 50 60 70 80 90
22 Age (years)
? Think about
The Scatter Plots
• Which of the scatter Foot 30
plots indicate the Length 25
strongest trend? Give (cm) 20
reasons for your
• Which of the scatter 10
plots do not indicate a
trend? Give reasons for 130 140 150 160 170 180 190 200 210
your answer. Height (cm)
• If a line of best fit were
drawn on each of the
There appears to be no relationship between absences and height; the dots
scatter plots that show a
trend, describe the are scattered randomly. The strongest relationship appears to be in life
slope of each line. expectancy. A very clear trend is evident from the data.
The scatter plot showing a strong trend and having a line of best fit with a
correlation—the positive slope is said to have a strong positive correlation. The scatter plot
apparent relation between showing a trend that is not strong and that has a line of best fit with a positive
two variables
slope is said to have a weak positive correlation.The scatter plot showing no
trend is said to have no correlation.
Negative correlation occurs in scatter plots where a line of best fit has a
? Think about
Negative negative slope.
Write a definition for a
strong negative correlation
Example 2 The Median–Median Line
and a weak negative corre-
lation. The environment club is interested in the relationship between the number of
canned drinks sold in the cafeteria and the number of cans that are recycled. The
data they collected are listed below.
median–median line—
a linear model used to fit a
Number of Canned Drinks Sold 15 18 23 25 28 30 30 36
line to a data set. The line
is fit only to key points Number of Cans Recycled 6 2 4 1 6 8 4 10
calculated using medians.
Number of 8
Cans Recycled
0 10 20 30 40
Number of Canned Drinks Sold
(b) The trend has a weak positive correlation because the data points are fairly
spread out; yet, they suggest a line with positive slope.
(c) To draw the median–median line, the data are broken up into three vertical
sections. As much as possible, each section contains an equal number of
data points.
Number of 8
Cans Recycled
0 10 20 30 40
Number of Canned Drinks Sold
Find the median of the x-coordinates (x-median) and the median of the
y-coordinates (y-median) in each section. In the first section, the x-median of
15, 18, and 23 is 18, and the y-median of 6, 2, and 4 is 4. Plot the median
point (18, 4). The median point is indicated on the graph that follows by a ∆.
Repeat the process for the other two sections. Place the edge of a ruler along
the line joining the first and third median points. If the second point is not on
this line, slide the ruler about a third of the way toward the second point.
Ensure that the slope of the line has not changed. Draw the median–median
line along the edge of the ruler.
Technolink 12
More data for trend 10
analysis are available in 8
Number of
Appendix A starting on
Cans Recycled 6
page 388.
0 10 20 30 40
Number of Canned Drinks Sold
(d) They would expect to find nine cans in the recycling box.
Number of Photos 44 30 24 15
Total Cost ($) 18.00 16.00 13.00 10.00
(c) The number of available seats and the average speed of a variety of
planes are listed below (the data are taken from Fathom™).
(a) (b)
Age and Sleep Habits of College Students Age and Minutes of Homework
of College Students
9.0 200
8.0 160
Sleep 7.5
(h) 7.0 120
6.5 (min/day)
6.0 80
19 21 23 25 27 29
Age (years) 0
19 21 23 25 27 29
Age (years)
(c) (d)
Length of Foot and Forearm of Grade 12 Students Temperature Readings
32 103.0
30 102.0
28 Tympanic 101.0
Length of 26 (ºF) 100.0
Forearm 24 99.0
(cm) 22 98.0
92 94 96 98 100 102
16 Oral (ºF)
16 18 20 22 24 26 28 30
Length of Foot (cm)
Speed of
25 35 45 55 60 70 80 90 100 110
Car (km/h)
10 15 21 27 33 42 54 61 78 103
Distance (m)
Game 1 2 3 4 5 6 7 8
Attendance 125 111 122 105 100 93 85 72
Predict the attendance for Game 9. Give reasons for your answer.
If you were to toss the coin 30 times, how many tails would you expect?
Give reasons for your answer.
8. The winning women’s Olympic long-jump distance is shown in the table
Year Distance (m) Year Distance (m)
1948 5.69 1976 6.72
1952 6.24 1980 7.06
1956 6.35 1984 6.96
1960 6.37 1988 7.40
1964 6.76 1992 7.14
1968 6.82 1996 7.12
1972 6.78 2000 6.99
Source: British Broadcasting Corporation (BBC)
If the Olympics had been held in 1944, what might the winning distance
have been?
9. A local movie theatre monitors attendance during the first 10 weeks of a
movie’s showing. The results of one movie are listed below.
Week 1 2 3 4 5 6 7 8 9 10
Attendance 2250 2100 1950 1678 1430 1200 987 731 675 587
If less than 200 people attend a movie, the theatre loses money. How many
more weeks will the movie run?
Time (min) 3 7 12 18 25 35
Energy Used (in calories) 28 70 119 170 241 320
11. Knowledge and Understanding Make a scatter plot of the data and
construct the median–median line.
12. Communication Describe a trend in the data in terms of correlation.
13. Application Determine the equation of the median–median line that you
14. Thinking, Inquiry, Problem Solving If a person burned 1000 calories while
in-line skating, determine the length of time that she or he skated. How
confident are you that your prediction is valid? Give reasons for your
Chapter Problem
Trends in Canada’s Population
Use the data given in the chapter problem on page 2 to answer the
following questions.
CP7. Create a single scatter plot that illustrates the trends in each age
class since 1951.
CP8. For each age class, describe the trends that you see. Where are
these trends most visible: in the table or in the graph? Explain.
CP9. For each age class, draw the median–median line on your graph.
CP10. Describe the type of correlation that exists within each age class.
CP11. Provide some possible reasons for the trends that you see.
Visual displays of data taken from a table help not only the identification of
trends, but also the drawing of valid conclusions from the information. By exam-
ining a scatter plot, for example, you can see whether the relationship between
two variables is strong or weak, positive or negative.
In the past, health researchers used treadmills and bicycle ergometers to
measure the exercise capacity of patients with cardiac and respiratory illnesses.
As this equipment was not always available to everyone, investigators began to
use a simpler test more related to day-to-day activity. The simpler test had
patients cover as much ground as possible in a specified amount of time by
walking in an enclosed corridor. The strength of the relationship between the two
tests is important because if the results are strongly related, then one test can be
substituted for the other. The strength of the relationship could show how well
laboratory tests can predict a patient’s ability to undertake physically demanding
activities associated with daily living.
You are familiar with scatter plots and finding a line of best fit. Often, a set
of data is best represented by a curve of best fit. In this section, you will investi-
gate mathematical tools that will allow you to evaluate the strength of any con-
clusions drawn from a data set. These tools will give you more confidence in
analyzing data and describing trends, which are important mathematical skills.
regression—the process
of fitting a line or curve to a
set of data
Example 1 Finding the Best Model to Fit the Data
Sanjev is planning to enter college or university in 2005. He creates a scatter plot
? Think about
to help him predict the relative costs of tuition fees and must decide which
model of regression (linear or quadratic) best fits the data.
Sanjev noticed that the Canadian Tuition CPI
turning point for tuition Solution
increases was around
1990. What other observa- The scatter plot with 250
tions could be added to
the line of best fit is
the list?
shown to the right. 200
Procedure—Using Fathom™
Technolink A. Enter the data shown below into a new case table, labelling the attributes x and y.
For more informa- x 1 3 5 7 9
tion about creating scatter
plots in Fathom™, see y 2 4 4 8 3
Appendix D.6 on page 421.
B. Drop a new graph into the workspace, and drag x to the horizontal axis and
y to the vertical axis to create a scatter plot.
C. Drag the points so that the value of r 2 is as large as possible. Record the
greatest value of r.
D. Adjust the points so that the value of r 2 is as small as possible. Record the
smallest value of r.
E. Adjust the points so that r 2 is 0. Do this in as many ways as possible.
F. Drag the points so that they are on top of each other.
Discussion Questions
1. What is the largest value of r 2?
2. Describe the relationship between the points when r 2 is a maximum.
3. What is the smallest value of r 2?
4. Describe the relationship between the points when r 2 is a minimum.
5. What happens when you try to drag the points on top of each other? Zoom in
on the points to help explain the results.
Day 0 10 8 13 9 11 14
Height (cm) 1 12 7 14 10 11 13
See Appendix C (a) Enter the data into L1 and L2 using Stat Editor.
starting on page 401 for
instructions on entering To construct a line of best fit, you must choose
data into lists, creating a the type of line/curve you think will best fit the
scatter plot, and graphing a
line of best fit.
data, where the data are located, and where you
If your calculator does not
would like the equation of the line to be written.
display r and r 2, refer to The slope and y-intercept of the line are given,
page 402 of Appendix C
as well as two measures of correlation: r, the
and activate Diagnostic
On. coefficient of correlation, and r2, the
coefficient of determination. Plot the points
coefficient of and the line of best fit.
correlation—a number
from 1 to 1 that gives (b) You can be confident in the conclusion that the
the strength and direction relationship is linear because the r and r2 values
of the relationship between
two variables
are very close to 1. Also, the line of best fit
appears to fit the data closely.
coefficient of
determination— (c) Using the CALC feature on a TI-83 Plus calculator, select 1:value, hit
a number from 0 to 1 that u, and enter 20 for the x-value. The y-value returned is 19.9. The plant
gives the relative strength
would be approximately 19.9 cm in height, provided that it continued to
of the relationship between
two variables. (If r 20.44, grow at the rate demonstrated by the data.
this means that 44% of the
variation of the dependent CORRELATION COEFFICIENT
variable is due to variation
in the independent The correlation coefficient, r, is an indicator of both the strength and direction of
variable.) a linear relationship. A value of r 0 indicates no correlation, while r ±1
indicates perfect positive or negative correlation.
The coefficient of determination, r 2, does not give the direction of correla-
tion, but does make the scale constant. A value of r 2 0.4 indicates that 40% of
the variation in y is due to the variation in x.
Technolink Price (¢/L) 54.5 55.0 55.9 56.3 58.4 59.2 60.2 62.3
For instructions on
creating a collection, a Amount Sold (L/h) 186 178 172 150 127 112 102 83
case table, a scatter plot,
and the least-squares line (a) Is the relationship linear? Give reasons for your answer.
using Fathom™, refer to
pages 416, 417, and 421 (b) How many litres of gasoline would be sold if the price were $0.57/litre?
of Appendix D.
(a) Using Fathom™, start a new worksheet and create a collection with two
attributes: price and litres. Enter the data from the table. Drag a new graph
onto the worksheet and drag the attribute names onto the appropriate axes.
Select the graph and choose Least-Squares Line under the Graph menu.
Based on the r2 value, the data are linear. The line shown on the graph
appears accurate.
(a) Open a new spreadsheet. Enter the titles and data in columns A, B, and C.
Although the r-value indicates the strength and direction of a linear relationship,
a lower r-value does not necessarily mean that the linear model should be
rejected. Another method of analyzing data is also useful. This involves ana-
lyzing the distance the data points are from the line of best fit.
The vertical distance between a data point and the line of best fit is called
residual value the residual value (or residual). It may be calculated for a single point (x1, y1)
(residual)—the vertical by subtracting the calculated value from the actual value
distance between a data
point and the line of best fit R1 y1 [a(x1) b]
where a and b are the slope and intercept of the line of best fit, respectively.
The residuals should be graphed. If the model is a good fit, the residuals
should be fairly small, and there should be no noticeable pattern. Large
residuals or a noticeable pattern are indicators that another model may be more
appropriate. If only a few pieces of data cause large residuals, you may wish to
disregard them.
(b) (c)
13 10
11 8
9 6
7 4
A 4 6 8 10 12 14 16
4 6 8 10 12 14 16
(i) 12 (ii) 3
8 1
Residual Residual 0
1 2 3 4 5 6
4 –1
x –3
0 1 2 3 4 5 6
(a) Describe any trends in the data. Give reasons for your answer.
(b) Create a graph to display the data.
13. The population, births, deaths, and infant deaths for Canada for the years
1948 to 1972 are listed in the table below.
Infant Infant
Year Population Births Deaths Deaths Year Population Births Deaths Deaths
1948 13 167 000 359 860 122 974 15 965 1961 18 238 000 475 700 140 985 12 940
1949 13 475 000 367 092 124 567 15 935 1962 18 614 000 469 693 143 699 12 941
1950 13 737 000 372 009 124 220 15 441 1963 18 964 000 465 767 147 367 12 270
1951 14 050 000 381 092 125 823 14 673 1964 19 325 000 452 915 145 850 11 169
1952 14 496 000 403 559 126 385 15 408 1965 19 678 000 418 595 148 939 9 862
1953 14 886 000 417 884 127 791 14 859 1966 20 048 000 387 710 149 863 8 960
1954 15 330 000 436 198 124 855 13 934 1967 20 412 000 370 894 150 283 8 151
1955 15 736 000 442 937 128 476 13 884 1968 20 729 000 364 310 153 196 7 583
1956 16 123 000 450 739 131 961 14 399 1969 21 028 000 369 647 154 477 7 149
1957 16 677 000 469 093 136 579 14 517 1970 21 297 100 371 988 155 961 7 001
1958 17 120 000 470 118 135 201 14 178 1971 22 962 082 362 187 157 272 6 356
1959 17 522 000 479 275 139 913 13 595 1972 22 219 560 347 319 162 413 5 938
1960 17 909 000 478 551 139 693 13 077
Source: Data have been extracted from Fathom Dynamic Statistics™, Key Curriculum Press.
(a) Find a model that best fits the data given. Use it to predict the popula-
tion, live births, deaths, and infant deaths for 1973.
(c) Using the data from 1974 to 1997, find the equation(s) you would use
to predict the live births in the future, provided the trend continues.
C 14. Thinking, Inquiry, Problem Solving
(a) Find some data containing two or three sets that show trends relating
to the general direction of growth or decline.
(b) Describe the rate of the trend and state whether or not the trend shows
a change that is steady or erratic. Give reasons for your answer.
(c) Provide a model that could be used for predictions.
(d) Explain why your model is accurate.
Chapter Problem
Trends in Canada’s Population
The media are major users of data. In addressing issues and presenting points of
view, the media rely on information based on data. One of the main purposes of
the media, as producers of mass communication, is to inform the general public
about world events in as an objective manner as possible. Ideally, the information
is accepted as being accurate; however, the media may sometimes provide
misleading or false impressions to sway the public or to increase ratings or
An important reason to study statistics is to understand how information is
represented or misrepresented. The ability to correctly interpret tables/charts,
diagrams, and graphs presented in the media is an invaluable skill.
(a) The value of the car in graph A seems to be decreasing but at a much slower
rate than the value of the car in graph B.
(b) The change in the value of the car is actually the same. However, your
impression likely changed when you looked at the scale provided for the two
(c) Car dealers and bankers might use graph A to convince people to buy a car.
The gradual decrease makes it seem as if the car holds its value longer.
Environmentalists might prefer graph B because they would want to
encourage people to use public transportation. Showing consumers the rapid
decrease in the value of a car might discourage them from buying a new car.
Insurance companies might also prefer graph B because it would help to
justify paying lower replacement costs if a claim were made against the
Yo r k R e g i o n a g r e a t p l a c e t o
d o b u s i n e s s : S u r v ey
York Region is ranked the second best place in the Greater Toronto Area
(GTA) to start or expand a business, a study revealed this week.
The study was conducted by the Canadian Federation of Independent
Business, a national lobby group, in the city of Toronto and across the GTA
in March and April of 2001. About 650 people in Toronto and the GTA
took part.
Respondents were asked to give opinions on a number of issues,
including taxation and local administration, which affect their operations.
While Mississauga ranked first in terms of satisfaction with local govern-
ment, York Region was second, with 10% of respondents saying they were
very satisfied and another 65% saying they were somewhat satisfied with the
local government’s management and handling of the economy.
Source: The Liberal, Sunday, September 16, 2001.
(a) What is the purpose of the article? Who might be interested in the informa-
tion and why?
(b) The article mentions that 650 people in Toronto and the GTA took part in the
? Think about
The Sample Size survey. What proportion of people in the population is represented by the
How large must the sample?
sample be to be represen-
tative of the population? (c) Estimate the number of respondents who might have come from Toronto.
From Mississauga. From York Region. Justify your answer.
(a) The purpose is to get the message out that York Region is a great place to do
business. It is meant to encourage businesses to establish their operations in
the area. Current or prospective business owners might be interested in this
information. If current businesses in the area are happy, then new establish-
ments might also be happy. Members of the York Region business community
would also be interested—it gives them “bragging rights” at national meetings.
(b) If the population of Toronto and the GTA is approximately 4 500 000 people
and 650 people were surveyed, the percent of the population that answered
the survey is 650 4 500 000 100 0.014 %. This means that relatively
few people were surveyed.
(c) Assuming the populations of Toronto, Mississauga, and York Region are
approximately 2 500 000, 550 000, and 450 000, respectively, and that the
respondents for the survey were chosen based on regional proportional repre-
sentation of the total population, the number of respondents from each
region is calculated as follows:
Toronto 2 500 000 4 500 000 650 361
Mississauga 550 000 4 500 000 650 79
York Region 450 000 4 500 000 650 65
(d) The number of respondents who were
(i) very satisfied 65 10 100 6.5. This means that approximately
seven people were satisfied.
(ii) somewhat satisfied 65 65 100 42.25. This means that approxi-
mately 42 people were somewhat satisfied.
(e) The results of the survey are very suspect because so few people have actu-
ally responded to the survey. If even a few of the respondents were to change
their mind and decide that they are dissatisified, the title of the article could
be “York Region: An awful place to do business!”
(f) A sample size of between 800 and 1000 is appropriate to ensure strong
For more on appro- representation in York Region.
priate sample size, go to (g) If all of the people surveyed came from one geographical area, their
responses may be representative of what is happening in their area but may
aboutpubop4.htm. not reflect what is happening in other areas. If the respondents are all from a
particular age group, they may have a different view of what is happening.
If the geographical region is overrepresented in the sample, bias will result.
1.5 Exercises
A 1. Knowledge and Understanding The two graphs below show the profits of
the Crazy Car Company.
(a) How are the graphs similar? How are they different?
(b) How much has the profit increased on each graph?
(c) What false impressions are conveyed by the two graphs?
43 50
41 40
Profit 39 Profit 30
($1000s) 38 ($1000s)
37 20
35 10
Month Month
1500 3000
square feet square feet
1975 1980 1990 2000
1998 1999 2000 2001
(1st half)
Source: ROB magazine, October 2001 Source: Phillips, Hager & North
Investment Management Ltd.
QUESTION: Regarding strengthening Canada’s military and our ability to contribute to QUESTION: Given the new military and security situation, do you predict federal
the Western military alliance, how essential is this for protecting Canada’s economic and provincial governments will be spending a lot more than people predicted before
and business interests in Canada-U.S. relations? Sept. 11, somewhat more, about the same, somewhat less, or a lot less?
QUESTION: About 3400 employers have decided to give employees who belong to the QUESTION: Will upcoming spending levels by federal and provincial governments
Reserve two weeks off for military training, sometimes with pay and sometimes without. be very positive for the economy, somewhat positive, neutral, somewhat negative,
Should Canadian business make this kind of commitment to Canada’s security preparedness? or very negative?
Possibly 15 Neutral 31
(a) Application Have the data been misrepresented to bias the reader?
Give reasons for your answer.
(b) Application If you answered yes to part (a), then modify the graph to
display the data accurately.
(c) Communication Explain why your graph is more appropriate.
For each type of media, the information gives comparative data within
that category.
(a) Create graphs to represent the data properly for each media type.
(b) Write a statement in the form of a conclusion for each media type.
(c) Use a technique for misrepresenting the data to support each conclu-
sion in part (b).
Mobile Phones per 1000 Population (millions)
1994 1998
93 94 95 96 97 98 99
15. Thinking, Inquiry, Problem Solving Why would a media source willingly
distort information and misrepresent data in articles and reports? Research
to find out when, where, and why this happens.
16. Communication Suppose that in a recent magazine article, the graphic in
the margin was used to show how the use of cell phones changed between
1994 and 1998. Explain why this picture is misleading.
71 420
Chapter Problem
Trends in Canada’s Population