STA215 STA220 Practice Test
STA215 STA220 Practice Test
STA215 STA220 Practice Test
by temperature. He measured the number of chirps per second for different crickets
at different temperatures (in F). A portion of the (modified) data and some R output
is shown below.
Regression Analysis: chirp_rate versus
temperature
22
20
Predictor
(Intercept)
temperature
Coef
-0.435
0.21443
SE Coef
2.323
0.02910
T
-0.19
7.37
P
0.853
0.000
chirp_rate
21
19
18
17
16
15
14
70
chirp_rate
20.0
16.0
19.8
18.4
17.1
75
85
80
temperature
90
temperature
88.6
71.6
93.3
84.3
80.6
a) Two items have been replaced with letters. Fill in what they should be.
A -
95
[4]
B -
b) What does the estimated slope of 0.214 tell us about the relationship between
chirp rate and temperature? Be specific a qualitative answer will not suffice
here.
[2]
Percent
90
50
10
1
99
-2
-1
0
1
Standardized Residual
2
1
0
-1
-2
Frequency
6
4
2
-2.0
-1.5
16
17
18
Fitted Value
19
15
1.5
2
1
0
-1
-2
2
8 10 12 14 16 18 20 22 24 26 28
Observation Order
c) Do you see any problems with these residuals? If so, list the problems in order of
importance. If not, state why you came to this conclusion.
[3]
d) Predict the chirp rate for a cricket at 78F. Are there any problems with this
prediction? Explain if so, ignoring the prediction if you like.
[2]
e) Predict the temperature for a chirp rate of 21. Are there any problems with this
prediction? Explain if so, ignoring the prediction if you like.
[3]
2) A research team wants to determine what brand of cat food results in the least
amount of cat hair on furniture. The team runs an experiment with 300 cats,
separated into long-haired and short-haired and each with their own living space.
Each cat is given one of three brands of cat food (cheap dry food, fancy dry food, or
wet food) as well as either purified water or tap water. The team measures the
amount of hair on the floor of each living space.
Identify all the key design elements, such as:
[10]
a) the factors, levels, and treatments
c) response variable(s)
d) use of blinding
f) We could use side-by-side boxplots to compare the amount of hair left on the
floor for each type of cat food.
True or False ?
g) If we notice a significant difference between the mean hair found from different
brands, we can assume a cause-effect relationship.
True or False ?
3
3) Regression, again
Below is some output from a regression analysis performed on a dataset containing
the age and systolic blood pressure measurement for 30 patients. These patients
were a random sample from all of the patients at a medical clinic in Toronto.
The regression equation is
blood_pressure = 98.7 + 0.971 age
S = 17.3137
Coef
98.71
0.9709
SE Coef
10.00
0.2102
T
9.87
4.62
P
0.000
0.000
200
blood_pressure
Predictor
Constant
age
R-Sq = 43.2%
180
160
140
120
100
10
20
Unusual Observations
Obs
2
age blood_pressure
47.0
220.00
Fit
144.35
SE Fit
3.19
Residual
XXX
30
40
age
50
60
70
St Resid
XXX
a) Something was minimized by this regression procedure. What is it, in very simple
words, and what is the actual (minimal) numerical value of this quantity for the
regression here?
[3]
True or False
True or False
True or False
d) The value under Residual has been replaced with XXX. What is this value?
[2]
[3]
[1]
f) Does the intercept have any meaning in this analysis? If so, what is the
interpretation of the intercept? If not, state why it is meaningless.
[2]
[2]
g) A researcher wants to use this model to predict the blood pressure of all Toronto
residents between the ages of 18-70. Assuming we deal with the outlier (by
removing it, for example), is this an appropriate use of regression? Explain why
or why not using terminology from class.
[3]
Below is a portion (first 7 children) of a data set consisting of observations on a number of variables
of interest for 78 seventh grade students, followed by some analyses and plots (residuals are
calculated from the regression preceeding the plots). Higher self-concept scores (based on a
standard test) indicate more positive self-concept. Two different regression models are fitted below
using the data. We are interested in what factors influence the GPA of these students.
Portion of the data set:
OBS GPA IQ Gender
1
7.940 111
M
2
8.292 107
M
3
4.643 100
M
4
7.470 107
M
5
8.882 114
F
6
7.585 115
M
7
7.650 111
M
.
.
.
78
..
..
Self-concept
67
43
52
66
58
51
71
..
Correlations (Pearson)
GPA
IQ
0.677
Self-con
0.612
IQ
0.382
Regression Analysis 1
The regression equation is
Predictor
Constant
IQ
S = 1.545
Coef
-6.602
0.12729
R-Sq = 45.9%
T
-2.57
5.60
P
0.014
0.000
R-Sq(adj) = 44.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
37
38
SS
74.980
88.377
163.357
MS
74.980
2.389
F
31.39
P
0.000
Unusual Observations
Obs
IQ
GPA
Fit
StDev Fit
Residual
8
97
2.412
5.745
0.430
-3.333
22
109
1.760
7.273
0.260
-5.513
R denotes an observation with a large standardized residual
St Resid
-2.25R
-3.62R
Regression Analysis 2
The regression equation is
Predictor
Constant
Self-con
S = 1.663
Coef
1.519
0.10638
R-Sq = 37.4%
T
1.13
4.70
P
0.266
0.000
R-Sq(adj) = 35.7%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
37
38
SS
61.087
102.270
163.357
MS
61.087
2.764
F
22.10
P
0.000
Unusual Observations
Obs
Self-con
GPA
Fit
StDev Fit
Residual
St Resid
8
51.0
2.412
6.944
0.313
-4.532
-2.78R
22
20.0
1.760
3.647
0.906
-1.887
-1.35 X
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
1) Answer the following questions, based on the previous three pages of output.
a) Look at the first regression above (GPA on IQ). Carefully describe the plot and relationship to
someone who will not be able to actually view the plot. [3]
b) If looking only at the numerical output (without taking the graphs into account), which variable is
the better predictor? Why? [2]
c) Some but not all of the variation in GPAs is explained by a linear relationship with self-concept.
Many other variables are involved. How much of the variation in GPAs is not accounted for after
taking into account self-concept scores? [2]
d) A students IQ is 155. Give a prediction of this students GPA. Is your prediction reliable (why or
why not)? [3]
e) For the second regression (GPA on self-concept), examine the scatterplot of (standardized)
residuals versus predictor. What exactly do you learn from this plot? [2]
f) Attempting to make sense of the results, Joe came to the conclusion that high GPA is the result of a
high IQ quotient, and in addition high GPA is also a consequence of high self-concept, although
self-concept is not as strong a cause as IQ. Do you agree? Defend your argument. (<20 words)
[1]
ii) 8th student in the list somewhat unusual (do not use very technical terms like residual or
deviation). [1]
h) Describe the (frequency) distribution of the residuals resulting from the regression of GPA on selfconcept. [2]
2) A hobbyist / researcher rediscovered the research of optometrist Dr. William H. Bates (1860-1931),
who demonstrated that nearsightedness and farsightedness can be cured with proper training and care.
He believed that glasses are in fact the main reason why people's eyesight deteriorates over time, and
recorded hundreds of patients whom he helped to regain 20/20 (standard) vision from myopia
(nearsightedness).
Due to vehement opposition from glasses manufacturers and optometrists who refused to accept his
view in his time, only a small number of people today know about Dr. Bates ambitious work that
challenged the orthodox belief about eyesight. Unfortunately, the theory of statistical experimental
design, which could have supported Dr. Bates research, had not yet been developed 100 years ago.
Although this eyesight research sounded very convincing, the researcher could not find any
experiment on Bates' research, and decided to carry out an experiment on his own to confirm the truth.
The researcher found 80 volunteers from U. of T. He decided to determine the effects of two
different types of eye exercises (plus lens method, shifting method both described as highly
effective in the literature); also the length of exercise time (30 min/day, 90 min/day - the longer the
better); and the effect of taking Vitamin A (vitamin A pill, placebo pill); on vision improvement
(measured on a numeric 1-10 scale) after 3 months.
The subjects were randomly assigned to each possible treatment, so that 10 subjects were allocated
for each treatment. All the subjects were extremely eager to participate, although they did not know
anything about the effects of the different exercise types, length of exercises, or supplements. In
addition, the researcher was very careful with measuring the vision improvement on his own after 90
days of training.
10
a) Identify explicitly:
i) the experimental units
[1 ]
[3]
[1 ]
b) Is this a randomized block design or a completely randomized design? (circle one) [1]
c) This experiment can be criticized severely by the scientific community due to two reasons - identify
those two "main" reasons relevant to experimental design. (< 40 words in total) [4]
(1)
(2)
11
3) Think about the sampling or experimental approach that you would use in each of the following
situations. Explain briefly how you would proceed, mentioning any critical procedural details, e.g. what
instructions you might give to your research assistant (you do not need to explain to him how to use a
random number table just be sure to tell him what/where to randomize). If there is a technical term
that describes the sample/experimental design, mention it as part of your explanation. (< 30 words,
each)
a) Child welfare service areas CWSAs are spread across Canada. We want to collect data re
investigated cases of child abuse, in order to estimate types of abuse, percentage of cases
substantiated, etc. Each investigated case occupies a file at one of these CWSAs. Wed like to
sample about 1% of total cases. [2]
b) We have a list of students enrolled in STA220 (the population of interest). We want to sample 20
students to estimate some population characteristics; these characteristics may be associated with
gender. [2]
c) You want to select 10 test papers, randomly, from a pile of 80 sitting in front of you, and have to do
it as fast as possible. [2]
d) We want to compare three diets for their effect on weight gain (over the next 4 months), in young
rats. We have 18 young rats of about the same age, though differing in weight. [2]
e) You want to compare the taste of french fries, where some will be cooked from potatoes stored at
room temp, and others will be cooked from potatoes stored at a colder temperature. 10 people are
available for your study. Each taster will have to give a rating on a 0-10 scale for flavour. [2]
12
b) Look back to question 4 computer output. In assessing the relation between GPA and self-concept,
we used a scatterplot, plotting GPA vs. self-concept, as shown in the output there. How could you
improve the information in this scatterplot, if interested in assessing this relationship as accurately
as possible, in this particular study? [2]
c) For the following scatterplot, with fitted regression line shown, draw a rough picture of the
histogram of the residuals, with 3 - 5 bins. [2]
y
6
5
4
3
2
1
0
x
13