Newbold Book Solutions
Newbold Book Solutions
Newbold Book Solutions
Chapter 13:
Multiple Regression
19
13.6
13.7
13.8
13.9
a. b1 = .653: All else equal, a one unit increase in the average number
of meals eaten per week will result in an estimated .653 pounds gained
during freshman year.
b2 = -1.345: All else equal, a one unit increase in the average number of
hours of exercise per week will result in an estimated 1.345 pound
weight loss.
b3 = .613: All else equal, a one unit increase in the average number of
beers consumed per week will result in an estimated .613 pound weight
gain.
b. The intercept term b0 of 7.35 is the estimated amount of weight gain
during the freshman year given that the meals eaten is 0, hours exercise
is 0 and there are no beers consumed per week. This is likely
extrapolating beyond the observed data series and is not a useful
interpretation.
13.10 Compute the slope coefficients for the model: yi b0 b1 x1i b2 x2i
Given that b1
, b2
400(.60 (.50)(.70))
400(.70 (.50)(.60))
= 2.000, b2
= 3.200
2
200(1 .50 )
200(1 .502 )
400(.60 (.50)(.70))
b. b1
= -.667,
200(1 (.50) 2 )
400(.70 ( .50)(.60))
b2
= 1.067
200(1 (.50) 2 )
400(.40 (.80)(.45))
400(.45 (.80)(.40))
c. b1
=.083, b2
= .271
2
200(1 (.80) )
200(1 (.80)2 )
400(.60 (.60)(.50))
d. b1
= .9375,
200(1 (.60) 2 )
400(.50 (.60)(.60))
b1
= -.4375
200(1 (.60) 2 )
a. b1
s y rx1 y
sx1
which is the equivalent of the bivariate slope coefficient (see box on bottom
of page 380).
b. When the correlation between X1 and X2 is = 1, then the second term in the
denominator goes to 0 and the slope coefficient is undefined.
13.12 a. Electricity sales as a function of number of customers and price
Regression Analysis: salesmw2 versus priclec2, numcust2
The regression equation is
salesmw2 = - 647363 + 19895 priclec2 + 2.35 numcust2
Predictor
Coef
SE Coef
T
P
Constant
-647363
291734
-2.22
0.030
priclec2
19895
22515
0.88
0.380
numcust2
2.3530
0.2233
10.54
0.000
S = 66399
R-Sq
Analysis of Variance
Source
DF
Regression
2
Residual Error
61
Total
63
= 79.2%
R-Sq(adj) = 78.5%
SS
MS
1.02480E+12 5.12400E+11
2.68939E+11 4408828732
1.29374E+12
F
116.22
P
0.000
20
21
All else equal, for every one unit increase in the price of electricity, we estimate
that sales will increase by 19895 mwh. Note that this estimated coefficient is not
significantly different from zero (p-value = .380).
All else equal, for every additional residential customer who uses electricity in the
heating of their home, we estimate that sales will increase by 2.353 mwh.
b. Electricity sales as a function of number of customers
Regression Analysis: salesmw2 versus numcust2
The regression equation is
salesmw2 = - 410202 + 2.20 numcust2
Predictor
Coef
SE Coef
Constant
-410202
114132
numcust2
2.2027
0.1445
S = 66282
R-Sq
Analysis of Variance
Source
DF
Regression
1
Residual Error
62
Total
63
= 78.9%
T
-3.59
15.25
P
0.001
0.000
R-Sq(adj) = 78.6%
SS
MS
1.02136E+12 1.02136E+12
2.72381E+11 4393240914
1.29374E+12
F
232.48
P
0.000
= 42.2%
R-Sq(adj) = 40.3%
SS
MS
5.45875E+11 2.72938E+11
7.47863E+11 12260053296
1.29374E+12
F
22.26
P
0.000
All else equal, an increase in the price of electricity will reduce electricity sales by
165,275 mwh.
All else equal, an increase in the degree days (departure from normal weather) by
one unit will increase electricity sales by 56.06 mwh.
Note that the coefficient on the price variable is now negative, as expected, and it
is significantly different from zero (p-value = .000)
22
d.
Regression Analysis: salesmw2 versus Yd872, degrday2
The regression equation is
salesmw2 = 293949 + 326 Yd872 + 58.4 degrday2
Predictor
Coef
SE Coef
T
Constant
293949
67939
4.33
Yd872
325.85
21.30
15.29
degrday2
58.36
35.79
1.63
S = 66187
R-Sq
Analysis of Variance
Source
DF
Regression
2
Residual Error
61
Total
63
= 79.3%
P
0.000
0.000
0.108
R-Sq(adj) = 78.7%
SS
MS
1.02652E+12 5.13259E+11
2.67221E+11 4380674677
1.29374E+12
F
117.16
P
0.000
All else equal, an increase in personal disposable income by one unit will increase
electricity sales by 325.85 mwh.
All else equal, an increase in degree days by one unit will increase electricity sales
by 58.36 mwh.
13.13 a. mpg as a function of horsepower and weight
Regression Analysis: milpgal versus horspwr, weight
The regression equation is
milpgal = 55.8 - 0.105 horspwr - 0.00661 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.769
1.448
38.51
0.000
horspwr
-0.10489
0.02233
-4.70
0.000
weight
-0.0066143
0.0009015
-7.34
0.000
S = 3.901
R-Sq = 72.3%
R-Sq(adj) = 72.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
5850.0
2925.0
192.23
Residual Error
147
2236.8
15.2
Total
149
8086.8
P
0.000
All else equal, a one unit increase in the horsepower of the engine will reduce fuel
mileage by .10489 mpg. All else equal, an increase in the weight of the car by 100
pounds will reduce fuel mileage by .66143 mpg.
b. Add number of cylinders
Regression Analysis: milpgal versus horspwr, weight, cylinder
The regression equation is
milpgal = 55.9 - 0.117 horspwr - 0.00758 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
55.925
1.443
38.77
horspwr
-0.11744
0.02344
-5.01
weight
-0.007576
0.001066
-7.10
cylinder
0.7260
0.4362
1.66
+ 0.726 cylinder
P
0.000
0.000
0.000
0.098
S = 3.878
R-Sq = 72.9%
R-Sq(adj) = 72.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
5891.6
1963.9
130.62
Residual Error
146
2195.1
15.0
Total
149
8086.8
P
0.000
23
All else equal, one additional cylinder in the engine of the auto will increase fuel
mileage by .726 mpg. Note that this is not significant at the .05 level (p-value = .
098).
Horsepower and weight still have the expected negative signs
c. mpg as a function of weight, number of cylinders
Regression Analysis: milpgal versus weight, cylinder
The regression equation is
milpgal = 55.9 - 0.0104 weight + 0.121 cylinder
154 cases used 1 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.914
1.525
36.65
0.000
weight
-0.0103680
0.0009779
-10.60
0.000
cylinder
0.1207
0.4311
0.28
0.780
S = 4.151
R-Sq = 68.8%
R-Sq(adj) = 68.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
5725.0
2862.5
166.13
Residual Error
151
2601.8
17.2
Total
153
8326.8
P
0.000
All else equal, an increase in the weight of the car by 100 pounds will reduce fuel
mileage by 1.0368 mpg. All else equal, an increase in the number of cylinders in
the engine will increase mpg by .1207 mpg.
The explanatory power of the models has stayed relatively the same with a slight
drop in explanatory power for the latest regression model.
Note that the coefficient on weight has stayed negative and significant (p-values
of .000) for all of the regression models; although the value of the coefficient has
changed. The number of cylinders is not significantly different from zero in either
of the models where it was used as an independent variable. There is likely some
correlation between cylinders and the weight of the car as well as between
cylinders and the horsepower of the car.
d. mpg as a function of horsepower, weight, price
Regression Analysis: milpgal versus horspwr, weight, price
The regression equation is
milpgal = 54.4 - 0.0938 horspwr - 0.00735 weight +0.000137 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
54.369
1.454
37.40
0.000
horspwr
-0.09381
0.02177
-4.31
0.000
weight
-0.0073518
0.0008950
-8.21
0.000
price
0.00013721 0.00003950
3.47
0.001
S = 3.762
R-Sq = 74.5%
R-Sq(adj) = 73.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
6020.7
2006.9
141.82
Residual Error
146
2066.0
14.2
Total
149
8086.8
P
0.000
All else equal, an increase by one unit in the horsepower of the auto will reduce
fuel mileage by .09381 mpg. All else equal, an increase by 100 pounds in the
weight of the auto will reduce fuel mileage by .73518 mpg and an increase in the
price of the auto by one dollar will increase fuel mileage by .00013721 mpg.
24
e.
Horse power and weight remain significant negative independent variables
throughout whereas the number of cylinders has been insignificant. The size of the
coefficients change as the combinations of independent variables changes. This is
likely due to strong correlation that may exist between the independent variables.
13.14
VIF
6.0
6.0
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a
1.54 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.57 increase in the horsepower of the auto.
b. Horsepower as a function of weight, displacement, number of cylinders
Regression Analysis: horspwr versus weight, displace, cylinder
The regression equation is
horspwr = 16.7 + 0.0163 weight + 0.105 displace + 2.57 cylinder
151 cases used 4 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
16.703
9.449
1.77
0.079
weight
0.016261
0.004592
3.54
0.001
6.2
displace
0.10527
0.05859
1.80
0.074
14.8
cylinder
2.574
2.258
1.14
0.256
7.8
S = 13.63
R-Sq = 69.5%
R-Sq(adj) = 68.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
62170
20723
111.55
Residual Error
147
27310
186
Total
150
89480
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a
1.63 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.05 increase in the horsepower of the auto.
All else equal, one additional cylinder in the engine is associated with a 2.57
increase in the horsepower of the auto.
Note that adding the independent variable number of cylinders has not added to the
explanatory power of the model. R square has increased marginally. Engine
displacement is no longer significant at the .05 level (p-value of .074) and the
estimated regression slope coefficient on the number of cylinders is not
25
significantly different from zero. This is due to the strong correlation that exists
between cubic inches of engine displacement and the number of cylinders.
c. Horsepower as a function of weight, displacement and fuel mileage
Regression Analysis: horspwr versus weight, displace, milpgal
The regression equation is
horspwr = 93.6 + 0.00203 weight + 0.165 displace - 1.24 milpgal
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
93.57
15.33
6.11
0.000
weight
0.002031
0.004879
0.42
0.678
8.3
displace
0.16475
0.03475
4.74
0.000
6.1
milpgal
-1.2392
0.2474
-5.01
0.000
3.1
S = 12.55
R-Sq = 74.2%
R-Sq(adj) = 73.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
66042
22014
139.77
Residual Error
146
22994
157
Total
149
89036
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a .
203 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.6475 increase in the horsepower of the auto.
All else equal, an increase in the fuel mileage of the vehicle by 1 mile per gallon is
associated with a reduction in horsepower of 1.2392.
Note that the negative coefficient on fuel mileage indicates the trade-off that is
expected between horsepower and fuel mileage. The displacement variable is
significantly positive, as expected, however, the weight variable is no longer
significant. Again, one would expect high correlation among the independent
variables.
d. Horsepower as a function of weight, displacement, mpg and price
Regression Analysis: horspwr versus weight, displace, milpgal, price
The regression equation is
horspwr = 98.1 - 0.00032 weight + 0.175 displace - 1.32 milpgal
+0.000138 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
98.14
16.05
6.11
0.000
weight
-0.000324
0.005462
-0.06
0.953
10.3
displace
0.17533
0.03647
4.81
0.000
6.8
milpgal
-1.3194
0.2613
-5.05
0.000
3.5
price
0.0001379
0.0001438
0.96
0.339
1.3
S = 12.55
R-Sq = 74.3%
R-Sq(adj) = 73.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
66187
16547
105.00
Residual Error
145
22849
158
Total
149
89036
P
0.000
e. Explanatory power has marginally increased from the first model to the last.
The estimated coefficient on price is not significantly different from zero.
Displacement and fuel mileage have the expected signs. The coefficient on
weight has the wrong sign; however, it is not significantly different from zero
(p-value of .953).
13.15
13.16
13.17
26
27
13.18
13.19
13.20
13.21
3.549
.9145 , therefore, 91.45% of the variability in work-hours
3.881
of design effort can be explained by the variation in the planes top speed,
weight and percentage number of parts in common with other models.
b. SSE = 3.881-3.549 = .332
.332 /(27 4)
.9033
c. R 2 1
3.881/ 26
d. R .9145 .9563 . This is the sample correlation between observed and
predicted values of the design effort
a. R 2
88.2
.5441 , therefore, 54.41% of the variability in milk
162.1
consumption can be explained by the variations in weekly income and family
size.
73.9 /(30 3)
.5103
b. R 2 1
162.1/ 29
c. R .5441 .7376 . This is the sample correlation between observed and
predicted values of milk consumption.
a. R 2
79.2
.6331 , therefore, 63.31% of the variability in weight
79.2 45.9
gain can be explained by the variations in the average number of meals eaten,
number of hours exercised and number of beers consumed weekly.
45.9 /(25 4)
.5807
b. R 2 1
125.1/ 24
c. R .6331 .7957 . This is the sample correlation between observed and
predicted values of weight gained
a. R 2
13.22
a.
Regression Analysis: Y profit versus X2 offices
The regression equation is
Y profit = 1.55 -0.000120 X2 offices
Predictor
Coef
SE Coef
Constant
1.5460
0.1048
X2 offi -0.00012033 0.00001434
T
14.75
-8.39
P
0.000
0.000
S = 0.07049
R-Sq = 75.4%
R-Sq(adj) = 74.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
0.34973
0.34973
70.38
Residual Error
23
0.11429
0.00497
Total
24
0.46402
P
0.000
b.
Regression Analysis: X1 revenue versus X2 offices
The regression equation is
X1 revenue = - 0.078 +0.000543 X2
Predictor
Constant
X2 offi
Coef
-0.0781
0.00054280
SE Coef
0.2975
0.00004070
offices
T
-0.26
13.34
P
0.795
0.000
S = 0.2000
R-Sq = 88.5%
R-Sq(adj) = 88.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
7.1166
7.1166
177.84
Residual Error
23
0.9204
0.0400
Total
24
8.0370
P
0.000
c.
Regression Analysis: Y profit versus X1 revenue
The regression equation is
Y profit = 1.33 - 0.169 X1 revenue
Predictor
Coef
SE Coef
Constant
1.3262
0.1386
X1 reven
-0.16913
0.03559
T
9.57
-4.75
P
0.000
0.000
S = 0.1009
R-Sq = 49.5%
R-Sq(adj) = 47.4%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
0.22990
0.22990
22.59
Residual Error
23
0.23412
0.01018
Total
24
0.46402
P
0.000
d.
Regression Analysis: X2 offices versus X1 revenue
The regression equation is
X2 offices = 957 + 1631 X1 revenue
Predictor
Coef
SE Coef
Constant
956.9
476.5
X1 reven
1631.3
122.3
T
2.01
13.34
P
0.057
0.000
S = 346.8
R-Sq = 88.5%
R-Sq(adj) = 88.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
21388013
21388013
177.84
Residual Error
23
2766147
120267
Total
24
24154159
P
0.000
28
29
13.23
Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 4.8 2.086 (2.1); .4194 up to 9.1806
95% CI for x2 = 6.9 2.086 (3.7); -.8182 up to 14.6182
95% CI for x3 = -7.2 2.086 (2.8); -13.0408 up to -1.3592
b. Test the hypothesis H 0 : j 0, H1 : j 0
4.8
2.286 t20,.05 /.01 = 1.725, 2.528
For x1: t
2.1
Therefore, reject H 0 at the 5% level but not at the 1% level
6.9
1.865 t20,.05 /.01 = 1.725, 2.528
For x2: t
3.7
Therefore, reject H 0 at the 5% level but not at the 1% level
7.2
2.571 t20,.05/.01 = 1.725, 2.528
For x3: t
2.8
Therefore, do not reject H 0 at either level
13.24
Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 6.8 2.042 (3.1); .4698 up to 13.1302
95% CI for x2 = 6.9 2.042 (3.7); -6.4554 up to 14.4554
95% CI for x3 = -7.2 2.042 (3.2); -13.7344 up to -.6656
b. Test the hypothesis H 0 : j 0, H1 : j 0
6.8
2.194 t30,.05 /.01 = 1.697, 2.457
For x1: t
3.1
Therefore, reject H 0 at the 5% level but not at the 1% level
6.9
1.865 t30,.05 /.01 = 1.697, 2.457
For x2: t
3.7
Therefore, reject H 0 at the 5% level but not at the 1% level
7.2
2.25 t30,.05 /.01 = 1.697, 2.457
For x3: t
3.2
Therefore, do not reject H 0 at the 5% level nor the 1% level
13.25
Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 34.8 2.000 (12.1); 10.60 up to 59.0
95% CI for x2 = 56.9 2.000 (23.7); 9.50 up to 104.30
95% CI for x3 = -57.2 2.000 (32.8); -122.80 up to 8.40
b. Test the hypothesis H 0 : j 0, H1 : j 0
34.8
2.876 t60,.05/.01 = 1.671, 2.390
For x1: t
12.1
Therefore, reject H 0 at the 5% level but not at the 1% level
56.9
2.401 t60,.05/.01 = 1.671, 2.390
For x2: t
23.7
Therefore, reject H 0 at the 5% level but not at the 1% level
57.2
1.744 t60,.05 /.01 = 1.671, 2.390
For x3: t
32.8
Therefore, do not reject H 0 at either level
13.26
Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 17.8 2.042 (7.1); 3.3018 up to 32.2982
95% CI for x2 = 26.9 2.042 (13.7); -1.0754 up to 54.8754
95% CI for x3 = -9.2 2.042 (3.8); -16.9596 up to -1.44
b. Test the hypothesis H 0 : j 0, H1 : j 0
17.8
2.507 t35,.05 /.01 1.697, 2.457
For x1: t
7.1
Therefore, reject H 0 at the 5% level but not at the 1% level
26.9
1.964 t35,.05 /.01 1.697, 2.457
For x2: t
13.7
Therefore, reject H 0 at the 5% level but not at the 1% level
9.2
2.421 t35,.05 /.01 1.697, 2.457
For x3: t
3.8
Therefore, do not reject H 0 at either level
13.27
30
31
a. H 0 : 1 0; H1 : 1 0
.052
t
2.26
.023
t27,.025 /.01 2.052, 2.473
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. t27,.05/.025 /.005 1.703, 2.052, 2.771
90% CI: 1.14 1.703(.35); .5439 up to 1.7361
95% CI: 1.14 2.052(.35); .4218 up to 1.8582
99% CI: 1.14 2.771(.35); .1701 up to 2.1099
13.29
a. H 0 : 2 0; H1 : 2 0
1.345
t
2.381
.565
t21,.025/.01 2.080, 2.518
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. H 0 : 3 0; H1 : 3 0
.613
t
2.523
.243
t21,.01/.005 2.518, 2.831
Therefore, reject H 0 at the 1% level but not at the .5% level
c. t21,.05 /.025 /.005 1.721, 2.080, 2.831
90% CI: .653 1.721(.189); .3277 up to .9783
95% CI: .653 2.080(.189); .2599 up to 1.0461
99% CI: .653 2.831(.189); .1179 up to 1.1881
32
13.30 a. H 0 : 3 0, H1 : 3 0
.000191
t
.428
.000446
t16,.10 = -1.337
Therefore, do not reject H 0 at the 20% level
b. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
16 .71
F
13.057 , F 3,16,.01 = 5.29
3 1 .71
Therefore, reject H 0 at the 1% level
13.31 a. t85,.025 /.005 2.000, 2.660
90% CI: 7.878 2.000(1.809); 4.260 up to 11.496
95% CI: 7.878 2.660(1.809); 3.0661 up to 12.6899
.003666
2.73 , t85,.005 2.660
b. H 0 : 2 0; H1 : 2 0 , t
.001344
Therefore, reject H 0 at the .5% level
13.32
a. All else being equal, an extra $1 in mean per capita personal income leads
to an expected extra $.04 of net revenue per capita from the lottery
b. b2 .8772, sb 2 .3107, n 29, t24,.025 2.064
95% CI: .8772 2.064(.3107), .2359 up to 1.5185
c. H 0 : 3 0, H1 : 3 0
365.01
t
1.383
263.88
t24,.10 /.05 = -1.318, -1.711
Therefore, reject H 0 at the 10% level but not at the 5% level
33
Mean Squares
1.183
.014435
F-Ratio
81.955
F-Ratio
16.113
34
35
Sum of
Squares
79.2
45.9
125.1
Degress of
Freedom
3
21
24
Mean Squares
26.4
2.185714
F-Ratio
12.078
F , and
R
1
, and hence
2
1 R
SSE SST SSE
SST
SST
n K 1 R2
24 .51
F
F
6.2449 , F 4,24,.01 = 4.22. Therefore,
2 ,
K
4 1 .51
1 R
reject H 0 at the 1% level
13.43 a. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
R2
R2
SSR SST SSR
SSR
SSE
F , and
1
, and hence
2
1 R
SSE SST SSE
SST
SST
n K 1 R2
15 .84
F
F
26.25 , F 3,15,.01 = 5.42. Therefore, reject
2 ,
K
3 1 .84
1 R
H 0 at the 1% level
13.44 a. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
R2
F
R2
SSR SST SSR
SSR
SSE
F , and
1
, and hence
2
1 R
SSE SST SSE
SST
SST
n K 1 R2
16 .96 (2 /16)
F
217 , F 2,16,.01 = 6.23
2 ,
K
2
1 .96
1 R
36
SSE /(n k 1)
k1
SSE / SST
13.46
=
13.47
n k 11 R 2* (1 R 2 )
n k 1 R 2 R 2*
=
k1
1 R2
k1
1 R2
13.47.1
SSE /( n k 1)
n 1
(1 R 2 ) =
= 1
SST /(n 1)
n k 1
n 1
k
(n 1) R 2 k
R2
=
n k 1
n k 1
n k 1
2
a. R 1
b. Since R 2
c.
(n 1) R 2 k
(n k 1) R 2 k
, then R 2
n k 1
n 1
SSR / k
n k 1 SSR / SST
n k 1 R2
=
=
SSE /(n k 1)
k
SSE / SST
k
1 R2
n k 1 [(n k 1) R 2 k ] /(n 1)
n k 1 (n k 1) R 2 k
=
k
[n 1 (n k 1) R 2 k ] /(n 1)
k
( n k 1)(1 R 2 )
n k 1 R 2 k
=
k
(1 R 2 )
37
8
90.5097
10
126.4611
145
221
10
252.3829
221
8
90.5097
10
126.4611
125.8
191
8
36.3772
10
47.5468
-55
-99
13.58 There are many possible answers. Relationships that can be approximated by a
non-linear quadratic model include many supply functions, production functions
and cost functions including average cost versus the number of units produced.
38
To estimate the function with linear least squares, solve the equation 1 2 2
for 2 . Since 2 2 1 , plug into the equation and algebraically manipulate:
Y o 1 X 1 (2 1 ) X 21 3 X 2
13.59
Y o 1 X 1 2 X 21 1 X 21 3 X 2
Y o 1[ X 1 X 21 ] 2 X 21 3 X 2
Y 2 X 21 o 1[ X 1 X 21 ] 3 X 2
Conduct the variable transformations and estimate the model using least squares.
13.60
13.61
13.62
a. All else equal, a 1% increase in the price of beef will be associated with a
decrease of .529% in the tons of beef consumed annually in the U.S.
b. All else equal, a 1% increase in the price of pork will be associated with an
increase of .217% in the tons of beef consumed annually in the U.S.
.416
2.552 , t25,.01 = 2.485, Therefore, reject
c. H 0 : 4 0, H1 : 4 0 , t
.163
H 0 at the 1% level
d. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4)
n k 1 R2
25 .683
39
12.34
Regression Plot
Salary = 20544.5 + 616.113 Experience
S = 3117.89
R-Sq = 78.0 %
R-Sq(adj) = 77.9 %
50000
40000
Salary
13.64
30000
20000
0
10
20
Experience
30
40
Quadratic model:
Regression Plot
Salary = 18683.8 + 910.807 Experience
- 8.21382 Experience**2
S = 3027.17
R-Sq = 79.4 %
R-Sq(adj) = 79.1 %
50000
Salary
40000
30000
20000
0
10
20
Experience
30
40
40
41
Cubic model:
Regression Plot
Salary = 20881.1 + 344.484 Experience
+ 26.4323 Experience**2 - 0.582553 Experience**3
S = 2982.43
R-Sq = 80.2 %
R-Sq(adj) = 79.8 %
50000
Salary
40000
30000
20000
0
10
20
30
40
Experience
All three of the models appear to fit the data well. The cubic model appears to fit
the data the best as the standard error of the estimate is lowest. In addition,
explanatory power is marginally higher for the cubic model than the other models.
13.66
Results for: GermanImports.xls
Regression Analysis: LogYt versus LogX1t, LogX2t
The regression equation is
LogYt = - 4.07 + 1.36 LogX1t + 0.101 LogX2t
Predictor
Coef
SE Coef
T
Constant
-4.0709
0.3100
-13.13
LogX1t
1.35935
0.03005
45.23
LogX2t
0.10094
0.05715
1.77
S = 0.04758
R-Sq = 99.7%
Analysis of Variance
Source
DF
Regression
2
Residual Error
28
Total
30
Source
LogX1t
LogX2t
DF
1
1
SS
21.345
0.063
21.409
P
0.000
0.000
0.088
VIF
4.9
4.9
R-Sq(adj) = 99.7%
MS
10.673
0.002
F
4715.32
Seq SS
21.338
0.007
13.67 What is the model constant when the dummy variable equals 1
a. y 7 8 x1 , b0 = 7
b. y 12 6 x1 , b0 = 12
c. y 7 12 x1 , b0 = 7
P
0.000
42
13.68 What is the model constant when the dummy variable equals 1
a. y 5.78 4.87 x1
b. y 1.15 9.51x1
c. y 13.67 8.98 x1
13.69 The interpretation of the dummy variable is that we can conclude that for a given
difference between the spot price in the current year and OPEC price in the
previous year, the difference between the OPEC price in the current year and
OPEC price in the previous years is $5.22 higher in 1974 during the oil embargo
than in other years
13.70
a. All else being equal, expected selling price is higher by $3,219 if condo has a
fireplace.
b. All else being equal, expected selling price is higher by $2,005 if condo has
brick siding.
c. 95% CI: 3219 1.96(947) = $1,362.88 up to $5,075.12
2005
2.611 , t809,.005 = 2.576
d. H 0 : 5 0, H1 : 5 0 , t
768
Therefore, reject H 0 at the .5% level
13.71
a. All else being equal, the price-earnings ratio is higher by 1.23 for a regional
company than a national company
1.23
2.48 , t29,.01/.005 = 2.462, 2.756
b. H 0 : 2 0, H1 : 2 0 , t
.496
Therefore, reject H 0 at the 2% level but not at the 1% level
c. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
n k 1 R2
29 .37
13.72 35.6% of the variation in overall performance in law school can be explained by
the variation in undergraduate gpa, scores on the LSATs and whether the students
letter of recommendation are unusually strong. The overall model is significant
since we can reject the null hypothesis that the model has no explanatory power in
favor of the alternative hypothesis that the model has significant explanatory
power. The individual regression coefficients that are significantly different than
zero include the scores on the LSAT and whether the students letters of
recommendation were unusually strong. The coefficient on undergraduate gpa was
not found to be significant at the 5% level.
13.73
a. All else equal, the annual salary of the attorney general who can be removed is
$5,793 higher than if the attorney general cannot be removed
43
b. All else equal, the annual salary of the attorney general of the state is $3,100
lower if the supreme court justices are elected on partisan ballots
5793
1.9996 , t43,.05 /.025 = 1.68, 2.016
c. H 0 : 5 0, H1 : 5 0 , t
2897
Therefore, reject H 0 at the 5% level but not at the 2.5% level
3100
1.76 , t43,.05 /.025 = -1.68, -20.16
d. H 0 : 6 0, H1 : 6 0 , t
1761
Therefore, reject H 0 at the 5% level but not at the 2.5% level
e. t43,.05/.025 = 2.016
95% CI: 547 2.016(124.3), 296.41 up to 797.59
13.74
a. All else equal, the average rating of a course is 6.21 units higher if a
visiting
lecturer is brought in than if otherwise.
6.21
1.73 , t20,.05 = 1.725
b. H 0 : 4 0, H1 : 4 0 , t
3.59
Therefore, reject H 0 at the 5% level
c. 56.9% of the variation in the average course rating can be explained by the
variation in the percentage of time spent in group discussions, the dollars spent
on preparing the course materials, the dollars spent on food and drinks, and
whether a guest lecturer is brought in.
H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4)
n k 1 R2
20 .569
6.6
2
k
1 R
4 1 .569
F 4,20,.01 = 4.43
Therefore, reject H 0 at the 1% level
d. t20,.025 = 2.086
95% CI: .52 2.086(.21), .0819 up to .9581
13.75 34.4% of the variation in a test on understanding college economics can be
explained by which course was taken, the students gpa, the teacher that taught the
course, the gender of the student, the pre-test score, the number of credit hours
completed and the age of the student. The regression model has significant
explanatory power:
H 0 : 1 2 3 4 5 6 7 0, H1 : At least one
i 0, (i 1, 2,3, 4,5, 6, 7)
F
n k 1 R2
342 .344
25.62
2
k
1 R
7 1 .344
44
13.76
Results for: Student Performance.xls
Regression Analysis: Y versus X1, X2, X3, X4, X5
The regression equation is
Y = 2.00 + 0.0099 X1 + 0.0763 X2 - 0.137 X3 + 0.064 X4 + 0.138 X5
Predictor
Constant
X1
X2
X3
X4
X5
Coef
1.997
0.00990
0.07629
-0.13652
0.0636
0.13794
S = 0.5416
SE Coef
1.273
0.01654
0.05654
0.06922
0.2606
0.07521
R-Sq = 26.5%
Analysis of Variance
Source
DF
Regression
5
Residual Error
21
Total
26
T
1.57
0.60
1.35
-1.97
0.24
1.83
P
0.132
0.556
0.192
0.062
0.810
0.081
VIF
1.3
1.2
1.1
1.4
1.1
R-Sq(adj) = 9.0%
SS
2.2165
6.1598
8.3763
MS
0.4433
0.2933
F
1.51
P
0.229
The model is not significant (p-value of the F-test = .229). The model only explains
26.5% of the variation in gpa with the hours spent studying, hours spent preparing
for tests, hours spent in bars, whether or not students take notes or mark highlights
when reading tests and the average number of credit hours taken per semester. The
only independent variables that are marginally significant (10% level but not the 5%
level) include number of hours spent in bars and average number of credit hours.
The other independent variables are not significant at common levels of alpha.
13.77
a. Begin the analysis with the correlation matrix identify important independent
variables as well as correlations between the independent variables
Correlations: Salary, Experience, yearsenior, Gender_1F
Experien
yearseni
0.674
0.000
Gender_1 -0.429
0.000
-0.378
0.000
-0.292
0.000
R-Sq = 84.9%
Analysis of Variance
Source
DF
Regression
3
Residual Error
146
SS
5559163505
989063178
R-Sq(adj) = 84.6%
MS
1853054502
6774405
F
273.54
P
0.000
45
Total
149
6548226683
84.9% of the variation in annual salary (in dollars) can be explained by the variation
in the years of experience, the years of seniority and the gender of the employee. All
of the variables are significant at the .01 level of significance (p-values of .000, .000
and .006 respectively). The F-test of the significance of the overall model shows
that we reject H 0 that all of the slope coefficients are jointly equal to zero in favor
of H1 that at least one slope coefficient is not equal to zero. The F-test yielded a pvalue of .000.
b. H 0 : 3 0, H1 : 3 0
1443.2
t
2.78 , t146,.01 = -2.326
519.8
Therefore, reject H 0 at the 1% level. And conclude that the annual salaries for
females are statistically significantly lower than they are for males.
c. Add an interaction term and test for the significance of the slope coefficient on
the interaction term.
13.78 Two variables are included as predictor variables. What is the effect on the
estimated slope coefficients when these two variables have a correlation equal to
a. .78. A large correlation among the independent variables will lead to a high
variance for the estimated slope coefficients and will tend to have a small
2
students t statistic. Use the rule of thumb r
to determine if the
n
correlation is large.
b. .08. No correlation exists among the independent variables. No effect on the
estimated slope coefficients.
c. .94. A large correlation among the independent variables will lead to a high
variance for the estimated slope coefficients and will tend to have a small
students t statistic.
2
d. .33. Use the rule of thumb r
to determine if the correlation is large.
n
13.79 n = 34 and four independent variables. R = .23. Does this imply that this
independent variable will have a very small students t statistic?
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.
46
13.80 n = 47 with three independent variables. One of the independent variables has a
correlation of .95 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.
13.81 n = 49 with two independent variables. One of the independent variables has a
correlation of .56 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.
13.82 Through 13.84 Reports can be written by following the extended Case Study on
the data file Cotton see Section 13.9
13.85
Regression Analysis: y_deathrate versus x1_totmiles, x2_avgspeed
The regression equation is
y_deathrate = - 2.97 - 0.00447 x1_totmiles + 0.219 x2_avgspeed
Predictor
Coef
SE Coef
T
P
VIF
Constant
-2.969
3.437
-0.86
0.416
x1_totmi
-0.004470
0.001549
-2.89
0.023
11.7
x2_avgsp
0.21879
0.08391
2.61
0.035
11.7
S = 0.1756
R-Sq = 55.1%
R-Sq(adj) = 42.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
0.26507
0.13254
4.30
Residual Error
7
0.21593
0.03085
Total
9
0.48100
P
0.061
55.1% of the variation in death rates can be explained by the variation in total
miles traveled and in average travel speed. The overall model is significant at the
10% but not the 5% level since the p-value of the F-test is .061.
All else equal, the average speed variable has the expected sign since as average
speed increases, the death rate also increases. The total miles traveled variable is
negative which indicates that the more miles traveled, the lower the death rate.
Both of the independent variables are significant at the 5% level (p-values of .023
and .035 respectively). There appears to be some correlation between the
independent variables.
47
SE Coef
1.296
0.002835
0.00000153
T
-5.04
9.45
-9.68
P
0.001
0.000
0.000
VIF
285.5
285.5
S = 0.06499
R-Sq = 93.9%
R-Sq(adj) = 92.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
0.45143
0.22572
53.44
Residual Error
7
0.02957
0.00422
Total
9
0.48100
Source
x1_totmi
x1_totsq
DF
1
1
P
0.000
Seq SS
0.05534
0.39609
13.86
Regression Analysis: y_FemaleLFPR versus x1_income, x2_yrsedu, ...
The regression equation is
y_FemaleLFPR = 0.2 +0.000406 x1_income + 4.84 x2_yrsedu - 1.55
x3_femaleun
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.16
34.91
0.00
0.996
x1_incom
0.0004060
0.0001736
2.34
0.024
1.2
x2_yrsed
4.842
2.813
1.72
0.092
1.5
x3_femal
-1.5543
0.3399
-4.57
0.000
1.3
S = 3.048
R-Sq = 54.3%
Analysis of Variance
Source
DF
Regression
3
Residual Error
46
Total
49
SS
508.35
427.22
935.57
R-Sq(adj) = 51.4%
MS
169.45
9.29
F
18.24
P
0.000
13.87
Regression Analysis: y_money versus x1_pcincome, x2_ir
The regression equation is
y_money = - 1158 + 0.253 x1_pcincome - 19.6 x2_ir
Predictor
Coef
SE Coef
T
P
Constant
-1158.4
587.9
-1.97
0.080
x1_pcinc
0.25273
0.03453
7.32
0.000
x2_ir
-19.56
21.73
-0.90
0.391
S = 84.93
R-Sq = 89.8%
R-Sq(adj) = 87.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
570857
285429
39.57
Residual Error
9
64914
7213
Total
11
635771
Source
x1_pcinc
x2_ir
DF
1
1
Seq SS
565012
5845
VIF
1.3
1.3
P
0.000
48
13.88
Regression Analysis: y_manufgrowt versus x1_aggrowth, x2_exportgro, ...
The regression equation is
y_manufgrowth = 2.15 + 0.493 x1_aggrowth + 0.270 x2_exportgrowth
- 0.117 x3_inflation
Predictor
Coef
SE Coef
T
P
VIF
Constant
2.1505
0.9695
2.22
0.032
x1_aggro
0.4934
0.2020
2.44
0.019
1.0
x2_expor
0.26991
0.06494
4.16
0.000
1.0
x3_infla
-0.11709
0.05204
-2.25
0.030
1.0
S = 3.624
R-Sq = 39.3%
R-Sq(adj) = 35.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
373.98
124.66
9.49
Residual Error
44
577.97
13.14
Total
47
951.95
Source
x1_aggro
x2_expor
x3_infla
DF
1
1
1
P
0.000
Seq SS
80.47
227.02
66.50
13.89
The method of least squares regression yields estimators that are BLUE Best
Linear Unbiased Estimators. This result holds when the assumptions regarding the
behavior of the error term are true. BLUE estimators are the most efficient (best)
estimators out of the class of all unbiased estimators. The advent of computing
power incorporating the method of least squares has dramatically increased its use.
13.90 The analysis of variance table identifies how the total variability of the dependent
variable (SST) is split up between the portion of variability that is explained by the
regression model (SSR) and the part that is unexplained (SSE). The Coefficient of
Determination (R2) is derived as the ratio of SSR to SST. The analysis of variance
table also computes the F statistic for the test of the significance of the overall
regression whether all of the slope coefficients are jointly equal to zero. The
associated p-value is also generally reported in this table.
13.91
a. False. If the regression model does not explain a large enough portion of the
variability of the dependent variable, then the error sum of squares can be larger
than the regression sum of squares
b. False the sum of several simple linear regressions will not equal a multiple
regression since the assumption of all else equal will be violated in the simple
linear regressions. The multiple regression holds all else equal in calculating the
partial effect that a change in one of the independent variables has on the
dependent variable.
c. True
d. False While the regular coefficient of determination (R2) cannot be negative,
the adjusted coefficient of determination R 2 can become negative. If the
independent variables added into a regression equation have very little
explanatory power, the loss of degrees of freedom may more than offset the
added explanatory power.
49
e. True
13.92 If one model contains more explanatory variables, then SST remains the same for
both models but SSR will be higher for the model with more explanatory variables.
Since SST = SSR1 + SSE1 which is equivalent to SSR2 + SSE2 and given that SSR2 >
SSR1, then SSE1 > SSE2. Hence, the coefficient of determination will be higher with
a greater number of explanatory variables and the coefficient of determination must
be interpreted in conjunction with whether or not the regression slope coefficients on
the explanatory variables are significantly different from zero.
13.93
13.94
e (y a b x b x
e (y y b x b x
e ny ny nb x nb x
e 0
1 1i
2 2i
1 1i
2 2i
b1 x1i b2 x2i )
2 2
nb1 x1 nb2 x2
1 1
13.95
28.99
2
k
1 R
2 1 .766
F 7,62,.01 = 2.79, Therefore, reject H 0 at the 1% level
F
13.96
50
23.69
2
k
1 R
2 1 .637
F 2,27,.01 = 5.49, Therefore, reject H 0 at the 1% level
d. t27,.005 = 2.771, 99% CI: -1.8345 2.771(.6349). 3.5938 up to -.0752
e. t = -1.78, t27,.05/.025 = -1.703, -2.052.
F
a. All else equal, a 1% increase in course time spent in group discussion results
in an expected increase of .3817 in the average rating of the course. All else
equal, a dollar increase in money spent on the preparation of subject matter
materials results in an expected increase of .5172 in the average rating by
participants of the course. All else equal, a unit increase in expenditure on noncourse related materials results in an expected increase of .0753 in the average
rating of the course.
b. 57.9% of the variation in the average rating can be explained by the linear
relationship with % of class time spent on discussion, money spent on the
preparation of subject matter materials and money spent on non-class related
materials.
c. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
n k 1 R2
21 .579
9.627
2
k
1 R
3 1 .579
F 2,21,.05 = 3.47
Therefore, reject H 0 at the 5% level
d. t21,.05 = 1.721, 90% CI: .3817 1.721(.2018) .0344 up to .729
e. t = 2.64, t21,.01/.005 = 2.518, 2.831
F
51
13.98
Regression Analysis: y_rating versus x1_expgrade, x2_Numstudents
The regression equation is
y_rating = - 0.200 + 1.41 x1_expgrade - 0.0158 x2_Numstudents
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.2001
0.6968
-0.29
0.777
x1_expgr
1.4117
0.1780
7.93
0.000
1.5
x2_Numst
-0.015791
0.003783
-4.17
0.001
1.5
S = 0.1866
R-Sq = 91.5%
R-Sq(adj) = 90.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
6.3375
3.1687
90.99
Residual Error
17
0.5920
0.0348
Total
19
6.9295
13.99
5.804 , F
2
k
1 R
4 1 .467
Therefore, reject H 0 at the 1% level
F
13.100
P
0.000
4,55,.01
= 3.68
a. All else equal, each extra point in the students expected score leads to an
expected increase of .469 in the actual score
b. t 103,.025 = 1.98, therefore, the 95% CI: 3.369 1.98(.456) = 2.4661 up to
4.2719
3.054
2.096 , t103,.025 = 1.96
c. H 0 : 3 0, H1 : 3 0 , t
1.457
Therefore, reject H 0 at the 5% level
d. 68.6% of the variation in the exam scores is explained by their linear
dependence on the students expected score, hours per week spent working on
the course and the students grade point average
H
e. 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
n k 1 R2
103 .686
f. R .686 .82825
g. Y 2.178 .469(80) 3.369(8) 3.054(3) 75.812
13.101 a. t 22,.01 = 2.819, therefore, the 99% CI: .0974 2.819(0.0215)
= .0368 up to .1580
.374
1.789 , t22,.05 /.025 = 1.717, 2.074.
b. H 0 : 2 0, H1 : 2 0 , t
.209
Therefore, reject H 0 at the 5% level but not the 2.5% level
22(.91) 2
.9175
c. R 2
24
52
d. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
n k 1 R2
22 .9175
e. R .9175 .9579
13.102 a. t 2669,.05 = 1.645, therefore, the 90% CI: 480.04 1.645(224.9) = 110.0795 up
to 850.0005
b. t 2669,.005 = 2.576, therefore, the 99% CI: 1350.3 2.576(212.3) = 803.4152 up
to 1897.1848
891.67
4.9299
c. H 0 : 8 0, H1 : 8 0 , t
180.87
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
722.95
6.5142
d. H 0 : 9 0, H1 : 9 0 , t
110.98
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
e. 52.39% of the variability in minutes played in the season can be explained by
the variability in all 9 variables.
f. R .5239 .7238
13.103 a. H 0 : 1 0, H1 : 1 0
.052
t
2.737 , t60,.005 = 2.66, therefore, reject H 0 at the 1% level
.019
.005
.119
b. H 0 : 2 0, H1 : 2 0 , t
.042
t60,.10 = 1.296, therefore, do not reject H 0 at the 20% level
c. 17% of the variation in the growth rate in GDP can be explained by the
variations in real income per capita and the average tax rate, as a proportion of
GNP.
d. R .17 .4123
13.104 A report can be written by following the Case Study and testing the significance
of the model. See section 13.9
53
SATverb
SATmath
0.427
0.000
0.353
0.003
HSPct
0.362
0.000
0.201
0.121
SATverb
SATmath
0.497
0.000
DF
1
1
1
HSPct
VIF
1.2
1.5
1.3
P
0.000
Seq SS
3.7516
0.9809
0.2846
The regression model indicates positive coefficients, as expected, for all three
independent variables. The greater the high school rank, and the higher the SAT
verbal and SAT math scores, the larger the Econ GPA. The high school rank
variable has the smallest t-statistic and is removed from the model:
Regression Analysis: EconGPA versus SATverb, SATmath
The regression equation is
EconGPA = 0.755 + 0.0230 SATverb + 0.0174 SATmath
67 cases used 45 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
0.7547
0.4375
1.72
0.089
SATverb
0.022951
0.006832
3.36
0.001
SATmath
0.017387
0.006558
2.65
0.010
S = 0.4196
R-Sq = 30.5%
R-Sq(adj) = 28.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
4.9488
2.4744
14.05
Residual Error
64
11.2693
0.1761
Total
66
16.2181
Source
SATverb
SATmath
DF
1
1
Seq SS
3.7109
1.2379
VIF
1.1
1.1
P
0.000
54
Both SAT variables are now statistically significant at the .05 level and appear to
pick up separate influences on the dependent variable. The simple correlation
coefficient between SAT math and SAT verbal is relatively low at .353. Thus,
multicollinearity will not be dominant in this regression model.
The final regression model, with conditional t-statistics in parentheses under the
coefficients, is:
Y .755 .023( SATverbal ) .0174( SATmath)
(3.36)
(2.65)
S = .4196 R2 = .305 n = 67
b. Start with the correlation matrix:
Correlations: EconGPA, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.387
0.001
Acteng
ACTmath
0.338
0.003
0.368
0.001
ACTss
0.442
0.000
0.448
0.000
0.439
0.000
ACTcomp
0.474
0.000
0.650
0.000
0.765
0.000
0.812
0.000
HSPct
0.362
0.000
0.173
0.150
0.290
0.014
0.224
0.060
Acteng
ACTmath
ACTss
ACTcomp
0.230
0.053
DF
1
1
1
1
1
Seq SS
3.5362
1.0529
1.4379
0.0001
1.4983
P
0.000
55
The regression shows that only high school rank is significant at the .05 level. We
may suspect multicollinearity between the variables, particularly since there is a
total ACT score (ACT composite) as well as the components that make up the
ACT composite. Since conditional significance is dependent on which other
independent variables are included in the regression equation, drop one variable at
a time. ACTmath has the lowest t-statistic and is removed:
Regression Analysis: EconGPA versus Acteng, ACTss, ACTcomp,
HSPct
The regression equation is
EconGPA = - 0.195 + 0.0276 Acteng + 0.0224 ACTss + 0.0339 ACTcomp
+ 0.0127 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.1946
0.6313
-0.31
0.759
Acteng
0.02756
0.02534
1.09
0.281
1.8
ACTss
0.02242
0.02255
0.99
0.324
3.0
ACTcomp
0.03391
0.04133
0.82
0.415
4.2
HSPct
0.012702
0.005009
2.54
0.014
1.1
S = 0.4996
R-Sq = 31.4%
R-Sq(adj) = 27.2%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
7.5239
1.8810
7.54
Residual Error
66
16.4706
0.2496
Total
70
23.9945
Source
Acteng
ACTss
ACTcomp
HSPct
DF
1
1
1
1
P
0.000
Seq SS
3.5362
2.1618
0.2211
1.6048
Again, high school rank is the only conditionally significant variable. ACTcomp
has the lowest t-statistic and is removed:
Regression Analysis: EconGPA versus Acteng, ACTss, HSPct
The regression equation is
EconGPA = 0.049 + 0.0390 Acteng + 0.0364 ACTss + 0.0129 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.0487
0.5560
0.09
0.930
Acteng
0.03897
0.02114
1.84
0.070
1.3
ACTss
0.03643
0.01470
2.48
0.016
1.3
HSPct
0.012896
0.004991
2.58
0.012
1.1
S = 0.4983
R-Sq = 30.7%
R-Sq(adj) = 27.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
7.3558
2.4519
9.87
Residual Error
67
16.6386
0.2483
Total
70
23.9945
Source
Acteng
ACTss
HSPct
DF
1
1
1
Seq SS
3.5362
2.1618
1.6579
P
0.000
56
Now ACTss and high school rank are conditionally significant. ACTenglish has a tstatistic less than 2 and is removed:
57
P
0.250
0.001
0.009
S = 0.5070
R-Sq = 27.1%
R-Sq(adj) = 25.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
6.5123
3.2562
12.67
Residual Error
68
17.4821
0.2571
Total
70
23.9945
Source
ACTss
HSPct
DF
1
1
VIF
1.1
1.1
P
0.000
Seq SS
4.6377
1.8746
Both of the independent variables are statistically significant at the .05 level and
hence, the final regression model, with conditional t-statistics in parentheses under
the coefficients, is:
Y .567 .0479( ACTss) .0137( HSPct )
(3.53)
(2.70)
2
S = .5070 R = .271 n = 71
c. The regression model with the SAT variables is the better predictor
because the standard error of the estimate is smaller than for the ACT model
(.4196 vs. .5070). The R2 measure cannot be directly compared due to the sample
size differences.
13.106
Correlations: Salary, age, Experien, yrs_asoc, yrs_full, Sex_1Fem, Market,
C8
age
Salary
0.749
0.000
Experien
0.883
0.000
0.877
0.000
yrs_asoc
0.698
0.000
0.712
0.000
0.803
0.000
yrs_full
0.777
0.000
0.583
0.000
0.674
0.000
0.312
0.000
Sex_1Fem -0.429
0.000
-0.234
0.004
-0.378
0.000
-0.367
0.000
-0.292
0.000
Market
0.026
0.750
-0.134
0.103
-0.150
0.067
-0.113
0.169
-0.017
0.833
0.062
0.453
-0.029
0.721
-0.189
0.020
-0.117
0.155
-0.073
0.373
-0.043
0.598
-0.094
0.254
C8
Market
-0.107
0.192
58
The correlation matrix indicates that several of the independent variables are likely
to be significant, however, multicollinearity is also a likely result. The regression
model with all independent variables is:
Regression Analysis: Salary versus age, Experien, ...
The regression equation is
Salary = 23725 - 40.3 age + 357 Experien + 263 yrs_asoc + 493
yrs_full
- 954 Sex_1Fem + 3427 Market + 1188 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
23725
1524
15.57
0.000
age
-40.29
44.98
-0.90
0.372
4.7
Experien
356.83
63.48
5.62
0.000
10.0
yrs_asoc
262.50
75.11
3.49
0.001
4.0
yrs_full
492.91
59.27
8.32
0.000
2.6
Sex_1Fem
-954.1
487.3
-1.96
0.052
1.3
Market
3427.2
754.1
4.54
0.000
1.1
C8
1188.4
597.5
1.99
0.049
1.1
S = 2332
R-Sq = 88.2%
R-Sq(adj) = 87.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 5776063882
825151983
151.74
Residual Error
142
772162801
5437766
Total
149 6548226683
Source
age
Experien
yrs_asoc
yrs_full
Sex_1Fem
Market
DF
1
1
1
1
1
1
P
0.000
Seq SS
3669210599
1459475287
1979334
500316356
22707368
100860164
Since age is insignificant and has the smallest t-statistics, it is removed from the
model:
The conditional F test for age is:
SSRF SSRR 5, 766, 064, 000 5, 771, 700, 736
FX 2
.80
s 2Y | X
(2332) 2
Which is well below any common critical value of F. Thus, age is removed from
the model. The remaining independent variables are all significant at the .05 level
of significance and hence, become the final regression model. Residual analysis to
determine if the assumption of linearity holds true follows:
10000
5000
RESI1
59
-5000
10
20
Experien
30
40
- 1043
VIF
6.7
4.0
2.6
1.2
1.1
1.1
P
0.000
10000
RESI1
5000
-5000
10
20
yrs_asoc
10000
RESI1
5000
-5000
10
yrs_full
20
60
10000
RESI1
5000
-5000
0.0
0.5
1.0
Sex_1Fem
10000
5000
RESI1
61
-5000
0.0
0.5
Market
1.0
62
10000
RESI1
5000
-5000
0.0
0.5
1.0
C8
The residual plot for Experience shows a relatively strong quadratic relationship
between Experience and Salary. Therefore, a new variable, taking into account the
quadratic relationship is generated and added to the model. None of the other
residual plots shows strong evidence of non-linearity.
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18915 + 875 Experien - 15.9 ExperSquared + 222 yrs_asoc + 612
yrs_full
- 650 Sex_1Fem + 3978 Market + 1042 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18915.2
583.2
32.43
0.000
Experien
875.35
72.20
12.12
0.000
20.6
ExperSqu
-15.947
1.717
-9.29
0.000
16.2
yrs_asoc
221.58
59.40
3.73
0.000
4.0
yrs_full
612.10
48.63
12.59
0.000
2.8
Sex_1Fem
-650.1
379.6
-1.71
0.089
1.2
Market
3978.3
598.8
6.64
0.000
1.1
C8
1042.3
467.1
2.23
0.027
1.1
S = 1844
R-Sq = 92.6%
R-Sq(adj) = 92.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 6065189270
866455610
254.71
Residual Error
142
483037413
3401672
Total
149 6548226683
Source
Experien
ExperSqu
yrs_asoc
yrs_full
Sex_1Fem
Market
C8
DF
1
1
1
1
1
1
1
Seq SS
5109486518
91663414
15948822
678958872
12652358
139540652
16938635
P
0.000
63
The squared term for experience is statistically significant; however, the Sex_1Fem
is no longer significant at the .05 level and hence is removed from the model:
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18538 + 888 Experien - 16.3 ExperSquared + 237 yrs_asoc
+ 624 yrs_full
+ 3982 Market + 1145 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18537.8
543.6
34.10
0.000
Experien
887.85
72.32
12.28
0.000
20.4
ExperSqu
-16.275
1.718
-9.48
0.000
16.0
yrs_asoc
236.89
59.11
4.01
0.000
3.9
yrs_full
624.49
48.41
12.90
0.000
2.8
Market
3981.8
602.9
6.60
0.000
1.1
C8
1145.4
466.3
2.46
0.015
1.0
S = 1857
R-Sq = 92.5%
R-Sq(adj) = 92.2%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 6055213011 1009202168
292.72
Residual Error
143
493013673
3447648
Total
149 6548226683
P
0.000
This is the final model with all of the independent variables being conditionally
significant, including the quadratic transformation of Experience. This would
indicate that a non-linear relationship exists between experience and salary.
13.107
Correlations: hseval, Comper, Homper, Indper, sizehse, incom72
hseval
-0.335
0.001
Comper
Homper
0.145
0.171
-0.499
0.000
Indper
-0.086
0.419
-0.140
0.188
-0.564
0.000
sizehse
0.542
0.000
-0.278
0.008
0.274
0.009
-0.245
0.020
incom72
0.426
0.000
-0.198
0.062
-0.083
0.438
0.244
0.020
Comper
Homper
Indper
sizehse
0.393
0.000
The correlation matrix indicates that the size of the house, income and percent
homeowners have a positive relationship with house value. There is a negative
relationship between the percent industrial and percent commercial and house
value.
64
P
0.000
All variables are conditionally significant with the exception of Indper and Homper.
Since Homper has the smaller t-statistic, it is removed:
Regression Analysis: hseval versus Comper, Indper, sizehse,
incom72
The regression equation is
hseval = - 30.9 - 15.2 Comper - 5.73 Indper + 7.44 sizehse +
0.00418 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-30.88
11.07
-2.79
0.007
Comper
-15.211
7.126
-2.13
0.036
1.1
Indper
-5.735
6.194
-0.93
0.357
1.3
sizehse
7.439
2.154
3.45
0.001
1.5
incom72
0.004175
0.001569
2.66
0.009
1.4
S = 3.986
R-Sq = 38.2%
R-Sq(adj) = 35.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
836.15
209.04
13.16
Residual Error
85
1350.48
15.89
Total
89
2186.63
P
0.000
R-Sq = 37.6%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
SS
822.53
1364.10
2186.63
R-Sq(adj) = 35.4%
MS
274.18
15.86
F
17.29
P
0.000
65
This becomes the final regression model. The selection of a community with the
objective of having larger house values would include communities where the
percent of commercial property is low, the median rooms per residence is high and
the per capita income is high.
13.108
a. Correlation matrix:
Correlations: deaths, vehwt, impcars, lghttrks, carage
vehwt
impcars
lghttrks
carage
deaths
0.244
0.091
vehwt
impcars lghttrks
-0.284
0.048
-0.943
0.000
0.726
0.000
0.157
0.282
-0.175
0.228
-0.422
0.003
0.123
0.400
0.011
0.943
-0.329
0.021
Crash deaths are positively related to vehicle weight and percentage of light trucks
and negatively related to percent imported cars and car age. Light trucks will have
the strongest linear association of any independent variable followed by car age.
Multicollinearity is likely to exist due to the strong correlation between impcars
and vehicle weight.
b.
Regression Analysis: deaths versus vehwt, impcars, lghttrks, carage
The regression equation is
deaths = 2.60 +0.000064 vehwt - 0.00121 impcars
lghttrks
- 0.0395 carage
Predictor
Coef
SE Coef
T
Constant
2.597
1.247
2.08
vehwt
0.0000643
0.0001908
0.34
impcars
-0.001213
0.005249
-0.23
lghttrks
0.008332
0.001397
5.96
carage
-0.03946
0.01916
-2.06
+ 0.00833
P
0.043
0.738
0.818
0.000
0.045
S = 0.05334
R-Sq = 59.5%
R-Sq(adj) = 55.8%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
0.183634
0.045909
16.14
Residual Error
44
0.125174
0.002845
Total
48
0.308809
VIF
10.9
10.6
1.2
1.4
P
0.000
Light trucks is a significant positive variable. Since impcars has the smallest tstatistic, it is removed from the model:
66
R-Sq = 59.4%
Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48
SS
0.183482
0.125326
0.308809
R-Sq(adj) = 56.7%
MS
0.061161
0.002785
F
21.96
P
0.000
R-Sq = 56.5%
Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48
SS
0.174458
0.134351
0.308809
VIF
1.1
1.1
R-Sq(adj) = 54.6%
MS
0.087229
0.002921
F
29.87
P
0.000
The model has light trucks and car age as the significant variables. Note that car
age is marginally significant (p-value of .052) and hence could also be dropped
from the model.
c. The regression modeling indicates that the percentage of light trucks is
conditionally significant in all of the models and hence is an important predictor
in the model. Car age and imported cars are marginally significant predictors
when only light trucks is included in the model.
13.109
a. Correlation matrix:
Correlations: deaths, Purbanpop, Ruspeed, Prsurf
deaths Purbanpo
Purbanpo -0.594
0.000
Ruspeed
Prsurf
0.305
0.033
-0.224
0.121
-0.556
0.000
0.207
0.153
Ruspeed
-0.232
0.109
67
N
49
49
49
49
Mean
0.1746
0.5890
0.7980
58.186
Median
0.1780
0.6311
0.8630
58.400
TrMean
0.1675
0.5992
0.8117
58.222
Variable
deaths
Purbanpo
Prsurf
Ruspeed
Minimum
0.0569
0.0000
0.2721
53.500
Maximum
0.5505
0.9689
1.0000
62.200
Q1
0.1240
0.4085
0.6563
57.050
Q3
0.2050
0.8113
0.9485
59.150
StDev
0.0802
0.2591
0.1928
1.683
SE Mean
0.0115
0.0370
0.0275
0.240
The proportion of urban population and rural roads that are surfaced are negatively
related to crash deaths. Average rural speed is positively related, but the
relationship is not as strong as the proportion of urban population and surfaced
roads. The simple correlation coefficients among the independent variables are
relatively low and hence multicollinearity should not be dominant in this model.
Note the relatively narrow range for average rural speed. This would indicate that
there is not much variability in this independent variable.
b. Multiple regression
Regression Analysis: deaths versus Purbanpop, Prsurf, Ruspeed
The regression equation is
deaths = 0.141 - 0.149 Purbanpop - 0.181 Prsurf + 0.00457 Ruspeed
Predictor
Constant
Purbanpo
Prsurf
Ruspeed
S = 0.05510
Coef
0.1408
-0.14946
-0.18058
0.004569
SE Coef
0.2998
0.03192
0.04299
0.004942
R-Sq = 55.8%
Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48
SS
0.172207
0.136602
0.308809
T
0.47
-4.68
-4.20
0.92
P
0.641
0.000
0.000
0.360
VIF
1.1
1.1
1.1
R-Sq(adj) = 52.8%
MS
0.057402
0.003036
F
18.91
P
0.000
The model has conditionally significant variables for percent urban population and
percent surfaced roads. Since average rural speed is not conditionally significant,
it is dropped from the model:
68
R-Sq = 54.9%
Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48
VIF
1.0
1.0
R-Sq(adj) = 53.0%
SS
0.169612
0.139197
0.308809
MS
0.084806
0.003026
F
28.03
P
0.000
This becomes the final model since both variables are conditionally significant.
c. Conclude that the proportions of urban populations and the percent of rural
roads that are surfaced are important independent variables in explaining crash
deaths. All else equal, increases in the proportion of urban population, the
lower the crash deaths. All else equal, increases in the proportion of rural
roads that are surfaced will result in lower crash deaths. The average rural
speed is not conditionally significant.
13.110 a. Correlation matrix and descriptive statistics
Correlations: hseval, sizehse, Taxhse, Comper, incom72, totexp
sizehse
Taxhse
Comper
incom72
totexp
hseval
0.542
0.000
0.248
0.019
-0.335
0.001
0.426
0.000
0.261
0.013
sizehse
Taxhse
Comper
incom72
0.289
0.006
-0.278
0.008
0.393
0.000
-0.022
0.834
-0.114
0.285
0.261
0.013
0.228
0.030
-0.198
0.062
0.269
0.010
0.376
0.000
69
N
90
90
90
90
90
90
Mean
21.031
5.4778
130.13
0.16211
3360.9
1488848
Median
20.301
5.4000
131.67
0.15930
3283.0
1089110
TrMean
20.687
5.4638
128.31
0.16206
3353.2
1295444
Variable
hseval
sizehse
Taxhse
Comper
incom72
totexp
Minimum
13.300
5.0000
35.04
0.02805
2739.0
361290
Maximum
35.976
6.2000
399.60
0.28427
4193.0
7062330
Q1
17.665
5.3000
98.85
0.11388
3114.3
808771
Q3
24.046
5.6000
155.19
0.20826
3585.3
1570275
StDev
4.957
0.2407
48.89
0.06333
317.0
1265564
SE Mean
0.522
0.0254
5.15
0.00668
33.4
133402
The range for applying the regression model (variable means + / - 2 standard errors):
Hseval 21.03 +/- 2(4.957) = 11.11 to 30.94
Sizehse5.48 +/- 2(.24) = 5.0 to 5.96
Taxhse
130.13 +/- 2(48.89) = 32.35 to 227.91
Comper
.16 +/- 2(.063) = .034 to .286
Incom72
3361 +/- 2(317) = 2727 to 3995
Totexp
1488848 +/- 2(1265564) = not a good approximation
b. Regression models:
Regression Analysis: hseval versus sizehse, Taxhse, ...
The regression equation is
hseval = - 31.1 + 9.10 sizehse - 0.00058 Taxhse
incom72
+0.000001 totexp
Predictor
Coef
SE Coef
T
Constant
-31.07
10.09
-3.08
sizehse
9.105
1.927
4.72
Taxhse
-0.000584
0.008910
-0.07
Comper
-22.197
7.108
-3.12
incom72
0.001200
0.001566
0.77
totexp
0.00000125 0.00000038
3.28
S = 3.785
R-Sq = 45.0%
Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89
SS
982.98
1203.65
2186.63
VIF
1.3
1.2
1.3
1.5
1.5
R-Sq(adj) = 41.7%
MS
196.60
14.33
F
13.72
P
0.000
Taxhse is not conditionally significant, nor is income; however, dropping one variable at a
time, eliminate Taxhse first, then eliminate income:
70
R-Sq = 44.6%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
R-Sq(adj) = 42.6%
SS
974.55
1212.08
2186.63
MS
324.85
14.09
F
23.05
P
0.000
This is the final regression model. All of the independent variables are
conditionally significant.
Both the size of house and total government expenditures enhances market value
of homes while the percent of commercial property tends to reduce market values
of homes.
c. In the final regression model, the tax variable was not found to be conditionally
significant and hence it is difficult to support the developers claim.
13.111
a. Correlation matrix
Correlations: retsal84, Unemp84, perinc84
retsal84
-0.370
0.008
perinc84 0.633
0.000
Unemp84
Unemp84
-0.232
0.101
There is a positive association between per capita income and retail sales. There is
a negative association between unemployment and retail sales. High correlation
among the independent variables does not appear to be a problem since the
correlation between the independent variables is relatively low.
Descriptive Statistics: retsal84, perinc84, Unemp84
Variable
retsal84
perinc84
Unemp84
N
51
51
51
Mean
5536
12277
7.335
Median
5336
12314
7.000
TrMean
5483
12166
7.196
Variable
retsal84
perinc84
Minimum
4250
8857
Maximum
8348
17148
Q1
5059
10689
Q3
6037
13218
StDev
812
1851
2.216
SE Mean
114
259
0.310
71
R-Sq = 45.3%
Analysis of Variance
Source
DF
Regression
2
Residual Error
48
Total
50
VIF
1.1
1.1
R-Sq(adj) = 43.0%
SS
14931938
18029333
32961271
MS
7465969
375611
F
19.88
P
0.000
This is the final model since all of the independent variables are conditionally significant at
the .05 level. The 95% confidence intervals for the regression slope coefficients:
1 t ( S ) : -86.25 +/- 2.011(40.2) = -86.25 +/- 80.84
1
2 t ( S ) :
2
b.
All things equal, the condition effect of a $1,000 decrease in per capita income
on retail sales would be to reduce retail sales by $254.
c.
P
0.000
The population variable is not conditionally significant and adds little explanatory power,
therefore, it will not improve the multiple regression model.
13.112 a.
Correlations: FRH, FBPR, FFED, FM2, GDPH, GH
FBPR
FFED
FM2
GDPH
GH
FRH
0.510
0.000
0.244
0.001
0.854
0.000
0.934
0.000
0.907
FBPR
FFED
FM2
GDPH
0.957
0.000
0.291
0.000
0.580
0.000
0.592
0.077
0.326
0.287
0.000
0.285
0.987
0.000
0.977
0.973
0.000
0.000
0.000
0.000
72
0.000
The correlation matrix shows that both interest rates have a significant positive
impact on residential investment. The money supply, GDP and government
expenditures also have a significant linear association with residential investment.
Note the high correlation between the two interest rate variables, which, as
expected, would create significant problems if both variables are included in the
regression model. Hence, the interest rates will be developed in two separate
models.
Regression Analysis: FRH versus FBPR, FM2, GDPH, GH
The regression equation is
FRH = 70.0 - 3.79 FBPR - 0.0542 FM2 + 0.0932 GDPH - 0.165 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
70.00
24.87
2.82
0.005
FBPR
-3.7871
0.6276
-6.03
0.000
1.2
FM2
-0.054210
0.009210
-5.89
0.000
46.8
GDPH
0.093223
0.007389
12.62
0.000
58.1
GH
-0.16514
0.03747
-4.41
0.000
28.7
S = 23.42
R-Sq = 86.7%
R-Sq(adj) = 86.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
573700
143425
261.42
Residual Error
161
88331
549
Total
165
662030
P
0.000
This will be the final model with prime rate as the interest rate variable since all of
the independent variables are conditionally significant. Note the significant
multicollinearity that exists between the independent variables.
Regression Analysis: FRH versus FFED, FM2, GDPH, GH
The regression equation is
FRH = 55.0 - 2.76 FFED - 0.0558 FM2 + 0.0904 GDPH - 0.148 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
55.00
26.26
2.09
0.038
FFED
-2.7640
0.6548
-4.22
0.000
1.2
FM2
-0.05578
0.01007
-5.54
0.000
50.7
GDPH
0.090402
0.007862
11.50
0.000
59.6
GH
-0.14752
0.03922
-3.76
0.000
28.5
S = 24.61
R-Sq = 85.3%
R-Sq(adj) = 84.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
564511
141128
233.00
Residual Error
161
97519
606
Total
165
662030
P
0.000
The model with the federal funds rate as the interest rate variable is also the final
model with all of the independent variables conditionally significant. Again, high
correlation among the independent variables will be a problem with this regression
model.
73
b. 95% confidence intervals for the slope coefficients on the interest rate term:
Bank prime rate as the interest rate variable:
1 t ( S ) : -3.7871 +/- 1.96(.6276) = -3.7871 +/- 1.23
1
13.112 a.
Correlations: Infmrt82, Phys82, Perinc84, Perhosp
Infmrt82
0.434
0.001
Perinc84 0.094
0.511
Perhosp
0.411
0.003
Phys82 Perinc84
Phys82
0.614
0.000
0.285
0.042
0.267
0.058
The correlation matrix shows a positive association with Phys82 and Perhosp.
These variables are the number of physicians per 100,000 population and the total
per capita expenditures for hospitals. One would expect a negative association,
therefore, examine the scatterdiagram of infant mortality vs. phys82:
Infmrt82
20
15
10
100
150
200
250
300
350
400
450
500
550
Phys82
The graph shows an obvious outlier which, upon further investigation, is the
District of Columbia. Due to the outlier status, this row is dropped from the
analysis and the correlation matrix is recalculated:
74
Phys82 Perinc84
Phys82
0.574
0.000
-0.065
0.654
0.140
0.331
The physicians per 100,000 population now has the correct sign, however, none of
the independent variables has a statistically significant linear association with the
dependent variable. Per capita expenditures for hospitals is an unexpected positive
sign; however, it is not conditionally significant. The multiple regression results
are likely to yield low explanatory power with insignificant independent variables:
Regression Analysis: Infmrt82 versus Phys82, Perinc84, Perhosp
The regression equation is
Infmrt82 = 12.7 - 0.00017 Phys82 -0.000206 Perinc84 + 6.30 Perhosp
Predictor
Coef
SE Coef
T
P
VIF
Constant
12.701
1.676
7.58
0.000
Phys82
-0.000167
0.006647
-0.03
0.980
1.5
Perinc84
-0.0002064
0.0001637
-1.26
0.214
1.6
Perhosp
6.297
3.958
1.59
0.118
1.1
S = 1.602
R-Sq = 8.9%
R-Sq(adj) = 3.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
11.546
3.849
1.50
Residual Error
46
118.029
2.566
Total
49
129.575
P
0.227
As expected, the model explains less than 9% of the variability in infant mortality.
None of the independent variables are conditionally significant and high correlation
among the independent variables does not appear to be a significant problem. The
standard error of the estimate is very large (1.602) relative to the size of the infant
mortality rates and hence the model would not be a good predictor. Sequentially
dropping the independent variable with the lowest t-statistic confirms the
conclusion that none of the independnet variables is conditionally significant. The
search is on for better independent variables.
b. The two variables to include are per capita spending on education (PerEduc)
and per capita spending on public welfare (PerPbwel). Since the conditional
significance of the independent variables is a function of other independent
variables in the model, we will include the original set of variables:
75
P
0.100
The model shows low explanatory power and only one independent variable that is
conditionally significant (Perhosp). Dropping sequentially the independent variable
with the lowest t-statistic yields a model with no conditionally significant
independent variables. This problem illustrates that in some applications, the
variables that have been identified as theoretically important predictors do not
meet the statistical test.
13.114 a.
Correlations: Salary, age, yrs_asoc, yrs_full, Sex_1Fem, Market, C8
Salary
0.749
0.000
yrs_asoc 0.698
0.000
yrs_full 0.777
0.000
Sex_1Fem -0.429
0.000
Market
0.026
0.750
C8
-0.029
0.721
Market
age
0.712
0.000
0.583
0.000
-0.234
0.004
-0.134
0.103
-0.189
0.020
0.312
0.000
-0.367
0.000
-0.113
0.169
-0.073
0.373
-0.292
0.000
-0.017
0.833
-0.043
0.598
0.062
0.453
-0.094
0.254
-0.107
0.192
The correlation matrix indicates several independent variables that should provide
good explanatory power in the regression model. We would expect that age, years
at Associate professor and years at full professor are likely to be conditionally
significant:
76
R-Sq = 85.6%
Analysis of Variance
Source
DF
Regression
6
Residual Error
143
Total
149
SS
5604244075
943982608
6548226683
R-Sq(adj) = 85.0%
MS
934040679
6601277
F
141.49
P
0.000
R-Sq = 85.3%
Analysis of Variance
Source
DF
Regression
5
Residual Error
144
Total
149
SS
5585766862
962459821
6548226683
R-Sq(adj) = 84.8%
MS
1117153372
6683749
F
167.14
P
0.000
This is the final model. All of the independent variables are conditionally
significant and the model explains a sizeable portion of the variability in salary.
b. To test the hypothesis that the rate of change in female salaries as a function of
age is less than the rate of change in male salaries as a function of age, the
dummy variable Sex_1Fem is used to see if the slope coefficient for age (X1) is
different for males and females. The following model is used:
Y 0 ( 1 6 X 4 ) X 1 2 X 2 3 X 3 4 X 4 5 X 5
0 1 X 1 6 X 4 X 1 2 X 2 3 X 3 4 X 4 5 X 5
77
Create the variable X4X1 and then test for conditional significance in the regression model.
If it proves to be a significant predictor of salaries then there is strong evidence to
conclude that the rate of change in female salaries as a function of age is different than for
males:
Regression Analysis: Salary versus age, femage, ...
The regression equation is
Salary = 22082 + 85.1 age + 11.7 femage + 543 yrs_asoc + 701 yrs_full
- 1878 Sex_1Fem + 2673 Market
Predictor
Coef
SE Coef
T
P
VIF
Constant
22082
1877
11.77
0.000
age
85.07
48.36
1.76
0.081
4.4
femage
11.66
63.89
0.18
0.855
32.2
yrs_asoc
542.85
66.73
8.13
0.000
2.6
yrs_full
701.35
57.35
12.23
0.000
2.0
Sex_1Fem
-1878
2687
-0.70
0.486
31.5
Market
2672.8
825.1
3.24
0.001
1.0
S = 2594
R-Sq = 85.3%
R-Sq(adj) = 84.7%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 5585990999
930998500
138.36
Residual Error
143
962235684
6728921
Total
149 6548226683
P
0.000
The regression shows that the newly created variable of femage is not conditionally
significant. Thus, we cannot conclude that the rate of change in female salaries as a
function of age differs from that of male salaries.
13.115
Regression Analysis: hseval versus sizehse, taxrate, incom72, Homper
The regression equation is
hseval = - 32.7 + 6.74 sizehse - 223 taxrate + 0.00464 incom72 + 11.2
Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-32.694
8.972
-3.64
0.000
sizehse
6.740
1.880
3.58
0.001
1.4
taxrate
-222.96
45.39
-4.91
0.000
1.2
incom72
0.004642
0.001349
3.44
0.001
1.2
Homper
11.215
4.592
2.44
0.017
1.3
S = 3.610
R-Sq = 49.3%
R-Sq(adj) = 47.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
1079.08
269.77
20.70
Residual Error
85
1107.55
13.03
Total
89
2186.63
P
0.000
All of the independent variables are conditionally significant. Now add the percent of
commercial property to the model to see if it is significant:
78
P
0.000
With a t-statistic of -.27 we have not found strong enough evidence to reject H 0 that the
slope coefficient on percent commercial property is significantly different from zero. The
conditional F test:
SSRF SSRR 1080.07 1079.08
FComper
R-Sq = 50.2%
Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89
SS
1096.77
1089.86
2186.63
R-Sq(adj) = 47.2%
MS
219.35
12.97
F
16.91
P
0.000
Likewise, the percent industrial property is not significantly different from zero. The
RSS5 RSS 4 1096.77 1079.08
FIndper
79
lower than the critical value of F based on common levels of alpha, therefore, do not reject
H 0 that the percent industrial property has no effect on house values.
Tax rate models:
Regression Analysis: taxrate versus taxbase, expercap, Homper
The regression equation is
taxrate = - 0.0174 -0.000000 taxbase +0.000162 expercap + 0.0424 Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.017399
0.007852
-2.22
0.029
taxbase
-0.00000000 0.00000000
-0.80
0.426
1.2
expercap
0.00016204 0.00003160
5.13
0.000
1.1
Homper
0.042361
0.009378
4.52
0.000
1.2
S = 0.007692
R-Sq = 31.9%
R-Sq(adj) = 29.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3 0.00237926 0.00079309
13.41
Residual Error
86 0.00508785 0.00005916
Total
89 0.00746711
P
0.000
VIF
1.1
1.1
P
0.000
Both of the independent variables are significant. This becomes the base model that we
now add percent commercial property and percent industrial property sequentially:
Regression Analysis: taxrate versus expercap, Homper, Comper
The regression equation is
taxrate = - 0.0413 +0.000157 expercap + 0.0643 Homper + 0.0596 Comper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.041343
0.008455
-4.89
0.000
expercap
0.00015660 0.00002819
5.55
0.000
1.1
Homper
0.064320
0.009172
7.01
0.000
1.4
Comper
0.05960
0.01346
4.43
0.000
1.3
S = 0.006966
R-Sq = 44.1%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
SS
0.0032936
0.0041735
0.0074671
R-Sq(adj) = 42.2%
MS
0.0010979
0.0000485
F
22.62
P
0.000
80
19.62
S 2Y | X
(.006966) 2
With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom
in the denominator, the critical value of F at the .05 level is 3.95. Hence we would
conclude that the percentage of commercial property has a statistically significant positive
impact on tax rate.
We now add industrial property to test the effect on tax rate:
Regression Analysis: taxrate versus expercap, Homper, Indper
The regression equation is
taxrate = - 0.0150 +0.000156 expercap + 0.0398 Homper - 0.0105 Indper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.015038
0.009047
-1.66
0.100
expercap
0.00015586 0.00003120
5.00
0.000
1.1
Homper
0.03982
0.01071
3.72
0.000
1.6
Indper
-0.01052
0.01273
-0.83
0.411
1.5
S = 0.007690
R-Sq = 31.9%
R-Sq(adj) = 29.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3 0.00238178 0.00079393
13.43
Residual Error
86 0.00508533 0.00005913
Total
89 0.00746711
P
0.000
The percent industrial property is insignificant with a t-statistic of only -.83. The F-test
confirms that the variable does not have a significant impact on tax rate:
RSS3 RSS 2 .002382 .00234
FIndper
81
13.116
a. Correlation matrix:
Correlations: EconGPA, sex, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.187
0.049
Acteng
0.387
0.001
ACTmath
0.338
0.003
ACTss
0.442
0.000
ACTcomp
0.474
0.000
HSPct
0.362
0.000
sex
Acteng
ACTmath
ACTss
ACTcomp
0.270
0.021
-0.170
0.151
-0.105
0.375
-0.084
0.478
0.216
0.026
0.368
0.001
0.448
0.000
0.650
0.000
0.173
0.150
0.439
0.000
0.765
0.000
0.290
0.014
0.812
0.000
0.224
0.060
0.230
0.053
sex
There exists a positive relationship between EconGPA and all of the independent variables,
which is expected. Note that there is a high correlation between the composite ACT score
and the individual components, which is again, as expected. Thus, high correlation among
the independent variables is likely to be a serious concern in this regression model.
Regression Analysis: EconGPA versus sex, Acteng, ...
The regression equation is
EconGPA = - 0.050 + 0.261 sex + 0.0099 Acteng + 0.0064 ACTmath + 0.0270
ACTss
+ 0.0419 ACTcomp + 0.00898 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.0504
0.6554
-0.08
0.939
sex
0.2611
0.1607
1.62
0.109
1.5
Acteng
0.00991
0.02986
0.33
0.741
2.5
ACTmath
0.00643
0.03041
0.21
0.833
4.3
ACTss
0.02696
0.02794
0.96
0.338
4.7
ACTcomp
0.04188
0.07200
0.58
0.563
12.8
HSPct
0.008978
0.005716
1.57
0.121
1.4
S = 0.4971
R-Sq = 34.1%
R-Sq(adj) = 27.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6
8.1778
1.3630
5.52
Residual Error
64
15.8166
0.2471
Total
70
23.9945
P
0.000
As expected, high correlation among the independent variables is affecting the results. A
strategy of dropping the variable with the lowest t-statistic with each successive model
causes the dropping of the following variables (in order): 1) ACTmath, 2) ACTeng, 3)
ACTss, 4) HSPct. The two variables that remain are the final model of gender and
ACTcomp:
82
P
0.538
0.011
0.000
S = 0.4931
R-Sq = 29.4%
R-Sq(adj) = 27.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
7.0705
3.5352
14.54
Residual Error
70
17.0192
0.2431
Total
72
24.0897
VIF
1.0
1.0
P
0.000