Answer Key: Problem Set 4: Colgpa Hsgpa Act Skipped
Answer Key: Problem Set 4: Colgpa Hsgpa Act Skipped
Answer Key: Problem Set 4: Colgpa Hsgpa Act Skipped
1. Consider the following estimated equation, which can be used to study the effects of
skipping class on college GPA:
n = 1.39 + 0.412 hsGPA + 0.015 ACT − 0.083skipped
colGPA
(0.33) (0.094) (0.011) (0.026)
n = 64 , R 2 = 0.234
i. Using the standard normal approximation, find the 95% significance interval for
β hsGPA .
(Ans)
412 ± 1.96(.094), or about .228 to .596
ii. Can you reject the hypothesis H 0 : β hsGPA = 0.4 against the tow-side alternative at
the 5% level?
(Ans)
No, because the value 0.4 is well inside the 95% CI.
iii. Can you reject the hypothesis H 0 : β hsGPA = 1 against the tow-side alternative at the
5% level?
(Ans)
Yes, because 1 is well outside the 95% CI.
2. Consider the multiple regression model with three independent variables, under the
classical linear model assumptions MLR.1. through MLR.6:
y = β 0 + β1 x1 + β 2 x2 + β3 x3 + u
i. (
Let β̂1 and βˆ2 denote the OLS estimators of β1 and β 2 . Find var βˆ1 − 3βˆ2 ) in
terms of the variance of β̂1 and βˆ2 and the covariance between them. What is the
(Ans)
1
ECON 482 / WH Hong Answer Key
We use Property VAR.3 from Appendix B: Var( β̂1 − 3 β̂ 2 ) = Var ( β̂1 ) + 9 Var
( β̂ 2 ) – 6 Cov ( β̂1 , β̂ 2 ).
(Ans)
t = ( β̂1 − 3 β̂ 2 − 1)/se( β̂1 − 3 β̂ 2 ), so we need the standard error of β̂1 − 3 β̂ 2 .
iii. Define θ1 = β1 − 3β 2 and θˆ1 = βˆ1 − 3βˆ2 . Write a regression equation involving β 0 ,
θ1 , β 2 , and β3 that allows you to directly obtain θˆ1 and its standard error.
(Ans)
Because θ1 = β1 – 3β2, we can write β1 = θ1 + 3β2. Plugging this into the
= β 0 + θ1 x1 + β 2 (3x1 + x2) + β 3 x3 + u.
This last equation is what we would estimate by regressing y on x1, 3x1 + x2, and x3.
The coefficient and standard error on x1 are what we want.
3. Regression analysis can be used to test whether the market efficiently uses information in
valuing stocks. For concreteness, let return be the total return from holding a firm's
stock over the four-year period from the end of 1990 and 1994. The efficient market
hypothesis says that these returns should not be systematically related to information
known in 1990. If firm characteristics known at the beginning of the period help to
predict stock returns, then we could use this information in choosing stocks.
For 1990, let dkr be a firm's debt to capital ratio, let eps denote the earnings per
share, let netinc denote net income, and let salary denote total compensation for the
CEO.
i. Using the data in RETURN .DTA , the following equation was estimated:
n = −14.37 + 0.321 dkr + 0.043 eps − 0.0051 netinc + 0.0035 salary
return
(6.89) (0.201) (0.078) (0.0047) (0.0022)
2
ECON 482 / WH Hong Answer Key
n = 142 , R 2 = 0.0395
Test whether the explanatory variables are jointly significant at the 5% significance
level. Is any explanatory variable individually significant?
(Ans)
We need to compute the F statistic for the overall significance of the regression with
n = 142 and k = 4: F = [.0395/(1 – .0395)](137/4) ≈ 1.41. The 5% critical value
with 4 numerator df and using 120 for the numerator df, is 2.45, which is well above
the value of F. Therefore, we fail to reject H0: β1 = β 2 = β 3 = β 4 = 0 at the
ii. Now, reestimate the model using the log form of netinc and salary :
n = −36.30 + 0.327 dkr + 0.069 eps − 4.74 netinc + 7.24 salary
return
(39.37) (0.203) (0.080) (3.39) (6.31)
n = 142 , R 2 = 0.0330
Do any of your conclusions from part (i) change?
(Ans)
The F statistic (with the same df) is now [.0330/(1 – .0330)](137/4) ≈ 1.17, which is
even lower than in part (i). None of the t statistics is significant at a reasonable
level.
iii. In this sample, some firms have zero debt and others have negative earnings. Should
we try to use log ( dkr ) or log ( eps ) in the model to see if these improve the fit?
Explain
(Ans)
We probably should not use the logs, as the logarithm is not defined for firms that
have zero for dkr or eps. Therefore, we would lose some firms in the regression.
iv. Overall, is the evidence for predictability of stock returns strong or weak?
(Ans)
It seems very weak. There are no significant t statistics at the 5% level (against a
3
ECON 482 / WH Hong Answer Key
two-sided alternative), and the F statistics are insignificant in both cases. Plus, less
than 4% of the variation in return is explained by the independent variables.
Computer Exercises
4. Use the data in MLB1.DTA for this exercise. In class, you have seen the following
estimation results:
m ( salary ) = 11.19 + 0.0689 years + 0.0126 gamesyr
log
(0.29) (0.0121) (0.0026)
+ 0.00098bavg + 0.0144hrunsyr + 0.0108rbisyr
(0.00110) (0.0161) (0.0072)
i. Use the estimated equation above, and drop the variable rbisyr and estimate the
new model. What happens to the statistical significance of hrunsyr ? What about the
size of the coefficient on hrunsyr ?
(Ans)
If we drop rbisyr the estimated equation becomes
n
log( salary) = 11.02 + .0677 years + .0158 gamesyr
(0.27) (.0121) (.0016)
.0014 bavg + .0359 hrunsyr
+
(.0011) (.0072)
2
n = 353, R = .625.
Now hrunsyr is very statistically significant (t statistic ≈ 4.99), and its coefficient
has increased by about two and one-half times.
ii. Add the variables runsyr (run per year), fldperc (fielding percentage), and
sbasesyr (stolen bases per year) to the model from part i. Which of these factors are
individually significant?
(Ans)
The equation with runsyr, fldperc, and sbasesyr added is
4
ECON 482 / WH Hong Answer Key
n
log( salary) = 10.41 + .0700 years + .0079 gamesyr
(2.00) (.0120) (.0027)
+ .00053 bavg + .0232 hrunsyr
(.00110) (.0086)
+ .0174 runsyr + .0010 fldperc – .0064 sbasesyr
(.0051) (.0020) (.0052)
n = 353, R2 = .639.
Of the three additional independent variables, only runsyr is statistically significant (t
statistic = .0174/.0051 ≈ 3.41). The estimate implies that one more run per year,
other factors fixed, increases predicted salary by about 1.74%, a substantial increase.
The stolen bases variable even has the “wrong” sign with a t statistic of about –1.23,
while fldperc has a t statistic of only .5. Most major league baseball players are
pretty good fielders; in fact, the smallest fldperc is 800 (which means .800). With
relatively little variation in fldperc, it is perhaps not surprising that its effect is hard to
estimate.
iii. In the model from part (ii), test the joint significance of bavg , fldperc , and
sbasesyr . (DO NOT use the Stata command test. Follow the steps you learned in
class and use the formula for F-statistic)
(Ans)
From their t statistics, bavg, fldperc, and sbasesyr are individually insignificant.
The F statistic for their joint significance (with 3 and 345 df) is about .69 with p-
value ≈ .56. Therefore, these variables are jointly very insignificant.
by OLS and report the results in the usual form. Test the null hypothesis that educ
is linearly related to abil against the alternative that the relationship is quadratic.
(Ans)
The estimated equation, with standard errors in parentheses below coefficient
estimates, is
5
ECON 482 / WH Hong Answer Key
The null hypothesis of a linear relationship between educ and abil is H 0 : β 4 = 0 and
the alternative is that H 0 does not hold. The t statistic is about .0506 / .0083 ≈ 6.1 ,
which is a very large value for a t statistic. The p-value against the two-sided
alternative is zero to more than four decimal places.
ii. Using the equation in part i, test H 0 : β1 = β 2 against a two-sided alternative. What
is the p-value of the test?
(Ans)
We could rewrite the model by defining, say, θ1 = β1 − β 2 and then substituting in
β1 = θ1 + β 2 , just as we did with the example in Section 4.4. These days, it is easier to
use a special command in statistical softward. The estimated difference in the
coefficients is about .081. I used the lincom command in Stata to get a t statistic of
about 1.94 and an associated two-sided p-value of about .053. So there is some
evidence against the null hypothesis.
iii. Add the two college tuition variables to the regression from part i and determine
whether they are jointly statistically significant?
(Ans)
I used the test command in Stata to test the joint significance of the tuition variables.
With 2 and 1,223 degrees of freedom I get an F statistic of about .84 with association
p-value of about .43. Thus, the tuition variables are jointly insignificant at any
reasonable significance level.
iv. What is the correlation between tuit17 and tuit18 ? Explain why using the average
of the tuition over the two years might be preferred to adding each separately. What
happens when you do use the average?
(Ans)
Not surprising, the correlation between tuit17 and tuit18 is very high, about .981:
there is very little change in tuition over a year that cannot be explained by a
6
ECON 482 / WH Hong Answer Key
common inflation factor. I generated the variable avgtuit = (tuit17 + tuit18)/2, and
then added it to the regression from part (i). The coefficient on avgtuit is about .016
with t = 1.29. This certainly helps with statistical significance but the two-sided p-
value is still only about .20.
v. Do the findings for the average tuition variable in part iv make sense when interpret
causally? What might be going on?
(Ans)
The positive coefficient on avgtuit does not make a lot of sense if we think that, all
other things fixed, higher tuition makes it less likely that people go to college. But
we are only controlling for parents’ levels of education and a measure of ability. It
could be that higher tuition indicates higher quality of the state colleges. Or, it could
be that tuition is higher in states with higher average incomes, and higher family
incomes lead to higher education. In any case, the statistical link is not very strong.