Chicago Booth BUS 41100 Practice Final Exam Fall 2020 Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Chicago Booth BUS 41100

Practice Final Exam


Fall 2020
SOLUTIONS
Instructor: Panagiotis (Panos) Toulis
1 True or False

Circle T if you believe the corresponding statement is true, or circle F otherwise.

Cross-validation is primarily used to estimate the generalization error of a model. T F

Forward stepwise regression finds the model with the smallest AIC. T F

–Stepwise will find the smallest AIC in its search path, not overall.

Large models will usually have large bias and small variance. T F

–They usually have small bias and large variance.

LASSO estimates equal the estimates of OLS plus a penalty. T F

–This holds for the objectives, not the estimates; i.e., LASSO objective is OLS objective plus penalty.

Multicollinearity usually leads to narrower confidence intervals. T F

In logistic regression the regression function is a probability. T F

Logistic regression is the same as classification. T F

In classification the 0.5-threshold rule is optimal. T F

In Poisson regression the link function is the logarithm. T F

The residuals in an autoregressive model of Yt may be correlated with Yt−1 . T F

The log-log model encodes a linear relationship in terms of percentages. T F

Simpson’s paradox occurs when comparing treatments of similar efficacy. T F

–The efficacy of the treatments is irrelevant.

2
2 Short Answer & Multiple Choice

(a) Grading. 3 points for answering (ii). No partial credit given.


Solution. The slope is clearly negative, eliminating (i) and (v). The intercept is certainly
not 12, eliminating (iii). Examining the axis shows the slope is -2.5, giving (ii).
(b) Grading. 3 points for answering (i) and (iv). Partial credit given.
Solution. Choices (i) and (iv) follow directly from the definition of IV. (ii) is wrong because
IV has to be known, otherwise we cannot use it! (iii) is wrong because the expectation/variance
of IV do not affect how it is being used (assuming these moments are bounded).
(c) Grading. 3 points for answering (vi). Partial credit for other answers.
(d) Grading. 3 points for answering (iii). No Partial credit given.
Solution. From the lecture slides: the odds ratio is multiplied by ebj = e2.3 ≈ 10 for a one
unit change in x1 .

3
3 Linear Regression

(a) (i) Grading. 2 points for commenting positive/right skew of the residuals. No Partial
credit given.
(ii) Grading. 3 points for answering that temperature squared should be added. Partial
credit of 1 point for suggesting seasonal dummies, or other well-reasoned answers.
Solution. Plot 3 shows an obvious pattern in the residuals, indicating nonlinearity.
First the estimates are too high, then too low, then too high again, implying a squared
term should be added to the model. Nothing else is as correct as this. Demand for
energy increases both for low and high temperatures. A seasonal categorical variable is
possible, but it would have to be that for “middle” predicted values, the predictions were
systematically too low, somehow.
(b) (i) Grading. 1 point for answering Sunday is lowest, 1 point for answering Tuesday is
the lowest, 1 point for correct numbers. 3 points total available. No other partial
credit.
Solution. The regression on the left has positive coefficients for all the other days, so
Sunday must be lowest. The largest coefficient is for Tuesday, so it is the highest. The
same conclusion follows from the right: Sunday has the most negative intercept, Tuesday
the largest positive one.
(ii) Grading. 1 point for saying that from the F test we learn than at least one day is
statistically significantly different from another. 2 points for saying the test results are
identical because it is the exact same regression, just coded differently. 3 points total
available. No other partial credit. In particular, no points given for saying the F tests
are the same because the R2 values (and the DOF) are the same.
Solution. From the F -test we learn that (statistically) at least one coefficient is different
from zero, and hence at least one day is different from another (the baseline day). The
two regressions are identical up to coding the day variable differently, and hence they
have the same difference or similarity. That is, these coefficients explain a fixed amount
of the variation, no matter how you permute them.
(c) Grading. 1 point for anything close to the right direction (like the right formulas, the right
idea, etc). Beyond that, give 1 point further for the correct fitted value and 1 point for
the correct interval. 3 points total available. No other partial credit. Subtract 1 point if
everything is correct except that they used 52 for temp instead of 20.
Solution. The tricky part here is that temp is measured as degrees above 32, and hence if it
is 52 degrees, our prediction should be based on temp = 20. Using this and the fact that it is
Wednesday, we find that

Ŷ = 2632 − 476 + (19 + 12) × 20 = 2776.

The keyqis to not forget the additional slope due to the interaction term. Recalling that

spred = s2 + s2fit = 4002 + 3002 = 500 (the former from the output, the latter is given) a
95% predictive interval is Ŷ ± 2 × 500.

4
4 Car quality

(a) Grading. -1 point if not correctly specifying any of the baseline levels.
Solution. For maint the baseline is high, for doors it is doors2, for trunk it is big, and
for safety it is high.
(b) Grading. -1 point for each variable where the interpretation did not include the log-odds or
odds ratios.
Solution. For maintlow: the log-odds ratio of buying the car over not buying it increases by
about 0.7 when we move from the high maintenance level to the low maintenance level. The
odds-ratio therefore increases twofold. This makes sense because the dealer should prefer cars
that have low maintenance cost as they will have larger resale value.
(c) Grading. -1 or 2 points for not being close to the answer. Partial credit if answer is in
the right direction.
Solution. Large standard errors usually mean that there is not enough variation to estimate
the coefficient. Here most likely there is no variation in the response when safety is low. A
peek at the data reveals that when safety of the car is low the decision is always not to buy.
This is a point of concern because it may affect the estimates of the other variables. It would
be better to remove these data points because Y can be perfectly predicted by safety when the
latter is equal to low.
(d) Grading. Partial credit only if shown calculations of sensitivity and specificity. No other
partial credit.
Solution. Sensitivity = 320/(320 + 198) = 62% and Specificity = 1026/(1026 + 184) = 85%.
These are ok numbers, but we may prefer a better sensitivity in practice. Here, it seems we
better predict when the buyer is not buying, which may be influenced by the fact that there
many more 0’s than 1’s in the data. This is evidence that we can do better.
(e) Grading. This requires a clear answer. No partial credit.
Solution. It seems that safety is the only one variable that can distinguish Y = 1. Indeed, by
tabulating the data we see that when safety is high, the cases where Y = 1 are more numerous
than Y = 0. This does not hold for any other variable. So in the absence of safety the model
always predicts Y = 0, since these are the majority of cases for any combination of values of
the other parameters.

5
5 Model Building

(a) (i) Grading. 1 point for correctly identifying that these are partial F tests and 1 point
further for correctly identifying that model 6 is best. 2 points total available. No other
partial credit.
Solution. Each row shows the partial F test of one model against the next most com-
plicated model. From each row we see that, according to this metric, each new model
represents a worthwhile addition. That is, Model 6 is the best, according to this.
(ii) Grading. 2 points for any type of correct answer at all, discussing multiple testing,
overfitting, not testing prediction of the models, etc. No other partial credit.
Solution. The problem is that it relies on hypothesis testing and R2 , i.e. model fit.
From the latter, it is prone to overfitting, because R2 is not a good goal in and of itself,
prediction is the goal. From the former, it is prone to the multiple testing problem, and
the choosing a specific direction.
(b) (i) Grading. 1 point for just saying that it is stepwise selection based on AIC. Award a
further 2 points for a correct explanation of what happens at each step and when the
process ends. 3 points total available. No other partial credit.
Solution. This is stepwise selection based on AIC. The process begins from the null
model, which is empty. At each step one variable is added, and the variable that leads
to the greatest improvement in AIC is kept. This is repeated until no variables lead to
improvement. Main effects are tested first, then interactions.
(ii) Explicitly list the elements of this process which are controlled by the user as opposed
to what is controlled by the computer. Grading. 1 point for each of the four identified
below. Subtract 1/2 point for each incorrect thing mentioned. 4 points total available.
No other partial credit.
Solution. There are exactly four: 1) the starting place (here the empty model), 2) the
scope of the search (the full model), 3) the direction of the search (forward), and 4) the
criteria to evaluate each step (AIC). Nothing else is controlled by the user.
(c) (i) Grading. 2 points for correctly identifying the difference. Subtract 1/2 point for each
incorrect thing mentioned. 2 points total available. No other partial credit.
Solution. The difference is the starting point only, we force the stepwise process to
begin with 3 variables already in it, and the rest of the search proceeds as above.
(ii) Grading. 2 points for a fully correct answer: must identify that age and experience
are the same variable, or highly correlated. 1 point for anything kind of correct, such
as discussing multi-colinearity, etc. 2 points total available. No other partial credit, in
particular, no credit given for saying that experience and/or age don’t actually matter
once the other variables are conditioned upon.
Solution. The reason is that age and experience are highly correlated, in fact nearly
perfectly: the correlation is 0.98. So they are very close to being the same exact variable,
and so close that AIC can’t tell them apart. Even without the numerical correlation, it is
conceptually clear that they would be correlated: no 30 year old has 20 years of experience.
Since the only difference between the final selected models is that one includes age and the
other includes experience, there is no other possible answer. In particular, note that age

6
was added in the third step in part (c), so it is the most important variable (AIC-wise)
after sex and education, meaning that it would be added in step 1 in part (c) if it were
different enough from experience. Note that it is wrong to suggest that experience/age
don’t matter once the other variables are conditioned upon: if this were true it would
not be selected first. In particular, it’s not true that once adding experience squared, the
main effect doesn’t matter, because in part (b) age was selected before experience squared.
(iii) Grading. 2 points for any well-reasoned answer, no matter if (b) or (c) is preferred,
or if it doesn’t matter, as long as the reasoning is sound. 2 points total available. No
other partial credit.
Solution. It could be either, or no preference. We prefer part (c) for the reason that
including experience squared without the main effect is a bad idea. Even though expe-
rience and age are nearly perfectly correlated, and so the models are really the same,
interpretation of the model in part (b) is more difficult on the face of it. Overall, part
(c)’s model is more conceptually well-grounded. However, given that the models are the
same, it really doesn’t matter as long as you are careful with the interpretation. One
reason to prefer the model in (b) is that it is entirely based on AIC, and so if we really
had no idea what to use to predict wages, this might be a better way to go.

You might also like