IE 451 Fall 2023-2024 Homework 4 Solutions
IE 451 Fall 2023-2024 Homework 4 Solutions
IE 451 Fall 2023-2024 Homework 4 Solutions
1 Question 10
This question should be answered using the Carseats data set.
d1 <- Carseats
d1 %>% head %>% pander()
Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US
(a) Fit a multiple regression model to predict Sales using Price , Urban , and US.
Call:
lm(formula = Sales ~ Price + Urban + US, data = d1)
Residuals:
Min 1Q Median 3Q Max
-6.9206 -1.6220 -0.0564 1.5786 7.0581
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
Price -0.054459 0.005242 -10.389 < 2e-16 ***
UrbanYes -0.021916 0.271650 -0.081 0.936
USYes 1.200573 0.259042 4.635 4.86e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
• An increase of $1000 in price results in the number of carseats sold to decrease in average by
54.46.
• A store in the US sales 1200 more carseats in average than a store that is in abroad.
(c) Write out the model in equation form, being careful to handle the qualitative variables
properly.
(d) For which of the predictors can you reject the null hypothesis H0 : βj = 0?
For Price and US we should reject the null hypothesis that βPrice = 0 and βUS = 0 because their
p-values are significantly small. However, for Urban we cannot reject the null-hypothesis that
βUrban = 0 . Because the p-value is high.
(e) On the basis of your response to the previous question, fit a smaller model that only uses
the predictors for which there is evidence of association with the outcome.
Call:
lm(formula = Sales ~ Price + US, data = d1)
Residuals:
Min 1Q Median 3Q Max
-6.9269 -1.6286 -0.0574 1.5766 7.0515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
Price -0.05448 0.00523 -10.416 < 2e-16 ***
USYes 1.19964 0.25846 4.641 4.71e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(f) How well do the models in (a) and (e) fit the data?
The R2 value for the model in a is 0.2393. And the R2 value for the model in e is again 0.2393.
Hence these models are able to explain approximately 24% of the variation in Sales variable
2 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
2.5 % 97.5 %
[,1] [,2]
(Intercept) 11.79410192 14.26748359
Price -0.06472849 -0.04422677
USYes 0.69306864 1.70621725
(h) Is there evidence of outliers or high leverage observations in the model from (e)?
• Below we mark
◦ ±2 and ±3 bounds on standardized residuals with horizontal dotted lines in Normal QQ
plot and residual vs leverage plot
◦ the high-leverage thresholds with a vertical dotted lines in residual vs leverage plot
• Either of those horizontal or vertical dotted lines may not appear if the limits are outside the
plot limits.
3 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
(a)
(b)
4 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
(c)
(d)
5 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Inspect Figure 1:
2 Question 15
This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try
to predict per capita crime rate using the other variables in this data set. In other words, per capita
crime rate is the response, and the other variables are the predictors.
(a) For each predictor, fit a simple linear regression model to predict the response. Describe
your results. In which of the models is there a statistically significant association between the
predictor and the response? Create some plots to back up your assertions.
Call:
lm(formula = crim ~ zn, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.429 -4.222 -2.620 1.250 84.523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
zn -0.07393 0.01609 -4.594 5.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ indus, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-11.972 -2.698 -0.736 0.712 81.813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.06374 0.66723 -3.093 0.00209 **
indus 0.50978 0.05102 9.991 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ chas, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
chas1 -1.8928 1.5061 -1.257 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ nox, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.371 -2.738 -0.974 0.559 81.728
7 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.720 1.699 -8.073 5.08e-15 ***
nox 31.249 2.999 10.419 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ rm, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.604 -3.952 -2.654 0.989 87.197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.482 3.365 6.088 2.27e-09 ***
rm -2.684 0.532 -5.045 6.35e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.789 -4.257 -1.230 1.527 82.849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
age 0.10779 0.01274 8.463 2.85e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
8 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ dis, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.708 -4.134 -1.527 1.516 81.674
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4993 0.7304 13.006 <2e-16 ***
dis -1.5509 0.1683 -9.213 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ rad, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.164 -1.381 -0.141 0.660 76.433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
rad 0.61791 0.03433 17.998 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
9 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ tax, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.513 -2.738 -0.194 1.065 77.696
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
tax 0.029742 0.001847 16.10 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ ptratio, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-7.654 -3.985 -1.912 1.825 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
ptratio 1.1520 0.1694 6.801 2.94e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.925 -2.822 -0.664 1.079 82.862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
10 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ medv, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.071 -4.022 -2.343 1.298 80.957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.79654 0.93419 12.63 <2e-16 ***
medv -0.36316 0.03839 -9.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• We can conclude that except chas , all variables seem to be statistically significant in a simple
linear regression model.
select(Boston,-chas) %>%
ggpairs(progress = FALSE,
upper = list(continuous = my_fn),
lower = list(continuous = "cor"))
11 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
(b) Fit a multiple regression model to predict the response using all of the predictors.
Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0 ?
Call:
lm(formula = crim ~ ., data = Boston)
Residuals:
Min 1Q Median 3Q Max
-8.534 -2.248 -0.348 1.087 73.923
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.7783938 7.0818258 1.946 0.052271 .
zn 0.0457100 0.0187903 2.433 0.015344 *
indus -0.0583501 0.0836351 -0.698 0.485709
chas1 -0.8253776 1.1833963 -0.697 0.485841
nox -9.9575865 5.2898242 -1.882 0.060370 .
rm 0.6289107 0.6070924 1.036 0.300738
age -0.0008483 0.0179482 -0.047 0.962323
dis -1.0122467 0.2824676 -3.584 0.000373 ***
12 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
• For zn , dis , rad and medv we can reject the null hypothesis that βj = 0.
(c) How do your results from (a) compare to your results from (b)? Create a plot displaying the
univariate regression coefficients from (a) on the x-axis, and the multiple regression
coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the
plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its
coefficient estimate in the multiple linear regression model is shown on the y-axis.
13 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
• Clearly, the results from simple and multiple regression models are significantly different.
While almost every variable is statistically significant in simple regression models, most of the
variables are not in the multiple regression model.
(d) Is there evidence of non-linear association between any of the predictors and the
response? To answer this question, for each predictor X, fit a model of the form
Call:
lm(formula = crim ~ zn + I(zn^2) + I(zn^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.821 -4.614 -1.294 0.473 84.130
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.846e+00 4.330e-01 11.192 < 2e-16 ***
zn -3.322e-01 1.098e-01 -3.025 0.00261 **
I(zn^2) 6.483e-03 3.861e-03 1.679 0.09375 .
I(zn^3) -3.776e-05 3.139e-05 -1.203 0.22954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ indus + I(indus^2) + I(indus^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-8.278 -2.514 0.054 0.764 79.713
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6625683 1.5739833 2.327 0.0204 *
indus -1.9652129 0.4819901 -4.077 5.30e-05 ***
I(indus^2) 0.2519373 0.0393221 6.407 3.42e-10 ***
I(indus^3) -0.0069760 0.0009567 -7.292 1.20e-12 ***
---
14 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ nox + I(nox^2) + I(nox^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.110 -2.068 -0.255 0.739 78.302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 233.09 33.64 6.928 1.31e-11 ***
nox -1279.37 170.40 -7.508 2.76e-13 ***
I(nox^2) 2248.54 279.90 8.033 6.81e-15 ***
I(nox^3) -1245.70 149.28 -8.345 6.96e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ rm + I(rm^2) + I(rm^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-18.485 -3.468 -2.221 -0.015 87.219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.6246 64.5172 1.746 0.0815 .
rm -39.1501 31.3115 -1.250 0.2118
I(rm^2) 4.5509 5.0099 0.908 0.3641
I(rm^3) -0.1745 0.2637 -0.662 0.5086
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
15 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ age + I(age^2) + I(age^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.762 -2.673 -0.516 0.019 82.842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.549e+00 2.769e+00 -0.920 0.35780
age 2.737e-01 1.864e-01 1.468 0.14266
I(age^2) -7.230e-03 3.637e-03 -1.988 0.04738 *
I(age^3) 5.745e-05 2.109e-05 2.724 0.00668 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ dis + I(dis^2) + I(dis^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.757 -2.588 0.031 1.267 76.378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.0476 2.4459 12.285 < 2e-16 ***
dis -15.5543 1.7360 -8.960 < 2e-16 ***
I(dis^2) 2.4521 0.3464 7.078 4.94e-12 ***
I(dis^3) -0.1186 0.0204 -5.814 1.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
16 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Call:
lm(formula = crim ~ rad + I(rad^2) + I(rad^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.381 -0.412 -0.269 0.179 76.217
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545 2.050108 -0.295 0.768
rad 0.512736 1.043597 0.491 0.623
I(rad^2) -0.075177 0.148543 -0.506 0.613
I(rad^3) 0.003209 0.004564 0.703 0.482
Call:
lm(formula = crim ~ tax + I(tax^2) + I(tax^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.273 -1.389 0.046 0.536 76.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.918e+01 1.180e+01 1.626 0.105
tax -1.533e-01 9.568e-02 -1.602 0.110
I(tax^2) 3.608e-04 2.425e-04 1.488 0.137
I(tax^3) -2.204e-07 1.889e-07 -1.167 0.244
Call:
lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.833 -4.146 -1.655 1.408 82.697
17 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 477.18405 156.79498 3.043 0.00246 **
ptratio -82.36054 27.64394 -2.979 0.00303 **
I(ptratio^2) 4.63535 1.60832 2.882 0.00412 **
I(ptratio^3) -0.08476 0.03090 -2.743 0.00630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.234 -2.151 -0.486 0.066 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2009656 2.0286452 0.592 0.5541
lstat -0.4490656 0.4648911 -0.966 0.3345
I(lstat^2) 0.0557794 0.0301156 1.852 0.0646 .
I(lstat^3) -0.0008574 0.0005652 -1.517 0.1299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = crim ~ medv + I(medv^2) + I(medv^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-24.427 -1.976 -0.437 0.439 73.655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.1655381 3.3563105 15.840 < 2e-16 ***
medv -5.0948305 0.4338321 -11.744 < 2e-16 ***
18 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...
age: age is not significant but age2 and age3 are significant.
19 of 19 19/02/2024, 15:06