IE 451 Fall 2023-2024 Homework 4 Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

IE 451 Fall 2023-2024 Homework 4 Solutions


AUTHOR
Deniz Sahin

1 Question 10
This question should be answered using the Carseats data set.

d1 <- Carseats
d1 %>% head %>% pander()

Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US

9.5 138 73 11 276 120 Bad 42 17 Yes Yes

11.22 111 48 16 260 83 Good 65 10 Yes Yes

10.06 113 35 10 269 80 Medium 59 12 Yes Yes

7.4 117 100 4 466 97 Medium 55 14 Yes Yes

4.15 141 64 3 340 128 Bad 38 13 Yes No

10.81 124 113 13 501 72 Bad 78 16 No Yes

(a) Fit a multiple regression model to predict Sales using Price , Urban , and US.

lm1 <- lm(Sales~Price + Urban + US, d1)

(b) Provide an interpretation of each coefficient in the model. Be careful—some of the


variables in the model are qualitative!

lm1 %>% summary

Call:
lm(formula = Sales ~ Price + Urban + US, data = d1)

Residuals:
Min 1Q Median 3Q Max
-6.9206 -1.6220 -0.0564 1.5786 7.0581

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
Price -0.054459 0.005242 -10.389 < 2e-16 ***
UrbanYes -0.021916 0.271650 -0.081 0.936
USYes 1.200573 0.259042 4.635 4.86e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Residual standard error: 2.472 on 396 degrees of freedom


Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16

• An increase of $1000 in price results in the number of carseats sold to decrease in average by
54.46.

• A store’s sale is not affected by whether or not it is in an Urban area.

• A store in the US sales 1200 more carseats in average than a store that is in abroad.

(c) Write out the model in equation form, being careful to handle the qualitative variables
properly.

Salesi = 13.04 − 0.05 pricei − 0.02 urbani + 1.2 USi + εi

(d) For which of the predictors can you reject the null hypothesis H0 : βj = 0?

For Price and US we should reject the null hypothesis that βPrice = 0 and βUS = 0 because their
p-values are significantly small. However, for Urban we cannot reject the null-hypothesis that
βUrban = 0 . Because the p-value is high.

(e) On the basis of your response to the previous question, fit a smaller model that only uses
the predictors for which there is evidence of association with the outcome.

lm2 <- lm(Sales ~ Price + US, d1)


lm2 %>% summary

Call:
lm(formula = Sales ~ Price + US, data = d1)

Residuals:
Min 1Q Median 3Q Max
-6.9269 -1.6286 -0.0574 1.5766 7.0515

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
Price -0.05448 0.00523 -10.416 < 2e-16 ***
USYes 1.19964 0.25846 4.641 4.71e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.469 on 397 degrees of freedom


Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16

(f) How well do the models in (a) and (e) fit the data?

The R2 value for the model in a is 0.2393. And the R2 value for the model in e is again 0.2393.
Hence these models are able to explain approximately 24% of the variation in Sales variable

2 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

confint(lm2, level=0.95) %>% pander()

2.5 % 97.5 %

(Intercept) 11.79 14.27

Price -0.06476 -0.0442

USYes 0.6915 1.708

Mathematically we can use the formula

Est <- coef(lm2) #For the Parameter Estimations


sm_lm2<- summary(lm2)
SE <- sm_lm2$coefficients[,2] #For the Standard Errors of the Parameters
#We assumed normality because number of the observations is high

cbind(Est-qnorm(0.025,lower.tail = FALSE)*SE,Est+qnorm(0.025,lower.tail = FALSE)*SE)

[,1] [,2]
(Intercept) 11.79410192 14.26748359
Price -0.06472849 -0.04422677
USYes 0.69306864 1.70621725

(h) Is there evidence of outliers or high leverage observations in the model from (e)?

• Any point with leverage higher than


◦ 2 times the number of predictors / the number of cases
◦ or 3 times the number of predictors / the number of cases if number of cases is large can
be a high-leverage point

• Below we mark
◦ ±2 and ±3 bounds on standardized residuals with horizontal dotted lines in Normal QQ
plot and residual vs leverage plot
◦ the high-leverage thresholds with a vertical dotted lines in residual vs leverage plot

• Either of those horizontal or vertical dotted lines may not appear if the limits are outside the
plot limits.

plot(lm2, which = 1, col="darkgray", lwd=2, cex=1.5, cex.axis=1.5, cex.lab = 1.5, cex.id =


plot(lm2, which = 2, id.n = 3, col="darkgray",
lwd=2, cex=1.5, cex.axis=1.5, cex.lab = 1.5, cex.id = 1.5, cex.caption = 1.5)
abline(h=c(-3,-2,2,3), lty = "dotted")
plot(lm2, which = c(3,5), id.n=3, col="darkgray", lwd=2, cex=1.5, cex.axis=1.5, cex.lab =
abline(h = c(-3, -2 , 2, 3), # 5% and 0.1% of all cases are expected to be outside (-2,2) and
v = c(2,3)*(1 - df.residual(lm2)/nobs(lm2)), lty = "dotted") # any point with leverag

3 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

(a)

(b)

4 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

(c)

(d)

Figure 1: Diagnostic plots

5 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Figure 1: Diagnostic plots

Inspect Figure 1:

a. leveled smoother at zero suggests that the regression model is satisfactory


b. points are aligned along the straight line: residuals seem to normally distributed.
c. constant smoother suggests that the error variance is constant.
d. none of the points have large Cook’s distances (more than 1): no influential points exist. There
are six or more high leverage points. The number of standardized residuals outside ±3 is 0,
but outside ±2 is 23, which is a bit larger than the expected number 18. There are about 5 y-
outliers. They do not seem to be influential on regression coefficient estimates, but can
inflate the residual standard error, which can cause the confidence and prediction intervals.

2 Question 15
This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try
to predict per capita crime rate using the other variables in this data set. In other words, per capita
crime rate is the response, and the other variables are the predictors.

Boston <- mutate(Boston,chas=factor(chas))

(a) For each predictor, fit a simple linear regression model to predict the response. Describe
your results. In which of the models is there a statistically significant association between the
predictor and the response? Create some plots to back up your assertions.

lm_zn <- lm(crim~zn,Boston)


lm_zn %>% summary()

Call:
lm(formula = crim ~ zn, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-4.429 -4.222 -2.620 1.250 84.523

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
zn -0.07393 0.01609 -4.594 5.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.435 on 504 degrees of freedom


Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06

lm_indus <- lm(crim~indus,Boston)


lm_indus %>% summary()

6 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Call:
lm(formula = crim ~ indus, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-11.972 -2.698 -0.736 0.712 81.813

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.06374 0.66723 -3.093 0.00209 **
indus 0.50978 0.05102 9.991 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.866 on 504 degrees of freedom


Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16

lm_chas <- lm(crim~chas,Boston)


lm_chas %>% summary()

Call:
lm(formula = crim ~ chas, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
chas1 -1.8928 1.5061 -1.257 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom


Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094

lm_nox <- lm(crim~nox,Boston)


lm_nox %>% summary()

Call:
lm(formula = crim ~ nox, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-12.371 -2.738 -0.974 0.559 81.728

7 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.720 1.699 -8.073 5.08e-15 ***
nox 31.249 2.999 10.419 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.81 on 504 degrees of freedom


Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16

lm_rm <- lm(crim~rm,Boston)


lm_rm %>% summary()

Call:
lm(formula = crim ~ rm, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-6.604 -3.952 -2.654 0.989 87.197

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.482 3.365 6.088 2.27e-09 ***
rm -2.684 0.532 -5.045 6.35e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.401 on 504 degrees of freedom


Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07

lm_age <- lm(crim~age,Boston)


lm_age %>% summary()

Call:
lm(formula = crim ~ age, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-6.789 -4.257 -1.230 1.527 82.849

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
age 0.10779 0.01274 8.463 2.85e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.057 on 504 degrees of freedom

8 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Residual standard error: 8.057 on 504 degrees of freedom


Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16

lm_dis <- lm(crim~dis,Boston)


lm_dis %>% summary()

Call:
lm(formula = crim ~ dis, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-6.708 -4.134 -1.527 1.516 81.674

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4993 0.7304 13.006 <2e-16 ***
dis -1.5509 0.1683 -9.213 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.965 on 504 degrees of freedom


Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16

lm_rad <- lm(crim~rad,Boston)


lm_rad %>% summary()

Call:
lm(formula = crim ~ rad, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-10.164 -1.381 -0.141 0.660 76.433

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
rad 0.61791 0.03433 17.998 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.718 on 504 degrees of freedom


Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16

lm_tax <- lm(crim~tax,Boston)


lm_tax %>% summary()

Call:

9 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Call:
lm(formula = crim ~ tax, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-12.513 -2.738 -0.194 1.065 77.696

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
tax 0.029742 0.001847 16.10 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.997 on 504 degrees of freedom


Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16

lm_ptratio <- lm(crim~ptratio,Boston)


lm_ptratio %>% summary()

Call:
lm(formula = crim ~ ptratio, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-7.654 -3.985 -1.912 1.825 83.353

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
ptratio 1.1520 0.1694 6.801 2.94e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.24 on 504 degrees of freedom


Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11

lm_lstat <- lm(crim~lstat,Boston)


lm_lstat %>% summary()

Call:
lm(formula = crim ~ lstat, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-13.925 -2.822 -0.664 1.079 82.862

Coefficients:
Estimate Std. Error t value Pr(>|t|)

10 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Estimate Std. Error t value Pr(>|t|)


(Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
lstat 0.54880 0.04776 11.491 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.664 on 504 degrees of freedom


Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16

lm_medv <- lm(crim~medv,Boston)


lm_medv %>% summary()

Call:
lm(formula = crim ~ medv, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-9.071 -4.022 -2.343 1.298 80.957

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.79654 0.93419 12.63 <2e-16 ***
medv -0.36316 0.03839 -9.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.934 on 504 degrees of freedom


Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16

• We can conclude that except chas , all variables seem to be statistically significant in a simple
linear regression model.

my_fn <- function(data, mapping, method="loess", ...){


p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(formula = y ~ x,method=method,se=F, ...)
p
}

select(Boston,-chas) %>%
ggpairs(progress = FALSE,
upper = list(continuous = my_fn),
lower = list(continuous = "cor"))

11 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

(b) Fit a multiple regression model to predict the response using all of the predictors.
Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0 ?

lm_full <- lm(crim~.,Boston)


lm_full %>% summary()

Call:
lm(formula = crim ~ ., data = Boston)

Residuals:
Min 1Q Median 3Q Max
-8.534 -2.248 -0.348 1.087 73.923

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.7783938 7.0818258 1.946 0.052271 .
zn 0.0457100 0.0187903 2.433 0.015344 *
indus -0.0583501 0.0836351 -0.698 0.485709
chas1 -0.8253776 1.1833963 -0.697 0.485841
nox -9.9575865 5.2898242 -1.882 0.060370 .
rm 0.6289107 0.6070924 1.036 0.300738
age -0.0008483 0.0179482 -0.047 0.962323
dis -1.0122467 0.2824676 -3.584 0.000373 ***

12 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

rad 0.6124653 0.0875358 6.997 8.59e-12 ***


tax -0.0037756 0.0051723 -0.730 0.465757
ptratio -0.3040728 0.1863598 -1.632 0.103393
lstat 0.1388006 0.0757213 1.833 0.067398 .
medv -0.2200564 0.0598240 -3.678 0.000261 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.46 on 493 degrees of freedom


Multiple R-squared: 0.4493, Adjusted R-squared: 0.4359
F-statistic: 33.52 on 12 and 493 DF, p-value: < 2.2e-16

• For zn , dis , rad and medv we can reject the null hypothesis that βj = 0.

(c) How do your results from (a) compare to your results from (b)? Create a plot displaying the
univariate regression coefficients from (a) on the x-axis, and the multiple regression
coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the
plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its
coefficient estimate in the multiple linear regression model is shown on the y-axis.

sep <- c(coef(lm_zn)[2],coef(lm_indus)[2],coef(lm_chas)[2],coef(lm_nox)[2],coef(lm_rm)[


full <- coef(lm_full)[2:13]

coefs <- as_tibble(cbind(sep,full))

coefs %>% ggplot(aes(x=sep,y=full))+


geom_point()

13 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

• Clearly, the results from simple and multiple regression models are significantly different.
While almost every variable is statistically significant in simple regression models, most of the
variables are not in the multiple regression model.

(d) Is there evidence of non-linear association between any of the predictors and the
response? To answer this question, for each predictor X, fit a model of the form

Y = β0 + β1X + β2X 2 + β3X 3 + ϵ.

lm_zn <- lm(crim~zn + I(zn^2) + I(zn^3),Boston)


lm_zn %>% summary()

Call:
lm(formula = crim ~ zn + I(zn^2) + I(zn^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-4.821 -4.614 -1.294 0.473 84.130

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.846e+00 4.330e-01 11.192 < 2e-16 ***
zn -3.322e-01 1.098e-01 -3.025 0.00261 **
I(zn^2) 6.483e-03 3.861e-03 1.679 0.09375 .
I(zn^3) -3.776e-05 3.139e-05 -1.203 0.22954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.372 on 502 degrees of freedom


Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06

lm_indus <- lm(crim~indus + I(indus^2) + I(indus^3),Boston)


lm_indus %>% summary()

Call:
lm(formula = crim ~ indus + I(indus^2) + I(indus^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-8.278 -2.514 0.054 0.764 79.713

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6625683 1.5739833 2.327 0.0204 *
indus -1.9652129 0.4819901 -4.077 5.30e-05 ***
I(indus^2) 0.2519373 0.0393221 6.407 3.42e-10 ***
I(indus^3) -0.0069760 0.0009567 -7.292 1.20e-12 ***
---

14 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.423 on 502 degrees of freedom


Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16

lm_nox <- lm(crim~nox + I(nox^2) + I(nox^3),Boston)


lm_nox %>% summary()

Call:
lm(formula = crim ~ nox + I(nox^2) + I(nox^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-9.110 -2.068 -0.255 0.739 78.302

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 233.09 33.64 6.928 1.31e-11 ***
nox -1279.37 170.40 -7.508 2.76e-13 ***
I(nox^2) 2248.54 279.90 8.033 6.81e-15 ***
I(nox^3) -1245.70 149.28 -8.345 6.96e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.234 on 502 degrees of freedom


Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16

lm_rm <- lm(crim~rm + I(rm^2) + I(rm^3),Boston)


lm_rm %>% summary()

Call:
lm(formula = crim ~ rm + I(rm^2) + I(rm^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-18.485 -3.468 -2.221 -0.015 87.219

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.6246 64.5172 1.746 0.0815 .
rm -39.1501 31.3115 -1.250 0.2118
I(rm^2) 4.5509 5.0099 0.908 0.3641
I(rm^3) -0.1745 0.2637 -0.662 0.5086
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.33 on 502 degrees of freedom


Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07

15 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07

lm_age <- lm(crim~age + I(age^2) + I(age^3),Boston)


lm_age %>% summary()

Call:
lm(formula = crim ~ age + I(age^2) + I(age^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-9.762 -2.673 -0.516 0.019 82.842

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.549e+00 2.769e+00 -0.920 0.35780
age 2.737e-01 1.864e-01 1.468 0.14266
I(age^2) -7.230e-03 3.637e-03 -1.988 0.04738 *
I(age^3) 5.745e-05 2.109e-05 2.724 0.00668 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.84 on 502 degrees of freedom


Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16

lm_dis <- lm(crim~dis + I(dis^2) + I(dis^3),Boston)


lm_dis %>% summary()

Call:
lm(formula = crim ~ dis + I(dis^2) + I(dis^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-10.757 -2.588 0.031 1.267 76.378

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.0476 2.4459 12.285 < 2e-16 ***
dis -15.5543 1.7360 -8.960 < 2e-16 ***
I(dis^2) 2.4521 0.3464 7.078 4.94e-12 ***
I(dis^3) -0.1186 0.0204 -5.814 1.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.331 on 502 degrees of freedom


Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16

lm_rad <- lm(crim~rad + I(rad^2) + I(rad^3),Boston)


lm_rad %>% summary()

16 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Call:
lm(formula = crim ~ rad + I(rad^2) + I(rad^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-10.381 -0.412 -0.269 0.179 76.217

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545 2.050108 -0.295 0.768
rad 0.512736 1.043597 0.491 0.623
I(rad^2) -0.075177 0.148543 -0.506 0.613
I(rad^3) 0.003209 0.004564 0.703 0.482

Residual standard error: 6.682 on 502 degrees of freedom


Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16

lm_tax <- lm(crim~tax + I(tax^2) + I(tax^3),Boston)


lm_tax %>% summary()

Call:
lm(formula = crim ~ tax + I(tax^2) + I(tax^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-13.273 -1.389 0.046 0.536 76.950

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.918e+01 1.180e+01 1.626 0.105
tax -1.533e-01 9.568e-02 -1.602 0.110
I(tax^2) 3.608e-04 2.425e-04 1.488 0.137
I(tax^3) -2.204e-07 1.889e-07 -1.167 0.244

Residual standard error: 6.854 on 502 degrees of freedom


Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16

lm_ptratio <- lm(crim~ptratio + I(ptratio^2) + I(ptratio^3),Boston)


lm_ptratio %>% summary()

Call:
lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-6.833 -4.146 -1.655 1.408 82.697

17 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 477.18405 156.79498 3.043 0.00246 **
ptratio -82.36054 27.64394 -2.979 0.00303 **
I(ptratio^2) 4.63535 1.60832 2.882 0.00412 **
I(ptratio^3) -0.08476 0.03090 -2.743 0.00630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.122 on 502 degrees of freedom


Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13

lm_lstat <- lm(crim~lstat + I(lstat^2) + I(lstat^3),Boston)


lm_lstat %>% summary()

Call:
lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-15.234 -2.151 -0.486 0.066 83.353

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2009656 2.0286452 0.592 0.5541
lstat -0.4490656 0.4648911 -0.966 0.3345
I(lstat^2) 0.0557794 0.0301156 1.852 0.0646 .
I(lstat^3) -0.0008574 0.0005652 -1.517 0.1299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.629 on 502 degrees of freedom


Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16

lm_medv <- lm(crim~medv + I(medv^2) + I(medv^3),Boston)


lm_medv %>% summary()

Call:
lm(formula = crim ~ medv + I(medv^2) + I(medv^3), data = Boston)

Residuals:
Min 1Q Median 3Q Max
-24.427 -1.976 -0.437 0.439 73.655

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.1655381 3.3563105 15.840 < 2e-16 ***
medv -5.0948305 0.4338321 -11.744 < 2e-16 ***

18 of 19 19/02/2024, 15:06
IE 451 Fall 2023-2024 Homework 4 Solutions file:///home/sdayanik/Downloads/ie451/Homework/...

I(medv^2) 0.1554965 0.0171904 9.046 < 2e-16 ***


I(medv^3) -0.0014901 0.0002038 -7.312 1.05e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.569 on 502 degrees of freedom


Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16

zn: zn is significant, zn2 and zn3 are not significant.

indus: indus, indus2 and indus3 are significant.

nox: nox, nox2 and nox3 are significant.

rm: rm, rm2 and rm3 are not significant.

age: age is not significant but age2 and age3 are significant.

dis: dis, dis2 and dis3 are significant.

rad: rad, rad2 and rad3 are not significant.

tax: tax, tax2 and tax3 are not significant.

ptratio: ptratio, ptratio2 and ptratio3 are significant.

lstat: lstat, lstat2 and lstat3 are not significant.

medv: medv, medv2 and medv3 are significant.

19 of 19 19/02/2024, 15:06

You might also like