3 Linear Regression 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

9/29/2021

Some Important Questions


When we perform multiple linear regression, we usually are
interested in answering a few important questions:

1. Is at least one of the predictors X1,X2, . . . , Xp useful in


predicting the response?
2. Do all the predictors help to explain Y , or is only a subset of
the predictors useful?
3. How well does the model fit the data?
4. Given a set of predictor values, what response value should
we predict, and how accurate is our prediction?

21

1. Is There a Relationship Between the Response


and Predictors?

• H0 : β1 = β2 = · · · = βp = 0
• Ha: at least one slope βj  0

• This hypothesis test is performed by computing the F-statistic in


the ANOVA (ANalysis Of VAriance) table.
• The ANOVA table has many pieces of information. What we care about is
the F-Ratio and the corresponding p-value.

• How large does the F-statistic need to be before we can reject


H0 and conclude that there is a relationship? It turns out that
the answer depends on the values of n and p.
• For the advertising data, the p-value associated with the F-statistic is
essentially zero, so we have extremely strong evidence that at least one of
the media is associated with increased sales.

22

1
9/29/2021

Why do we need F-statistic?


• Given these individual p-values for each variable, why do we
need to look at the overall F-statistic?
• However, this logic is flawed, especially when the number of
predictors p is large.
• We expect to see approximately 5% of small p-values even in
the absence of any true association between the predictors and
the response (by chance !!!)
• However, the F-statistic does not suffer from this problem
because it adjusts for the number of predictors.
• The approach of using an F-statistic to test for any association
between the predictors and the response works when p is
relatively small, and certainly small compared to n.

23

2. Deciding on Important Variables

• It is more often the case that the response is only related to a


subset of the predictors.
• The task of determining which predictors are associated with
the response, in order to fit a single model involving only those
predictors, is referred to as variable selection.

• Best subset selection: fit a separate least squares regression


model for each possible combination of the p predictors
• Unfortunately, total of 2p models that contain subsets of p variables.
• Only feasible when p is small

• Need an automated and efficient approach to choose a smaller


set of models to consider.

24

2
9/29/2021

Variable Selection
Forward stepwise selection:
• Begin with the null model:
• model that contains an intercept but no predictors.

• Fit p simple linear regressions and add to the null model the
variable that results in the lowest RSS.

• Add to that model the variable that results in the lowest RSS for
the new two-variable model.

• Continue until some stopping rule is satisfied.

25

Variable Selection (2)


Backward stepwise selection:
• Start with all variables in the model

• Remove the variable with the largest p-value


• the least statistically significant.

• The new (p − 1)-variable model is fit, and the variable with the
largest p-value is removed.

• Procedure continues until a stopping rule is reached


• may stop when all remaining variables have a p-value below some
threshold.

26

3
9/29/2021

Variable Selection (3)


Hybrid Approaches = Mixed Selection
• Variables are added to the model sequentially, in analogy to
forward selection.

• However, after adding each new variable, the method may also
remove any variables that no longer provide an improvement in
the model fit.
• the p-values for variables can become larger as new predictors are added
to the model. (p-value rises above a certain threshold).

• Such an approach attempts to more closely mimic best subset


selection while retaining the computational advantages of
forward and backward stepwise selection.

27

3. Model Fit

• In MLR, 𝑅 2 = 𝐶𝑜𝑟(𝑌, 𝑌)2


• the square of the correlation between the response and the fitted linear
model; in fact one property of the fitted linear model is that it maximizes
this correlation among all possible linear models.

• The model that uses only TV and radio to predict sales has an R2
value of 0.89719; only TV as a predictor had an R2 of 0.61

28

4
9/29/2021

Observations
• Adding radio to the model leads to a substantial improvement in
R2 (TV + Radio better prediction; higher R2 and smaller RSE)
• There is a small increase in R2 if we include newspaper
advertising in the model that already contains TV and radio
advertising.
• Previously observed: p-value for newspaper advertising is not significant.
• Additional evidence that newspaper can be dropped from the model

• R2 will always increase when more variables are added to the


model, even if those variables are only weakly associated with
the response.

• The inclusion of variables that do not provide real improvement


in the model fit to the training samples, will likely lead to poor
results on independent test samples due to overfitting.

29

4. Predictions
• Once we have fit the multiple regression model, it is
straightforward to predict the response Y on the basis of a set
of values for the predictors.

• However, there are three sorts of uncertainty associated with


this prediction.
1. The inaccuracy in the coefficient estimates is related to the
reducible error
2. Model Bias (assumption of linearity)
3. Irreducible error: even if we knew the true value of the
coefficients, the response value cannot be predicted perfectly
because of the random error  in the model

30

5
9/29/2021

Other Issues/Considerations

• Qualitative Predictors

• Interaction Terms

• Non-linear effects

• Multicollinearity

• Model Selection

31

Qualitative Predictors
• Investigate differences in credit card balance between males and
females, ignoring the other variables for the moment.

• How do you stick “men” and “women” (qualitative predictor =


factor) into a regression equation?

• Code them as indicator variables (dummy variables)

32

6
9/29/2021

• Use this variable as a predictor in the regression equation. This


results in the model.

• Now β0 can be interpreted as the average credit card balance


among males, β0 + β1 as the average credit card balance among
females, and β1 as the average difference in credit card balance
between females and males.

33

Alternative coding schemes


• Alternatively, instead of a 0/1 coding scheme, we could create a
dummy variable:

• Now β0 can be interpreted as the overall average credit card


balance (ignoring the gender effect), and β1 is the amount that
females are above the average and males are below the average.
• β0 and β1?

34

7
9/29/2021

It is important to note that the final predictions for the


credit balances of males and females will be identical
regardless of the coding scheme used. The only difference is
in the way that the coefficients are interpreted.

35

Extensions of the Linear Model


The standard linear regression model makes several highly
restrictive assumptions that are often violated in practice. Two of
the most important assumptions state that the relationship
between the predictors and response are additive and linear.

1. The additive assumption means the effect of changes in a


predictor Xj on the response Y is independent of the values of
the other predictors

2. The linear assumption states that the change in the response


Y due to a one-unit change in Xj is constant, regardless of the
value of Xj

36

8
9/29/2021

Interaction terms in advertising


Sales = b0 + b1 ´TV + b2 ´ Radio+ b3 ´TV ´ Radio

Sales = b0 +(b1 + b3 ´ Radio)´TV + b2 ´ Radio Interaction Term

Spending $1 extra on TV increases average sales by


0.0191 + 0.0011Radio
Sales = b0 + (b2 + b3 ´TV)´ Radio + b2 ´TV

Spending $1 extra on Radio increases average sales


by 0.0289 + 0.0011TV

37

Observations
• We can interpret β3 as the increase in the effectiveness of TV
advertising for a one unit increase in radio advertising (or vice-versa).

• The p-value for the interaction term, TV×radio, is extremely low,


indicating that there is strong evidence for Ha : β3  0.
• In other words, it is clear that the true relationship is not additive.

• The R2 for the model is 96.8%, compared to only 89.7% for the model
that predicts sales using TV and radio without an interaction term.

• The hierarchical principle states that if we include an interaction in a


model, we should also include the main effects, even if the p-values
associated with their coefficients are not significant.

38

9
9/29/2021

(Multi) Collinearity
• Situation in which two or more predictor variables are closely
related to one another.

• Can pose problems in the regression context,


• since it can be difficult to separate out the individual effects of collinear
variables on the response.

• Unfortunately, not all collinearity problems can be detected by


inspection of the correlation matrix.
• possible for collinearity to exist between three or more variables even if
no pair of variables has a particularly high correlation.
• Multicollinearity

39

Potential Fit Problems


There are a number of possible problems that one may encounter
when fitting the linear regression model.
1. Non-linearity of the data
2. Dependence of the error terms
3. Non-constant variance of error terms
4. Outliers
5. High leverage points
6. Collinearity

See Section 3.3.3 for more details.

40

10

You might also like