Assumptions of Linear Regression: No or Little Multicollinearity

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Assumptions of Linear Regression

No or Little
Multicollinearity

Amjad Iqbal
PhD Management Sciences
COMSATS University
Multicollinearity
 Linear Regression assumes that there is no or little
Multicollinearity in the data.

 Occurs when Independent Variables (in a Multiple Regression


Model) are highly correlated with each other.

 Rarely maintained.
Types of Multicollinearity
 Structural multicollinearity
 Mathematical artifact caused by creating new predictors
from other predictors — such as, creating the
predictor x2 from the predictor x.

 Data-based multicollinearity
 A result of a poorly designed experiment, reliance on
purely observational data, or the inability to manipulate the
system on which the data are collected.
Why Multicollinearity is a Potential Problem?
 The interpretation of Regression Coefficient is that it
represents the mean change in the dependent variable for
each 1 unit change in an independent variable when you hold
all of the other independent variables constant.

 When independent variables are correlated, it indicates that


changes in one variable are associated with shifts in another
variable. Hence, it becomes difficult to find out which
variable is actually contributing to predict the response
variable.
Why Multicollinearity is a Potential Problem?
Why Multicollinearity is a Potential Problem?
 With presence of correlated predictors, the standard errors
tend to increase. And, with large standard errors, the
confidence interval becomes wider leading to less precise
estimates of slope parameters (i.e. more chances to accept
Ho)

 Multicollinearity thus weakens the statistical power of our


regression model i.e. we might not be able to trust the
p-values to identify independent variables that are
statistically significant.
Sources of Multicollinearity?

The data collection Sampling over a limited range of


values taken by regressors in the
method employed population

Constraints on the model Electricity consumption (Y) with


or in the population Income X1) and House Size (X2)

Adding polynomial terms to a


Model Specification Regression model, especially when
range of X is small

More explanatory variables than the


number of observations
Overdetermined Model Medical Research where small no of
patients & Large no of variables
How to Detect Multicollinearity?
 Multicollinearity is a question of degree
How to Detect Multicollinearity?
 Correlation Matrix
 Pearson Correlation: Coefficients should not be more
than 0.8
How to Detect Multicollinearity?
 Tolerance
 Measure the influence of one independent variable on
the others independent variables included in the model
 T = 1- R²
 With T < 0.1 there might be multicollinearity in the data
and with T < 0.01 there certainly is.
How to Detect Multicollinearity?
 Variance Inflation Factors (VIF)
 VIF = 1/1- R²
 VIF value < 4 suggests no multicollinearity whereas a
value > 10 implies serious multicollinearity.
When to Deal with Multicollinearity?
 The need to deal with/reduce multicollinearity depends
upon its severity and primary goal for regression model.
 If you have a moderate multicollinearity, you may not
need to reduce it.
 If multicollinearity is not present for particular
independent variables of interest, you may not need to
resolve it.
 Multicollinearity does not affect predictors, precision of
predictions and goodness-of-fit statistics. If your
primary goal is to make predictions, and you don’t need
to understand the role of each independent variable, you
don’t need to reduce severe multicollinearity.
How to Deal with Multicollinearity?
 In case of structural multicollinearity center the independent
variables (i.e. standardizing the variables by subtracting the
mean).

 In case of multicollinearity in data.


 Remove some of the highly correlated independent
variables.
 Linearly combine the independent variables, such as
adding them together.
 Perform an analysis designed for highly correlated
variables, such as principal components analysis or partial
least squares regression
Thanks

You might also like