Introduction To Multiple Regression
Introduction To Multiple Regression
Introduction To Multiple Regression
Introduction to
Multiple Regression
Outline
1. Omitted variable bias
2. Causality and regression analysis
3. Multiple regression and OLS
4. Measures of fit
5. Sampling distribution of the OLS estimator
2
Omitted Variable Bias
(SW Section 6.1)
The error u arises because of factors that influence Y but are not
included in the regression function; so, there are always omitted
variables.
3
Omitted variable bias, ctd.
The bias in the OLS estimator that occurs as a result of an
omitted factor is called omitted variable bias. For omitted
variable bias to occur, the omitted factor “Z” must be:
p
Xu
2
X
u Xu u
= = Xu ,
X X u X
where Xu = corr(X,u). If assumption #1 is valid, then Xu = 0,
but if not we have….
7
The omitted variable bias formula:
ˆ
p u
1 1 + Xu
X
If an omitted factor Z is both:
(1) a determinant of Y (that is, it is contained in u); and
(2) correlated with X,
then Xu 0 and the OLS estimator ˆ is biased (and is not
1
consistent).
The math makes precise the idea that districts with few ESL
students (1) do better on standardized tests and (2) have
smaller classes (bigger budgets), so ignoring the ESL factor
results in overstating the class size effect.
Is this is actually going on in the CA data?
8
Districts with fewer English Learners have higher test scores
Districts with lower percent EL (PctEL) have smaller classes
Among districts with comparable PctEL, the effect of class size is
small (recall overall “test score gap” = 7.4)
9
Digression on causality and
regression analysis
What do we want to estimate?
10
Ideal Randomized Controlled
Experiment
Ideal: subjects all follow the treatment protocol – perfect
compliance, no errors in reporting, etc.!
Randomized: subjects from the population of interest are
randomly assigned to a treatment or control group (so
there are no confounding factors)
Controlled: having a control group permits measuring the
differential effect of the treatment
Experiment: the treatment is assigned as part of the
experiment: the subjects have no choice, so there is no
“reverse causality” in which subjects choose the treatment
they think will work best.
11
Back to class size:
Conceive an ideal randomized controlled experiment for
measuring the effect on Test Score of reducing STR…
How does our observational data differ from this ideal?
The treatment is not randomly assigned
Consider PctEL – percent English learners – in the district.
It plausibly satisfies the two criteria for omitted variable
bias: Z = PctEL is:
1. a determinant of Y; and
2. correlated with the regressor X.
The “control” and “treatment” groups differ in a systematic
way – corr(STR,PctEL) 0
12
Randomized controlled experiments:
Randomization + control group means that any differences
between the treatment and control groups are random – not
systematically related to the treatment
We can eliminate the difference in PctEL between the large
(control) and small (treatment) groups by examining the
effect of class size among districts with the same PctEL.
If the only systematic difference between the large and
small class size groups is in PctEL, then we are back to the
randomized controlled experiment – within each PctEL
group.
This is one way to “control” for the effect of PctEL when
estimating the effect of STR.
13
Return to omitted variable bias
Three ways to overcome omitted variable bias
1. Run a randomized controlled experiment in which treatment
(STR) is randomly assigned: then PctEL is still a determinant
of TestScore, but PctEL is uncorrelated with STR. (But this is
unrealistic in practice.)
2. Adopt the “cross tabulation” approach, with finer gradations
of STR and PctEL – within each group, all classes have the
same PctEL, so we control for PctEL (But soon we will run
out of data, and what about other determinants like family
income and parental education?)
3. Use a regression in which the omitted variable (PctEL) is no
longer omitted: include PctEL as an additional regressor in a
multiple regression.
14
The Population Multiple Regression
Model (SW Section 6.2)
Consider the case of two regressors:
Yi = 0 + 1X1i + 2X2i + ui, i = 1,…,n
Y = 0 + 1X1 + 2X2
Difference: Y = 1X1
So:
Y
1 = , holding X2 constant
X 1
Y
2 = , holding X1 constant
X 2
n
min b0 ,b1 ,b2 [Yi (b0 b1 X 1i b2 X 2i )]2
i 1
·
TestScore = 698.9 – 2.28STR
·
TestScore = 686.0 – 1.10STR – 0.65PctEL
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
·
TestScore = 686.0 – 1.10STR – 0.65PctEL
n
1
SER = i
n k 1 i 1
ˆ
u 2
1 n 2
RMSE =
n i 1
uˆi
22
2
R2 and R
The R2 is the fraction of the variance explained – same definition
as in regression with a single regressor:
2 ESS SSR
R = = 1 ,
TSS TSS
n n n
where ESS = (Yˆi Yˆ ) , SSR =
i 1
2
uˆ , TSS =
i 1
2
i i
(Y
i 1
Y ) 2
.
23
2
R2 and R , ctd.
The R 2 (the “adjusted R2”) corrects this problem by “penalizing”
you for including another regressor – the R 2 does not necessarily
increase when you add another regressor.
2 n 1 SSR
Adjusted R : R = 1
2
n k 1 TSS
Note that R 2 < R2, however if n is large the two will be very
close.
24
Measures of fit, ctd.
Test score example:
(1) ·
TestScore = 698.9 – 2.28STR,
R2 = .05, SER = 18.6
(2) ·
TestScore = 686.0 – 1.10STR – 0.65PctEL,
R2 = .426, R 2 = .424, SER = 14.5
What – precisely – does this tell you about the fit of regression
(2) compared with regression (1)?
Why are the R2 and the R 2 so close in (2)?
25
The Least Squares Assumptions for
Multiple Regression (SW Section 6.5)
Yi = 0 + 1X1i + 2X2i + … + kXki + ui, i = 1,…,n
26
Assumption #1: the conditional mean of
u given the included X’s is zero.
E(u|X1 = x1,…, Xk = xk) = 0
28
Assumption #4: There is no perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors.
30
The Sampling Distribution of the
OLS Estimator (SW Section 6.6)
Under the four Least Squares Assumptions,
The exact (finite sample) distribution of ˆ1 has mean 1,
var( ˆ1 ) is inversely proportional to n; so too for ˆ2 .
Other than its mean and variance, the exact (finite-n)
distribution of ˆ1 is very complicated; but for large n…
p
ˆ1 is consistent: ˆ1 1 (law of large numbers)
ˆ1 E ( ˆ1 )
is approximately distributed N(0,1) (CLT)
var( ˆ1 )
So too for ˆ2 ,…, ˆk
Conceptually, there is nothing new here!
31
Multicollinearity, Perfect and
Imperfect (SW Section 6.7)
Some more examples of perfect multicollinearity
The example from earlier: you include STR twice.
Second example: regress TestScore on a constant, D, and B,
where: Di = 1 if STR ≤ 20, = 0 otherwise; Bi = 1 if STR >20,
= 0 otherwise, so Bi = 1 – Di and there is perfect
multicollinearity
Would there be perfect multicollinearity if the intercept
(constant) were somehow dropped (that is, omitted or
suppressed) in this regression?
This example is a special case of…
32
The dummy variable trap
Suppose you have a set of multiple binary (dummy)
variables, which are mutually exclusive and exhaustive – that is,
there are multiple categories and every observation falls in one
and only one category (Freshmen, Sophomores, Juniors, Seniors,
Other). If you include all these dummy variables and a constant,
you will have perfect multicollinearity – this is sometimes called
the dummy variable trap.
Why is there perfect multicollinearity here?
Solutions to the dummy variable trap:
1. Omit one of the groups (e.g. Senior), or
2. Omit the intercept
What are the implications of (1) or (2) for the interpretation of
the coefficients?
33
Perfect multicollinearity, ctd.
Perfect multicollinearity usually reflects a mistake in the
definitions of the regressors, or an oddity in the data
If you have perfect multicollinearity, your statistical software
will let you know – either by crashing or giving an error
message or by “dropping” one of the variables arbitrarily
The solution to perfect multicollinearity is to modify your list
of regressors so that you no longer have perfect
multicollinearity.
34
Imperfect multicollinearity
Imperfect and perfect multicollinearity are quite different despite
the similarity of the names.
35
Imperfect multicollinearity, ctd.
Imperfect multicollinearity implies that one or more of the
regression coefficients will be imprecisely estimated.
Intuition: the coefficient on X1 is the effect of X1 holding X2
constant; but if X1 and X2 are highly correlated, there is very
little variation in X1 once X2 is held constant – so the data are
pretty much uninformative about what happens when X1
changes but X2 doesn’t, so the variance of the OLS estimator
of the coefficient on X1 will be large.
Imperfect multicollinearity (correctly) results in large
standard errors for one or more of the OLS coefficients.
The math? See SW, App. 6.2