Handout 4 Multiple Regression
Handout 4 Multiple Regression
Handout 4 Multiple Regression
Class size
Test scores
HH income
• Omitted variable bias does NOT go away with larger sample: the slope estimate is not consistent.
• The direction of the bias depends on the correlation of u and X.
4.3. Addressing omitted variable bias using multiple regression (multivariable regression)
• Basically, add the omitted variable(s) to the regression, which then controls for their effect on Y.
• Regression model with two regressors (two explanatory variables): Yi = b0 + b1 X1i + b2 X 2i + ui
• The interpretation of each slope is now the partial effect, holding the other variable(s) constant:
DY ¶Y
b1 = holding X 2 constant =
DX1 ¶ X1
• Can add many variables. Obviously have to have more observations than variables! But this is no problem
in era of big data. Can have regressions with thousands of explanatory/control/covariates/independent
variables.
• The multiple regression coefficients can be estimated using OLS: again, choose the intercept and slope
parameters (βs) to minimize the sum of squared deviations between Y and predicted Y.
• If one of the explanatory variables is a dummy variable, interpret the estimated coefficient as the
difference, on average, in the outcome, between the 1 category and the 0 category, holding constant the
level of the other covariates.
4.4. Least squares assumptions for multiple regression (first three are basically the same as before)
• Error term has conditional mean zero: E(ui | X1i, X2i, … Xki) = 0
• Observations are i.i.d. (random sampling)
• Large outliers are unlikely
• No perfect multicollinearity