03 Advance
03 Advance
03 Advance
Jakub Mućk
SGH Warsaw School of Economics
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 2 / 31
Multiple regression
y = β0 + β1 x1 + β2 x2 + . . . + βK xK + ε (1)
where
I y is the (outcome) dependent variable;
I x1 , x2 , . . . , xK is the set of independent variables;
I ε is the error term.
The dependent variable is explained with the components that vary with the
the dependent variable and the error term.
β0 is the intercept.
β1 , β2 , . . . , βK are the coefficients (slopes) on x1 , x2 , . . . , xK .
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 3 / 31
Multiple regression
y = β0 + β1 x1 + β2 x2 + . . . + βK xK + ε (1)
where
I y is the (outcome) dependent variable;
I x1 , x2 , . . . , xK is the set of independent variables;
I ε is the error term.
The dependent variable is explained with the components that vary with the
the dependent variable and the error term.
β0 is the intercept.
β1 , β2 , . . . , βK are the coefficients (slopes) on x1 , x2 , . . . , xK .
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 3 / 31
Assumptions of the least squares estimators I
Assumption #1: true DGP (data generating process):
y = Xβ + ε. (2)
E (ε) = 0, (3)
E(Xε) = 0. (7)
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 4 / 31
Assumptions of the least squares estimators II
rank(X) = K + 1 ≤ N. (8)
ε ∼ N 0, σ 2 .
(9)
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 5 / 31
Gauss-Markov Theorem
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 6 / 31
The least squares estimator
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Least squares estimator 7 / 31
Estimating non-linear relationship
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 8 / 31
Estimating non-linear relationship
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 9 / 31
Examples
40
30
Orange line :
y = β1 + β2 x
20
Red line :
y = β1 + β2 x2
10
0
−2 0 2 4 6
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 10 / 31
Examples
5.0
Orange line :
4.5
y = β1 + β2 x
Red line :
y = β1 + β2 ln x
4.0
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 10 / 31
Examples
20
15
Orange line :
y = β1 + β2 x
10
Red line :
ln y = β1 + β2 x
5
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 10 / 31
Examples
20
15
Orange line :
y = β1 + β2 x
10
Red line :
ln y = β1 + β2 ln x
5
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 10 / 31
How to interpret coefficients
∂E (y)
Marginal effect = (14)
∂x
In other words, the marginal effects is the slope of the tangent to the curve
at a particular point.
Elasticity measures the percentage change in y in a reaction to percentage
change in x:
∂E (y) x
Elasticity = . (15)
∂x y
Semi-elasticity measures the percentage change in y in a reaction to a
change in x
∂E (y) 1
Semi-Elasticity = . (16)
∂x y
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 11 / 31
Some useful functions
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 12 / 31
Interaction variable
In this case:
∂E (w)
Marginal effect of education = = β3 + β4 exper,
∂educ
∂E (w)
Marginal effect of experience = = β2 + 2β3 exper + β4 educ.
∂exper
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Estimating non-linear relationship 13 / 31
Model Specification
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Model Specification 14 / 31
Model Specification
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Model Specification 15 / 31
Omitted variables I
Omission of a relevant variable (defined as one whose coefficient is nonzero)
might lead to an estimator that is biased. This bias is known as omitted-
variable bias.
Let’s assume true DGP (data generating process):
y = β0 + β1 x1 + β2 x2 + ε. (18)
Consider the case when we do not have data on x2 .
Equivalently, we impose the restriction that β2 = 0. According to our true
DGP this restriction is invalid.
Then the expected value of the least squares estimator of β1 :
cov(x1 , x2 )
E(β̂1LS ) = β1 + β2 , (19)
var(x2 )
and the omitted variable bias:
cov(x1 , x2 )
bias β̂1LS = E(β̂1LS ) − β1 = β2
. (20)
var(x2 )
The omitted bias is larger if:
I the true slope on omitted variable β2 is higher,
I the omitted variable (x2 ) is more correlated with the included variable (x3 ).
However, there is no bias when the omitted variable is not correlated with
the explanatory variables.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Model Specification 16 / 31
Irrelevant variables I
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Model Specification 17 / 31
RESET test I
RESET (REgression Specification Error Test) is designed to detect
omitted variables and incorrect functional form.
Consider the multiple linear regression:
y = β0 + β1 x1 + . . . + βk xk + ε. (21)
[Step #1]. Obtain the least square estimates and calculate the fitted values:
Model 1 : y = β0 + β1 x1 + . . . + βk xk + γ1 ŷ 2 + ε.
Model 2 : y = β0 + β1 x1 + . . . + βk xk + γ1 ŷ 2 + γ2 ŷ 3 + ε.
Model 1 : H0 : γ1 = 0,
Model 2 : H0 : γ1 = γ2 = 0,
The RESET test is very general test allowing for testing functional form.
However, if we reject the null we do not know what is the source of misspec-
ification.
If a number of observations is large one might replace squared and cubic fitted
values of outcome variable by squared and cubic of explanatory variables.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Model Specification 19 / 31
Collinearity
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Collinearity 20 / 31
Collinearity I
When data are the result of an uncontrolled experiment, many of the eco-
nomic variables may move together in systematic ways.
This problem is labeled collinearity and explanatory variable are said to be
collinear.
Example: multiple regression with two explanatory variable
y = β0 + β1 x1 + β2 x2 + ε. (23)
The variance of the least squares estimator for β2 :
σ2
var β̂2LS =
PN , (24)
2
(1 − r12 ) i=1
(xi2 − x̄2 )
where r12 is the correlation between x1 and x2 .
Extreme case: r23 = 1 then the x1 and x2 are perfectly collinear. In this
case the least squares estimator is not defined and we cannot obtain the least
squares estimates.
2
If r12 is large then:
I the standards errors are large =⇒ small (in modulus) t statistics. Typically, it
leads to the conclusion that parameter estimates are not significantly different
from zero,
I estimates may be very sensitive to the inclusion or exclusion of a few observa-
tions,
I estimates may be very sensitive to the exclusion of insignificant variables.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Collinearity 21 / 31
Identifying and mitigating collinearity
Detecting collinearity:
I pairwise correlation between explanatory variables,
I variance inflation factor (VIF) which is calculated for each explanatory
variable. The VIF is a function of R2 from auxiliary regression of the selected
explanatory variable on the remaining explanatory variables:
1
V IFi = . (25)
1 − Ri2
The values above 10 suggests collinearity.
Dealing with collinearity:
I Obtaining more infromation.
I Using non-sample information, i.e., restrictions on parameters.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Collinearity 22 / 31
Normality of the error term
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Normality of the error term 23 / 31
Normality of the error term
The assumption of the error term is crucial to test the hypothesis. However,
the error term is random variable and , therefore, is not unobservable.
The normality of the error term can be justified on the basis of the residuals
properties.
The assessment of this assumption bases on:
I the residuals histogram,
I results of the Jarque-Berra test.
But if the sample is sufficiently large then, according to a central limit the-
orem, the distribution of least squares estimator can be approximated by
normal distribution.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Normality of the error term 24 / 31
The Jarque-Berra test
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Normality of the error term 25 / 31
Goodness-of -fit
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 26 / 31
Goodness-of -fit I
The observed values (yi ) of dependent variable can be decomposed into the
fitted values (ŷi ) and the residuals (êi ):
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 27 / 31
Goodness-of -fit II
PN
I SSR is the sum of squares due to regression and SSR = (ŷi − ȳ)2 ,
Pi=1
N
I SSE is the sum of squares due to regression and SSE = 2
ê .
i=1 i
2
Coefficient of determination R is the proportion of variation that can
be explained by independent variables:
SSR SSE
R2 = =1− , (31)
SST SST
R2 ∈< 0, 1 >.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 28 / 31
Correlation coefficient and R2
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 29 / 31
Adjusted R2
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 30 / 31
Information Criteria
SSE K ln(K)
SIC = ln + . (37)
N N
Using the above criteria, the lower values of AIC/BIC signals better fit to
data.
Jakub Mućk Advanced Applied Econometrics OLS estimator: verifying assumptions Goodness-of -fit 31 / 31