Regression Model Assumptions
Regression Model Assumptions
Regression Model Assumptions
Asad Dossani
1 / 68
Regression Model Assumptions
E(ut ) = 0
Var(ut ) = σ 2 < ∞
Cov(ut , uj ) = 0
Cov(ut , xt ) = 0
ut ∼ N(0, σ 2 )
2 / 68
Statistical Distributions of Diagnostic Tests
χ2 (m)
F (m, T − k) → as T → ∞
m
3 / 68
Assumption 1: E(ut ) = 0
4 / 68
Inclusion of a Constant Term
5 / 68
Assumption 2: Var(ut ) = σ 2 < ∞
6 / 68
Heteroskedasticity
7 / 68
Heteroskedasticity
8 / 68
White’s (1980) Test for Heteroskedasticity
yt = β1 + β2 x2t + β3 x3t + ut
ût2 = α1 + α2 x2t + α3 x3t + α4 x2t
2 2
+ α5 x3t + α6 x2t x3t + vt
9 / 68
White’s (1980) Test for Heteroskedasticity
10 / 68
White’s (1980) Test for Heteroskedasticity
test statistic = TR 2
test statistic = (120)(0.05)
test statistic = 6.0
χ20.05 (5) = 11.07
Since the test statistic is less than the critical value, we do not reject
the null hypothesis.
11 / 68
Solutions to Heteroskedasticity
12 / 68
Generalized Least Squares
yt = β1 + β2 x2t + β3 x3t + ut
Var(ut ) = σ 2 zt2
13 / 68
Generalized Least Squares
yt 1 x2t x3t ut
= β1 + β2 + β3 +
zt zt zt zt zt
2 2
ut σ z
Var = 2t = σ2
zt zt
14 / 68
Assumption 3: Cov(ui , uj ) = 0 for i 6= j
If errors are not uncorrelated with each other, they are autocorre-
lated, or serially correlated. Since population disturbances cannot
be observed, tests for autocorrelation are conduction on residuals
ût .
15 / 68
Lagged Values and First Differences
The lagged value of a variable is the value that the variable took
during the previous period. The value of yt lagged one period is
yt−1 . The value of yt lagged p periods is yt−p .
∆yt = yt − yt−1
16 / 68
Graphical Tests for Autocorrelation
17 / 68
Positive Autocorrelation
18 / 68
Positive Autocorrelation
19 / 68
Negative Autocorrelation
20 / 68
Negative Autocorrelation
21 / 68
No Autocorrelation
22 / 68
No Autocorrelation
23 / 68
Durbin-Watson (1951) Test
ut = ρut−1+vt
vt ∼ N(0, σv2 )
H0 : ρ = 0
H1 : ρ 6= 0
24 / 68
Durbin-Watson (1951) Test
In practice, we don’t need to run the regression since the test statistic
can be calculated using quantities available after the regression has
been run.
PT 2
t=2 (ût − ût−1 )
DW = PT 2
t=2 ût
≈ 2(1 − ρ̂)
25 / 68
Durbin-Watson (1951) Test
PT 2
PT 2
PT
t=2 ût + t=2 ût−1 − 2 t=2 ût ût−1
DW = PT 2
t=2 ût
PT 2
PT
2 t=2 ût −2 t=2 ût ût−1
≈ PT as T → ∞
2
t=2 ût
PT
t=2 ût ût−1
≈2 1− P T 2
t=2 ût
≈ 2(1 − ρ̂)
26 / 68
Durbin-Watson (1951) Test
−1 ≤ ρ̂ ≤ 1 → 0 ≤ DW ≤ 4
ρ̂ = 0 → DW = 2 (no autocorrelation)
ρ̂ = 1 → DW = 0 (perfect positive autocorrelation)
ρ̂ = −1 → DW = 4 (perfect negative autocorrelation)
27 / 68
Durbin-Watson (1951) Test
28 / 68
Breusch-Godfrey Test for Autocorrelation
29 / 68
Breusch-Godfrey Test for Autocorrelation
30 / 68
Breusch-Godfrey Test for Autocorrelation
(T − r )R 2 ∼ χ2r
If the test statistic exceeds the critical value, we reject the null
hypothesis of no autocorrelation, and vice versa.
31 / 68
Breusch-Godfrey Test for Autocorrelation
test statistic = (T − r )R 2
test statistic = (120 − 3)0.10
test statistic = 11.7
χ2 (3) = 7.81
Since the test statistic is greater than the critical value, we reject
the null hypothesis.
32 / 68
Cochrane-Orcutt Procedure
yt = β1 + β2 x2t + β3 x3t + ut
ut = ρut−1 + vt
33 / 68
Cochrane-Orcutt Procedure
yt = β1 + β2 x2t + β3 x3t + ut
ρyt−1 = ρβ1 + ρβ2 x2t−1 + ρβ3 x3t−1 + ρut−1
yt − ρyt−1 = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β2 (x3t − ρx3t−1 )
+ (ut − ρut−1 )
34 / 68
Cochrane-Orcutt Procedure
ût = ρût−1 + vt
Using the estimated ρ̂, we run the GLS regression. We can addition-
ally obtain better estimates by going through the process multiple
times. After running the first GLS regression, we again correct it
for autocorrelation to obtain a new estimate for ρ̂. This procedure
is repeated until the change in ρ̂ from one iteration to the next is
sufficiently small.
36 / 68
Cochrane-Orcutt Procedure
The Cochrane-Orcutt procedure requires a specific assumption re-
garding the form of autocorrelation. Suppose we move ρyt−1 to the
right hand side of the regression equation.
38 / 68
Newey-West Variance Covariance Estimator
T L T
ut xt xt0 + wl ut ut−l (xt xt−l + xt−l xt0 )
1 X 2 1 X X 0
Var(β̂) =
T T
t=1 l=1 t=l+1
l
wl = 1 −
L+1
39 / 68
Dynamic Models
yt = α + β0 xt + β1 xt−1 + ut
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
40 / 68
Dynamic Models
41 / 68
Dynamic Models
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
42 / 68
Dynamic Models
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
y = α + β0 x + β1 x + γ1 y
(1 − γ1 )y = α + (β0 + β1 )x
α β0 + β1
y= + x
1 − γ1 1 − γ1
43 / 68
Lagged Dependent Variables
44 / 68
Lagged Dependent Variables
yt = α + β0 xt + γ1 yt−1 + ut
ut = ρut−1 + vt
yt−1 = α + β0 xt−1 + γ1 yt−2 + ut−1
yt = α + β0 xt + γ1 yt−1 + ρut−1 + vt
45 / 68
Assumption 4: The xt are Non-Stochastic
The OLS estimator is consistent and unbiased in the presence of
stochastic regressors, provided the regressors are not correlated with
the error term.
y = Xβ + u
β̂ = (X 0 X )−1 X 0 y
β̂ = (X 0 X )−1 X 0 (X β + u )
β̂ = (X 0 X )−1 X 0 X β + (X 0 X )−1 X 0 u
β̂ = β + (X 0 X )−1 X 0 u
E(β̂) = E(β) + E[(X 0 X )−1 X 0 u ]
E(β̂) = β + E[(X 0 X )−1 X 0 ]E(u )
E(β̂) = β
46 / 68
Endogeneity
47 / 68
Assumption 5: ut ∼ N(0, σ 2 )
48 / 68
Assumption 5: ut ∼ N(0, σ 2 )
Let u denote the residuals and σ 2 denote their variance. Let b1 and
b2 denote skewness and kurtosis, respectively. T is the sample size.
The Bera-Jarque test W statistic is given by:
b 2 (b2 − 3)2
W =T 1 +
6 24
E(u 3 )
b1 =
σ3
E(u 4 )
b2 =
σ4
We can compute the test static using OLS residuals û. Under the
null hypothesis of normality, the statistic W ∼ χ2 (2). If the test
statistic exceeds the critical value, we reject the null hypothesis of
normally distributed errors.
49 / 68
Dealing with Non-Normality and Outliers
50 / 68
Outliers
51 / 68
Multicollinearity
52 / 68
Multicollinearity
53 / 68
Multicollinearity
The variance inflation factor (VIF ) estimates the extent to which the
variance of a parameter estimate increases because the explanatory
variables are correlated. Suppose Ri2 is the R 2 from a regression of
explanatory variable i on a constant plus all the other explanatory
variables in the regression. The VIFi is given by:
1
VIFi =
1 − Ri2
54 / 68
Multicollinearity
55 / 68
Multicollinearity
56 / 68
Adopting the Wrong Functional Form
yt = β1 + β2 xt + β3 xt2 + ut
57 / 68
Quadratic Regression: yt = β1 + β2 xt + β3 xt2 + ut
58 / 68
Logarithmic Transformation
yt = β1 xtβ2 ut
ln yt = ln β1 + β2 ln xt + ln ut
59 / 68
Logarithmic Transformation
Transforming the variables into logarithms changes the interpreta-
tion of the coefficients. Each of the four possibilities has the follow-
ing interpretation:
yt = β1 + β2 xt + ut
ln yt = β1 + β2 xt + ut
yt = β1 + β2 ln xt + ut
ln yt = β1 + β2 ln xt + ut
60 / 68
Omission of an Important Variable
Suppose the true data generating process is given by the first equa-
tion, but we estimate the second equation, so that x3t is omitted.
yt = β1 + β2 x2t + β3 x3t + ut
yt = β1 + β2 x2t + ut
61 / 68
Inclusion of an Irrelevant Variable
Suppose the true data generating process is given by the first equa-
tion, but we estimate the second equation, so that x3t is included
but irrelevant.
yt = β1 + β2 x2t + ut
yt = β1 + β2 x2t + β3 x3t + ut
62 / 68
Parameter Stability
yt = β1 + β2 xt + ut
63 / 68
Chow Test
The restricted regression is the one over the whole sample, while the
unrestricted regressions are the subsample regressions. Intuitively,
the restricted regression imposes the restriction that the coefficients
are constant over the full sample, while the two unrestricted regres-
sions allow the coefficients to vary between the subsamples.
64 / 68
Chow Test
Let RSS denote the residual sum of squares for the whole sample
regression, and RSS1 and RSS2 the residual sum of squares of each
subsample, respectively. Let k denote the number of regressors in
each regression (including the constant), and T denote the sample
size of the whole sample regression. Under the null hypothesis of
parameter stability, the test statistic has an F (k, T − 2k) distribu-
tion.
If the test statistic is greater than the critical value, we reject the
null hypothesis.
65 / 68
Chow Test
Suppose we want to test for parameter stability with a Chow test.
The regression is: yt = β1 + β2 x2t + ut . The RSS for the full sample
regression is 120, and the RSS for the two subsamples are 45 and
55, respectively. T = 120. Perform the test at the 5% level of
significance.
Since the test statistic is greater than the critical value, we reject
the null hypothesis.
66 / 68
Measurement Error in Explanatory Variables
Suppose we don’t observe the true value of xt , and instead it is
measured with error. Let x̃t denote the observed noisy value of xt
and vt denote an iid measurement error.
yt = β1 + β2 xt + ut
x̃t = xt + vt
vt ∼ N(0, σv2 )
yt = β1 + β2 (x̃t − vt ) + ut
yt = β1 + β2 x̃t + (ut − β2 vt )
67 / 68
Measurement Error in Explanatory Variables
Cov(yt , x̃t )
β̂2 =
Var(x̃t )
Cov(yt , xt + vt )
β̂2 =
Var(xt + vt )
Cov(yt , xt )
β̂2 =
Var(xt ) + σv2
68 / 68