Regression Model Assumptions

Regression Model Assumptions
Asad Dossani
Fin 625: Quantitative Methods in Finance
1 / 68
Regression Model Assumptions
E(ut ) = 0
Var(ut ) = σ 2 < ∞
Cov(ut , uj ) = 0
Cov(ut , xt ) = 0
ut ∼ N(0, σ 2 )
2 / 68
Statistical Distributions of Diagnostic Tests
Lagrange Multiplier (LM) tests follow a χ2 distribution with de-

grees of freedom equal to the number of restrictions placed on the
model, denoted by m. The Wald test follows an F distribution with
(m, T − k) degrees of freedom. Asymptotically, the two are equiv-
alent, though their results may differ in small samples.
χ2 (m)
F (m, T − k) → as T → ∞
m
3 / 68
Assumption 1: E(ut ) = 0
If a constant term is included in a regression, this term is never

violated. If a constant is not included, this could bias the estimates
of the slope coefficient. R 2 is not a meaningful statistic, since ȳ is
not necessarily equal to the mean of the fitted values ŷ .
4 / 68
Inclusion of a Constant Term
5 / 68
Assumption 2: Var(ut ) = σ 2 < ∞
Homoskedasticity is the assumption that the variance of the error

term is constant. If the errors do not have a constant variance, they
are said to be heteroskedastic.
6 / 68
Heteroskedasticity
7 / 68
Heteroskedasticity
The OLS estimators in the presence of heteroskedasticity are still

unbiased and consistent, but are no longer BLUE (best linear unbi-
ased estimators). The standard errors of the OLS estimator are no
longer correct. If the variance of errors is positively related to the
square of an explanatory variable, the OLS standard errors for the
slope coefficient will be too low, and vice versa.
8 / 68
White’s (1980) Test for Heteroskedasticity
We first estimate the regression model using OLS. Next, we collect

the residuals ût , and regress the squared residuals on a constant, the
original explanatory variables, the squares of the explanatory vari-
ables, and their cross products. This is called an auxiliary regression.
yt = β1 + β2 x2t + β3 x3t + ut
ût2 = α1 + α2 x2t + α3 x3t + α4 x2t
2 2
+ α5 x3t + α6 x2t x3t + vt
9 / 68
If one or more of the coefficients in the model is statistically sig-

nificant, the R 2 will be relatively high, and vice versa. Under the
null hypothesis of homoskedasticity TR 2 ∼ χ2 (m), where T is the
sample size and m is the number of regressors in the auxiliary re-
gression, excluding the constant. If the test statistic is greater than
the critical value, we reject the null hypothesis of homoskedasticity,
and vice versa.
10 / 68
Suppose the auxiliary regression R 2 = 0.05, and T = 120. Perform

the test at the 5% level of significance.
test statistic = TR 2
test statistic = (120)(0.05)
test statistic = 6.0
χ20.05 (5) = 11.07
Since the test statistic is less than the critical value, we do not reject
the null hypothesis.
11 / 68
Solutions to Heteroskedasticity
One solution is to estimate the regression using OLS, and correct

the standard errors. White’s (1980) heteroskedasticity consistent
(robust) standard errors are given by:
Var(β̂) = (X 0 X )−1 (X ΣX )(X 0 X )−1

Σ = diag(û12 , û22 , . . . , û1T )
12 / 68
Generalized Least Squares
Another solution is to transform the data so that the errors are no

longer heteroskedastic. This is known as generalized least squares
(GLS). Suppose the variance is related to a known variable zt :
yt = β1 + β2 x2t + β3 x3t + ut
Var(ut ) = σ 2 zt2
13 / 68
Generalized Least Squares
We can divide the regression equation by zt , and estimate the trans-

formed regression by OLS. The errors of the transformed data are
homoskedastic. GLS is essentially OLS applied to transformed data.
yt 1 x2t x3t ut
= β1 + β2 + β3 +
zt zt zt zt zt
2 2

ut σ z
Var = 2t = σ2
zt zt
In practice, we may not know the functional form of heteroskedas-

ticity. We can then use a feasible version of GLS, where we estimate
the functional form of heteroskedasticity.
14 / 68
Assumption 3: Cov(ui , uj ) = 0 for i 6= j
If errors are not uncorrelated with each other, they are autocorre-
lated, or serially correlated. Since population disturbances cannot
be observed, tests for autocorrelation are conduction on residuals
ût .
15 / 68
Lagged Values and First Differences
The lagged value of a variable is the value that the variable took
during the previous period. The value of yt lagged one period is
yt−1 . The value of yt lagged p periods is yt−p .
The first difference of yt is given by ∆yt :
∆yt = yt − yt−1
16 / 68
Graphical Tests for Autocorrelation
To test for autocorrelation, we investigate whether there is any rela-

tionship between ût and its previous values ût−1 , ût−2 , . . . . To start,
we can consider possible relationships between ût and ût−1 . We can
plot ût against ût−1 , and plot ût over time.
17 / 68
Positive Autocorrelation
18 / 68
Positive Autocorrelation
19 / 68
Negative Autocorrelation
20 / 68
Negative Autocorrelation
21 / 68
No Autocorrelation
22 / 68
No Autocorrelation
23 / 68
Durbin-Watson (1951) Test
The Durbin-Watson (DW ) test is a test for first order autocorre-

lation, i.e. the relationship between an error term and its previous
value. We can motivate the test using the following regression:
ut = ρut−1+vt
vt ∼ N(0, σv2 )
H0 : ρ = 0
H1 : ρ 6= 0
24 / 68
In practice, we don’t need to run the regression since the test statistic
can be calculated using quantities available after the regression has
been run.
PT 2
t=2 (ût − ût−1 )
DW = PT 2
t=2 ût
≈ 2(1 − ρ̂)
25 / 68
PT 2
PT 2
PT
t=2 ût + t=2 ût−1 − 2 t=2 ût ût−1
DW = PT 2
t=2 ût
PT 2
PT
2 t=2 ût −2 t=2 ût ût−1
≈ PT as T → ∞
2
t=2 ût
PT
t=2 ût ût−1
≈2 1− P T 2
t=2 ût
≈ 2(1 − ρ̂)
26 / 68
−1 ≤ ρ̂ ≤ 1 → 0 ≤ DW ≤ 4
ρ̂ = 0 → DW = 2 (no autocorrelation)
ρ̂ = 1 → DW = 0 (perfect positive autocorrelation)
ρ̂ = −1 → DW = 4 (perfect negative autocorrelation)
For the DW test to be valid, the regression must have a constant,

no lags of the dependent variable, and the regressors must be non-
stochastic.
27 / 68
The DW test does not follow a standard distribution. DW has two

critical values dU (upper) and dL (lower). There is an intermediate
region whether the null can neither be rejected nor not rejected.
28 / 68
Breusch-Godfrey Test for Autocorrelation
The Breusch-Godfrey test is a joint test for autocorrelation that

examines the relationship between ût and several of its lagged values
at the same time.
ut = ρ1 ut−1 + ρ2 ut−2 + · · · + ρr ut−r + vt

vt ∼ N(0, σv2 )
H0 : ρ 1 = 0 & ρ 2 = 0 & . . . & ρ r = 0
H1 : ρ1 6= 0 or ρ2 6= 0 or . . . or ρr 6= 0
29 / 68
First, we estimate a linear regression using OLS, and obtain the

residuals ût . Second, regress we ût on all regressors and r lagged
values of the residuals. This is the auxiliary regression. Suppose the
regressors are a constant, x2t , x3t , and x4t .
ût = γ1 + γ2 x2t + γ3 x3t + γ4 x4t + ρ1 ût−1 + ρ2 ût−2 + · · · + ρr ût−r + vt
30 / 68
We obtain the R 2 from the auxiliary regression. Let T denote the

number of observations. The statistic is given by:
(T − r )R 2 ∼ χ2r
If the test statistic exceeds the critical value, we reject the null
hypothesis of no autocorrelation, and vice versa.
31 / 68
Suppose the auxiliary regression R 2 = 0.10, T = 120, and the

number of lags is r = 3. Perform the test at the 5% level of
significance.
test statistic = (T − r )R 2
test statistic = (120 − 3)0.10
χ2 (3) = 7.81
Since the test statistic is greater than the critical value, we reject
32 / 68
Cochrane-Orcutt Procedure
If the form of autocorrelation is known, one approach is to use a

GLS procedure, known as the Cochrane-Orcutt procedure. Suppose
we regress yt on a constant, x2t and x3t . We assume a particular
functional form for the structure of the autocorrelation:
yt = β1 + β2 x2t + β3 x3t + ut
ut = ρut−1 + vt
33 / 68
Suppose we lag the regression equation, multiply by it ρ, and sub-

tract it from the original regresion equation.
yt = β1 + β2 x2t + β3 x3t + ut
ρyt−1 = ρβ1 + ρβ2 x2t−1 + ρβ3 x3t−1 + ρut−1
yt − ρyt−1 = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β2 (x3t − ρx3t−1 )
+ (ut − ρut−1 )
34 / 68
yt − ρyt−1 = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β3 (x3t − ρx3t−1 )

+ (ut − ρut−1 )
yt∗ = yt − ρyt−1
β1∗ = β1 (1 − ρ)
∗
x2t = x2t − ρx2t−1
∗
x3t = x3t − ρx3t−1
vt = ut − ρut−1
yt∗ = β1∗ + β2 x2t
∗ ∗
+ β3 x3t + vt
We can now estimate the transformed regression using OLS because

vt is not autocorrelated.
35 / 68
In practice, we do not know ρ, so we have to estimate it. This is

called feasible GLS. We first estimate the original regression using
OLS, and obtain the fitted residuals ût . We then run the following
regression:
ût = ρût−1 + vt
Using the estimated ρ̂, we run the GLS regression. We can addition-
ally obtain better estimates by going through the process multiple
times. After running the first GLS regression, we again correct it
for autocorrelation to obtain a new estimate for ρ̂. This procedure
is repeated until the change in ρ̂ from one iteration to the next is
sufficiently small.
36 / 68
The Cochrane-Orcutt procedure requires a specific assumption re-
garding the form of autocorrelation. Suppose we move ρyt−1 to the
right hand side of the regression equation.
yt = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β2 (x3t − ρx3t−1 )

+ ρyt−1 + vt
yt = β1 (1 − ρ) + β2 x2t − ρβ2 x2t−1 + β3 x3t − ρβ3 x3t−1
+ ρyt−1 + vt
yt = γ1 + γ2 x2t + γ3 x2t−1 + γ4 x3t + γ5 x3t−1 + γ6 yt−1 + vt
We could estimate an equation containing the same variables using

OLS. The Cochrane-Orcutt procedure is a restricted version of the
OLS regression with γ2 γ6 = −γ3 and γ4 γ6 = −γ5 .
37 / 68
These are known as common factor restrictions and should be tested

before Cochrane-Orcutt or similar procedure is applied. In practice,
these restrictions may be invalid, and a dynamic model should be
used instead. A dynamic model means we simply estimate the un-
restricted model using OLS.
38 / 68
Newey-West Variance Covariance Estimator
An alternative approach to dealing with autocorrelation is to esti-

mate the model using OLS and correct the standard errors. The
Newey-West variance covariance estimator is consistent in the pres-
ence of both autocorrelation and heteroskedasticity. It requires spec-
ification of a trunction lag length L to determine the number of
lagged residuals used to evaluate the autocorrelation.
T L T
ut xt xt0 + wl ut ut−l (xt xt−l + xt−l xt0 )
1 X 2 1 X X 0
Var(β̂) =
T T
t=1 l=1 t=l+1
l
wl = 1 −
L+1
39 / 68
Dynamic Models
Suppose the current value of yt depends on the current and previous

values of xt . This is known as a distributed lag model (first equa-
tion). If yt also depends on the previous values of yt , it is known as
an autoregressive distributed lag model (ARDL) (second equation).
yt = α + β0 xt + β1 xt−1 + ut
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
40 / 68
Dynamic Models
Including lags in a regression can often eliminate autocorrelation.

Lags can capture the dynamic structure of the dependent variable.
For example, a change in the explanatory variable may not affect the
dependent variable immediately, but instead with a lag over several
time periods. A general form of an ARDL(p,q) model with p lags
of xt and q lags of yt is as follows:
yt = α + β0 xt + β1 xt−1 + · · · + βp xt−p + γ1 yt−1 + · · · + γq yt−q + ut
41 / 68
Dynamic Models
How do we interpret the coefficients from a dynamic model? Con-

sider an ARDL(1,1):
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
In this model, β0 captures the immediate, or short run effect of xt

on yt .
42 / 68
Dynamic Models
We can also consider a long run equilibrium relationship between

x and y . This is equivalent to asking the effect of a permanent,
or long run change in x on y . Suppose we set yt = yt−1 = y ,
xt = xt−1 = x, and ut = E(ut ) = 0.
yt = α + β0 xt + β1 xt−1 + γ1 yt−1 + ut
y = α + β0 x + β1 x + γ1 y
(1 − γ1 )y = α + (β0 + β1 )x
α β0 + β1
y= + x
1 − γ1 1 − γ1
43 / 68
Lagged Dependent Variables
Adding lagged dependent variables violates the assumption that the

explanatory variables are non-stochastic. The OLS estimate is no
longer unbiased, but it is consistent, meaning that the bias disap-
pears as the sample size gets large.
If autocorrelation remains in a model with lagged dependent vari-

ables, OLS is no longer consistent. This can occur if not enough
lags are included in a model. If can also occur if relevant variables
are omitted from the model, and these variables are themselves au-
tocorrelated.
44 / 68
Lagged Dependent Variables
yt = α + β0 xt + γ1 yt−1 + ut
ut = ρut−1 + vt
yt−1 = α + β0 xt−1 + γ1 yt−2 + ut−1
yt = α + β0 xt + γ1 yt−1 + ρut−1 + vt
If ut is autocorrelated, and yt depends on yt−1 , then because yt−1

is correlated with ut−1 , yt−1 is also correlated with ut .
Then E(X 0 u ) = 0 is not satisfied, and OLS is not consistent.
45 / 68
Assumption 4: The xt are Non-Stochastic
The OLS estimator is consistent and unbiased in the presence of
stochastic regressors, provided the regressors are not correlated with
the error term.
y = Xβ + u
β̂ = (X 0 X )−1 X 0 y
β̂ = (X 0 X )−1 X 0 (X β + u )
β̂ = (X 0 X )−1 X 0 X β + (X 0 X )−1 X 0 u
β̂ = β + (X 0 X )−1 X 0 u
E(β̂) = E(β) + E[(X 0 X )−1 X 0 u ]
E(β̂) = β + E[(X 0 X )−1 X 0 ]E(u )
E(β̂) = β
46 / 68
Endogeneity
If one or more of the explanatory variables is contemporaneously

correlated with the error term, OLS is not consistent. This is known
as endogeneity.
Suppose xt and ut are positively correlated. When ut is high, yt

is also high. If xt is positively correlated with ut , xt is also high.
OLS will incorrectly attribute the high value of yt to a high value of
xt , when in reality yt is high because ut is high. This will lead to
inconsistent and biased parameter estimates and a fitted line that
appears to capture the data better than it does in reality.
47 / 68
Assumption 5: ut ∼ N(0, σ 2 )
The disturbances are assumed to be normally distributed. This is

required to conduct single or join hypothesis tests about the param-
eters. We can test the normality of the residuals using the Bera-
Jarque test.
A normally distributed random variable is characterized by its first

two moments, the mean and the variance. The third and fourth
standardized moments are skewness and kurtosis, respectively. A
normally distributed random variable has a skewness of zero and a
kurtosis of three, or an excess kurtosis of zero.
48 / 68
Assumption 5: ut ∼ N(0, σ 2 )
Let u denote the residuals and σ 2 denote their variance. Let b1 and
b2 denote skewness and kurtosis, respectively. T is the sample size.
The Bera-Jarque test W statistic is given by:
b 2 (b2 − 3)2

W =T 1 +
6 24
E(u 3 )
b1 =
σ3
E(u 4 )
b2 =
σ4
We can compute the test static using OLS residuals û. Under the
null hypothesis of normality, the statistic W ∼ χ2 (2). If the test
statistic exceeds the critical value, we reject the null hypothesis of
normally distributed errors.
49 / 68
Dealing with Non-Normality and Outliers
For a sufficiently large sample, non-normality is inconsequential be-

cause of the central limit theorem. Sometimes, a log transform of
the data can help to make the distribution of the residuals closer to
a normal.
Occasionally, one or two extreme observations can cause a rejection

of the normality assumption. These are known as outliers, e.g. stock
market crashes or financial crises. Outliers can have a serious effect
on OLS coefficient estimates, particularly in small samples. We can
remove outliers or include dummy variables for those observations,
if doing so is theoretically justified.
50 / 68
Outliers
51 / 68
Multicollinearity
If there is no correlation between explanatory variables, they are

called orthogonal to one another. In this case, adding or removing a
variable from the regression equation has no effect on the coefficient
estimates of the other explanatory variables.
In practice, correlation between explanatory variables is nonzero.

When explanatory variables are highly correlated with each other,
this is known as multicollinearity.
52 / 68
Multicollinearity
Perfect multicollinearity occurs when there is an exact linear rela-

tionship between two or more explanatory variables. In this case,
the regression cannot be estimated until we remove the perfectly
collinear variables.
Near multicollinearity occurs when two or more variables are highly,

but not perfectly correlated with each other. A high correlation
means that the correlation coefficient is close to one or close to
minus one.
53 / 68
Multicollinearity
The variance inflation factor (VIF ) estimates the extent to which the
variance of a parameter estimate increases because the explanatory
variables are correlated. Suppose Ri2 is the R 2 from a regression of
explanatory variable i on a constant plus all the other explanatory
variables in the regression. The VIFi is given by:
1
VIFi =
1 − Ri2
54 / 68
Multicollinearity
In the presence of multicollinearity, R 2 will be high but individual

coefficients will have high standard errors and may be statistically
insignificant.
Intuitively, a regression coefficient is the impact of an explanatory

variable on the dependent variable, holding the other explanatory
variables constant. If two explanatory variables are highly correlated
with each other, it is difficult to precisely estimate the impact of one
while holding the other constant.
55 / 68
Multicollinearity
OLS estimates are still BLUE in the presence of multicollinearity,

so one option is to ignore it. Other methods of dealing with mul-
ticollinearity include principal component analysis and ridge regres-
sion. We can also drop one of the collinear variables if doing so can
be theoretically justified.
56 / 68
Adopting the Wrong Functional Form
If the relationship between xt and yt is not linear, one possibility is

to use a non-linear model. These often require complex estimation
techniques. An alternative is to write the model so that it is linear
in the parameters, and then estimate the model by OLS. Consider a
quadratic regression:
yt = β1 + β2 xt + β3 xt2 + ut
57 / 68
Quadratic Regression: yt = β1 + β2 xt + β3 xt2 + ut
58 / 68
Logarithmic Transformation
Another approach is to transform the data into logarithms. Consider

the exponential growth model and its log transformation:
yt = β1 xtβ2 ut
ln yt = ln β1 + β2 ln xt + ln ut
59 / 68
Logarithmic Transformation
Transforming the variables into logarithms changes the interpreta-
tion of the coefficients. Each of the four possibilities has the follow-
ing interpretation:
yt = β1 + β2 xt + ut
ln yt = β1 + β2 xt + ut
yt = β1 + β2 ln xt + ut
ln yt = β1 + β2 ln xt + ut
1. 1 unit increase in x causes a β2 unit increase in y .

2. 1 unit increase in x causes a β2 ∗ 100% increase in y .
3. 1% increase in x causes a β2 /100 unit increase in y .
4. 1% increase in x causes a β2 % increase in y .
60 / 68
Omission of an Important Variable
Suppose the true data generating process is given by the first equa-
tion, but we estimate the second equation, so that x3t is omitted.
yt = β1 + β2 x2t + β3 x3t + ut
yt = β1 + β2 x2t + ut
The estimated coefficient on x2t is biased and inconsistent, unless

x2t and x3t are uncorrelated.
61 / 68
Inclusion of an Irrelevant Variable
Suppose the true data generating process is given by the first equa-
tion, but we estimate the second equation, so that x3t is included
but irrelevant.
yt = β1 + β2 x2t + ut
yt = β1 + β2 x2t + β3 x3t + ut
If x3t is irrelevant, the estimated coefficient on x3t will be close to

zero. The estimated coefficient on x2t is unbiased and consistent,
but the estimated coefficient has a higher standard error. Thus,
it is less efficient. The loss of efficiency depends positively on the
absolute value of the correlation between x2t and x3t .
62 / 68
Parameter Stability
Suppose we estimate the following regression:
The regression implicitly assumes that the coefficients β1 and β2 are

constant over the sample period. We can test this assumption using
a parameter stability test called the Chow test.
63 / 68
Chow Test
Suppose we split the sample into two sub-periods. We first estimate

the regression over the whole sample, and then separately for each
subsample. Our objective is to test whether the coefficients are
stable across the subsamples, using an F-test.
The restricted regression is the one over the whole sample, while the
unrestricted regressions are the subsample regressions. Intuitively,
the restricted regression imposes the restriction that the coefficients
are constant over the full sample, while the two unrestricted regres-
sions allow the coefficients to vary between the subsamples.
64 / 68
Chow Test
Let RSS denote the residual sum of squares for the whole sample
regression, and RSS1 and RSS2 the residual sum of squares of each
subsample, respectively. Let k denote the number of regressors in
each regression (including the constant), and T denote the sample
size of the whole sample regression. Under the null hypothesis of
parameter stability, the test statistic has an F (k, T − 2k) distribu-
tion.
RSS − (RSS1 + RSS2 ) T − 2k

test statistic = ∼ F (k, T − 2k)
RSS1 + RSS2 k
If the test statistic is greater than the critical value, we reject the
null hypothesis.
65 / 68
Chow Test
Suppose we want to test for parameter stability with a Chow test.
The regression is: yt = β1 + β2 x2t + ut . The RSS for the full sample
regression is 120, and the RSS for the two subsamples are 45 and
55, respectively. T = 120. Perform the test at the 5% level of
significance.
RSS − (RSS1 + RSS2 ) T − 2k

test statistic =
RSS1 + RSS2 k
120 − (45 + 55) 120 − 2(2)
test statistic =
45 + 55 2
F0.05 (2, 116) = 3.07
Since the test statistic is greater than the critical value, we reject
66 / 68
Measurement Error in Explanatory Variables
Suppose we don’t observe the true value of xt , and instead it is
measured with error. Let x̃t denote the observed noisy value of xt
and vt denote an iid measurement error.
x̃t = xt + vt
vt ∼ N(0, σv2 )
yt = β1 + β2 (x̃t − vt ) + ut
yt = β1 + β2 x̃t + (ut − β2 vt )
This leads to a correlation between the regressor and the composite

error term. The OLS estimate is biased and inconsistent.
67 / 68
Measurement Error in Explanatory Variables
Measurement error biases the coefficient β2 towards zero. The bias

worsens as σv2 increases.
Cov(yt , x̃t )
β̂2 =
Var(x̃t )
Cov(yt , xt + vt )
β̂2 =
Var(xt + vt )
Cov(yt , xt )
β̂2 =
Var(xt ) + σv2
68 / 68

Regression Model Assumptions

Uploaded by

Copyright:

Available Formats

Regression Model Assumptions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Model Assumptions

Uploaded by

Copyright:

Available Formats

Regression Model Assumptions

Fin 625: Quantitative Methods in Finance

Lagrange Multiplier (LM) tests follow a χ2 distribution with de-

If a constant term is included in a regression, this term is never

Homoskedasticity is the assumption that the variance of the error

The OLS estimators in the presence of heteroskedasticity are still

We first estimate the regression model using OLS. Next, we collect

If one or more of the coefficients in the model is statistically sig-

Suppose the auxiliary regression R 2 = 0.05, and T = 120. Perform

One solution is to estimate the regression using OLS, and correct

Var(β̂) = (X 0 X )−1 (X ΣX )(X 0 X )−1

Another solution is to transform the data so that the errors are no

We can divide the regression equation by zt , and estimate the trans-

In practice, we may not know the functional form of heteroskedas-

The first difference of yt is given by ∆yt :

To test for autocorrelation, we investigate whether there is any rela-

The Durbin-Watson (DW ) test is a test for first order autocorre-

For the DW test to be valid, the regression must have a constant,

The DW test does not follow a standard distribution. DW has two

The Breusch-Godfrey test is a joint test for autocorrelation that

ut = ρ1 ut−1 + ρ2 ut−2 + · · · + ρr ut−r + vt

First, we estimate a linear regression using OLS, and obtain the

ût = γ1 + γ2 x2t + γ3 x3t + γ4 x4t + ρ1 ût−1 + ρ2 ût−2 + · · · + ρr ût−r + vt

We obtain the R 2 from the auxiliary regression. Let T denote the

Suppose the auxiliary regression R 2 = 0.10, T = 120, and the

If the form of autocorrelation is known, one approach is to use a

Suppose we lag the regression equation, multiply by it ρ, and sub-

yt − ρyt−1 = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β3 (x3t − ρx3t−1 )

We can now estimate the transformed regression using OLS because

In practice, we do not know ρ, so we have to estimate it. This is

yt = β1 (1 − ρ) + β2 (x2t − ρx2t−1 ) + β2 (x3t − ρx3t−1 )

We could estimate an equation containing the same variables using

These are known as common factor restrictions and should be tested

An alternative approach to dealing with autocorrelation is to esti-

Suppose the current value of yt depends on the current and previous

Including lags in a regression can often eliminate autocorrelation.

yt = α + β0 xt + β1 xt−1 + · · · + βp xt−p + γ1 yt−1 + · · · + γq yt−q + ut

How do we interpret the coefficients from a dynamic model? Con-

In this model, β0 captures the immediate, or short run effect of xt

We can also consider a long run equilibrium relationship between

Adding lagged dependent variables violates the assumption that the

If autocorrelation remains in a model with lagged dependent vari-

If ut is autocorrelated, and yt depends on yt−1 , then because yt−1

Then E(X 0 u ) = 0 is not satisfied, and OLS is not consistent.

If one or more of the explanatory variables is contemporaneously

Suppose xt and ut are positively correlated. When ut is high, yt

The disturbances are assumed to be normally distributed. This is

A normally distributed random variable is characterized by its first

For a sufficiently large sample, non-normality is inconsequential be-

Occasionally, one or two extreme observations can cause a rejection

If there is no correlation between explanatory variables, they are

In practice, correlation between explanatory variables is nonzero.

Perfect multicollinearity occurs when there is an exact linear rela-

Near multicollinearity occurs when two or more variables are highly,

In the presence of multicollinearity, R 2 will be high but individual

Intuitively, a regression coefficient is the impact of an explanatory

OLS estimates are still BLUE in the presence of multicollinearity,

If the relationship between xt and yt is not linear, one possibility is

Another approach is to transform the data into logarithms. Consider