Module 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

MULTIPLE LINEAR REGRESSION

The two-variable model is often inadequate in practice for instance, in our consumption–income
example, it was assumed implicitly that only income X is related to consumption Y. But, besides
income, several other variables are also likely to affect consumption expenditure. An obvious
example is the wealth of the consumer. As another example, the demand for a commodity is likely
to depend not only on its price but also on the prices of other competing or complementary goods,
income of the consumer, social status, etc. Therefore, we need to extend our simple two-variable
regression model to cover models involving more than two variables. Adding more variables leads
us to the discussion of multiple regression models, that is, models in which the dependent
variable, or regressand, Y depends on two or more explanatory variables, or regressors.

The simplest possible multiple regression model is a three-variable regression, with one dependent
variable and two explanatory variables.

The Three-Variable Model

Generalizing the two-variable population regression function (PRF), we may write the three-
variable PRF as

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖

where Y is the dependent variable, 𝑋2 and 𝑋3 the explanatory variables (or regressors), 𝑢 the
stochastic disturbance term, and 𝑖 the ith observation.

In the equation 𝛽1 is the intercept term, it gives the average value of Y when 𝑋2 and 𝑋3 are set
equal to zero. The coefficients 𝛽2 and 𝛽3 are called the partial regression coefficient. The meaning
of partial regression coefficient is as follows: 𝛽2 measures the change in the mean value of Y,
E(Y), per unit change in 𝑋2, holding the value of 𝑋3 constant. Likewise, 𝛽3 measures the change
in the mean value of Y per unit change in 𝑋3, holding the value of 𝑋2 constant.

Assumptions

The multiple linear regression continue to operate within the framework of the classical linear
regression model (CLRM) which assumes the following:
1. Linear regression model, or linear in the parameters.
2. Fixed X values or X values independent of the error term. Here, this means we require zero
covariance between 𝑢𝑖 and each X variables.
𝑐𝑜𝑣 (𝑢𝑖 , 𝑋2𝑖 ) = 𝑐𝑜𝑣 (𝑢𝑖 , 𝑋3𝑖 )
3. Zero mean value of disturbance 𝑢𝑖 .
𝐸(𝑢𝑖 |𝑋2𝑖 , 𝑋3𝑖 ) = 0 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑖
4. Homoscedasticity or constant variance of 𝑢𝑖 .
𝑣𝑎𝑟 (𝑢𝑖 ) = 𝜎 2
5. No autocorrelation, or serial correlation, between the disturbances.
𝑐𝑜𝑣 (𝑢𝑖 , 𝑢𝑗 ) = 0 𝑖 ≠ 𝑗
6. The number of observations n must be greater than the number of parameters to be
estimated, which is 3 in our current case.
7. There must be variation in the values of the X variables.
In addition we will also address two other requirements.
8. No exact collinearity between the X variables.
No exact linear relationship between 𝑋2 and 𝑋3
9. There is no specification bias.
The model is correctly specified.

Given the assumptions of the classical regression model, it follows that, on taking the conditional
expectation of Y on both sides of Eq. 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖 , we obtain

𝐸(𝑌𝑖 |𝑋2𝑖 , 𝑋3𝑖 ) = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖

In words, it gives the conditional mean or expected value of Y, conditional upon the given or fixed
values of 𝑋2 and 𝑋3. Therefore, as in the two-variable case, multiple regression analysis is
regression analysis conditional upon the fixed values of the regressors, and what we obtain is the
average or mean value of Y or the mean response of Y for the given values of the regressors.

OLS Estimation of the Partial Regression Coefficients

To find the OLS estimators, let us first write the sample regression function (SRF) corresponding
to the PRF as follows:
𝑌𝑖 = 𝛽̂1 + 𝛽̂2 𝑋2𝑖 + 𝛽̂3 𝑋3𝑖 + 𝑢̂𝑖

where 𝑢̂𝑖 is the residual term, the sample counterpart of the stochastic disturbance term 𝑢𝑖 .

The OLS procedure consists of choosing the values of the unknown parameters so that the residual
sum of squares (RSS), 𝑢̂𝑖 is as small as possible. Symbolically

𝑚𝑖𝑛∑𝑢̂𝑖 = ∑(𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋2𝑖 − 𝛽̂3 𝑋3𝑖 ) 2

which is the OLS estimator of the population intercept 𝛽1 and OLS estimators of the population
partial regression coefficients 𝛽2 and 𝛽3, respectively.

Variances and Standard Errors of OLS Estimators

Having obtained the OLS estimators of the partial regression coefficients, we can derive the
variances and standard errors of these estimators. As in the two-variable case, we need the standard
errors for two main purposes: to establish confidence intervals and to test statistical hypotheses.
The relevant formulas are as follows
unbiased estimator of 𝜎 2 is given by

Note the degrees of freedom are now (𝑛 − 3) because in estimating 𝑢̂𝑖 2 we must first
estimate 𝛽1, 𝛽2, and 𝛽3, which consume 3 df.

Properties of OLS Estimators

The properties of OLS estimators of the multiple regression model parallel those of the two-
variable model. Specifically:

1. The three-variable regression line (surface) passes through the means ̅𝑌, ̅𝑋 2 , and 𝑋̅ 3 . This
property holds generally. Thus in the k-variable linear regression model (a regressand and
[k − 1] regressors)
2. The mean value of the estimated 𝑌𝑖 (= 𝑌̂𝑖 ) is equal to the mean value of the actual 𝑌𝑖 .
3. ∑𝑢̂𝑖 = 𝑢̅̂ = 0
4. The residuals 𝑢̂𝑖 are uncorrelated with 𝑋2𝑖 and 𝑋3𝑖 , that is, ∑𝑢̂𝑖 𝑋2𝑖 = ∑𝑢̂𝑖 𝑋3𝑖 = 0
5. The residuals 𝑢̂𝑖 are uncorrelated with 𝑌̂𝑖 ; that is, ∑𝑢̂𝑖 𝑌̂𝑖 = 0.
6. Given the assumptions of the classical linear regression model, one can prove that the OLS
estimators of the partial regression coefficients are not only linear and unbiased but also
have minimum variance in the class of all linear unbiased estimators. In short, they are
BLUE. Put differently, they satisfy the Gauss–Markov theorem.

The Multiple Coefficient of Determination 𝑹𝟐 and the Multiple Coefficient of


Correlation 𝑹

In the two-variable case we saw that 𝑟 2 measures the goodness of fit of the regression equation;
that is, it gives the proportion or percentage of the total variation in the dependent variable Y
explained by the (single) explanatory variable X. This notation of 𝑟 2 can be easily extended to
regression models containing more than two variables. Thus, in the three-variable model, we
would like to know the proportion of the variation in Y explained by the variables 𝑋2 and 𝑋3
jointly. The quantity that gives this information is known as the multiple coefficient of
determination and is denoted by 𝑅 2 ; conceptually it is akin to 𝑟 2 .

Now, by definition

𝐸𝑆𝑆
𝑅2 =
𝑇𝑆𝑆

𝛽̂2 ∑𝑦𝑖 𝑥2𝑖 + 𝛽̂3 ∑𝑦𝑖 𝑥3𝑖


=
∑𝑦𝑖2

𝑅 2 , like 𝑟 2 , lies between 0 and 1. If it is 1, the fitted regression line explains 100 percent of the
variation in Y. On the other hand, if it is 0, the model does not explain any of the variations in Y.
Typically, however, 𝑅 2 lies between these extreme values. The fit of the model is said to be
“better’’ the closer 𝑅 2 is to 1.

Recall that in the two-variable case, we defined the quantity 𝑟 as the coefficient of correlation and
indicated that it measures the degree of (linear) association between two variables. The three-or-
more-variable analog of 𝑟 is the coefficient of multiple correlations, denoted by 𝑅, and it is a
measure of the degree of association between Y and all the explanatory variables jointly. Although
𝑟 can be positive or negative, 𝑅 is always taken to be positive. In practice, however, 𝑅 is of little
importance. The more meaningful quantity is 𝑅 2 .

𝑹𝟐 And The Adjusted 𝑹𝟐

An important property of 𝑅 2 is that it is a non-decreasing function of the number of explanatory


variables or regressors present in the model i.e., as the number of regressors increases, 𝑅 2 almost
invariably increases and never decreases. Stated differently, an additional X variable will not
decrease 𝑅 2 .
Now ∑𝑦𝑖2 is independent of the number of 𝑋 variables in the model because it is simply (𝑌𝑖 − 𝑌̅)2 .
The RSS, ∑𝑥𝑖2 however, depends on the number of regressors present in the model. Intuitively, it
is clear that as the number of 𝑋 variables increases, ∑𝑢𝑖2 is likely to decrease (at least it will not
increase); hence 𝑅 2 will increase. In view of this, in comparing two regression models with the
same dependent variable but differing number of X variables, one should be very wary of choosing
the model with the highest 𝑅 2 .

To compare two 𝑅 2 terms, one must take into account the number of X variables present in the
model. This can be done readily if we consider an alternative coefficient of determination, which
is as follows:

∑𝑢𝑖2 /(𝑛 − 𝑘)
𝑅̅ 2 = 1 −
∑𝑦𝑖2 /(𝑛 − 1)

where 𝑘 = the number of parameters in the model including the intercept term. (In the three-
variable regression, 𝑘 = 3) The 𝑅 2 thus defined is known as the adjusted 𝑅 2 , denoted by 𝑅̅ 2 . The
term adjusted means adjusted for the 𝑑𝑓 associated with the sums of squares entering into the
equation.

It is easy to see that 𝑅̅ 2 and 𝑅 2 are related because

𝑛−1
𝑅̅ 2 = 1 − (1 − 𝑅 2 )
𝑛−𝑘

It is immediately apparent from the equation that for 𝑘 > 1, 𝑅̅ 2 < 𝑅 2 which implies that as the
number of 𝑋 variables increases, the adjusted 𝑅 2 increases less than the unadjusted 𝑅 2 ; and 𝑅̅ 2 can
be negative, although 𝑅 2 is necessarily nonnegative. In case 𝑅̅ 2 turns out to be negative in an
application, its value is taken as zero.
Polynomial Regression Models

We now consider a class of multiple regression models, the polynomial regression models that
have found extensive use in econometric research relating to cost and production functions.
Consider

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2

which is called a quadratic function, or more generally, a second-degree polynomial in the variable
X—the highest power of X represents the degree of the polynomial (if 𝑋 3 were added to the
preceding function, it would be a third-degree polynomial, and so on).

The stochastic version of this may be written as

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋 2 𝑖 + 𝑢𝑖

which is called a second-degree polynomial regression. The general 𝑘 𝑡ℎ degree polynomial


regression may be written as

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋 2 𝑖 + ··· + 𝛽𝑘 𝑋 𝑘 𝑖 + 𝑢𝑖

Notice that in these types of polynomial regressions, there is only one explanatory variable on the
right-hand side but it appears with various powers, thus making them multiple regression models.

Since the second-degree polynomial or the 𝑘 𝑡ℎ degree polynomial is linear in the parameters, the
β’s can be estimated by the usual OLS methodology and present no new estimation problems.

MULTICOLLINEARITY
The term multicollinearity is due to Ragnar Frisch. It means the existence of a perfect or exact
linear relationship among some or all explanatory variables of a regression model.

One of the assumptions of the classical linear regression model is that the independent variables
are not correlated with each other. Violation of this assumption leads to the problem of
multicollinearity. If the explanatory variables are perfectly correlated, then there is perfect
multicollinearity which leads to the breakdown of the least square method. But in practice, neither
this situation nor the complete absence of multicollinearity is often met.
Multicollinearity is essentially a sample phenomenon in the sense that even if the X variables are
not linearly related in the population, they may be so related in the particular sample at hand. When
we postulate the PRF, we believe that all the X variables included in the model have a separate or
independent influence on the dependent variable Y. But it may happen that in any given sample
that is used to test the PRF, the X variables are so highly collinear that we cannot isolate their
individual influence on Y. and multicollinearity is a question of degree and not of kind or existence.

Sources of Multicollinearity

There are several sources of multicollinearity. As Montgomery and Peck note, multicollinearity
may be due to the following factors:

1. The data collection method employed. For example, sampling over a limited range of the values
taken by the regressors in the population.

2. Constraints on the model or in the population being sampled. For example, in the regression of
electricity consumption on income (𝑋2) and house size (𝑋3) there is a physical constraint in the
population in that families with higher incomes generally have larger homes than families with
lower incomes.

3. Model specification. For example, adding polynomial terms to a regression model, especially
when the range of the X variable is small.

4. An overdetermined model. This happens when the model has more explanatory variables than
the number of observations. This could happen in medical research where there may be a small
number of patients about whom information is collected on a large number of variables.

5. It can arise due to the inherent characteristic of many economic variables to move together over
time. This is because the economic magnitudes are influenced by the same factors.

6. It may arise due to the use of lagged values of certain explanatory variables in the model.

Consequences of Multicollinearity

Theoretical Consequences

1. In the case of near multicollinearity, the OLS estimators are unbiased. However,
unbiasedness is a multisample or repeated sampling property.
2. Collinearity does not destroy the property of minimum variance. In the class of all linear
unbiased estimators, the OLS estimators have minimum variance, i.e., they are efficient.
But this does not mean that the variance of an OLS estimator will necessarily be small in
any given sample.
3. Multicollinearity is essentially a sample phenomenon. Even if the X variables are not
linearly related in the population they may be so related in the particular sample at hand.

Practical Consequences

1. If multicollinearity is perfect, the regression coefficients of x variables are indeterminate


and their standard error is infinite.
2. Although BLUE the OLS estimators have large variances and covariance, making precise
estimation difficult
3. Because of the above consequence, the confidence intervals tend to be much wider,
leading to the acceptance of the zero null hypotheses (ie- the true population coefficient is
zero) more readily.
4. Also, the t-ratio of one or more coefficients tends to be statistically insignificant.
5. Although the t-ratio is statistically insignificant, the 𝑅 2 value, which is the overall measure
of goodness of fit, can be very high.
6. The OLS estimators and their standard errors can be sensitive to small changes in the data.

Detection of Multicollinearity

Multicollinearity is a question of degree and not of kind. So the meaningful distinction is not
between the presence and the absence of much collinearity, but between its various degrees.

Since multicollinearity is essentially a sample phenomenon, arising out of the largely non-
experimental data collected in most social sciences, we do not have one unique method of detecting
or measuring its strength. What we have are some rules of thumb, which are given below:

1. High 𝑹𝟐 but few significant t-ratios: If 𝑅 2 is high, the F-test in most cases will reject the
hypothesis that the partial slope coefficients are simultaneously equal to zero but at the
same time the individual t-test will show that none or very few of the partial slope
coefficients are statistically different from zero. This signifies that multi-collinearity is
present in the model.
2. High pair-wise correlations among regressors: If the pair-wise or zero-order correlation
coefficient between two regressors is high, then multicollinearity is a serious problem.
However, this does not provide an infallible guide to the presence of multicollinearity
because multicollinearity can exist even though the zero-order correlations are
comparatively low. Thus, high zero-order correlations are a sufficient but not necessary
condition for the existence of multicollinearity.
3. Examination of partial correlations: This is suggested by Farrar–Glauber. In the
2 2 2
regression of Y on 𝑋2, 𝑋3 and 𝑋4, a finding that 𝑅1.234 is very high, but 𝑟12.34 , 𝑟13.24 ,
2
𝑟14.23 ….are comparatively low may suggest that the variables 𝑋2, 𝑋3, and 𝑋4 are highly
intercorrelated and that at least one of these variables is superfluous.
4. Auxiliary Regression: Here we regress each 𝑋𝑖 on the remaining X variables and compute
the corresponding 𝑅 2 , which we designate as 𝑅𝑖2 . Each on these regressions is called an
auxillary regression, auxiliary to the main regression of Y on X's. Then, following the
relationship between F and 𝑅 2

Where n stands for the sample size, k stands for the number of explanatory variables
2
including the intercept term, and 𝑅𝑥𝑖.𝑥2𝑥3….𝑥𝑘 is the coefficient of determination in the
regression of variable 𝑋𝑖 on the remaining X variables.
If the computed F exceeds the critical 𝐹𝑖 at the chosen level of significance, it is taken to
mean that the particular 𝑋𝑖 is collinear with other X’s; if it does not exceed the critical Fi,
we say that it is not collinear with other X’s,
5. Klien's measure: According to this, multicollinearity may be a problem only if the 𝑅 2
obtained from an auxiliary regression is greater than the overall 𝑅 2 , which is obtained from
the regression of Y on all the regressors.
6. Eigenvalues and condition index: From Eviews and Stata, eigenvalues and condition
index can be found, to diagnose multicollinearity. From these eigenvalues, we can derive
the condition number k, defined as
and the condition index (CI) defined as

Then we have this rule of thumb: If k is between 100 and 1000 there is moderate to strong
multicollinearity and if it exceeds 1000 there is severe multicollinearity. Alternatively, if
the CI (=√k) is between 10 and 30, there is moderate to strong multicollinearity and if it
exceeds 30 there is severe multicollinearity
7. Tolerance and variance inflation factor: the variance with which variance and covariance
increase can be explained with the Variance Inflation Factor (VIF) which can be defined
as
1
𝑉𝐼𝐹 =
(1 − 𝑅𝑖2 )
where 𝑅𝑖2 is the coefficient of determination in the regression of regressor 𝑋𝑖 on the
remaining regressors in the model.
As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if 𝑅𝑗2 exceeds
0.90, that variable is said to be highly collinear
Related to VIF, one would also use the measure of tolerance (TOL) to detect
multicollinearity. Tolerance (TOL) is the inverse of the VIF and can be defined as
1
𝑇𝑂𝐿𝑖 = 𝑜𝑟 (1 − 𝑅𝑖2 )
𝑉𝐼𝐹
𝑇𝑂𝐿𝑖 = 1 if 𝑋𝑖 is not correlated with other regressors, whereas it is zero if it is perfectly
related to other regressors.
8. Scatterplot: Scatterplot is used to see how the various variables in a regression model are
related.

Remedial Measures

What can be done if multicollinearity is serious? We have two choices: (1) do nothing or (2) follow
some rules of thumb.

Do Nothing
Multicollinearity is essentially a data deficiency problem and sometimes we have no choice over
the data we have available for empirical analysis.

Rule-of-Thumb Procedures

One can try the following rules of thumb to address the problem of multicollinearity; their success
will depend on the severity of the collinearity problem.

1. A priori information: If we have a priori information regarding the presence of


multicollinearity, we can solve it. Suppose we consider the model
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖
where 𝑌𝑖 = consumption, 𝑋2 = income, and 𝑋3 = wealth. As noted before, income and
wealth variables tend to be highly collinear. But suppose a priori we believe that 𝛽3 =
0.10𝛽2; that is, the rate of change of consumption with respect to wealth is one-tenth the
corresponding rate with respect to income. We can then run the following regression:
𝑌𝑖 = 𝛽1 +𝛽2 𝑋2𝑖 + 0.10 𝛽2 𝑋3𝑖 + 𝑢𝑖
=𝛽1 + 𝛽1 𝑋𝑖 + 𝑢𝑖
where 𝑋𝑖 = 𝑋2𝑖 + 0.1𝑋3𝑖 . Once we obtain 𝛽̂2, we can estimate 𝛽̂3 from the postulated
relationship between 𝛽2 and 𝛽3.
2. Combining cross-sectional and time series data: This is also known as pooling the data.
In time series data, the economic variables generally tend to be highly collinear, than the
cross-sectional data where the variable do not vary much at a particular point of time.
Therefore, a combination of cross-section and time series data can be a remedy for
multicollinearity problem. But this pooling of data may create problems of interpretation.
3. Dropping a variable and specification bias: when faced with severe multicollinearity, one
of the simplest things to do is to drop one of the collinear variable. But this dropping may
create the problem of specification bias or specification error, which arises from the
incorrect specification of the model used is the analysis.
4. Transformation of variables: Multicollinearity problem can be avoided by transforming
the variables. The variables can be transformed to the first difference form or to ratio
transformation, or using lagged values, etc. But these transformations leads to many other
problems and also it needs a priori information regarding variables and parameters.
5. Additional or new data: Sometimes, simply increasing the size of the sample by adding
new data, may attenuate the collinearity problem.
6. Reducing collinearity is polynomial regressions: It has been found that if the explanatory
variables are expressed in the deviation form, multicollinearity is substantially reduced.
7. Other methods of remedying multicollinearity: Multivariate statistical techniques such as
factor analysis and principal components or techniques such as ridge regression are often
employed to “solve” the problem of multicollinearity.

HETEROSCEDASTICITY

One of the important assumptions of the CLRM is that the variance of each disturbance term 𝑢𝑖 ,
conditional on the chosen values of the explanatory variables is some constant number, equal
to 𝜎 2 . This is the assumption of homoscedasticity.

Violation of this assumption i.e., the conditional variance 𝑌𝑖 increasing as X increases is the
problem of heteroscedasticity.

Reasons of Heteroscedasticity

1. Following the error-learning models: as people learn, their errors of behavior become
smaller over time or the number of errors becomes more consistent. In this case, 𝜎𝑖2 is
expected to decrease. As an example, consider the number of typing errors made in a
given period on a test to the hours put in typing practice.
2. As incomes grow, people have more discretionary income and hence more scope for
choice about the disposition of their income. Hence, 𝜎𝑖2 is likely to increase with
income.
3. As data-collecting techniques improve, 𝜎𝑖2 is likely to decrease.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying
observation, or outlier, is an observation that is much different (either very small or
very large) in relation to the observations in the sample.
5. Another reason is the misspecification of the regression model.
6. Another source of heteroscedasticity is skewness in the distribution of one or more
regressors included in the model.
7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can
also arise because of
i. incorrect data transformation (e.g., ratio or first difference transformations)
and
ii. incorrect functional form (e.g., linear versus log-linear models)

Consequences of Heteroscedasticity

i. The OLS estimator will remain linear and unbiased.


ii. Variance of OLS coefficients will be incorrect: Under heteroscedasticity problems,
𝜎𝑖 2 is no longer a finite constant figure, but rather it tends to change with an increasing
range of value of x.
iii. OLS estimation shall be inefficient, as it does not have the minimum variance in a class
of unbiased estimators and therefore they are not efficient both in small samples and in
large samples.
iv. The confidence limits and the tests of significance will not be applicable
v. If we proceed with our model under the false belief of “homogeneity” of variances, our
inferences and predictions about the population coefficients would be incorrect.

Detection of Heteroscedasticity

I. Informal Methods
1) Nature of the problem: Very often the nature of the problem under consideration suggests
whether heteroscedasticity is likely to be present or not. In cross-sectional data involving
heterogeneous units, heteroscedasticity may be the rule rather than the exception
2) Graphical method: Plotting the values of 𝑢̂𝑖 2 against 𝑦̂𝑖 or any X variable in the model, can
give a preliminary idea about the presence of heteroscedasticity.
Figure (a) shows that there is no systematic pattern between the two variables suggesting that
there is no heteroscedasticity or (a) shows homoscedasticity. Figures (b) to (e) exhibit definite
patterns and indicate the presence of heteroscedasticity.

II. Formal Methods


1) Park Test

Park formalizes the graphical method by suggesting that 𝜎𝑖 2 is some function of the explanatory
variable Xi. The functional form he suggests is

where 𝑣𝑖 is the stochastic disturbance term.

Since 𝜎𝑖 2 is generally not known, Park suggests using 𝑢̂𝑖 2 as a proxy and running the following
regression:

If 𝛽 turns out to be statistically significant, it would suggest that heteroscedasticity is present in


the data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity.
2) Glejser Test

The Glejser test is similar in spirit to the Park test. After obtaining the residuals 𝑢̂𝑖 from the OLS
regression, Glejser suggests regressing the absolute values of 𝑢̂𝑖 on the X variable that is thought
to be closely associated with 𝜎𝑖 2 . The Glejser technique may be used for large samples and may
be used in small samples strictly as a qualitative device to learn something about
heteroscedasticity

3) Spearman’s Rank Correlation Test

Spearman’s rank correlation coefficient can be defined as

∑d𝑖 2
𝑟𝑠 = 1 − 6 [ ]
𝑛(𝑛2 − 1)

where 𝑑𝑖 = difference in the ranks assigned to two different characteristics of the 𝑖 𝑡ℎ individual
or phenomenon and n = number of individuals or phenomena ranked.

In this test, we first regress Y on X and obtain the residuals 𝑢̂𝑖 . Ignoring the sign of 𝑢̂𝑖 , that is,
taking their absolute value |𝑢̂𝑖 |, rank both |𝑢̂𝑖 | and 𝑋𝑖 (or 𝑌̂𝑖 ) according to an ascending or
descending order and compute the Spearman’s rank correlation coefficient. Assuming that the
population rank correlation coefficient 𝜌𝑠 is zero and n > 8, the significance of the sample 𝑟𝑠 can
be tested by the t-test. If the computed t value exceeds the critical t value, we may accept the
hypothesis of heteroscedasticity; otherwise, we may reject it.

4) Goldfeld–Quandt Test

This popular method is applicable if one assumes that the heteroscedastic variance,𝜎𝑖 2 is
positively related to one of the explanatory variables in the regression model. Consider the
model:

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖

Suppose 𝜎𝑖 2 is positively related to 𝑋𝑖 as

where 𝜎 2 is a constant.
Here, 𝜎𝑖 2 is proportional to the square of the X variable.

If this equation is appropriate, it would mean 𝜎𝑖 2 would be larger, the larger the values
of 𝑋𝑖. If that turns out to be the case, heteroscedasticity is most likely to be present in the model.

5) Breusch–Pagan–Godfrey Test

Consider the k-variable linear regression model

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + ··· + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖

Assume that the error variance 𝜎𝑖 2 is described as

that is, 𝜎𝑖 2 is some function of the non-stochastic Z variables. Now assume that:

that is, 𝜎𝑖 2 is a linear function of the Z’s. If 𝛼2 = 𝛼3 = ··· = 𝛼𝑚 = 0, 𝜎𝑖 2 = 𝛼1 , which is a


constant. Therefore, to test whether 𝜎𝑖 2 is homoscedastic, one can test the hypothesis that 𝛼2 =
𝛼3 = ···= 𝛼𝑚 = 0. This is the basic idea behind the Breusch–Pagan–Godfrey test.

6) White’s General Heteroscedasticity Test

The general test of heteroscedasticity proposed by White. It does not rely on the normality
assumption and is easy to implement.

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖

The White test proceeds as follows:

Step 1. Obtain the residuals, 𝑢̂𝑖 .

Step 2. We then run the following (auxiliary) regression:

𝑢̂𝑖 2 = 𝛼1 + 𝛼2 𝑋2𝑖 + 𝛼3 𝑋3𝑖 + 𝛼4 𝑋2𝑖


2 2
+ 𝛼5 𝑋3𝑖 + 𝛼6 𝑋2𝑖 𝑋3𝑖 + 𝑣𝑖

Step 3. Under the null hypothesis that there is no heteroscedasticity, it can be shown that sample
size (n) times the 𝑅 2 obtained from the auxiliary regression asymptotically follows the chi-
square distribution with df equal to the number of regressors (excluding the constant term) in the
auxiliary regression. That is,

Step 4. If the chi-square value obtained exceeds the critical chi-square value at the chosen level
of significance, the conclusion is that there is heteroscedasticity. If it does not exceed the critical
chi-square value, there is no heteroscedasticity, which is to say that in the auxiliary regression.

7) Koenker–Bassett (KB) test

The KB test is based on the squared residuals, 𝑢̂𝑖2 , but instead of being regressed on one or more
regressors, the squared residuals are regressed on the squared estimated values of the regressand.
Specifically, if the original model is:

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + ··· + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖

estimate this model, obtain 𝑢̂𝑖 from this model, and then estimate 𝑢̂𝑖2 = 𝛼1 + 𝛼2 (𝑌̂𝑖 )2 + 𝑣𝑖
where 𝑌̂𝑖 are the estimated values from the model. The null hypothesis is that 𝛼2 = 0. If this is not
rejected, then one could conclude that there is no heteroscedasticity. The null hypothesis can be
tested by the usual t-test or the F-test.

Remedial Measures

Heteroscedasticity does not destroy the unbiasedness and consistency properties of the OLS
estimators, but they are no longer efficient, not even asymptotically (i.e., large sample size).
There are two approaches to remediation: when 𝜎𝑖2 is known and when 𝜎𝑖2 is not known.

When 𝝈𝟐𝒊 is known: The Method of Weighted Least Squares

When 𝜎𝑖2 is known, The Weighted Least Squares method can be applied and the WLS (GLS)
estimators are BLUE.

When 𝝈𝟐𝒊 is not known


Since the true 𝜎𝑖2 are rarely known, there are ways of obtaining consistent (in the statistical
sense) estimates of the variances and covariances of OLS estimators even if there is
heteroscedasticity

White's Heteroscedasticity-Consistent Variances and Standard Errors

White has shown that this estimate can be performed so that asymptotically valid (i.e. large-
sample) statistical inferences can be made about the true parameter values. Incidentally, White’s
heteroscedasticity-corrected standard errors are also known as robust standard errors.

Apart from being a large-sample procedure, one drawback of the White procedure is that the
estimators thus obtained may not be as efficient as those obtained by methods that transform data
to reflect specific types of heteroscedasticity.

Plausible Assumptions about Heteroscedasticity Pattern

We now consider several assumptions about the pattern of heteroscedasticity

ASSUMPTION 1: The error variance is proportional to 𝑋𝑖2

𝐸𝑢𝑖2 = 𝜎 2 𝑋𝑖2

one may transform the original model as follows. Divide the original model through by 𝑋𝑖 :

ASSUMPTION 2: The error variance is proportional to 𝑋𝑖. The square root transformation:

𝐸(𝑢𝑖2 ) = 𝜎 2 𝑋𝑖

Then the original model can be transformed as follows:


Note an important feature of the transformed model: It has no intercept term. Therefore, one will have
to use the regression-through-the-origin model to estimate 𝛽1 and 𝛽2.

ASSUMPTION 3: The error variance is proportional to the square of the mean value of Y.

𝐸(𝑢𝑖2 ) = 𝜎 2 [𝐸 (𝑌𝑖)]2

Therefore, if we transform the original equation as follows,

where 𝑣𝑖 = 𝑢𝑖/𝐸(𝑌𝑖), it can be seen that E(𝑣𝑖2 ) = 𝜎 2 ; that is, the disturbances 𝑣𝑖 are homoscedastic

ASSUMPTION 4: A log transformation such as

𝑙𝑛 𝑌𝑖 = 𝛽1 + 𝛽2 𝑙𝑛 𝑋𝑖 + 𝑢𝑖

very often reduces heteroscedasticity when compared with the regression 𝑌𝑖 = 𝛽1 + 𝛽2𝑋𝑖 + 𝑢𝑖 .

AUTOCORRELATION
The term autocorrelation may be defined as “correlation between members of a series of
observations ordered in time [as in time series data] or space [as in cross-sectional data].’’

In the regression context, the classical linear regression model assumes that such autocorrelation
does not exist in the disturbances 𝑢𝑖. Symbolically,

𝑐𝑜𝑣(𝑢𝑖, 𝑢𝑗|𝑥𝑖, 𝑥𝑗) = 𝐸(𝑢𝑖 𝑢𝑗) = 0 𝑖= 𝑗

Put simply, the classical model assumes that the disturbance term relating to any observation is
not influenced by the disturbance term relating to any other observation.
For example, if we are dealing with quarterly time series data involving the regression of output
on labor and capital inputs and if, say, there is a labor strike affecting output in one quarter, there
is no reason to believe that this disruption will be carried over to the next quarter. That is, if output
is lower this quarter, there is no reason to expect it to be lower next quarter. Similarly, if we are
dealing with cross-sectional data involving the regression of family consumption expenditure on
family income, the effect of an increase in one family’s income on its consumption expenditure is
not expected to affect the consumption expenditure of another family.

However, if there is such a dependence, we have autocorrelation. Symbolically,

𝐸(𝑢𝑖 𝑢𝑗) ≠ 0 𝑖 ≠ 𝑗

In this situation, the disruption caused by a strike this quarter may very well affect output next
quarter, or the increases in the consumption expenditure of one family may very well prompt
another family to increase its consumption expenditure.

Let us visualize some of the plausible patterns of auto- and non-autocorrelation, which are given
in figures (a) to (d) showing that there is a discernible pattern among the 𝑢’s. Figure (a) shows a
cyclical pattern; Figures (b) and (c) suggest an upward or downward linear trend in the
disturbances; whereas (d) indicates that both linear and quadratic trend terms are present in the
disturbances. Only (e) indicates no systematic pattern, supporting the non-autocorrelation
assumption of the classical linear regression model.
Reasons for Autocorrelation

1. Temporal Dependence: Many time series data exhibit autocorrelation because values at
one time point are often related to values at previous time points. This can happen due to
inherent patterns or trends in the data.

2. Seasonality: Autocorrelation can occur in seasonal data where patterns repeat at regular
intervals. For example, sales data might exhibit autocorrelation if sales tend to be higher
during certain times of the year.

3. Trends: Autocorrelation can also arise from trends in the data. If there is a consistent
increase or decrease in the values over time, the current value is likely to be correlated with
past values.

4. Incomplete Model: Autocorrelation can indicate that the current model used to describe
the data is incomplete. If there are underlying factors or variables influencing the data that
are not accounted for in the model, it may lead to autocorrelation in the residuals.

5. Measurement Errors: Measurement errors or noise in the data can also lead to
autocorrelation. If the errors at one time point are related to errors at previous time points,
it can result in autocorrelation.

6. Spurious Correlation: Sometimes autocorrelation can be purely coincidental, especially


in small datasets. Random fluctuations or anomalies in the data might create the appearance
of autocorrelation even when there is no underlying relationship.
7. Dynamic Processes: Autocorrelation can arise in dynamic systems where the current state
is influenced by previous states. This is common in fields like economics, where economic
indicators are often influenced by past events.

OLS Estimation In The Presence Of Autocorrelation

Under heteroscedasticity and autocorrelation the usual OLS estimators, although linear, unbiased,
and asymptotically (i.e., in large samples) normally distributed, are no longer minimum variance
among all linear unbiased estimators. In short, they are not efficient relative to other linear and
unbiased estimators. Put differently, they may not be the best linear unbiased estimators (BLUE).
As a result, the usual, t, F, and 𝜒 2 may not be valid.

To determine the precise consequences of autocorrelation, a more specific pattern to the


autocorrelation must be assumed. Typically, autocorrelation is assumed to be represented by a
first-order autoregression; also known as an AR (1). In general, an autoregressive process occurs
any time the value for a variable in one period can be modeled as a function of values of the same
variable in previous periods. In the specific case of autocorrelation, the random variable displaying
this characteristic is the error term. The variance of an AR (1) error depends on the relationship
between the error in period 𝑡 and the error in period 𝑡– 1. In the presence of autocorrelation, the
estimated variances and standard errors from OLS are underestimated.

Detecting Autocorrelation

Graphical Method

Very often a visual examination of the 𝑢̂’s gives us some clues about the likely presence of
autocorrelation in 𝑢’s. A visual examination of 𝑢̂𝑡 or ( 𝑢̂𝑡 2 ) can provide useful information not
only about autocorrelation but also about heteroscedasticity, model inadequacy, or specification
bias

The Runs Test

The run test, also known as the Geary test, uses the sequences of positive and negative residuals
to test the hypothesis of no autocorrelation. A run is defined as a sequence of positive or negative
residuals. The hypothesis of no autocorrelation isn’t sustainable if the residuals have too many or
too few runs. The most common version of the test assumes that runs are distributed normally. If
the assumption of no autocorrelation is sustainable, with 95 percent confidence, then the number
of runs should be between μr ± 1.96σr where μr is the expected number of runs and σr is the
standard deviation. If the number of observed runs is below the expected interval, it’s evidence of
positive autocorrelation, else if the number of runs exceeds the upper bound of the expected
interval, it is evidence of negative autocorrelation.

Durbin–Watson D Test

The most celebrated test for detecting serial correlation is that developed by statisticians Durbin
and Watson. It is popularly known as the Durbin–Watson d statistic, which is defined as

which is simply the ratio of the sum of squared differences in successive residuals to the RSS.
Note that in the numerator of the d statistic, the number of observations is n − 1 because one
observation is lost in taking successive differences.

The assumptions underlying the d statistic:

1. The regression model includes the intercept term. If it is not present, as in the case of the
regression through the origin, it is essential to rerun the regression including the intercept term to
obtain the RSS.

2. The explanatory variables, the X’s, are nonstochastic, or fixed in repeated sampling.

3. The disturbances 𝑢𝑡 are generated by the first-order autoregressive scheme4. The error term 𝑢𝑡
is assumed to be normally distributed.

5. The regression model does not include the lagged value(s) of the dependent variable as one of
the explanatory variables.

6. There are no missing observations in the data.


Interpreting the Durbin Watson Statistic:

The Durbin-Watson statistic will always assume a value between 0 and 4. A value of DW = 2
indicates that there is no autocorrelation. When the value is below 2, it indicates a positive
autocorrelation and a value higher than 2 indicates a negative serial correlation.

To test for positive autocorrelation at significance level α (alpha), the test statistic DW is compared
to lower and upper critical values:

If DW < Lower critical value: There is statistical evidence that the data is positively autocorrelated

If DW > Upper critical value: There is no statistical evidence that the data is positively correlated.

If DW is in between the lower and upper critical values: The test is inconclusive.

To test for negative autocorrelation at significance level α (alpha), the test statistic 4-DW is
compared to lower and upper critical values:

If 4-DW < Lower critical value: There is statistical evidence that the data is negatively
autocorrelated.

If 4-DW > Upper critical value: There is no statistical evidence that the data is negatively
correlated.

If 4-DW is in between the lower and upper critical values: The test is inconclusive.

The Breusch-Godfrey Test

The Breusch-Godfrey test is a statistical test that is used to detect autocorrelation in the residuals
of a linear regression model. It helps to detect autocorrelation at different lags and it applies to
both linear and non-linear models.
The test starts with an initial regression where we record all the residuals for each period. The
residual terms are then regressed against the original set of independent variables, plus one or
more additional variables representing lagged residuals. For example, we have lagged residuals
for time T-1, and T-2 to test for serial correlation with 1 and 2 time lag respectively, and P1 and
P2 are their coefficients. For each of these coefficients, we perform hypothesis tests on whether
they are significantly different from zero, with an assumed Chi-square distribution and the
degrees of freedom of p-k-1. The F-statistic is provided with most statistical software, so you just
need to check it against the critical value. If we reject H-not for a particular time lag, we
conclude that there is serial correlation present for that time lag

Remedial Measures

1. Try to find out if the autocorrelation is pure autocorrelation and not the result of mis-
specification of the model. If it is pure autocorrelation, one can use an appropriate
transformation of the original model so that in the transformed model we do not have the
problem of (pure) autocorrelation. As in the case of heteroscedasticity, we will have to use
some type of generalized least-square (GLS) method.
2. In large samples, we can use the Newey–West method to obtain standard errors of OLS
estimators that are corrected for autocorrelation. This method is an extension of White’s
heteroscedasticity-consistent standard errors method that we discussed in the previous
chapter.
3. In some situations, we can continue to use the OLS method.

DYNAMIC MODELS
Regression models that take into account time lags are known as dynamic or lagged
regression models.

There are two types of lagged models: distributed-lag and autoregressive. In the former,
the current and lagged values of regressors are explanatory variables.

In the latter, the lagged value(s) of the regressand appears as an explanatory variable(s).

A purely distributed-lag model can be estimated by OLS, but in that case, there is the
problem of multicollinearity since successive lagged values of a regressor tend to be
correlated.
As a result, some shortcut methods have been devised. These include the Koyck, the adaptive
expectations, and partial adjustment mechanisms, the first being a purely algebraic approach
and the other two being based on economic principles.

Autoregressiveness poses estimation challenges; if the lagged regressand is correlated with


the error term, OLS estimators of such models are not only biased but also are inconsistent.
Bias and inconsistency are the case with the Koyck and the adaptive expectations models; the
partial adjustment model is different in that it can be consistently estimated by OLS despite
the presence of the lagged regressand.

To estimate the Koyck and adaptive expectations models consistently, the most popular
method is the method of instrumental variable. The instrumental variable is a proxy variable
for the lagged regressand but with the property that it is uncorrelated with the error term.

Despite the estimation problems, which can be surmounted, the distributed and
autoregressive models have proved extremely useful in empirical economics because they
make the otherwise static economic theory a dynamic one by taking into account explicitly
the role of time. Such models help us to distinguish between the short- and the long-run
responses of the dependent variable to a unit change in the value of the explanatory
variable(s). Thus, for estimating short- and long-run price, income, substitution, and other
elasticities these models have proved to be highly useful.

You might also like