Econometrics Chapter Two
Econometrics Chapter Two
Econometrics Chapter Two
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable with the view to
estimating/ and or predicting the (population mean or average value of the terms of the known
or fixed values of the latter.
A frequent objective in research is the specification of a functional relationship between two
variables such as
Y = f(x) ..........................................................(2.1)
where Y is called dependent variable and X the independent (or explanatory) variable.
The purely mathematical model stated above is of limited interest to the econometrician for it
assumes that there is an exact or deterministic relationship between Y and the explanatory
variable X. But relationships between economic variables are generally inexact. That is,
economic model is simply a logical representation of theoretical knowledge (or an priori
knowledge). On the other land econometric model simply represents a set of behavioural
equations derived from and economic model. It differs from economic model since the
relationship between the variables in stochastic (i.e. non exact). In other words we cannot
expect a perfect explanation and hence we write
Y = f(x) + U
= 0 + 1X + U ...........................................(2.2)
where U is a random variable called residual or error, 0 is the constant term and 1 is the slope
parameter This (2.2) is called a regression equation of Y on X. Notice that since U is a random
variable, Y is also a random variable.
Basically, the existence of the disturbance term is justified in three main ways.
15
i) Omission of other variables: although income might be the major determinant of the
level of consumption, it is not the only determinant. Other variables such as interest
rate, or liquid asset holdings may have a systematic influence on consumption. Their
omission constitutes one type of specification error. But, the disturbance term is often
viewed as representing the net influence of small independent causes such as taste
charge, epidemics, and others.
ii) Measurement error: it may be the case that the variable being explained cannot be
measures accurately, either because of data collection difficulties or because it is
inherently un measurable and a proxy variables must be used instead. The disturbance
term can in these circumstances be though of as representing this measurement error
[(of the variable(s)]
iii) Randomness in human behavior. Humans are not machines that will do as instructed.
So there is unpredictable element. Example: due to unexplained case, an increase in
income may not influence consumption. Thus the disturbance term captures such human
behavior that is left unexplained by the economic model.
iv) Imperfect specification of the model. Example: we may have linearized a non-linear
function if so, the random term may tell us the wrong specification
Generally speaking regression analysis is concerned with the study of the dependency of one
dependent variable on one or more other variables called the explanatory variable(s) or the
independent variable(s). Moreover, the true relationship that connects the variables involved is
split in to two. They are systematic (or explained variation and random or (unexplained)
variation. Using (2.2) we can disaggregate the two components as follows
Y = 0 + 1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
16
regard it is essential to know what the term linear really means, for it can be interpreted in two
different ways. These are,
a) Linearity in the variables
b) Linearity in parameters
i) Linearity in variables implies that an equation is linear model if it is expressed in a straight
line.
Example. Consider the regression function Y = 0 + 1X. This means the slope (or
derivative) of this equation is independent of X so that there is linearity in variable. But if Y
= 0 + 1X2 then the variable X is raised (power) to second degree, so it is non-linear in
variable. This is because, the slope or derivative is not independent of the value taken by X.
That is, dy = 21X Hence the above function is not linear in X since the variable X
dx
appears with a power of 2
ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The later is an example of a non linear (in the parameters) regression model of the two
interpretation of linearity, linearity in the parameters is relevant for the development of the
regression theory. Thus the term linear regression means a regression that is linear in the
parameters, the ’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the basis
of the known or fixed values of the explanatory variable(s).
17
X 140 180 220 260
Y 100
65 75 95 105 115
70 80 100 110 120
75 85 105 115 125
80 90 110 120 130
Table 2.1 Family expenditure and income
Note that from the above table we can construct conditional probabilities. For instance P(Y =
65
/X = 160) = ¼ Or P(Y = 150/X = 120 = 1/4. The following figure clearly shows that
consumption expenditure(y) on the average increases as income (X) increases. That is
conditional mean value of Y increases as X increases.
Note from the above table that the average consumption expenditure when income is 100 is
72.5 (= 65 + 70 + 75 + 80). In other words, E(Y/X=100) = 72.5 This conditional mean
increases as X increases
160
PRF
140
120
100
80
60
40
20
100 140 180 220 260
Income
Figure2.1 Conditional distribution of expenditure for various levels of income
Figure 2.1(the above line) is known as the population regression line or, more generally, the
population regression curve. Geometrically, a population regression curve is simply the locus
of the conditional means or expectations of the dependent variable for the fixed value of the
explanatory variables.
18
From the preceding discussion it is clear that each conditional mean which is E Y is a
Xi
function of Xi. Symbolically
E Y = 0 + 1Xi
Xi
Where 0 and 1 are unknown but fixed parameters known as the regression coefficients
(intercept and slope coefficients respectively). The above equation is known as the linear
population regression function. But since consumption expenditure does not necessarily
increase as income level increases we incorporate the error term. That is,
Yi = E Y +Ui
Xi
= 1 + 2Xi + Ui .......................................................(2.4)
Note that in table 2.1we observe that for the same value of X (e.g. 100) we have different value
of Y (65, 70, 75 and 80). Thus the value of Y is also affected by other factors that can be
captured by the error term, U.
If we take the expected value of (2.4) we obtain,
E = E Y + E U i
Yi
X i X
i
X .................................(2.5)
i
Since
E = E Y
Yi
X i X
i
it implies that
E = 0
Ui
X i
...............................(2.6)
19
Thus, the assumption that the regression line passes through the conditional means of Y implies
that the conditional mean values of Ui (conditional upon the given X's) are zero.
The regression function based on a sample collected from the population is called sample
regression function (SRF).
S RF
Consumption expenditure
Income
Hence, analogous to PRF that underlines the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample regression function (which is a counterpart of the PRF stated earlier) may be written as:
ˆ 2 = estimator of 2 , Û i = an estimator of Ui
To sum up, because our analysis is based on a single sample from some population our primary
objective in regression analysis is to estimate the PRF given by
Yi = 1 + 2Xi + Ui
on the basis of
20
Yˆi = ˆ1 + ˆ 2 Xi + Û i
We employ SRF because in most of the cases our analysis is based upon a single sample from
some population. But because of sampling fluctuations our estimate of the PRF based on the
SRF is at best an approximate one.
Note that Yˆ1 overestimates the true E Y for Xi shown therein. By the same token, for any
Xi
Xi to the left of the point A, the SRF will underestimate the true PRF. Such over and under
estimation is inevitable because of sampling fluctuations.
Note that there are several methods of constructing the SRF, but in so far regression analysis is
concerned the model that is used most extensively is the method of Ordinary Least Squares,
OLS
In other words, how should the SRF be constructed so that ˆ1 is as “close” as possible to the
true 1 and ˆ 2 is as “close” as possible to the true 2 even though we never know the true 1
and 2. We can develop procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible. This can be done even though we never actually determine the PRF itself.
The method of ordinary least squares has some very attractive statistical properties that have
made it one of the most powerful and popular method of regression analysis.
Thus 1 + 2X represent systematic explain variation and Ui refer to unexplained variation.
However, the PRF is not directly observable. Hence, we estimate it from the SRF. That is,
21
Yi = ˆ1 ˆ2 X i Uˆ i
= Yˆi Uˆ i
= Yi – ˆ1 ˆ2 X i
This shows that Û i , (the residuals) are simply the difference between the actual and the
estimated Y values.
Note from the above figure that Û i represents the difference between the Y and Yˆ , (SRF)
Now given n pairs of observations on Y and X, we would like to determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we may adopt the following
criterion:
Choose the SRF in such a way that the sum of the residuals Û i = (Yi – Yˆi ) is as small as
possible. However, Ui is zero (refer 2.6) although the Ui are widely scattered about the SRF.
We can avoid this problem if we consider the sum of the squared errors, which is the least
squares criterion.
That is, minimize
Uˆ i2 = (Yi – Yˆi )2
Thus, the least square method requires the sum of the squared error term to be as small as
possible. Note that summing the error term ( Û i ) can be small (even zero) even though the Û i
are widely spread about the SRF. But this is not possible under the least-square procedure, for
the larger the Û i (in absolute value), the larger the Uˆ i2 .
In other words, the least-square method allows to choose ̂ 0 and ˆ1 as estimator of 0 and 1
respectively so that
(Yi - ˆ1 - ˆ 2 Xi)2
is minimum.
22
If the deviation of the actual from the estimate is the minimum, then our estimation from the
collected samples provides a very good approximation of the true relationship between the
variables
Note that to estimate the coefficients 0 and 1 we need observations on X, Y and U. Yet U is
never observed like the other explanatory variables, and therefore in order to estimate the
function Xi = 0 + 1Xi + Ui, we should guess the value of Ui, That means we should make
some reasonable (plausible) assumptions about the shape of the distribution of each Ui (i.e., its
mean, variance and covariance with other U’s). These assumptions are guesses about the true,
but unobservable values of Ui
Note that in PRF: Yi = 0 + 1Xi + Ui. It shows that Yi depends on both Xi and Ui. Therefore,
unless we are specific about how Xi and Ui are created or generated, there is no way we can
make any statistical inference about Yi and also, as we shall see about 0 and 1.
Thus the linear regression model is based on certain assumptions, some of which refer to the
distribution of the random variable Ui, some to the relationship between Ui and the explanatory
variables, and finally some refer to the relationship between the explanatory variables
themselves. The following are assumption underlying the method of least squares.
Assumption 1: Linear regression model: - the regression model is linear in the parameters.
Assumption 2: X (explanatory) values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed to be non
stochastic. In other wordsThat is our regression analysis is conditional regression analysis, that
is conditional on the given values of the regressor (s) X.
E.g. Recall that for a fixed value of 100 we have Y values of 65, 70, 75 and 80. Hence
X is assumed to be non stochastic.
Assumption 3: Ui is a random real variable. The value which U may assume in any one period
depends on chance. It may be positive, negative or zero. Each value has a certain probability of
being assumed by U in any particular instance.
Assumption 4: Zero mean of the disturbance term. This means that for each value of X, U may
assume various values, some greater than zero and some smaller than zero, but if we consider
23
all the possible values of U, for any given value of X, they would have an average value equal
to zero. Hence the mean or expected value of the random disturbance term Ui is zero.
Symbolically, we have:
E U i =0
X i
= E U E U i
2
Var U i i X i
X i
= E i
U
X i
= 2 ..........................................(2.9)
Recall that this holds because of assumption 4
Equation (2.9) states that the variance of Ui.for each Xi is some positive constant number equal
to 2, equal variance. This means that the Y population corresponding to various X values have
the same variance. Consider the following figures.
Y
Y
Fig a Fig b
24
X
X X1 X2 X3
X1 X2 X3
(a) (b)
Figure 2.5 Variance of the error term for each Xi
Note that in both cases the distribution of the error term is normal. That is the value of U (for
each Xi) have a bell-shaped systematical distribution about their zero mean.
But in fig(a) there is equal variance of the error term (and hence y values) in each period (i.e.,
at all values of X). However, in figure(b) there is unequal spread or variance of the error term.
This situation is known as hetrodcedasticity.
Where the subscript i on 2 indicates that the variance of the Y population is no longer
constant. To understand the rational behind this assumptions, refer figure (b) where
Therefore, the likelihood is that the Y observations coming from the population with X = X1
would be closer to the PRF then those coming from population corresponding to X = X3. In
short all Y values corresponding to the various X’s will not be equally reliable, reliability being
judged by how closely or distantly the Y values are distributed around their means.
Stated differently this assumption is saying that all Y values corresponding to the various X’s
are equally important since they have the same variance. Thus assumption 5 implies that the
conditional variance of Yi, are also homoscedastic. That is,
Var (Yi /Xi) = 2
Notice from the above two assumptions that the variable Ui has a normal distribution. That is,
U i ~ N (0, 2). This means the random term Ui is with zero mean and constant variance, 2
Assumption 6: No autocorrelation between the disturbances. Given any two X values, Xi and
Xj (i j), the correlation between any two Ui and Uj (i j) is zero. This implies that the error
25
term committed for the ith observation is independent of the error term committed for the jth
observation. Such cases are also known as no serial correlation.
Symbolically,
cov [Ui, Uj/Xi, Xj] = E[Ui – E(Ui)/Xi] [Uj – E(Uj)/Xj]
= E(Ui/Xi) (Ui/Xj)
=0
Note that if i = j then we are dealing with assumption five. This is because E(Ui/Xi) (Ui/Xj) =
E(U2) = 2
No autocorrelation implies that given X, the deviation of any two Y values form their means
do not exhibit a systematic pattern.
Assumption 7: Zero covariance between Ui and Xi or E(UiXi) = 0. That is, the error term is
independent of the explanatory variable(s). If the two are uncorrelated it means that X and U
have separate influence on Y. But if X and U are correlated it is not possible to assess their
individual effects on Y. But since we have assumed that X values are fixed (or non random) in
repeated samples, there is no way for it to co-vary with the error term. Thus, assumption 7 is
not very crucial.
26
explanatory variable in the relationship it is assumed that they are not perfectly correlated with
each other. Indeed the repressors should not even be strongly correlated, they should not be
highly multicolinear.
Assumption 10: The number of observations, n must be greater than the number of parameters
to be estimated. Alternatively the number of observations must be greater than the number of
explanatory variables.
At this point one may ask ‘how realistic these assumptions are really’? Note that in any
scientific study we make certain assumptions because they facilitate the development of the
subject matter in a gradual step, not because they are necessarily realistic in the sense that they
replicate reality exactly. What we plan to do is first study the properties of Classical Linear
Regression Model thoroughly and then in unit four we examine what happens if one or more of
the assumptions are not fulfilled.
Note that OLS method demands that the deviation of the actual from the estimated Y-value
(i.e., Yi - Yˆi ) should be as small as possible. This method provides us with unique estimates of
differential calculus.
That is, the sum of squared residual deviations is to be minimized with respect to ̂ 0 and ˆ1 .
necessary condition on minimization, or maximization process is that the first derivative is set
to zero). Hence,
Uˆ i2
0 .....................................(2.10)
ˆ 0
and
Uˆ i2
0 ..................................... (2.11)
ˆ 1
Recall from (2.8) the formula of Uˆ i2 . In this regard the partial differentiation of (2.8) with
respect to ̂ 0 will be
Uˆ i2
= 2 Yi ˆ0 ˆ1 X i 0 .................... (2.12)
ˆ
0
27
In the same way the partial differentiation of (2.8) with respect to ˆ1 will be
Uˆ i2
= 2 Xi Yi ˆ0 ˆ1 X i 0 ........................ (2.13)
ˆ 1
ˆ1
X Y ˆ X
i i 0 i
.......................................... (2.16)
X
2
i
ˆ0
X Y X Y
i
2
i i i
.......................................... (2.17)
n X X 2 2
i i
ˆ0
Y i 1 X i
n
Y ˆ
=
n
1 X n ....................................... (2.18)
n X i Yi X i Yi
ˆ1 ......................................... (2.20)
n X i2 X i
2
Therefore, equations (2.17) or (2.19) and (2.20) are the least square estimates since they are
obtained using the least square criteria.
28
Note that (2.20)is expressed in terms of the original sample observations on X and Y. It can be
shown that the estimate ˆ1 may be obtained by the following formulae, which is expressed in
deviations of the variables from its mean: That is:
ˆ1 =
x y i i
............................................. (2.21)
x
2
i
where xi = Xi – X and, yi = Yi - Y
In other words
ˆ1 =
X X Y Y
i
X X
2
............................................ (2.22)
Proof
x yi X X Y Y
i
=
x X X
2 2
i
XY XY YX XY
X 2 XX X
= 2 2
XY X Y Y X nXY
X 2 XX X
= 2 2
Note that, X
X and Y Y
n n
X Y n X Y
XY n
Y X
n n n
Hence,
X
X X
X 2 n X n n n
2
2 X Y X Y
XY n
n
=
2 X X XX
X 2
n
n
n X i Yi X i Y
=
n X i X i
2 2
29
Note that in some cases economic theory postulates relationships that have a zero constant
intercept, so that it passes through the origin of XY plane. For example linear production
functions of manufactured products should normally have zero intercept, since output is zero
when the factor inputs are zero.
In this event we should estimate the function Y = 0 + 1X + U by imposing the restriction,
0 = 0.
This is a restricted minimization problem: we minimize
Uˆ i
2
Y ˆ0 ˆ1 X
2
Note that estimation of elasticities is possible from an estimated regression line. Recall that in
SRF Yˆi ˆ0 ˆ1 X i is the equation of a line whose intercept is ̂ 0 and its slope ˆ1 . The
coefficient ˆ1 is the derivative of Yˆ with respect to X (i.e. i.e., dY
dX
This implies that for a linear function, the coefficient ˆ1 is a component of the elasticity. This is
because it is defined by the formula
dY
P Y dY X ............................................... (2.24)
dX dX Y
X
Substituting ˆ1 in place of dY dX we obtain an average elasticity of the form
x x
P ˆ . ˆ1 ............................................... (2.25)
yˆ y
In passing note that the least square estimators (i.e. ̂ 0 and ˆ1 ) are point estimators, that is,
given the sample, each estimator will provide only a single (point) value of the relevant
population parameter.
In conclusion the regression line obtained using the least square estimators has the following
properties
30
i It passes through the sample mean of Y and X. Recall that we got ̂ 0 = Y ˆ X which can
i. The mean value of the estimated Y = ŷ is equal to the mean value of the actual y. That is
yˆ Y
ii. The mean values of the residual is equal to zero
iii. The residuals Û i are uncorrelated with the predicted Yi i.e., Yˆ Uˆ
i i 0
Example: Consider the following table, which is constructed using raw data on X, and Y
where the sample size is 10.
xi yi Yi
Yi Xi YiXi Xi2 Xi2
Xi-X Yi-Y XiYi
70 80 5600 6400 -90 -41 8100 3690
65 100 6500 0000 -70 -46 4900 3220
90 120 0800 14400 -50 -21 2500 1050
95 40 13300 19600 -30 -16 900 480
110 160 17600 25600 -10 -1 100 10
115 180 20700 32400 10 4 100 40
120 200 24000 40000 30 9 900 270
140 220 30800 48400 50 29 2500 1450
155 240 37200 57600 70 44 4900 3080
150 260 39000 67600 90 39 8100 3510
Sum 1110 1700 205500 322000 0 0 33000 16800
Mean 111 170 - - 0 0 0 -
Note that column 2 to 7 in the above table is constructed using the information given in column
1 and 2.
We can compute ̂ 0 for the above tabulated figure by applying the formula given in (2.17)
that is,
Similarly we can compute ˆ1 , by using the formula given in (2.20) or (2.27). That is, using
(2.20), we obtain:
31
10(205,500) - (1700 (1,100)
ˆ1 = = 0.51
10 (322,000) - (1700)2
16,800
ˆ1 = = 0.51
330,000
Notice that once we compute ˆ1 , we can very easily calculate ̂ 0 using (2.19) as follows.
Interpretation of (2.26) reveals that when family income increase by 1 Birr, the estimated
consumption expenditure ˆ1 amounts to 51 cents.
The value of ̂ 0 = 24.4 (which is the intercept) indicates the average level of consumption
expenditure when family income is zero.
1. The following results have been obtained from a sample of 11 observations on the value of
sales (Y) of a firm and corresponding prices (X).
X = 519.18, Y = 217.82
ΣXi2 = 3,134,543, ΣXiYi = 1,296,836
a) Estimate the regression line (function) and interpret the results.
b) Compute the price elasticity of sales using average values of X and Y.
32
2.3.2 Properties of Least Squares Estimators: The Gauss-Markov Theorem
Note that there are various econometric methods which we may obtain estimates of the
parameters of economic relationships. To choose among these methods we use the desirable
properties as good criteria.
As noted in the previous discussion, given the assumptions of the classical linear regression
model, the least squares possess some ideal or optimum properties. These properties are
contained in the well-known Gauss-Markov theorem.
To understand this theorem, we need to consider the best linear unbiasedness property of an
estimator. That is, as estimator, say OLS estimator ˆi is said to be best linear unbiased
i. Linear Estimator. It is linear, that is, a linear function of a random variable, such as the
dependent variable Y in the regression model. Thus, an estimator is linear if it is a linear
function of the sample observations; that is, if it is determined by a linear combination of
the sample data. Given the same observations Y1, Y2, …, Yn, a linear estimator will have
the form.
K1Y1 + K2Y2 + … + KnYn ............................................. (2.27)
Where the K i ’s are some constants
For example, the sample mean Y is a linear estimator because
Y 1
Yi Y1 Y2 ... Yn
1
Y
i
n n n
ii. Unbiased estimator: an estimator is said to be unbiased if its average or expected value,
E( ˆ1 ), is equal to the true value, 1
The biase of an estimator is defined as the difference between its expected value and the
true parameter. That is,
Bias = E( ˆ ) –
33
E( ˆ ) =
~
Consider the following probability density of two estimators – and ˆ of the true parameter
~
~
But, has a bias of size E ( ˆ ) indicating the inequality between and
Note that the property of unbiasedness does not mean that ˆ = ; it says only that, if we could
undertake repeated sampling an infinite number of times, we would get the correct estimate “on
the average”.
iii. Minimum Variance estimator (or best estimator) An estimator is best when it has the
smallest variance as compared with any other estimate obtained from econometric
methods. Symbolically ˆ is best if
E ˆ E ( ˆ ) < E E ( )
2
2
More formally,
~
Var ( ˆ ) < Var ( )
small variance. Choice between these two alternative estimators may be based on the mean
square error criterion(MSE).
MSE( ˆ ) = E ˆ
2
............................. (2.28)
This is equal to the variance of the estimator plus the square of its bias. That is,
MSE( ˆ ) = Var ( ˆ ) + Biase2 () .................................. (2.29)
34
Note that the trade-off between low biase and low variance is formalized by using as a criterion
the minimization of a weighted average of the biase and the variance (i.e., Choosing the
estimator that minimizes the weighted average.
Notice that the property of minimum variance in itself is not important. An estimate may have a
very small variance and a large bias: we have a small variance around the “wrong” mean.
Similarly, the property of unbiasedness by itself is not particularly desirable, unless coupled
with a small variance.
We can prove that the least squares estimators are BLUE provided that the random term U
satisfies some general assumptions, namely that the U has zero mean and constant variance.
This proposition, together with the set of conditions under which it is true, is known as Gauss-
Markov least squares theorem
All Estimators
All Linear Estimators
Linear Unbiased
Estimators
The box above reveals that all estimators are not linear. Furthermore, not all linear estimators
are unbiased. The unbiased linear estimators are a subset of the linear estimators. In the group
of linear unbiased estimator, ˆ has the smallest variance. Hence, OLS possess three properties
35
The following discussion proves that ̂ 0 and ˆ1 . are an unbiased estimators of 0 and 1
respectively. Moreover it shows that to arrive at the variance and standard errors of the
estimates, ̂ 0 and ˆ1 .
x 2
i
=
X (Y Y )
i
xY Y x i
x 2
i x 2
i
=
xY i i
x 2
i
where Ki =
x i
x 2
i
= 1
Therefore, ˆ1 is an unbiased estimator of 1
By definition of variance, we can write
Var( ˆ1 ) = E ˆ1 E ( ˆ1 )
2
Notice that since E( ˆ1 ) = 1 it follows that
36
Var( ˆ1 ) = E ˆ1 E ( ˆ1 )
2
By rearranging (2.29) the above result can be written as
Var( ˆ1 ) = E(KiUi)2
= E(k12U12 + k22U22 + … + k2nU2n + 2k1k2U1U2 + … 2kn-1knUn-1Un
Recall that Var (U) = u2 = E[Ui - E(Ui)] 2. This is equal to E(Ui)2, because E(Ui) = 0
Furthermore, E(UiUj) = 0 for i j. Thus, it follows that
Var ( ˆ1 ) = u2Ki2
u2
= .................................................... (2.31)
x
2
i
It follows that the variance (and s.e.) of ̂ 0 can be obtained following the same line of reasoning
as above.
Recall from (2.19) that ̂ 0 = Y 1 X .
Moreover, remember that from the PRF we can compute Y 1 ̂ 2 X U . Substituting this
Var ( ̂ 0 ) = E( ̂ 0 - 0)
= E ˆ1 1 X U 2
= E X 2 ˆ1 1
2
U 2 2 X ˆ1 1 U ................................ (2.33)
= X 2 E ˆ1 1
2
EU 2 X ˆ
2
1
1 EU
37
U i 2
Note that E( U ) =
2
n
=
1
n 2
E U 1 U 2 ......... U n
2 2 2
1
= n 2
n2
2
=
n
Therefore, using this information we can adjust (2.32) to obtain
u2 u2
Var( ̂ 0 ) = X2 0
xi
2
n
2 1 X2
= u
n x 2
i
X
2
Var( ̂ 0 ) = u
2 i
.................................................. (2.34)
n x
2
i
X
2
s.e.( ̂ 0 ) = u
i
................................................. (2.35)
n X
2
i
Moreover, the covariance (cov) between ̂ 0 and ˆ1 describes how ̂ 0 and ˆ1 are related.
= E( ̂ 0 - 0 ) ( ˆ1 - 1 )
Using the information given about E( ̂ 0 - 0) in (2.32), we can rewrite the above result as
follows
Cov ( ̂ 0 , ˆ1 ) = E(- X ( ˆ1 - 1 ) + U ) ( ˆ1 - 1 )]
= 0 -[ X E( ˆ1 - 1 )2]
38
Note that E( ˆ1 - ˆ1 ) 2 is equal to Var( ˆ1 ). Hence using (2.30)we obtain
Cov ( ̂ 0 , ˆ1 ) = - X u
2
................................................. (2.36)
x
2
i
Note from (2.31) and (2.34) that the formula of the variance of ̂ 0 and ˆ1 involve the variance
of the random term U, u2 . However the true variance of Ui cannot be computed since the
values of Ui are not observable. But we may obtain unbiased estimate of u2 from the
expression
ˆ u2
Uˆ i
................................................. (2.37)
nk
where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.
Remember that
U Y Y (Y 0 1 X i ) 2
2 2
i = i i .............. (2.38)
Therefore, in calculating Variance of ̂ 0 and ˆ1 we will make use of ˆ u2 in place of u since
2
Thus for we were concerned with the problem of estimating regression coefficients, their
standard errors, and some of their properties. The next stage is to establish criteria for judging
the goodness of the parameter estimator. In this connection we will consider the goodness of fit
of the fitted regression line to a set of data. That is, we shall find how “well” the sample
regression fits the data. This is called the statistical criteria or first-order test for the evaluation
of the parameter estimates. The econometric criteria or second-order tests will be examined in
unit four. The two most commonly used tests in statistical criteria are the Square of the
Correlation Coefficient, r2 and a test based on the Standard Errors of the Estimates
39
clear that if all the observations were to lie on the regression line, we would obtain a “perfect
fit”, but this is rarely the case. Hence, the knowledge of the dispersion of the observation
around the regression line is essential because the closer the observations to the line, the better
the goodness of fit. That is the better is the explanation of the variations of Y by the changes in
the explanatory variables. In general the coefficient of determination r2 is a summary measure
that tells how well the sample regression line fits the data. We will prove that a measure of the
goodness of fit is the square of the correlation coefficient, r
By fitting the line Yˆ = ̂ 0 + ˆ1 Xi we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X. However, the fact
that the observations deviate from the estimated line shows that the regression line explains
only a part of the total variation of the dependent variable. A part of the variation, defined as Ui
= Yi - Yˆ , remains unexplained. Note the following:
a) We may compute the total variation of the dependent variable by comparing each value of
Y to the mean value Y and adding all the resulting deviations
n n
yi Yi Y
2 2
[Total variation in Y] = .............................. (2.39)
i 1 i
n
We squred the simple deviation because y
i
i 0
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ Yˆi Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations
is the total explained by the regression line
n n
yˆ yˆ i y
2 2
[Explained variation] = i ............................... (2.40)
i 1 i
c) Recall that we have defined the error term Ui as the difference, U Yi Yˆi . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its
mean. This is given by
40
n n
Y i Y yˆ i Y
2
Y yˆ
2
i
2
............................................ (2.42)
This shows the total variation in the observed Y values about their mean valuescan be
partitioned in to two parts. One attributed to the regression line and the other to random
forcesbecause not all actual Y observations lie on the fitted line. In other words total sum of
square (TSS) is equal to explained sum of square (ESS) plus residuals sum of squares (RSS).
Symbolically,
TSS = ESS + RSS ....................................................... (2.43)
Or in deviation form it is given by
n n n
y
i 1
2
i yˆ i2 Uˆ i2
i 1 i 1
....................................................... (2.44)
yˆ Y Uˆ
2 2
i
1 = ........................................... (2.46)
Y Y Y Y
2 2
We now define r2 as
yˆ Y
2
2
r =
Y Y
2
=
yˆ 2
i
..................................................... (2.47)
y 2
41
ESS
Notice that (2.47) is nothing but Note Thus, r2 is the square of correlation coefficient r2
TSS
determines the proportion of the variation of Y which is explained by variations in X. For this
reason r2 is also called the coefficient of determination. It measures the proportion of the total
variation in Y explained by the regression model.
= 1
Uˆ i
2
Y Y
2
................................................. (2.48)
The relationship between r2 and the slope ˆ1 indicates that r2 may be computed in various ways
given by the following formulas
r2 = ˆ1
xy ................................................... (2.49)
y 2
= ˆ12
x 2
................................................... (2.50)
y 2
42
Note that if we are working on cross section data, an r2 value equal to 0.5 may be a good fit.
But for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to
how much r2 should be. Generally, however, r2 is a good fit the higher the value of it is.
The adjusted coefficient of determination
One major problem with r2 is that it is a non-decreasing one. That is, it increases when
additional variables are included in the model. For example for the model Yi = 0 + 1Xi, let r2
= 0.5. Then when we increase the number of variables to Yi = 0 + 1X1 + 2X2, the r2 will be
greater than 0.5. Hence to arrive at r2 of higher value, one may be tempted to include irrelevant
variables to the model. That is, a completely non sensical variable can be included in the model
and r2 will increase. On top of the above note that including additional variables reduce the
degree of freedom where the lower the degree of freedom, the less reliable in the model. To
correct for this defect we adjust r2 by taking into account the degrees of freedom, which clearly
decreases as new regression are included in the function. The expression for the adjusted
coefficient of multiple determination is discussed in the next unit.
Example: Consider table 2.2 and the result obtained from it. Calculate Var ( ̂ 0 ), Var ( ˆ1 ) and
r2.
Solution: Notice that we can construct the value of the estimated Y and error term U. Recall
that we found Ŷi = 24.4 + 0.51Xi. Thus for each Xi of table 2.2, we can develop Ŷi. Once Ŷi is
obtained, subtracting it from each Yi gives the estimated error term. That is, ûi= Yi - Ŷi
43
1521 156.82 -6.82
(sum) 8892 (sum)1107.97 -
X
2
n x
2
i
Replacing u2 by ˆ u2
Uˆ i
where n = 10 and k = 2 we get ˆ u2 = 42.16
nk
42.16(322,000)
= 41.13
10(33,000)
42.16
= 0.0013
33,000
We can calculate r2 by using (2.47), (2.48), (2.49) or (2.50). for this example we use
(2.49) and (2.50)
0.51 (16,800)
Using (2.49), r2 = = 0.96
8890
(0.51)2 (33,000)
Using (2.50), r2 = = 0.96
8890
II. Testing the Significance of the Parameters Estimates
In addition to r2 testing of the reliability of the estimates ( ̂ 0 , ˆ1 ) should be done. That is, we
must see whether the estimates are statistically reliable. That is, since ̂ 0 and ˆ1 are sample
estimates of the parameters 0 and 1, the significance of the parameter estimates should be
seen. Note that given the assumption of normally distributed error term, the distribution of
estimates ̂ 0 and ˆ1 is also normal. That is, ˆi ~ N[(E ˆi ), Var( ˆi )]
More formally,
2
X 12
̂ 0 ~ N 0 ,
n xi2
and
44
1
ˆ1 ~ N 1 , u2
n xi2
The standard test of significance is explained through the standard error test and the t-test.
a) The Standard-Error test of the Least Squares Estimates
The least squares estimates ̂ 0 and ˆ1 are obtained from a sample of observations of Y and X.
Since sampling errors are inevitable in all estimates, it is necessary to apply tests of
significance in order to measure the size of the error and determine the degree of confidence in
the validity of the estimates.
Among a number of tests in this regard, we will examine the standard error test. This test helps
us to decide whether the estimates ̂ 0 and ˆ1 are significantly different from zero, i.e., whether
the sample from which they have been estimated might have come from a population whose
true parameters are zero (0 = 0 and/or 1 = 0). Formally we test the null hypothesis.
H0: i = 0 (i.e., X and Y have no relationship)
against the alternative hypothesis:
H1: 0 (i.e., Y and X have a relationship)
This is a two-tailed (or two sided) hypothesis. Very often such a two-sided alternative
hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about
the direction in which the alternative hypothesis should move from the null hypothesis.
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
Some times we have a strong a priori or theoretical expectation (or expectations based on some
previous empirical work) that the alternative hypothesis is one sided or unidirectional rather
than two-sided, as just discussed.
For instance in a consumption – income function C = 0 + 1Y one could postulate that:
H0: 1 0.3
H1: 1 > 0.3
45
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (1 ) is greater than 0.3. [Note: Students are strongly advised to refer and grasp the
discussion in unit 7 and 8 of the course Statistics for Economics]
Recall that in order to test a hypothesis of the kind discussed above we need to make use of Z
and t- tests
b) The Z-test of the least squares estimates
Recall what we said in statistics for economics course that the Z-test is applicable only if
a) the population variance is known, or
b) the population variance is unknown, and provided that the sample size is sufficiently
large (n > 30).
In econometric applications the population variance of Y is unknown. However, if we have a
large sample (n > 30) we may still use the standard normal distribution and perform the Z test.
If these conditions cannot be fulfilled, we apply the student’s t-test.
Recall that in our statistics to economics course we learned the formula which transforms the
value of any variable X into t units as shown below.
Xi
t=
Sx
with n – 1 degrees of freedom
where = value of the population mean
S x2 = Sample estimate of the population variance
n = Sample size
Accordingly the variable
ˆi i ˆ i
t= = i ............................................... (2.51)
Var i s.e. i
46
i = hypothesized value of i
the null hypothesis, that is, we accept that the estimate ˆi is statistically significant
Acceptance
Rejection Region Rejection
region region
-t/2 t/2
then accept H0 which implies that ˆi has insignificant or marginal contribution to the model.
47
Recall that if it is a one tailed test, the rejection region is found only on one side. Hence, we
reject H0 if t* > t. or t* < -t.
Note that the t-test can be performed in an approximate way by simple inspection. For (n – k) >
8, if the observed t* is greater than 2 (or smaller than –2), we reject the null hypothesis at 5
percent level of significance. If on the other hand, the observed t* is smaller than 2 (but greater
than –2) we accept the null hypothesis at 5% level of significant.
Given (2.50) the sample value of t* would be greater than 2 if the relevant estimate ( ̂ 0 or ˆ1 )
is at least twice its standard deviation. In other words, we reject the null hypothesis if
t* > 2 if ˆi > 2 s.e. ( ˆi ) ........................................................... (2.53)
Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.
Ĉ = 100 + 0.70Y
(75.5) (0.21)
where the figure in brackets are the standard errors of the coefficients β0= 100 and β1=0.70.
Are the estimates significant?
̂ 0 = 100
t* = = = 1.32
s.e ( ̂ 0 =) 75.5
ˆ1 = 0.70
t* = = = 3.3
s.e ( ˆ1 =) 0.21
Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0: β0=0. Thus the estimated value β0 is insignificant.
48
But for ˆ1 , the calculated value (=3.3) is greater than the table value (2.10), we reject H0: β1=0
indicating that indeed the estimated value of β1 is significant in affecting the relationship
between the two variable.
In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we
may have low R2 values and low standard errors or high r2 values but high standard errors.
There is no agreement among econometricians in this case, so the main issue is whether to
obtain high r2 or lower standard error of the parameter estimates.
In general r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.
49