Econometrics Chapter Two

CHAPTER 2: THE TWO VARIABLE LINEAR REGRESSION MODEL
(SIMPLE LINEAR REGRESSION MODEL)
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable with the view to
estimating/ and or predicting the (population mean or average value of the terms of the known
or fixed values of the latter.
A frequent objective in research is the specification of a functional relationship between two
variables such as
Y = f(x) ..........................................................(2.1)
where Y is called dependent variable and X the independent (or explanatory) variable.
The purely mathematical model stated above is of limited interest to the econometrician for it
assumes that there is an exact or deterministic relationship between Y and the explanatory
variable X. But relationships between economic variables are generally inexact. That is,
economic model is simply a logical representation of theoretical knowledge (or an priori
knowledge). On the other land econometric model simply represents a set of behavioural
equations derived from and economic model. It differs from economic model since the
relationship between the variables in stochastic (i.e. non exact). In other words we cannot
expect a perfect explanation and hence we write
Y = f(x) + U
=  0 +  1X + U ...........................................(2.2)
where U is a random variable called residual or error, 0 is the constant term and 1 is the slope
parameter This (2.2) is called a regression equation of Y on X. Notice that since U is a random
variable, Y is also a random variable.
Suppose that consumptions is a function of income and write C= f(Y)or mathematically

C = 0 + 1Y. An econometrician will claim that this relationship must also include a
disturbance (or error) term and may alter the above equation to C = 0 + 1Y + U. Note that
with out the disturbance term (U), the relationship is said to be exact or deterministic; with the
disturbance term, it is said to be stochastic or in exact.
Basically, the existence of the disturbance term is justified in three main ways.
15
i) Omission of other variables: although income might be the major determinant of the
level of consumption, it is not the only determinant. Other variables such as interest
rate, or liquid asset holdings may have a systematic influence on consumption. Their
omission constitutes one type of specification error. But, the disturbance term is often
viewed as representing the net influence of small independent causes such as taste
charge, epidemics, and others.
ii) Measurement error: it may be the case that the variable being explained cannot be
measures accurately, either because of data collection difficulties or because it is
inherently un measurable and a proxy variables must be used instead. The disturbance
term can in these circumstances be though of as representing this measurement error
[(of the variable(s)]
Example: measuring taste is not an easy job.
iii) Randomness in human behavior. Humans are not machines that will do as instructed.
So there is unpredictable element. Example: due to unexplained case, an increase in
income may not influence consumption. Thus the disturbance term captures such human
behavior that is left unexplained by the economic model.
iv) Imperfect specification of the model. Example: we may have linearized a non-linear
function if so, the random term may tell us the wrong specification
Generally speaking regression analysis is concerned with the study of the dependency of one
dependent variable on one or more other variables called the explanatory variable(s) or the
independent variable(s). Moreover, the true relationship that connects the variables involved is
split in to two. They are systematic (or explained variation and random or (unexplained)
variation. Using (2.2) we can disaggregate the two components as follows
Y =  0 +  1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
16
regard it is essential to know what the term linear really means, for it can be interpreted in two
different ways. These are,
a) Linearity in the variables
b) Linearity in parameters
i) Linearity in variables implies that an equation is linear model if it is expressed in a straight
line.
Example. Consider the regression function Y = 0 + 1X. This means the slope (or
derivative) of this equation is independent of X so that there is linearity in variable. But if Y
= 0 + 1X2 then the variable X is raised (power) to second degree, so it is non-linear in
variable. This is because, the slope or derivative is not independent of the value taken by X.
That is, dy = 21X Hence the above function is not linear in X since the variable X
dx
appears with a power of 2
ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The later is an example of a non linear (in the parameters) regression model of the two
interpretation of linearity, linearity in the parameters is relevant for the development of the
regression theory. Thus the term linear regression means a regression that is linear in the
parameters, the ’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the basis
of the known or fixed values of the explanatory variable(s).
2.2 POPULATION REGRESSION FUNCTION VS SAMPLE REGRESSION FUNCTION
Imagine a hypothetical country with a total population of 20 families. Suppose we are

interested in studying the relationship between monthly family consumption expenditure, Y and
monetary disposable family income, X. To this end, suppose we divide these 20 families into 5
groups of approximately the same income and examine the consumption expenditure of
families in each of these income groups. The following table summarizes the case.
17
X 140 180 220 260
Y 100
65 75 95 105 115
70 80 100 110 120
75 85 105 115 125
80 90 110 120 130
Table 2.1 Family expenditure and income
Note that from the above table we can construct conditional probabilities. For instance P(Y =
65
/X = 160) = ¼ Or P(Y = 150/X = 120 = 1/4. The following figure clearly shows that
consumption expenditure(y) on the average increases as income (X) increases. That is
conditional mean value of Y increases as X increases.
Note from the above table that the average consumption expenditure when income is 100 is
72.5 (= 65 + 70 + 75 + 80). In other words, E(Y/X=100) = 72.5 This conditional mean
increases as X increases
160
PRF
140
120
100
80
60
40
20
100 140 180 220 260
Income
Figure2.1 Conditional distribution of expenditure for various levels of income
Figure 2.1(the above line) is known as the population regression line or, more generally, the
population regression curve. Geometrically, a population regression curve is simply the locus
of the conditional means or expectations of the dependent variable for the fixed value of the
explanatory variables.
18
From the preceding discussion it is clear that each conditional mean which is E  Y  is a
 Xi 
function of Xi. Symbolically
E  Y  = f(Xi) = 0 + 1Xi ....................................... (2.3)

 Xi 
Note that in real situations we do not have the entire population available for examination. Thus
functional form that f(x) assumes is an important question. This is an empirical question
although in specific cases theory may have something to say.
For example an economist might posit that consumption expenditure is linearly related to
income. Hence,
E  Y  = 0 + 1Xi
 Xi 
Where 0 and 1 are unknown but fixed parameters known as the regression coefficients
(intercept and slope coefficients respectively). The above equation is known as the linear
population regression function. But since consumption expenditure does not necessarily
increase as income level increases we incorporate the error term. That is,
Yi = E  Y  +Ui
 Xi 
= 1 + 2Xi + Ui .......................................................(2.4)
Note that in table 2.1we observe that for the same value of X (e.g. 100) we have different value
of Y (65, 70, 75 and 80). Thus the value of Y is also affected by other factors that can be
captured by the error term, U.
If we take the expected value of (2.4) we obtain,
E   = E  Y  + E U i 
Yi
 X i   X 
 i 
 X  .................................(2.5)
 i
Since
E   = E Y 
Yi
 X i   X 
 i 
it implies that
E  = 0
Ui
X i 
...............................(2.6)

19
Thus, the assumption that the regression line passes through the conditional means of Y implies
that the conditional mean values of Ui (conditional upon the given X's) are zero.
In most practical situations, however, what we have is a sample of Y values corresponding to

some fixed X's. Therefore, our main task must be to estimate the population regression function
on the basis of the sample information.
The regression function based on a sample collected from the population is called sample
regression function (SRF).
S RF
Consumption expenditure
Income
Figure 2.2 Sample regression function
Hence, analogous to PRF that underlines the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample regression function (which is a counterpart of the PRF stated earlier) may be written as:
Yî = ˆ1 + ˆ 2 Xi + Û i ...............................(2.7)
where Yˆ ( is read as “Y – hat” or “Y – cap”) = estimator of E  Y  , ˆ1 = estimator of 1 ,

 Xi 
ˆ 2 = estimator of 2 , Û i = an estimator of Ui
To sum up, because our analysis is based on a single sample from some population our primary
objective in regression analysis is to estimate the PRF given by
Yi = 1 + 2Xi + Ui
on the basis of
20
Yî = ˆ1 + ˆ 2 Xi + Û i
We employ SRF because in most of the cases our analysis is based upon a single sample from
some population. But because of sampling fluctuations our estimate of the PRF based on the
SRF is at best an approximate one.
Note that Yˆ1 overestimates the true E  Y  for Xi shown therein. By the same token, for any
 Xi 
Xi to the left of the point A, the SRF will underestimate the true PRF. Such over and under
estimation is inevitable because of sampling fluctuations.
Note that there are several methods of constructing the SRF, but in so far regression analysis is
concerned the model that is used most extensively is the method of Ordinary Least Squares,
OLS
2.3 THE METHOD OF ORDINARY LEAST SQUARES
2.3.1 The Concept of OLS

The critical question now is while SRF is an approximation of the PRF, can we devise a rule or
a method that will make this approximation as “close” as possible.
In other words, how should the SRF be constructed so that ˆ1 is as “close” as possible to the
true 1 and ˆ 2 is as “close” as possible to the true 2 even though we never know the true 1
and 2. We can develop procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible. This can be done even though we never actually determine the PRF itself.
The method of ordinary least squares has some very attractive statistical properties that have
made it one of the most powerful and popular method of regression analysis.
Recall the two-variable (Y and X) PRF

Yi = 1 + 2X + Ui
This shows that the true relationship which connects the variables involved is split into two
parts. A part represented by a line and a part represented by the random term U.
Thus 1 + 2X represent systematic explain variation and Ui refer to unexplained variation.
However, the PRF is not directly observable. Hence, we estimate it from the SRF. That is,
21
Yi = ˆ1  ˆ2 X i  Uˆ i
= Yî  Uˆ i
Where Yî is the estimated (conditional mean) value of Yi
Note that Û i = Yi – Yî
= Yi – ˆ1  ˆ2 X i
This shows that Û i , (the residuals) are simply the difference between the actual and the
estimated Y values.
Note from the above figure that Û i represents the difference between the Y and Yˆ , (SRF)
Now given n pairs of observations on Y and X, we would like to determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we may adopt the following
criterion:
Choose the SRF in such a way that the sum of the residuals  Û i = (Yi – Yî ) is as small as
possible. However, Ui is zero (refer 2.6) although the Ui are widely scattered about the SRF.
We can avoid this problem if we consider the sum of the squared errors, which is the least
squares criterion.
That is, minimize
 Uˆ i2 = (Yi – Yî )2
=  (Yi – ˆ0  ˆ1 X i )2 ..................................(2.8)
Thus, the least square method requires the sum of the squared error term to be as small as
possible. Note that summing the error term ( Û i ) can be small (even zero) even though the Û i
are widely spread about the SRF. But this is not possible under the least-square procedure, for
the larger the Û i (in absolute value), the larger the  Uˆ i2 .
In other words, the least-square method allows to choose ̂ 0 and ˆ1 as estimator of 0 and 1
respectively so that
(Yi - ˆ1 - ˆ 2 Xi)2
is minimum.
22
If the deviation of the actual from the estimate is the minimum, then our estimation from the
collected samples provides a very good approximation of the true relationship between the
variables
Note that to estimate the coefficients 0 and 1 we need observations on X, Y and U. Yet U is
never observed like the other explanatory variables, and therefore in order to estimate the
function Xi = 0 + 1Xi + Ui, we should guess the value of Ui, That means we should make
some reasonable (plausible) assumptions about the shape of the distribution of each Ui (i.e., its
mean, variance and covariance with other U’s). These assumptions are guesses about the true,
but unobservable values of Ui
Note that in PRF: Yi = 0 + 1Xi + Ui. It shows that Yi depends on both Xi and Ui. Therefore,
unless we are specific about how Xi and Ui are created or generated, there is no way we can
make any statistical inference about Yi and also, as we shall see about 0 and 1.
Thus the linear regression model is based on certain assumptions, some of which refer to the
distribution of the random variable Ui, some to the relationship between Ui and the explanatory
variables, and finally some refer to the relationship between the explanatory variables
themselves. The following are assumption underlying the method of least squares.
Assumption 1: Linear regression model: - the regression model is linear in the parameters.
Assumption 2: X (explanatory) values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed to be non
stochastic. In other wordsThat is our regression analysis is conditional regression analysis, that
is conditional on the given values of the regressor (s) X.
E.g. Recall that for a fixed value of 100 we have Y values of 65, 70, 75 and 80. Hence
X is assumed to be non stochastic.
Assumption 3: Ui is a random real variable. The value which U may assume in any one period
depends on chance. It may be positive, negative or zero. Each value has a certain probability of
being assumed by U in any particular instance.
Assumption 4: Zero mean of the disturbance term. This means that for each value of X, U may
assume various values, some greater than zero and some smaller than zero, but if we consider
23
all the possible values of U, for any given value of X, they would have an average value equal
to zero. Hence the mean or expected value of the random disturbance term Ui is zero.
Symbolically, we have:
E U i  =0

 X i
That meanse the mean value of Ui

conditional upo the given Xi
zero. Example from table 2.1 we can show that when X = 100 the value of Ui are -7.5., -2.5,
2.5 and 7.5 so far its average is zero.
Assumption 5: The variance of Ui is constant in each period. This is the assumption of
Homodcedasticity. Homodcedadticity implies equal spread. That the conditional variance of Ui
are identical. This means the variance of Ui about its mean is constant at al values of X. in other
words for all values of X the U’s will show the same dispersion around their mean.
Symbolically we have
 = E U  E U i  
2
Var U i   i X i 
 X i
= E  i 
U

 X i
= 2 ..........................................(2.9)
Recall that this holds because of assumption 4
Equation (2.9) states that the variance of Ui.for each Xi is some positive constant number equal
to 2, equal variance. This means that the Y population corresponding to various X values have
the same variance. Consider the following figures.
Y
Y
Fig a Fig b
24
X
X X1 X2 X3
X1 X2 X3
(a) (b)
Figure 2.5 Variance of the error term for each Xi
Note that in both cases the distribution of the error term is normal. That is the value of U (for
each Xi) have a bell-shaped systematical distribution about their zero mean.
But in fig(a) there is equal variance of the error term (and hence y values) in each period (i.e.,
at all values of X). However, in figure(b) there is unequal spread or variance of the error term.
This situation is known as hetrodcedasticity.
Symbolically it implies that:
Var U i  =2


 X i
i
Where the subscript i on 2 indicates that the variance of the Y population is no longer
constant. To understand the rational behind this assumptions, refer figure (b) where
var U  < Var U .


 X 1  X 3
Therefore, the likelihood is that the Y observations coming from the population with X = X1
would be closer to the PRF then those coming from population corresponding to X = X3. In
short all Y values corresponding to the various X’s will not be equally reliable, reliability being
judged by how closely or distantly the Y values are distributed around their means.
Stated differently this assumption is saying that all Y values corresponding to the various X’s
are equally important since they have the same variance. Thus assumption 5 implies that the
conditional variance of Yi, are also homoscedastic. That is,
Var (Yi /Xi) = 2
Notice from the above two assumptions that the variable Ui has a normal distribution. That is,
U i ~ N (0, 2). This means the random term Ui is with zero mean and constant variance, 2
Assumption 6: No autocorrelation between the disturbances. Given any two X values, Xi and
Xj (i  j), the correlation between any two Ui and Uj (i  j) is zero. This implies that the error
25
term committed for the ith observation is independent of the error term committed for the jth
observation. Such cases are also known as no serial correlation.
Symbolically,
cov [Ui, Uj/Xi, Xj] = E[Ui – E(Ui)/Xi] [Uj – E(Uj)/Xj]
= E(Ui/Xi) (Ui/Xj)
=0
Note that if i = j then we are dealing with assumption five. This is because E(Ui/Xi) (Ui/Xj) =
E(U2) = 2
No autocorrelation implies that given X, the deviation of any two Y values form their means
do not exhibit a systematic pattern.
Assumption 7: Zero covariance between Ui and Xi or E(UiXi) = 0. That is, the error term is
independent of the explanatory variable(s). If the two are uncorrelated it means that X and U
have separate influence on Y. But if X and U are correlated it is not possible to assess their
individual effects on Y. But since we have assumed that X values are fixed (or non random) in
repeated samples, there is no way for it to co-vary with the error term. Thus, assumption 7 is
not very crucial.
Assumption 8: the regression model is correctly specified

This assumption implies that there is no specification bias or error in the model used in
empirical analysis. This means that we have included all the important regressions explicitly in
the model and that its mathematical form is correct.
Unfortunately in practice one rarely knows the correct variables to include in the model or the
correct functional forms of the model or the correct probabilistic assumptions about the
variables entering the model for the theory underlying the particular investigation not be strong
or result enough to answer all these questions. Therefore, in practice, the econometrician has to
use some judgment in choosing the number of variables entering the model and the functional
form of the model. To some extent there is some trial and error involved in choosing the “right”
model building is more often an art rather than a science.
Assumption 9: There is no perfect multicollinearity

That is, there are no perfect linear relationship among the explanatory variables. This
assumption words in case of multiple linear regression. That is, if there is more than one
26
explanatory variable in the relationship it is assumed that they are not perfectly correlated with
each other. Indeed the repressors should not even be strongly correlated, they should not be
highly multicolinear.
Assumption 10: The number of observations, n must be greater than the number of parameters
to be estimated. Alternatively the number of observations must be greater than the number of
explanatory variables.
At this point one may ask ‘how realistic these assumptions are really’? Note that in any
scientific study we make certain assumptions because they facilitate the development of the
subject matter in a gradual step, not because they are necessarily realistic in the sense that they
replicate reality exactly. What we plan to do is first study the properties of Classical Linear
Regression Model thoroughly and then in unit four we examine what happens if one or more of
the assumptions are not fulfilled.
Note that OLS method demands that the deviation of the actual from the estimated Y-value
(i.e., Yi - Yî ) should be as small as possible. This method provides us with unique estimates of
1 and 2 that give the smallest possible value of Uˆ i

2
. This is accomplished using
differential calculus.
That is, the sum of squared residual deviations is to be minimized with respect to ̂ 0 and ˆ1 .
Thus, using partial derivatives we minimize Uˆ i

2
and set it equal to zero (Recall that the
necessary condition on minimization, or maximization process is that the first derivative is set
to zero). Hence,

  Uˆ i2 
0 .....................................(2.10)
ˆ 0
and

  Uˆ i2 
0 ..................................... (2.11)
ˆ 1
Recall from (2.8) the formula of Uˆ i2 . In this regard the partial differentiation of (2.8) with
respect to ̂ 0 will be

  Uˆ i2  
=  2 Yi  ˆ0  ˆ1 X i  0  .................... (2.12)
ˆ
 0
27
In the same way the partial differentiation of (2.8) with respect to ˆ1 will be

  Uˆ i2   
=  2 Xi Yi  ˆ0  ˆ1 X i  0 ........................ (2.13)
ˆ 1
Simplifying (2.12) and (2.13) we generate the following normal equations.

Yi = nˆ0  ˆ1  X i ......................................... (2.14)
XiYi = ˆ0  X i  ˆ1  X 2 i ......................................... (2.15)
Solving for ˆ1 from (2.15)we obtain:
ˆ1 
 X Y  ˆ  X
i i 0 i
.......................................... (2.16)
X
2
i
Substituting (2.16) in place of ˆ1 of (2.14) we get,
ˆ0 
 X Y   X Y
i
2
i i i
.......................................... (2.17)
n X   X  2 2
i i
Alternatively solving for ̂ 0 from (2.14) gives
ˆ0 
Y i  1  X i
n
 Y  ˆ
=
n
1 X n ....................................... (2.18)
= Y - ˆ1 X ....................................... (2.19)
where Y = mean of Y, X = mean of X

Substituting (2.18) in place of ̂ 0 of (2.16) we get,
n X i Yi   X i  Yi
ˆ1  ......................................... (2.20)
n X i2   X i 
2
Therefore, equations (2.17) or (2.19) and (2.20) are the least square estimates since they are
obtained using the least square criteria.
28
Note that (2.20)is expressed in terms of the original sample observations on X and Y. It can be
shown that the estimate ˆ1 may be obtained by the following formulae, which is expressed in
deviations of the variables from its mean: That is:
ˆ1 =
x y i i
............................................. (2.21)
x
2
i
where xi = Xi – X and, yi = Yi - Y
In other words
ˆ1 =
 X  X Y  Y 
i
 X  X 
2
............................................ (2.22)
Proof
x yi  X  X Y  Y 
i
=
x  X  X 
2 2
i
 XY  XY  YX  XY 
 X  2 XX  X 
= 2 2
 XY  X  Y  Y  X  nXY
 X  2 XX  X 
= 2 2
Note that, X 
 X and Y   Y
n n
X Y  n  X Y
 XY  n
 Y X
n n n
Hence,
 X  
X X
X 2 n X n n n
2
2 X  Y  X Y
 XY  n

n
=
2 X  X XX
X 2

n

n
n X i Yi   X i  Y
=
n X i   X i 
2 2
which is equal to (2.20)
29
Note that in some cases economic theory postulates relationships that have a zero constant
intercept, so that it passes through the origin of XY plane. For example linear production
functions of manufactured products should normally have zero intercept, since output is zero
when the factor inputs are zero.
In this event we should estimate the function Y = 0 + 1X + U by imposing the restriction,
0 = 0.
This is a restricted minimization problem: we minimize
Uˆ i
2

  Y  ˆ0  ˆ1 X 
2
Subject to ̂ 0 = 0 .......................................... (2.23)

One way of solving this problem is to use the lagrangean function. Recall the concept of
constrained optimization process from Calculus for Economics Course.
Note that estimation of elasticities is possible from an estimated regression line. Recall that in
SRF Yî  ˆ0  ˆ1 X i is the equation of a line whose intercept is ̂ 0 and its slope ˆ1 . The

coefficient ˆ1 is the derivative of Yˆ with respect to X (i.e. i.e., dY
dX

This implies that for a linear function, the coefficient ˆ1 is a component of the elasticity. This is
because it is defined by the formula
dY
P  Y  dY X ............................................... (2.24)
dX dX Y
X
Substituting ˆ1 in place of  dY dX  we obtain an average elasticity of the form
x x
 P  ˆ .  ˆ1 ............................................... (2.25)
yˆ y
In passing note that the least square estimators (i.e. ̂ 0 and ˆ1 ) are point estimators, that is,
given the sample, each estimator will provide only a single (point) value of the relevant
population parameter.
In conclusion the regression line obtained using the least square estimators has the following
properties
30
i It passes through the sample mean of Y and X. Recall that we got ̂ 0 = Y  ˆ X which can
be written as Y  ˆ0  ˆ1 X
i. The mean value of the estimated Y = ŷ is equal to the mean value of the actual y. That is
yˆ  Y
ii. The mean values of the residual is equal to zero
iii. The residuals Û i are uncorrelated with the predicted Yi i.e., Yˆ Uˆ
i i 0
iv. The residuals Ui are uncorrelated with Xi i.e., Uˆ Xi i 0
Example: Consider the following table, which is constructed using raw data on X, and Y
where the sample size is 10.
xi yi Yi
Yi Xi YiXi Xi2 Xi2
Xi-X Yi-Y XiYi
70 80 5600 6400 -90 -41 8100 3690
65 100 6500 0000 -70 -46 4900 3220
90 120 0800 14400 -50 -21 2500 1050
95 40 13300 19600 -30 -16 900 480
110 160 17600 25600 -10 -1 100 10
115 180 20700 32400 10 4 100 40
120 200 24000 40000 30 9 900 270
140 220 30800 48400 50 29 2500 1450
155 240 37200 57600 70 44 4900 3080
150 260 39000 67600 90 39 8100 3510
Sum 1110 1700 205500 322000 0 0 33000 16800
Mean 111 170 - - 0 0 0 -
Table 2.2 Hypothetical data on construction expenditure Y and family income X
Note that column 2 to 7 in the above table is constructed using the information given in column
1 and 2.
We can compute ̂ 0 for the above tabulated figure by applying the formula given in (2.17)
that is,
(322,000) (1,110) - (205,500)

̂ 0 = = 24.4
10(322,000 - (1,700)2
Similarly we can compute ˆ1 , by using the formula given in (2.20) or (2.27). That is, using
(2.20), we obtain:
31
10(205,500) - (1700 (1,100)
ˆ1 = = 0.51
10 (322,000) - (1700)2
alternatively, using (2.21) we obtain:
16,800
ˆ1 = = 0.51
330,000
Notice that once we compute ˆ1 , we can very easily calculate ̂ 0 using (2.19) as follows.
̂ 0 = 111 - 0.51 (170)

= 24.4
The estimated regression line, therefore, is:
Ŷi = 24.4 + 0.51 Xi.......................... (2.26)
Interpretation of (2.26) reveals that when family income increase by 1 Birr, the estimated
consumption expenditure ˆ1 amounts to 51 cents.
The value of ̂ 0 = 24.4 (which is the intercept) indicates the average level of consumption
expenditure when family income is zero.
1. The following results have been obtained from a sample of 11 observations on the value of
sales (Y) of a firm and corresponding prices (X).
X = 519.18, Y = 217.82
ΣXi2 = 3,134,543, ΣXiYi = 1,296,836
a) Estimate the regression line (function) and interpret the results.
b) Compute the price elasticity of sales using average values of X and Y.
2. From the restricted minimization problem of (2.23), Compute ˆ1
32
2.3.2 Properties of Least Squares Estimators: The Gauss-Markov Theorem
Note that there are various econometric methods which we may obtain estimates of the
parameters of economic relationships. To choose among these methods we use the desirable
properties as good criteria.
As noted in the previous discussion, given the assumptions of the classical linear regression
model, the least squares possess some ideal or optimum properties. These properties are
contained in the well-known Gauss-Markov theorem.
To understand this theorem, we need to consider the best linear unbiasedness property of an
estimator. That is, as estimator, say OLS estimator î is said to be best linear unbiased
estimator (BLUE) of i if the following hold:
i. Linear Estimator. It is linear, that is, a linear function of a random variable, such as the
dependent variable Y in the regression model. Thus, an estimator is linear if it is a linear
function of the sample observations; that is, if it is determined by a linear combination of
the sample data. Given the same observations Y1, Y2, …, Yn, a linear estimator will have
the form.
K1Y1 + K2Y2 + … + KnYn ............................................. (2.27)
Where the K i ’s are some constants
For example, the sample mean Y is a linear estimator because
Y 1
Yi  Y1  Y2  ...  Yn 
1
Y   
i
n n n
ii. Unbiased estimator: an estimator is said to be unbiased if its average or expected value,
E( ˆ1 ), is equal to the true value, 1
The biase of an estimator is defined as the difference between its expected value and the
true parameter. That is,
Bias = E( ˆ ) – 
Hence, an estimator is unbiased if its bias is zero. That is

E( ˆ ) –  = 0
which, in other words, means
33
E( ˆ ) = 
~
Consider the following probability density of two estimators –  and ˆ of the true parameter 
~
  ~
But,  has a bias of size E ( ˆ )   indicating the inequality between  and 
Note that the property of unbiasedness does not mean that ˆ = ; it says only that, if we could
undertake repeated sampling an infinite number of times, we would get the correct estimate “on
the average”.
iii. Minimum Variance estimator (or best estimator) An estimator is best when it has the
smallest variance as compared with any other estimate obtained from econometric
methods. Symbolically ˆ is best if
  
E ˆ  E ( ˆ ) < E   E (  )
2

2
More formally,
~
Var ( ˆ ) < Var (  )
An unbiased estimator with the least variance is known as an efficient estimator.

Note ˆ is an unbiased estimator of with large variance, whereas * is a biased estimator of 
with a small variance.

The figure reveals that ˆ is unbiased but has a large variance, while * ˆ is biased but has a
small variance. Choice between these two alternative estimators may be based on the mean
square error criterion(MSE).
The minimum mean – square-error (MSE) estimator criterion is a combination of the

unbiasedness and the minimum variance properties. An estimator is a minimum MSE estimator
if it has the smallest mean-square-error defined as the expected value of the squared differences
of the estimator around the true population parameter, 

MSE( ˆ ) = E ˆ   
2
............................. (2.28)
This is equal to the variance of the estimator plus the square of its bias. That is,
MSE( ˆ ) = Var ( ˆ ) + Biase2 () .................................. (2.29)
34
Note that the trade-off between low biase and low variance is formalized by using as a criterion
the minimization of a weighted average of the biase and the variance (i.e., Choosing the
estimator that minimizes the weighted average.
Notice that the property of minimum variance in itself is not important. An estimate may have a
very small variance and a large bias: we have a small variance around the “wrong” mean.
Similarly, the property of unbiasedness by itself is not particularly desirable, unless coupled
with a small variance.
We can prove that the least squares estimators are BLUE provided that the random term U
satisfies some general assumptions, namely that the U has zero mean and constant variance.
This proposition, together with the set of conditions under which it is true, is known as Gauss-
Markov least squares theorem
All Estimators
All Linear Estimators
Linear Unbiased
Estimators
OLS estimator: has the minimum variance
The box above reveals that all estimators are not linear. Furthermore, not all linear estimators
are unbiased. The unbiased linear estimators are a subset of the linear estimators. In the group
of linear unbiased estimator, ˆ has the smallest variance. Hence, OLS possess three properties
namely linear, unbiased and minimum Variance.
2.3.3 Precision or Standard Errors of Least Squares Estimates

From equations (2.17) and (2.20) it is evident that least square estimators are a function of the
sample data. But since the data are likely to change from sample to sample, the estimates will
change ipso facto. Therefore, what is needed is some measure of “reliability” or precision of
the estimators ̂ 0 and ˆ1 . In statistics the precision of an estimate is measured by its standard
error
35
The following discussion proves that ̂ 0 and ˆ1 . are an unbiased estimators of 0 and 1
respectively. Moreover it shows that to arrive at the variance and standard errors of the
estimates, ̂ 0 and ˆ1 .
Recall from (2.21) that ˆ1 =

x y i i
x 2
i
Expanding this we obtain,
=
 X (Y  Y )
i

 xY  Y  x i
x 2
i x 2
i
=
xY i i
x 2
i
This is because xi = (X - X ) = 0

Thus,
̂1   k i Yi
where Ki =
x i
x 2
i
Substitute the PRF Yi = 0 + 1Xi + Ui into the above result we obtain

ˆ1 = Ki (0 + 1Xi + Ui)
= 0Ki + 1KiXi + KiUi

= 1 + Ki Ui ..........................................................(2.30)
This is because Ki = 0 and kiXi = 1 (N.B. students are required to prove this by their own)
Now taking the expectation (E) of the above result on both sides, we obtain
E( ˆ1 ) = 1 + KE(Ui)
= 1
Therefore, ˆ1 is an unbiased estimator of 1
By definition of variance, we can write

Var( ˆ1 ) = E ˆ1  E ( ˆ1 )
2

Notice that since E( ˆ1 ) = 1 it follows that
36

Var( ˆ1 ) = E ˆ1  E ( ˆ1 )
2

By rearranging (2.29) the above result can be written as
Var( ˆ1 ) = E(KiUi)2
= E(k12U12 + k22U22 + … + k2nU2n + 2k1k2U1U2 + … 2kn-1knUn-1Un
Recall that Var (U) = u2 = E[Ui - E(Ui)] 2. This is equal to E(Ui)2, because E(Ui) = 0
Furthermore, E(UiUj) = 0 for i  j. Thus, it follows that
Var ( ˆ1 ) = u2Ki2
u2
= .................................................... (2.31)
x
2
i
Thus, the standard error (s.e.) of ˆ1 is given by

u
s.e( ˆ1 ) = ................................................... (2.32)
 xi
2
It follows that the variance (and s.e.) of ̂ 0 can be obtained following the same line of reasoning
as above.
Recall from (2.19) that ̂ 0 = Y  1 X .
Moreover, remember that from the PRF we can compute Y   1  ̂ 2 X  U . Substituting this
on the above Y we obtain

= 0 + 1 X  U  ̂ 1 X
This means ̂ 0 - 0 = 1 X  U  ̂1 X


= - ˆ1   1 X  U 
Now, since Var ( ̂ 0 ) = E(0 - 0)2 it follows that
Var ( ̂ 0 ) = E( ̂ 0 - 0)

= E  ˆ1   1 X  U  2

= E  X 2 ˆ1  1

 2
 
 U 2  2 X ˆ1  1 U  ................................ (2.33)


= X 2 E ˆ1  1 
2
EU    2 X ˆ
2
1 
 1 EU
37
 U i 2 
Note that E( U ) = 
2 
 n 
 
=
1
n 2

E U 1 U 2 ......... U n
2 2 2

1
= n 2
n2
2
=
n
Therefore, using this information we can adjust (2.32) to obtain
u2 u2
Var( ̂ 0 ) = X2  0
 xi
2
n

2 1 X2 
= u   
n x 2 
 i 
X
2
Var( ̂ 0 ) =  u
2 i
.................................................. (2.34)
n x
2
i
Hence, the standard error of ̂ 0 is given by
X
2
s.e.( ̂ 0 ) =  u
i
................................................. (2.35)
n X
2
i
Moreover, the covariance (cov) between ̂ 0 and ˆ1 describes how ̂ 0 and ˆ1 are related.
Cov ( ̂ 0 , ˆ1 ) = E[( ̂ 0 - E(  0 )] [  1 - E( ˆ1 )]
= E( ̂ 0 -  0 ) ( ˆ1 -  1 )
Using the information given about E( ̂ 0 - 0) in (2.32), we can rewrite the above result as
follows
Cov ( ̂ 0 , ˆ1 ) = E(- X ( ˆ1 -  1 ) + U ) ( ˆ1 -  1 )]
= E[ U ( ˆ1 -  1 ) - X ( ˆ1 -  1 )2]
= 0 -[ X E( ˆ1 -  1 )2]
38
Note that E( ˆ1 - ˆ1 ) 2 is equal to Var( ˆ1 ). Hence using (2.30)we obtain

Cov ( ̂ 0 , ˆ1 ) = - X u
2
................................................. (2.36)
x
2
i
Note from (2.31) and (2.34) that the formula of the variance of ̂ 0 and ˆ1 involve the variance
of the random term U,  u2 . However the true variance of Ui cannot be computed since the
values of Ui are not observable. But we may obtain unbiased estimate of  u2 from the
expression
ˆ u2 
Uˆ i
................................................. (2.37)
nk
where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.
Remember that
U  Y Y   (Y   0  1 X i ) 2
2 2
i = i i .............. (2.38)
Therefore, in calculating Variance of ̂ 0 and ˆ1 we will make use of ˆ u2 in place of  u since
2
the former is unbiased estimate of the latter
2.4 STATISTICAL TEST OF SIGNIFICANCE AND GOODNESS OF FIT
Thus for we were concerned with the problem of estimating regression coefficients, their
standard errors, and some of their properties. The next stage is to establish criteria for judging
the goodness of the parameter estimator. In this connection we will consider the goodness of fit
of the fitted regression line to a set of data. That is, we shall find how “well” the sample
regression fits the data. This is called the statistical criteria or first-order test for the evaluation
of the parameter estimates. The econometric criteria or second-order tests will be examined in
unit four. The two most commonly used tests in statistical criteria are the Square of the
Correlation Coefficient, r2 and a test based on the Standard Errors of the Estimates
I) The Tests of the Goodness of Fit with r2

After the estimation of the parameters and the determination of the least square regression line,
we need to know how ‘good’ is the fit of this line to the sample observation of Y and X, that is
to say we need to measure the dispersion of the observations around the regression line. It is
39
clear that if all the observations were to lie on the regression line, we would obtain a “perfect
fit”, but this is rarely the case. Hence, the knowledge of the dispersion of the observation
around the regression line is essential because the closer the observations to the line, the better
the goodness of fit. That is the better is the explanation of the variations of Y by the changes in
the explanatory variables. In general the coefficient of determination r2 is a summary measure
that tells how well the sample regression line fits the data. We will prove that a measure of the
goodness of fit is the square of the correlation coefficient, r
By fitting the line Yˆ = ̂ 0 + ˆ1 Xi we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X. However, the fact
that the observations deviate from the estimated line shows that the regression line explains
only a part of the total variation of the dependent variable. A part of the variation, defined as Ui
= Yi - Yˆ , remains unexplained. Note the following:
a) We may compute the total variation of the dependent variable by comparing each value of
Y to the mean value Y and adding all the resulting deviations
 
n n
 yi   Yi  Y
2 2
[Total variation in Y] = .............................. (2.39)
i 1 i
n
We squred the simple deviation because y
i
i 0
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ  Yî  Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations
is the total explained by the regression line
n n
 yˆ    yˆ i  y 
2 2
[Explained variation] = i ............................... (2.40)
i 1 i
c) Recall that we have defined the error term Ui as the difference, U  Yi  Yî . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its
mean. This is given by
40
n n
 Uˆ i2   Yi  yˆ  ...................................... (2.41)

2
[Unexplained variation] =
i 1 i 1
Combining (2.39), (2.40) and (2.41) we obtain
 Y i  Y    yˆ i  Y
2
    Y  yˆ 
2
i
2
............................................ (2.42)
This shows the total variation in the observed Y values about their mean valuescan be
partitioned in to two parts. One attributed to the regression line and the other to random
forcesbecause not all actual Y observations lie on the fitted line. In other words total sum of
square (TSS) is equal to explained sum of square (ESS) plus residuals sum of squares (RSS).
Symbolically,
TSS = ESS + RSS ....................................................... (2.43)
Or in deviation form it is given by
n n n
y
i 1
2
i   yˆ i2   Uˆ i2
i 1 i 1
....................................................... (2.44)
TotalVariation  ExplainedVariation   Un exp lained Variation  ............... (2.45)

Note that because OLS estimators minimizes the sum of squared residuals (i.e., the unexplained
variation) it automatically maximizes r2. Thus maximization of r2 as a criterion of an estimator,
is formally identical to the least squares criterion. Dividing (2.43) by TSS on both sides, we
obtain
ESS RSS
1 = 
TSS TSS
From (2.43)point of view the above result can be rewritten as
 yˆ  Y   Uˆ
2 2
i
1 = ........................................... (2.46)
 Y  Y   Y  Y 
2 2
We now define r2 as
 yˆ  Y 
2
2
r =
 Y  Y 
2
=
 yˆ 2
i
..................................................... (2.47)
y 2
41
ESS
Notice that (2.47) is nothing but Note Thus, r2 is the square of correlation coefficient r2
TSS
determines the proportion of the variation of Y which is explained by variations in X. For this
reason r2 is also called the coefficient of determination. It measures the proportion of the total
variation in Y explained by the regression model.
Note that by rearranging (2.46) we can come up with

RSS
r2 = 1 –
TSS
= 1
Uˆ i
2
 Y  Y 
2
................................................. (2.48)
The relationship between r2 and the slope ˆ1 indicates that r2 may be computed in various ways
given by the following formulas
r2 = ˆ1
 xy ................................................... (2.49)
y 2
= ˆ12
x 2
................................................... (2.50)
y 2
Two properties of r2 may be noted

i) It is a non-negative quantity. This is because we are dealing with sum of squares
ii) Its limit are 0  r2  1.
In this regard,
a) if r2 = 1 means a perfect fit, that is ŷi = Yi for each i (or alternativelyUi2 = 0)
b) if r2 = 0 means there is no relationship between the regression and the regressor

whatsoever. From (2.49) and (2.50) this implies ˆ1 = 0
Hence the closer r2 to 1, the model becomes a good fit for instance if r2 = 0.90 this means that
the regression line gives a good fit to the observed data, since this line explains 90 percent of
the total variation of the dependent variable values around their mean. The remaining 10
percent of the total variation of the dependent variable is unaccounted for by the regression line
and is attributed to the factors included in the disturbance variable, U.
42
Note that if we are working on cross section data, an r2 value equal to 0.5 may be a good fit.
But for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to
how much r2 should be. Generally, however, r2 is a good fit the higher the value of it is.
The adjusted coefficient of determination
One major problem with r2 is that it is a non-decreasing one. That is, it increases when
additional variables are included in the model. For example for the model Yi = 0 + 1Xi, let r2
= 0.5. Then when we increase the number of variables to Yi = 0 + 1X1 + 2X2, the r2 will be
greater than 0.5. Hence to arrive at r2 of higher value, one may be tempted to include irrelevant
variables to the model. That is, a completely non sensical variable can be included in the model
and r2 will increase. On top of the above note that including additional variables reduce the
degree of freedom where the lower the degree of freedom, the less reliable in the model. To
correct for this defect we adjust r2 by taking into account the degrees of freedom, which clearly
decreases as new regression are included in the function. The expression for the adjusted
coefficient of multiple determination is discussed in the next unit.
Example: Consider table 2.2 and the result obtained from it. Calculate Var ( ̂ 0 ), Var ( ˆ1 ) and
r2.
Solution: Notice that we can construct the value of the estimated Y and error term U. Recall
that we found Ŷi = 24.4 + 0.51Xi. Thus for each Xi of table 2.2, we can develop Ŷi. Once Ŷi is
obtained, subtracting it from each Yi gives the estimated error term. That is, ûi= Yi - Ŷi
The following table summarizes the result.

Yi2 = (Y- Ŷ)2 Ŷi ûi =Yi-Ŷ
1681 65.18 4.82
2116 75.36 -10.36
441 85.54 4.45
256 95.72 -0.72
1 105.91 4.09
16 116.09 -109
81 125.27 -6.27
841 136.45 3.54
1936 145.63 8.36
43
1521 156.82 -6.82
(sum) 8892 (sum)1107.97 -
X
2
Recall from (2.36) that Var ( ̂ 0 ) =  u

2 i
n x
2
i
Replacing  u2 by ˆ u2 
Uˆ i
where n = 10 and k = 2 we get ˆ u2 = 42.16
nk
42.16(322,000)
=  41.13
10(33,000)
42.16
=  0.0013
33,000
We can calculate r2 by using (2.47), (2.48), (2.49) or (2.50). for this example we use
(2.49) and (2.50)
0.51 (16,800)
Using (2.49), r2 = = 0.96
8890
(0.51)2 (33,000)
Using (2.50), r2 = = 0.96
8890
II. Testing the Significance of the Parameters Estimates
In addition to r2 testing of the reliability of the estimates ( ̂ 0 , ˆ1 ) should be done. That is, we
must see whether the estimates are statistically reliable. That is, since ̂ 0 and ˆ1 are sample
estimates of the parameters 0 and 1, the significance of the parameter estimates should be
seen. Note that given the assumption of normally distributed error term, the distribution of
estimates ̂ 0 and ˆ1 is also normal. That is, î ~ N[(E î ), Var( î )]
More formally,
2 
 X 12 

̂ 0 ~ N  0 ,  
 n xi2 
 
and
44
 1 
ˆ1 ~ N  1 ,  u2 

 n xi2 

The standard test of significance is explained through the standard error test and the t-test.
a) The Standard-Error test of the Least Squares Estimates
The least squares estimates ̂ 0 and ˆ1 are obtained from a sample of observations of Y and X.
Since sampling errors are inevitable in all estimates, it is necessary to apply tests of
significance in order to measure the size of the error and determine the degree of confidence in
the validity of the estimates.
Among a number of tests in this regard, we will examine the standard error test. This test helps
us to decide whether the estimates ̂ 0 and ˆ1 are significantly different from zero, i.e., whether
the sample from which they have been estimated might have come from a population whose
true parameters are zero (0 = 0 and/or 1 = 0). Formally we test the null hypothesis.
H0: i = 0 (i.e., X and Y have no relationship)
against the alternative hypothesis:
H1:   0 (i.e., Y and X have a relationship)
This is a two-tailed (or two sided) hypothesis. Very often such a two-sided alternative
hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about
the direction in which the alternative hypothesis should move from the null hypothesis.
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
Some times we have a strong a priori or theoretical expectation (or expectations based on some
previous empirical work) that the alternative hypothesis is one sided or unidirectional rather
than two-sided, as just discussed.
For instance in a consumption – income function C = 0 + 1Y one could postulate that:
H0: 1  0.3
H1: 1 > 0.3
45
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (1 ) is greater than 0.3. [Note: Students are strongly advised to refer and grasp the
discussion in unit 7 and 8 of the course Statistics for Economics]
Recall that in order to test a hypothesis of the kind discussed above we need to make use of Z
and t- tests
b) The Z-test of the least squares estimates
Recall what we said in statistics for economics course that the Z-test is applicable only if
a) the population variance is known, or
b) the population variance is unknown, and provided that the sample size is sufficiently
large (n > 30).
In econometric applications the population variance of Y is unknown. However, if we have a
large sample (n > 30) we may still use the standard normal distribution and perform the Z test.
If these conditions cannot be fulfilled, we apply the student’s t-test.
c) The Student’s t-Test

Broadly speaking, a test of significance is a procedure by which sample results are used to
verify the truth or falsity of a null hypothesis.
Recall that in our statistics to economics course we learned the formula which transforms the
value of any variable X into t units as shown below.
Xi  
t=
Sx
with n – 1 degrees of freedom
where  = value of the population mean
S x2 = Sample estimate of the population variance
n = Sample size
Accordingly the variable
î   i ˆ   i
t= = i ............................................... (2.51)
Var  i s.e. i
follows the t-distribution with n – k degrees of freedom, where

î = least square estimate of i
46
i = hypothesized value of i
Var î = estimated variance of i (from the regression)

n = Sample size
k = total number of estimated parameters
As we stated earlier in econometrics, the customary form of the null hypothesis is

H0: i = 0
H1: i  0
In this case the t-statistics reduces to
î  0 î
t* =  ....................................... (2.52)
S .e ( î ) S .e ( î )
were t* refers to the calculated (estimated) t value.
The sample value of t is estimated by dividing the estimate î by its standard error. This value
is compared to the theoretical (table) values of t that define the critical region in a two-tailed
(for the above case) test, with n – k degrees of freedom. Recall that the critical region depends
of the chosen level of significance (i.e., the value of ). If t* falls in the critical region we reject
the null hypothesis, that is, we accept that the estimate î is statistically significant
Acceptance
Rejection Region Rejection
region region
-t/2 t/2
Figure 2.10. Acceptance and rejection regions

Note that the critical value  t/2 will be changed to t or - t if the test it is a one-tailed test.
In the language of significance test, a statistic is said to be statistically significant if the value of
the test statistic lies in the critical region. That is, if : -
-t/2  t*  t/2
then accept H0 which implies that î has insignificant or marginal contribution to the model.
47
Recall that if it is a one tailed test, the rejection region is found only on one side. Hence, we
reject H0 if t* > t. or t* < -t.
Note that the t-test can be performed in an approximate way by simple inspection. For (n – k) >
8, if the observed t* is greater than 2 (or smaller than –2), we reject the null hypothesis at 5
percent level of significance. If on the other hand, the observed t* is smaller than 2 (but greater
than –2) we accept the null hypothesis at 5% level of significant.
Given (2.50) the sample value of t* would be greater than 2 if the relevant estimate ( ̂ 0 or ˆ1 )
is at least twice its standard deviation. In other words, we reject the null hypothesis if
t* > 2 if î > 2 s.e. ( î ) ........................................................... (2.53)
or S( î ) < î /2
Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.
Ĉ = 100 + 0.70Y
(75.5) (0.21)
where the figure in brackets are the standard errors of the coefficients β0= 100 and β1=0.70.
Are the estimates significant?
Solution: Since n<30, we use the t-test.
̂ 0 = 100
t* = = = 1.32
s.e ( ̂ 0 =) 75.5
and for β1,
ˆ1 = 0.70
t* = = = 3.3
s.e ( ˆ1 =) 0.21
The critical value of t for (n-k =) 18 degrees of freedom is t0.025 = 2.10
Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0: β0=0. Thus the estimated value β0 is insignificant.
48
But for ˆ1 , the calculated value (=3.3) is greater than the table value (2.10), we reject H0: β1=0
indicating that indeed the estimated value of β1 is significant in affecting the relationship
between the two variable.
In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we
may have low R2 values and low standard errors or high r2 values but high standard errors.
There is no agreement among econometricians in this case, so the main issue is whether to
obtain high r2 or lower standard error of the parameter estimates.
In general r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.
49

Econometrics Chapter Two

Uploaded by

Copyright:

Available Formats

Econometrics Chapter Two

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Chapter Two

Uploaded by

Copyright:

Available Formats

CHAPTER 2: THE TWO VARIABLE LINEAR REGRESSION MODEL

(SIMPLE LINEAR REGRESSION MODEL)

Suppose that consumptions is a function of income and write C= f(Y)or mathematically

Example: measuring taste is not an easy job.

2.2 POPULATION REGRESSION FUNCTION VS SAMPLE REGRESSION FUNCTION

Imagine a hypothetical country with a total population of 20 families. Suppose we are

E  Y  = f(Xi) = 0 + 1Xi ....................................... (2.3)

In most practical situations, however, what we have is a sample of Y values corresponding to

Figure 2.2 Sample regression function

Yˆi = ˆ1 + ˆ 2 Xi + Û i ...............................(2.7)

where Yˆ ( is read as “Y – hat” or “Y – cap”) = estimator of E  Y  , ˆ1 = estimator of 1 ,

2.3 THE METHOD OF ORDINARY LEAST SQUARES

2.3.1 The Concept of OLS

Recall the two-variable (Y and X) PRF

Where Yˆi is the estimated (conditional mean) value of Yi

Note that Û i = Yi – Yˆi

=  (Yi – ˆ0  ˆ1 X i )2 ..................................(2.8)

That meanse the mean value of Ui

Symbolically it implies that:

Var U i  =2

var U  < Var U .

Assumption 8: the regression model is correctly specified

Assumption 9: There is no perfect multicollinearity

1 and 2 that give the smallest possible value of Uˆ i

Thus, using partial derivatives we minimize Uˆ i

Simplifying (2.12) and (2.13) we generate the following normal equations.

XiYi = ˆ0  X i  ˆ1  X 2 i ......................................... (2.15)

Solving for ˆ1 from (2.15)we obtain:

Substituting (2.16) in place of ˆ1 of (2.14) we get,

Alternatively solving for ̂ 0 from (2.14) gives

= Y - ˆ1 X ....................................... (2.19)

where Y = mean of Y, X = mean of X

which is equal to (2.20)

Subject to ̂ 0 = 0 .......................................... (2.23)

be written as Y  ˆ0  ˆ1 X

iv. The residuals Ui are uncorrelated with Xi i.e., Uˆ Xi i 0

Table 2.2 Hypothetical data on construction expenditure Y and family income X

(322,000) (1,110) - (205,500)

alternatively, using (2.21) we obtain:

̂ 0 = 111 - 0.51 (170)

Ŷi = 24.4 + 0.51 Xi.......................... (2.26)

2. From the restricted minimization problem of (2.23), Compute ˆ1

estimator (BLUE) of i if the following hold:

Hence, an estimator is unbiased if its bias is zero. That is

which, in other words, means

An unbiased estimator with the least variance is known as an efficient estimator.

with a small variance.

The minimum mean – square-error (MSE) estimator criterion is a combination of the

OLS estimator: has the minimum variance

namely linear, unbiased and minimum Variance.

2.3.3 Precision or Standard Errors of Least Squares Estimates

Recall from (2.21) that ˆ1 =

Expanding this we obtain,

This is because xi = (X - X ) = 0

Substitute the PRF Yi = 0 + 1Xi + Ui into the above result we obtain

= 0Ki + 1KiXi + KiUi

Thus, the standard error (s.e.) of ˆ1 is given by

on the above Y we obtain

This means ̂ 0 - 0 = 1 X  U  ̂1 X

Hence, the standard error of ̂ 0 is given by