CH 11 Slides

ECO 227Y1 Foundations of Econometrics
Kuan Xu
University of Toronto
[email protected]
April 3, 2024
Kuan Xu (UofT) ECO 227 April 3, 2024 1 / 41

Ch 11 Linear Models and Estimation by Least Squares
1 Introduction
2 Linear Statistical Models
3 The Method of Least Squares
4 Properties of the Least-Squares Estimators: Simple Linear Regression
5 Inferences Concerning the Parameters βi

Introduction (1)
In this chapter, we attempt to explain the estimation of a population

relationship between Y and X (or X1 , X2 , . . . , Xn ) in a simple (or
multiple) linear regression model, which goes beyond the previous
univariate analysis on the mean (proportion) or their unconditional
differences.
We further extend the method of least squares briefly introduced in
Ch 9.
Here, Y is called the dependent variable whereas X ’s are called
independent variables. The expressions with in(dependence) are not
in(dependence) in the probability sense. They are merely the names
referring to what can be changed and what will follow those changes.

Introduction (2)
The linear regression models are based on the deterministic linear

models:
y = β0 + β1 x
or
y = β0 + β1 x1 + β2 x2 + · · · + βk xk .
The linear regression models take into account some random
variations (ε) of the relationship modeled by their deterministic
counterparts:
Y = β0 + β1 x + ε
or
Y = β0 + β1 x1 + β2 x2 + · · · + βk xk + ε.

Introduction (3)
Fig. 11.1, p. 564
Figure: Plot of Data and Underlying Relationship

Introduction (4)
In view of the above figure, we might write the model for the data as
E (Y ) = β0 + β1 x
or, equivalently,
Y = β0 + β1 x + ε.
We may add the distribution assumption to the above model.
Fig. 11.2, p. 565
Figure: Graph of Y = β0 + β1 x
Introduction (5)
The natural and social science may post many different theories. For
example, in economics, we postulate that the level of wage is a
function of the level of education attainment as a theory; that is,
wage = β0 + β1 edu + ε,
where β1 > 0. Using the model, we are able to estimate β1 and test
H0 : β1 = β10 against Ha : β1 < β10 .
If the above model is not refuted by the data, we may use it for
decision making and prediction.

Linear Statistical Models (1)
The meaning of “linear” in linear statistical models should be clarified.

In the context of E (Y ) = β0 + β1 x, the word “linear” refer to the
fact that E (Y ) is the linear function of β0 and β1 , not of x.
Let us look at some other linear statistical models:
E (W ) = β0 + β1 ln(x),
where W = ln(Y ).
E (Y ) = β0 + β2 x + β3 x 2 .
E (Y ) = β0 + β1 x1 + β2 x2 .
E (Y ) = β0 + β1 x1 + β2 x2 + β3 x1 x2 + β4 x12 + β5 x22 .

Fig. 11.3, p. 567
Figure: E (Y ) = β0 + β2 x1 + β3 x2

Fig. 11.4, p. 568
Figure: E (Y ) = β0 + β1 x1 + β2 x2 + β3 x1 x2 + β4 x12 + β5 x22

A Linear Statistical Model

A linear statistical model relating a random response Y to a set of
independent variables x1 , x2 , . . . , xk is of the form
Y = β0 + β1 x1 + β2 x2 + · · · + βk xk + ε,
where β0 , β1 , . . . , βk are unknown parameters, ε is a random variable, and

the variables x1 , x2 , . . . , xk assume known values. We also assume that
E (ε) = 0 and V (ε) = σ 2 . Hence,
E (Y ) = β0 + β1 x1 + β2 x2 + · · · + βk xk .
For given data yi , x1i , x2i , . . . , xki , where i = 1, 2, . . . , n, we can use the
method of least squares to derive estimates for β0 , β1 , . . . , βk .
Note that, for given values of xj (j = 1, 2, . . . , k), V (Y ) = V (ε) = σ 2 .

The Method of Least Squares (1)
We show how to use the method of least squares to estimate the simple
linear regression model Y = β0 + β1 x + ε. Assume that we can find some
estimators for β0 and β1 , β̃0 and β̃1 . Then we will calculate the predicted
value of Y and the error of the estimation. For i = 1, 2, . . . , n,
ỹi = β̃0 + β̃1 xi ,
and
ε̃i = yi − ỹi = yi − (β̃0 + β̃1 xi ).
We can define the sum of squared errors as
n
X n
X
ε̃2i = [yi − (β̃0 + β̃1 xi )]2 .
i=1 i=1
The method of least squares is to solve the following problem:
n
X
min ε̃2i .
β̃0 ,β̃1
i=1

We can use the following figure to show the idea of the method of least
squares.
Fig. 11.5, p. 569

Because ni=1 ε̃2i is a quadratic

P
function that has the minimum, we can
take the derivatives of ni=1 ε̃2i with respect to β̃0 and β̃1 , respectively, and
P
set the derivatives to zero to obtain the first-order conditions (f.o.c.’s):1
Pn 2 n n
∂ i=1 ε̃i
X X
= −2 [yi − (β̂0 + β̂1 xi )] = 0. ⇒ ε̂i = 0.
∂ β̃0 i=1 i=1
Pn 2 n n
∂ i=1 ε̃i
X X
= −2 [yi − (β̂0 + β̂1 xi )]xi = 0. ⇒ ε̂i xi = 0.
∂ β̃1 i=1 i=1
These two f.o.c. are also called the least-squares equations or normal
equations.
1
Note that β̃j becomes β̂j for j = 0, 1. And β̂0 and β̂0 are the optimal and unique
estimators.
Solve the two f.o.c. for β̂0 and β̂1 .
X X
β̂0 n + β̂1 xi = yi
i i
X X X
β̂0 xi + β̂1 xi2 = xi yi
i i i
det(Ai )
Using the Cramer’s rule—for Aβ = q, the solution is βi = det(A) —to find
β̂1 .
P P P
n i xi yi − i xi i yi
β̂1 =
n i xi2 − i xi i xi
P P P
P 1P P
i xi yi − n i xi i yi
= P 2 1 P 2
x − n ( i xi )
P i i
i (x − x̄)(yi − ȳ )
= Pi 2
i (xi − x̄)

Solving for β̂0 is straightforward; just write the sample regression model as
yi = β̂0 + β̂1 xi + ε̂i .

P
Sum yi for all i = 1, 2, . . . , n and divided i yi by n to get
ȳ = β̂0 + β̂1 x̄ + ε̄,

P
where ε̄ = 0 because i ε̂i = 0 according to the f.o.c. Therefore, we have
β̂0 = ȳ − β̂1 x̄.

Let Sxy = i (xi − x̄)(yi − ȳ ) and Sxx = i (xi − x̄)2 . We can summarize
P P
our derivations of the least-squares estimators as follows.
Least-Squares Estimators for the Simple Linear Regression Model
Sxy
(a) β̂1 = Sxx .
(b) β̂0 = ȳ − β̂1 ȳ .

Example:
Find β̂0 and β̂1 and the fitted regression line ŷi given the data:
Table 11.1, p. 571

Solution:
Table 11.2, p. 571

Solution (continued):
1P
7 − 15 (0)(5)
P P
i xi yi − n i xi i yi
β̂1 = P 2 1 P 2
= 1 2
= .7.
x
i i − n ( i x i ) 10 − n (0)
5
β̂0 = ȳ − β̂1 x̄ = − (.7)(0) = 1.
5
For i = 1, 2, . . . , n,
ŷi = β̂0 + β̂i xi = 1 + .7xi .

Solution (continued): We can draw ŷi through the data as shown below.
Fig. 11.6, p. 572

Properties of the Least-Squares Estimators: Simple Linear
Regression (1)
We now evaluate the properties of β̂1 and β̂0 .
P
First, focus on E (β̂1 ) = β1 and E (β̂0 ) = β0 . Because i (xi − x̄) = 0, we
can write P
i (xi − x̄)Yi
β̂1 = .
Sxx
P
i (xi − x̄)Yi
E (β̂1 ) = E
Sxx
P
(x
i i − x̄)E (Yi )
=
Sxx
P
i (x i − x̄)(β0 + β1 xi )
=
Sxx
P
X (xi − x̄)xi
= β0 (xi − x̄) + β1 i
Sxx
i
Sxx
= β1 = β1 .
Sxx
Regression (2)
Because β̂0 = Ȳ − β̂1 x̄, we have
E (β̂0 ) = E (Ȳ ) − E (β̂1 )x̄

= β0 + β1 x̄ − β1 x̄
= β0 .
Second, we focus on V (β̂1 ), V (β̂0 ), and Cov (β̂0 , β̂1 ) P

i (xi −x̄)Yi
It is rather straightforward to derive V (β̂1 ). Recall β̂1 = Sxx .
P
i (xi − x̄)Yi
V (β̂1 ) = V
Sxx
2 X
1
= V ((xi − x̄)Yi )
Sxx
i

Regression (3)
2 X
1
V (β̂1 ) = (xi − x̄)2 V (Yi )
Sxx
i
σ2
= . ∵ V (Yi ) = σ 2 .
Sxx
It will take more steps to derive V (β̂0 ). We start from β̂0 = Ȳ − β̂1 x̄.
V (β̂0 ) = V (Ȳ ) +x̄ 2 V (β̂1 ) − 2x̄ Cov (Ȳ , β̂1 ) .

O O
| {z } | {z }
A B
OA : V (Ȳ ) = V (ε̄) = 1
n2
P
i V (εi ) = n1 V (εi ) = σ2
n .

Regression (4)
OB : Rewrite β̂ 1 =
P
i (xi −x̄)Yi
Sxx =
P
i ci Yi , where ci = xi −x̄
Sxx and
P
i ci = 0.
Then, we have
X
1 X
C (Ȳ , β̂1 ) = Cov Yi , ci Yi
n
X ci X X cj
= V (Yi ) + Cov (Yi , Yj )
n n
i̸=j
X ci
2
= σ + 0, ∵ Yi and Yj are independent
n
= 0.
O O
Using the results in A and B , we get
σ2 σ2 x̄ 2

1
V (β̂0 ) = + x̄ 2 = σ2 +
n Sxx n Sxx

Regression (5)
Sxx + nx̄ 2 σ 2 xi2

P
V (β̂0 ) = σ 2 = ,
nSxx nSxx
because Sxx = (xi − x̄)2 =
P 2
xi − nx̄ 2 .
P
With the results we have had so far, it is rather straightforward to show
Cov (β̂0 , β̂1 ) by applying Theorem 5.12.
Cov (β̂0 , β̂1 ) = Cov (Ȳ − β̂1 x̄, β̂1 )

= Cov (Ȳ , β̂1 ) − Cov (β̂1 x̄, β̂1 )
= 0 − x̄V (β̂1 )
−x̄σ 2
= .
Sxx
It is interesting to note that when x̄ = 0, Cov (β̂0 , β̂1 ) = 0.

Regression (6)
σ2
P 2
xi σ2
Note that we have so far derived V (β̂0 ) = nSxx , V (β̂1 ) = Sxx , and
−x̄σ 2
Cov (β̂0 , β̂1 ) =Sxx ,
where σ2is unknown and must be estimated. For
this simple regression,, the unbiased estimator for σ 2 is
n n
1 X 1 X
S2 = (Yi − Ŷi )2 = [Yi − (β̂0 + β̂1 xi )]2 .
n−2 n−2
i=1 i=1
Here, SSE = ni=1 (Yi − Ŷi )2 is used to denote the sum of squared errors.
P
It can be shown
(n − 2)S 2
∼ χ2 (n − 2).
σ2

Regression (7)
Because of the abovePresult, we have E (S 2 ) = σ 2 . We can substitute S 2

σ2 xi2 2 2
for σ 2 in V (β̂0 ) = nS xx
, V (β̂1 ) = Sσxx , and Cov (β̂0 , β̂1 ) = −x̄σ
S xx
to
S2
P 2
xi S2 −x̄S 2
obtain V̂ (β̂0 ) = nSxx , V̂ (β̂1 ) = Sxx , and Cov (β̂0 , β̂1 ) = Sxx .
d
To simplify the notation, we adopt the following convention for this
course.
S 2 = 0.36667 S = 0.60553
P 2
c00 = xi /(nSxx ) = .2 Var (β̂0 ) = σ 2 c00 V̂ (β̂0 ) = S 2 c00 = (0.2708)2
c11 = 1/Sxx = .1 V (β̂1 ) = σ 2 c11 V̂ (β̂1 ) = S 2 c11 = (0.19149)2
c01 = −x̄/Sxx = 0 Cov (β̂0 , β̂1 ) = σ 2 c01 d (β̂0 , β̂1 ) = S 2 c01 = 0
Cov
Note that the data in the previous example are used to get the values of S 2 , S, c00 , c11 , c01 , V̂ (β̂0 ), V̂ (β̂1 ), and Cov
d (β̂0 , β̂1 ).

Regression (8)
We use R to estimate the model in the previous example using lm

command.
> # data
> x = c(-2, -1, 0, 1, 2)
> y = c(0, 0, 1, 1, 3)
> # sample size n
> n = length(y)
> # plot data
> #plot(x,y)
> # estimate model
> my.model = lm(y~x)
> # report my.model results
> summary(my.model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.000 0.271 3.69 0.034 *
x 0.700 0.191 3.66 0.035 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.606 on 3 degrees of freedom

Multiple R-squared: 0.817,Adjusted R-squared: 0.756
F-statistic: 13.4 on 1 and 3 DF, p-value: 0.0354

Regression (9)
3.0
2.5
2.0
1.5
y
1.0
0.5
0.0
−2 −1 0 1 2
Figure: Plot for the Example Generated by R

Regression (10)
> # get residuals u

> u = my.model$residuals
> # calculate S^2
> s2 = sum(u^2)/(n-2)
> s2
[1] 0.36667
> s = sqrt(s2)
> s
[1] 0.60553
> # Sxx
> Sxx = sum((x-mean(x))^2)
> Sxx
[1] 10
> # c00
> c00 = sum(x^2)/(n*Sxx)
> c00
[1] 0.2
> # c11
> c11 = 1/Sxx
> c11
[1] 0.1
> # c01
> c01 = -mean(x)/Sxx
> c01
[1] 0

Regression (11)
> # Vhat.beta0
> Vhat.beta0 = s2*c00
> Vhat.beta0
[1] 0.073333
> Std.err.beta0 = sqrt(Vhat.beta0)
> Std.err.beta0
[1] 0.2708
> # Vhat.beta1
> Vhat.beta1 = s2*c11
> Vhat.beta1
[1] 0.036667
> Std.err.beta1 = sqrt(Vhat.beta1)
> Std.err.beta1
[1] 0.19149
> # Cov.hat.beta0.beta1
> Cov.hat.beta0.beta1 = s2*c01
> Cov.hat.beta0.beta1
[1] 0

Inferences Concerning the Parameters βi (1)
Test of Hypothesis for βi
H0 : βi = βi0 .

βi > βi0 ,

Ha : βi < βi0 ,

βi ̸= βi0 .

β̂i − βi0
Test statistic: T = √ .
S cii

t > tα ,

Rejection region: t < −tα ,

|t| > tα/2 .

Note that tα is the critical value of t such that P(t > tα ) = α based on n − 2 df.
> summary(my.model)
Coefficients:
(Intercept) 1.000 0.271 3.69 0.034 *
x 0.700 0.191 3.66 0.035 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


> # Test H_0: beta_i = 0 versus H_a: beta_i neq 0
> # extract b0 and b1
> b0 = my.model$coefficients[1]
> b1 = my.model$coefficients[2]
> # T test statistic
> T1 = (b1-0)/(s*sqrt(c11))
> T1
x
3.6556
> p.value.T1 = 2*pt(T1,n-2,lower.tail = FALSE)
> p.value.T1
x
0.035353
> T0 = (b0-0)/(s*sqrt(c00))
> T0
(Intercept)
3.6927
> p.value.T0 = 2*pt(T0,n-2,lower.tail = FALSE)
> p.value.T0
(Intercept)
0.034451
> # get t.half.alpha
> sig = .05
> t.half.alpha = qt(sig/2, n-2,lower.tail = FALSE)
> t.half.alpha
[1] 3.1824
> isTRUE(abs(T1)> t.half.alpha)
[1] TRUE
> # reject H_0
> isTRUE(abs(T0)> t.half.alpha)
[1] TRUE
> # reject H_0

A 100(1 − α)% Confidence Interval for βi

√
β̂i ± tα/2 S cii ,
where i = 0, 1.
> # calculate the 100*(1-alpha)% confidence interval for beta_i

> lower.bound.b1 = b1-t.half.alpha*s*sqrt(c11)
> lower.bound.b1
x
0.090608
> upper.bound.b1 = b1+t.half.alpha*s*sqrt(c11)
> upper.bound.b1
x
1.3094
> lower.bound.b0 = b0-t.half.alpha*s*sqrt(c00)
> lower.bound.b0
(Intercept)
0.13819
> upper.bound.b0 = b0+t.half.alpha*s*sqrt(c00)
> upper.bound.b0
(Intercept)
1.8618

Goodness of Fit
For the simple linear regression model Y = β0 + β1 x + ε, we can estimate
the model using data {yi , xi }ni=1 . We list three ways of measuring
goodness of fit.
(ŷ −ȳ )2
P
1. R squared: R 2 = Pi i 2.
i (yi −ȳ )
V̂ (ŷ ) Sŷ2
2. Multiple R squared: Multiple R 2 = V̂ (y )
= Sy2
.
(y −ŷ )2 /(n−k)
P
3. Adjusted R squared: Adjusted R 2 = 1 − Pi (yi −ȳi )2 /(n−1) , where k = 2
i i
for the simple linear regression model and n = # of observations.
Remarks: Note that (1) and (2) are the same.

> summary(my.model)
Coefficients:
(Intercept) 1.000 0.271 3.69 0.034 *
x 0.700 0.191 3.66 0.035 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> # R squares of various kinds

> # get yhat
> yhat = my.model$fitted.values
> # Multiple R-squared
> Multiple.R.squared = var(yhat)/var(y)
> Multiple.R.squared
[1] 0.81667
> # R-squared
> R.squared = sum((yhat-mean(y))^2)/sum((y-mean(y))^2)
> R.squared
[1] 0.81667
> # Adjusted R-squared
> Adjusted.R.squared = 1 - (sum((y-yhat)^2)/(n-2))/(sum((y-mean(y))^2)/(n-1))
> Adjusted.R.squared
[1] 0.75556
> # Recall my.model has Multiple R-squared: 0.817,Adjusted R-squared: 0.756

F Test for Overall Significance of the Simple Linear Regression Model

For the simple linear regression model Y = β0 + β1 x + ε, we can estimate
the model using data {yi , xi }ni=1 . We can test
H0 : β 0 = β 1 = 0
against
Ha : At least one of βi′ s is not zero.
(ŷi − ȳ )2 /(k − 1)
P
Test statistic: F = P i 2
∼ F (k − 1, n − k)
i (yi − ŷi ) /(n − k)
Rejection region: RR = {F > Fα },
where P(F > Fα ) = α, k = 2 for the simple linear regression model and
n = # of observations.
> # F test for overall signficance of my.model

> F.test = (sum((yhat-mean(y))^2)/(2-1))/(sum((y-yhat)^2)/(n-2)) # follows F(k-1=1,n-k=3)
> F.test
[1] 13.364
> p.value.F.test = pf(F.test,1,3,lower.tail = FALSE)
> p.value.F.test
[1] 0.035353
> # Recall my.model has F-statistic: 13.4 on 1 and 3 DF, p-value: 0.0354

The End

CH 11 Slides

Uploaded by

Copyright:

Available Formats

CH 11 Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 11 Slides

Uploaded by

Copyright:

Available Formats

ECO 227Y1 Foundations of Econometrics

Kuan Xu (UofT) ECO 227 April 3, 2024 1 / 41

2 Linear Statistical Models

3 The Method of Least Squares

4 Properties of the Least-Squares Estimators: Simple Linear Regression

5 Inferences Concerning the Parameters βi

Kuan Xu (UofT) ECO 227 April 3, 2024 2 / 41

In this chapter, we attempt to explain the estimation of a population

Kuan Xu (UofT) ECO 227 April 3, 2024 3 / 41

The linear regression models are based on the deterministic linear

Kuan Xu (UofT) ECO 227 April 3, 2024 4 / 41

Fig. 11.1, p. 564

Figure: Plot of Data and Underlying Relationship

Kuan Xu (UofT) ECO 227 April 3, 2024 5 / 41

Fig. 11.2, p. 565

Kuan Xu (UofT) ECO 227 April 3, 2024 7 / 41

The meaning of “linear” in linear statistical models should be clarified.

Kuan Xu (UofT) ECO 227 April 3, 2024 8 / 41

Fig. 11.3, p. 567

Kuan Xu (UofT) ECO 227 April 3, 2024 9 / 41

Fig. 11.4, p. 568

Figure: E (Y ) = β0 + β1 x1 + β2 x2 + β3 x1 x2 + β4 x12 + β5 x22

Kuan Xu (UofT) ECO 227 April 3, 2024 10 / 41

A Linear Statistical Model

where β0 , β1 , . . . , βk are unknown parameters, ε is a random variable, and

Kuan Xu (UofT) ECO 227 April 3, 2024 11 / 41

Kuan Xu (UofT) ECO 227 April 3, 2024 12 / 41

Fig. 11.5, p. 569

Kuan Xu (UofT) ECO 227 April 3, 2024 13 / 41

Because ni=1 ε̃2i is a quadratic

Kuan Xu (UofT) ECO 227 April 3, 2024 15 / 41

yi = β̂0 + β̂1 xi + ε̂i .

ȳ = β̂0 + β̂1 x̄ + ε̄,

β̂0 = ȳ − β̂1 x̄.

Kuan Xu (UofT) ECO 227 April 3, 2024 16 / 41

Kuan Xu (UofT) ECO 227 April 3, 2024 17 / 41

Table 11.1, p. 571

Kuan Xu (UofT) ECO 227 April 3, 2024 18 / 41

Table 11.2, p. 571

Kuan Xu (UofT) ECO 227 April 3, 2024 19 / 41

Kuan Xu (UofT) ECO 227 April 3, 2024 20 / 41

Fig. 11.6, p. 572

Kuan Xu (UofT) ECO 227 April 3, 2024 21 / 41

E (β̂0 ) = E (Ȳ ) − E (β̂1 )x̄

Second, we focus on V (β̂1 ), V (β̂0 ), and Cov (β̂0 , β̂1 ) P

Kuan Xu (UofT) ECO 227 April 3, 2024 23 / 41

V (β̂0 ) = V (Ȳ ) +x̄ 2 V (β̂1 ) − 2x̄ Cov (Ȳ , β̂1 ) .

Kuan Xu (UofT) ECO 227 April 3, 2024 24 / 41

Kuan Xu (UofT) ECO 227 April 3, 2024 25 / 41

Sxx + nx̄ 2 σ 2 xi2

Cov (β̂0 , β̂1 ) = Cov (Ȳ − β̂1 x̄, β̂1 )

It is interesting to note that when x̄ = 0, Cov (β̂0 , β̂1 ) = 0.

Kuan Xu (UofT) ECO 227 April 3, 2024 27 / 41

Because of the abovePresult, we have E (S 2 ) = σ 2 . We can substitute S 2

Kuan Xu (UofT) ECO 227 April 3, 2024 28 / 41

We use R to estimate the model in the previous example using lm

Residual standard error: 0.606 on 3 degrees of freedom

Kuan Xu (UofT) ECO 227 April 3, 2024 29 / 41

Figure: Plot for the Example Generated by R