FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

FECO Note 2 - Simple Linear Regression

Xuan Chinh Mai

February 3, 2018

Contents

1 Simple Linear Regression Model 2

2 OLS Estimator 3

3 Properties of the OLS Estimator 4

4 Goodness-of-fit 6

5 Limitations of the Simple Linear Regression 7

1
1 Simple Linear Regression Model

Functional form:

Y
|{z} = β0 + β1 |{z}
X + U
|{z} (1)
|{z} |{z}
Dependent Variable Intercept Slope Regressor Error Term

Assumptions of the model:

A1 Linear parameters

A2 Zero mean condition: E(U |X) = E(U ) = 0

A3 Random sampling: {(Xi , Yi )}N


i=1 is i.i.d.

A4 Finite fourth moments (no outliers): 0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞

A5 Homoskedasticity, V ar(U |X) = σU2

From these assumptions, the conditional expectation of Y given X is a function of X:

E(Y |X) ≡ g(X) = β0 + β1 X

Before going further, it is necessary to mention some twists as they will make things simpler.
N
X
(X − X) = 0 (2)
i=1
XN
Xi (X − X)
i=1
XN
= (X − X)(Xi + X − X)
i=1
N
X N
X
= (X − X)2 − X (X − X)
i=1 i=1
N
2
X
= (X − X)2
i=1
XN N
X
⇒ Xi (X − X) = (X − X)2 (3)
i=1 i=1

2
2 OLS Estimator

Using the OLS Estimator, we have the fitted model:

Ŷi = β̂0 + β̂1 Xi , and Ûi = Yi − Ŷi , ∀i ∈ {1, ..., N } (4)


N
X N 
X 2
min Ûi2 = Yi − β̂0 − β̂1 Xi
β̂0 ,β̂1
i=1 i=1

First order conditions:


N
∂ X
: Ûi = 0 (5)
∂ β̂0 i=1
N
∂ X
: Ûi Xi = 0 (6)
∂ β̂1 i=1

From 5, we have:
N
X
Ûi = 0 ⇔ β̂0 = Y − β̂1 X (7)
i=1

From 6, we have:
N
X
Ûi Xi =0
i=1
N  
3,5
X 
⇔ Yi − β̂0 − β̂1 Xi Xi − X =0
i=1
N h   i
7
X 
⇔ Yi − Y − β̂1 X − β̂1 Xi Xi − X =0
i=1
N N
2
X   X 2
⇔ Yi − Y Xi − X = β̂1 Xi − X
i=1 i=1
PN  
i=1 Yi − Y Xi − X
⇔β̂1 = PN 2 (8)
i=1 Xi − X

From 7 and 8, we have the formulas for the OLS Estimators:


PN  
i=1 Yi − Y Xi − X
β̂0 = Y − β̂1 X and β̂1 = PN 2
i=1 Xi − X

3
Moreover, from 7 and 8, we have some other twists:
PN  
i=1 Yi − Y Xi − X
β̂1 = PN 2
i=1 Xi − X
PN   
1 i=1 (β0 + β1 Xi + Ui ) − Y Xi − X
⇔ β̂1 = PN 2
i=1 Xi − X
PN PN
2 i=1 Xi (Xi − X) (Xi − X)Ui
⇔ β̂1 = β1 PN + Pi=1
N
2 2
i=1 (Xi − X) i=1 (Xi − X)
PN
3 (Xi − X)Ui
⇔ β̂1 = β1 + Pi=1N
(9)
2
i=1 (Xi − X)
N
1 X
Y = (β0 + β1 Xi + Ui ) = β0 + β1 X + U (10)
N i=1
β̂0 = Y − β̂1 X
 
10
⇔ β̂0 = β0 + β1 − β̂1 X + U (11)

3 Properties of the OLS Estimator

Remark: When dealing with conditional expectation, e.g E(Ui Xi |Xi ), the variables after
the conditional term indicate those are being controlled (fixed) and, thus, could be treated
as a constant, so we can take those controlled variables out of the expectation operation, e.g
E(Ui Xi |Xi ) = Xi E(Ui |Xi ). Under the first four assumptions (A1-A4) of the Simple Linear
Regression Model:
" PN # PN
 
9 i=1 (X i − X)U i
(Xi − X)E(Ui |Xi ) A2
E β̂1 Xi = E β1 + PN Xi = β1 + i=1PN = β1

2
2
i=1 (Xi − X) i=1 (Xi − X)

N
 
11
h   i 1 X A2
E β̂0 Xi = E β0 + β1 − β̂1 X + U Xi = β0 +[β1 − E (β1 |Xi )] X+ E (Ui |Xi ) = β0

N i=1
Hence, the OLS estimators of β0 and β1 are unbiased
   
E β̂0 Xi = β0 and E β̂1 Xi = β1

4
Moreover, under the fifth assumption (A5), when the error term Ui is homoskedastic, the
variance of these estimators are:
" PN #
 
9 (X i − X)U
i
V ar β̂1 Xi = V ar β1 + Pi=1 X i

N 2
i=1 (X i − X)
PN
(Xi − X)2 V ar(Ui |Xi )
= i=1hP i2
N 2
i=1 (X i − X)
PN
(Xi − X)2
= σU2 hP i=1 i2
N 2
i=1 (Xi − X)
  σU2 P 1 σU2
⇔ V ar β̂1 Xi = PN → (12)

i=1 (Xi − X)2 N V ar(X)
  h   i
11
V ar β̂0 Xi = V ar β0 + β1 − β̂1 X + U Xi

 
2 
= X V ar β1 − β̂1 Xi + V ar U Xi

2
  1
= X V ar β̂1 Xi + V ar (Ui |Xi )

N
2
12 2 σU 1
= X PN + σU2
i=1 (Xi − X)
2 N
PN 2 PN 2
2 i=1 X + i=1 Xi − X
= σU 2
N N
P
i=1 Xi − X
PN h 2 i
i=1 Xi − X + X − 2X Xi − X
= σU2 2
N N
P
i=1 Xi − X
PN
2
  X2 2
P σU E (X )
2
⇔ V ar β̂0 Xi = σU PN i=1 i
2
→ (13)

2
N i=1 Xi − X N V ar(X)

Therefore, when N is large, from 12 and 13 we have:


PN
  Xi2 σU2 E (X 2 )
V ar β̂0 Xi = σU2 i=1
2 ≈

PN N V ar(X)
N i=1 Xi − X

and
  σU2 1 σU2
V ar β̂1 Xi = PN ≈

i=1 (Xi − X)2 N V ar(X)

When the assumption A5 doesn’t hold, in large sample, the OLS estimators β̂0 and β̂1 have
a jointly normal sample distribution with the variances of this distribution as follows:

1 V ar[(Xi − µX )Ui ] 1 V ar(Hi Ui ) E(Xi )


V ar(βˆ1 ) = 2
and V ar(βˆ0 ) = 2 2
, where Hi ≡ 1 − Xi
N σX N E(Hi ) E(Xi2 )

5
It can be seen that the variances of β̂0 and β̂1 converge to zero as N → ∞, so they are
consistent. When N is large, β̂0 and β̂1 are close to β0 and β1 with high probability. The
larger V ar(X) is, the smaller V ar(β̂1 ) is, which means that the more spread out the sample
of X is, the easier it is to trace out the relationship between Y and X. Moreover, the smaller
V ar(U ) is, the smaller V ar(β̂1 ) is. That implies if U are smaller, the data will have a tighter
scatter around the regression line so its slope will be estimated more accurately.

As σU2 is unknown due to unknown U , we have to estimate it. Note that:


A2
V ar(U |X) = E U 2 X − E (U |X)2 = E U 2 X
 

⇔ E U 2 X = σU2 = E U 2

(14)
Hence, the unbiased estimator of σU2 is
N
1 X 2
≡ σ̂U2 Û
N − 2 i=1 i
   
then the estimators for V ar β̂0 and V ar β̂1 are:
v
u PN 2
σ̂U i=1 Xi
    u1
SE β̂1 ≡ qP and SE β̂ 0 ≡ σ̂U
t
N N Xi − X 2
2 P 
N
i=1 X i − X i=1

4 Goodness-of-fit

The Goodness-of-fit of the model is represented by R − squared which shows how well the
model could explain the deviations of the sample. It is the ratio between the explained
deviations and total deviations of the model. Defining following terms:
N
X 2
Total Sum of Squared: T SS = Yi − Y (15)
i=1
XN  2
Explained Sum of Squared: ESS = Ŷi − Y (16)
i=1
XN  2 N
X
Sum of Squared Residuals: SSR = Yi − Ŷi = Ûi2 (17)
i=1 i=1

From 15, we have:


N
X N h
X i2
2
T SS = (Yi − Y ) = (Yi − Ŷi ) + (Ŷi − Y )
i=1 i=1
XN  2 XN  2 N 
X  
= Yi − Ŷi + Ŷi − Y +2 Yi − Ŷi Ŷi − Y
i=1 i=1 i=1

6
where:
N 
X   XN   N
X N
X  
Yi − Ŷi Ŷi − Y = Ûi Ŷi − Y = −Y Ûi + Ûi β̂0 + β̂1 Xi
i=1 i=1 i=1 i=1
N N N
5,6
X X X
= −Y Ûi + β̂0 Ûi + β̂1 Ûi Xi = 0
i=1 i=1 i=1

hence,
N
X N 
X 2 N 
X 2
2
(Yi − Y ) = Yi − Ŷi + Ŷi − Y
i=1 i=1 i=1
⇔ T SS = ESS + SSR (18)

Then the formula of R − squared is:


ESS SSR
R2 = =1− , 0 < R2 < 1 (19)
T SS T SS

5 Limitations of the Simple Linear Regression

The Simple Linear Regression (SLR) model performs poorly in certain cases, especially when
one of its assumptions doesn’t hold:

• The SLR using OLS does poorly when there are outliers (A4 doesn’t hold)

• OLS doesn’t necessarily imply a causal effect, so without further assumptions the
meanings of the coefficient should be interpreted carefully

• The linear regression does poorly at very low/high level of the regressor as it depends
on the spread of the sample

• The negative intercept should be interpreted with caution

You might also like