FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
February 3, 2018
Contents
2 OLS Estimator 3
4 Goodness-of-fit 6
1
1 Simple Linear Regression Model
Functional form:
Y
|{z} = β0 + β1 |{z}
X + U
|{z} (1)
|{z} |{z}
Dependent Variable Intercept Slope Regressor Error Term
A1 Linear parameters
A4 Finite fourth moments (no outliers): 0 < E(X 4 ) < ∞ and 0 < E(Y 4 ) < ∞
Before going further, it is necessary to mention some twists as they will make things simpler.
N
X
(X − X) = 0 (2)
i=1
XN
Xi (X − X)
i=1
XN
= (X − X)(Xi + X − X)
i=1
N
X N
X
= (X − X)2 − X (X − X)
i=1 i=1
N
2
X
= (X − X)2
i=1
XN N
X
⇒ Xi (X − X) = (X − X)2 (3)
i=1 i=1
2
2 OLS Estimator
From 5, we have:
N
X
Ûi = 0 ⇔ β̂0 = Y − β̂1 X (7)
i=1
From 6, we have:
N
X
Ûi Xi =0
i=1
N
3,5
X
⇔ Yi − β̂0 − β̂1 Xi Xi − X =0
i=1
N h i
7
X
⇔ Yi − Y − β̂1 X − β̂1 Xi Xi − X =0
i=1
N N
2
X X 2
⇔ Yi − Y Xi − X = β̂1 Xi − X
i=1 i=1
PN
i=1 Yi − Y Xi − X
⇔β̂1 = PN 2 (8)
i=1 Xi − X
3
Moreover, from 7 and 8, we have some other twists:
PN
i=1 Yi − Y Xi − X
β̂1 = PN 2
i=1 Xi − X
PN
1 i=1 (β0 + β1 Xi + Ui ) − Y Xi − X
⇔ β̂1 = PN 2
i=1 Xi − X
PN PN
2 i=1 Xi (Xi − X) (Xi − X)Ui
⇔ β̂1 = β1 PN + Pi=1
N
2 2
i=1 (Xi − X) i=1 (Xi − X)
PN
3 (Xi − X)Ui
⇔ β̂1 = β1 + Pi=1N
(9)
2
i=1 (Xi − X)
N
1 X
Y = (β0 + β1 Xi + Ui ) = β0 + β1 X + U (10)
N i=1
β̂0 = Y − β̂1 X
10
⇔ β̂0 = β0 + β1 − β̂1 X + U (11)
Remark: When dealing with conditional expectation, e.g E(Ui Xi |Xi ), the variables after
the conditional term indicate those are being controlled (fixed) and, thus, could be treated
as a constant, so we can take those controlled variables out of the expectation operation, e.g
E(Ui Xi |Xi ) = Xi E(Ui |Xi ). Under the first four assumptions (A1-A4) of the Simple Linear
Regression Model:
" PN # PN
9 i=1 (X i − X)U i
(Xi − X)E(Ui |Xi ) A2
E β̂1 Xi = E β1 + PN Xi = β1 + i=1PN = β1
2
2
i=1 (Xi − X) i=1 (Xi − X)
N
11
h i 1 X A2
E β̂0 Xi = E β0 + β1 − β̂1 X + U Xi = β0 +[β1 − E (β1 |Xi )] X+ E (Ui |Xi ) = β0
N i=1
Hence, the OLS estimators of β0 and β1 are unbiased
E β̂0 Xi = β0 and E β̂1 Xi = β1
4
Moreover, under the fifth assumption (A5), when the error term Ui is homoskedastic, the
variance of these estimators are:
" PN #
9 (X i − X)U
i
V ar β̂1 Xi = V ar β1 + Pi=1 X i
N 2
i=1 (X i − X)
PN
(Xi − X)2 V ar(Ui |Xi )
= i=1hP i2
N 2
i=1 (X i − X)
PN
(Xi − X)2
= σU2 hP i=1 i2
N 2
i=1 (Xi − X)
σU2 P 1 σU2
⇔ V ar β̂1 Xi = PN → (12)
i=1 (Xi − X)2 N V ar(X)
h i
11
V ar β̂0 Xi = V ar β0 + β1 − β̂1 X + U Xi
2
= X V ar β1 − β̂1 Xi + V ar U Xi
2
1
= X V ar β̂1 Xi + V ar (Ui |Xi )
N
2
12 2 σU 1
= X PN + σU2
i=1 (Xi − X)
2 N
PN 2 PN 2
2 i=1 X + i=1 Xi − X
= σU 2
N N
P
i=1 Xi − X
PN h 2 i
i=1 Xi − X + X − 2X Xi − X
= σU2 2
N N
P
i=1 Xi − X
PN
2
X2 2
P σU E (X )
2
⇔ V ar β̂0 Xi = σU PN i=1 i
2
→ (13)
2
N i=1 Xi − X N V ar(X)
and
σU2 1 σU2
V ar β̂1 Xi = PN ≈
i=1 (Xi − X)2 N V ar(X)
When the assumption A5 doesn’t hold, in large sample, the OLS estimators β̂0 and β̂1 have
a jointly normal sample distribution with the variances of this distribution as follows:
5
It can be seen that the variances of β̂0 and β̂1 converge to zero as N → ∞, so they are
consistent. When N is large, β̂0 and β̂1 are close to β0 and β1 with high probability. The
larger V ar(X) is, the smaller V ar(β̂1 ) is, which means that the more spread out the sample
of X is, the easier it is to trace out the relationship between Y and X. Moreover, the smaller
V ar(U ) is, the smaller V ar(β̂1 ) is. That implies if U are smaller, the data will have a tighter
scatter around the regression line so its slope will be estimated more accurately.
4 Goodness-of-fit
The Goodness-of-fit of the model is represented by R − squared which shows how well the
model could explain the deviations of the sample. It is the ratio between the explained
deviations and total deviations of the model. Defining following terms:
N
X 2
Total Sum of Squared: T SS = Yi − Y (15)
i=1
XN 2
Explained Sum of Squared: ESS = Ŷi − Y (16)
i=1
XN 2 N
X
Sum of Squared Residuals: SSR = Yi − Ŷi = Ûi2 (17)
i=1 i=1
6
where:
N
X XN N
X N
X
Yi − Ŷi Ŷi − Y = Ûi Ŷi − Y = −Y Ûi + Ûi β̂0 + β̂1 Xi
i=1 i=1 i=1 i=1
N N N
5,6
X X X
= −Y Ûi + β̂0 Ûi + β̂1 Ûi Xi = 0
i=1 i=1 i=1
hence,
N
X N
X 2 N
X 2
2
(Yi − Y ) = Yi − Ŷi + Ŷi − Y
i=1 i=1 i=1
⇔ T SS = ESS + SSR (18)
The Simple Linear Regression (SLR) model performs poorly in certain cases, especially when
one of its assumptions doesn’t hold:
• The SLR using OLS does poorly when there are outliers (A4 doesn’t hold)
• OLS doesn’t necessarily imply a causal effect, so without further assumptions the
meanings of the coefficient should be interpreted carefully
• The linear regression does poorly at very low/high level of the regressor as it depends
on the spread of the sample