Lecture 8

Lecture 8
OLS Asymptotics
• Consider the population model:
• 𝑦 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ … + 𝛽𝑘 𝑥𝑘 + 𝑢
• Till now we have seen that given a sample size n, the OLS estimator 𝛽መ𝑗
has a sampling distribution
• The distribution comes from the estimates from all possible samples of size 𝑛
• The distribution is centred around 𝛽𝑗 , since under relevant assumptions
E 𝛽መ𝑗 = 𝛽𝑗
• What happens when 𝑛 → ∞
• First we consider go through the concepts of consistency for an
estimator
Definition: Consistent Estimator
• We will define this for a general case.
• Consider an estimator 𝜃መ𝑛 of a population parameter 𝜃. The estimator 𝜃መ𝑛 is
obtained from a sample of size n.
• The estimator 𝜃መ𝑛 is a consistent estimator of the parameter 𝜃 if 𝜃መ𝑛 converges to 𝜃
as n → ∞
• Note that 𝜃෠𝑛 is a random variable and has a probability distribution based on different samples
of size n
• Thus how do we define convergence in this case?
• We define 𝜃መ𝑛 to be a consistent estimator of 𝜃 if for all ε>0,
P(|𝜃መ𝑛 −𝜃| ≥ ε) →0 as n → ∞.
• It is denoted as 𝜃መ𝑛 →p𝜃 or plim 𝜃መ𝑛 =𝜃.
• We also call 𝜃 the probability limit of 𝜃መ𝑛
• This means that the probability that the random variable 𝜃መ𝑛 is very different from 𝜃
(that is |𝜃መ𝑛 −𝜃| is large) tends to 0 as n tends to infinity
OLS estimators are consistent estimators
• It can be shown that under the same assumptions as unbiasedness, 𝛽መ𝑗 is a
consistent estimator of 𝛽𝑗 .
• Thus plim 𝛽መ𝑗 = 𝛽𝑗
• The multivariate case is a little difficult and thus we prove it for the simple
regression model (1 variable).
Interpretation of consistency
• Given a sample size n, the OLS estimator 𝛽መ𝑗 has a sampling distribution with
mean 𝛽𝑗
• Consistency implies that this distribution becomes more and more tightly
distributed around 𝛽𝑗
• In limits, the distribution of 𝛽መ𝑗 converges to a point, 𝛽𝑗
• Assumptions for consistency:
• Assumption 1: Linear in parameters: The population model can be written as:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . . +𝛽𝑘 𝑥𝑘 + 𝑢
• Assumption 2: Random Sampling: We have a random sample {(𝑦𝑖 , 𝑥𝑖 ): i=1,…,n}
following the above population model.
• Assumption3: No Perfect Collinearity: In the sample (and therefore in the
population), none of the variables is a constant and there is no exact linear
relationship between the variables.
• Assumption 4: Zero Mean and Zero Correlation: 𝑐𝑜𝑣 𝑥𝑗 , 𝑢 = 0 𝑎𝑛𝑑 𝐸 𝑢 = 0
• Under assumptions 1-4, 𝛽መ𝑗 is a consistent estimator of 𝛽𝑗

Proof: OLS Estimators are consistent estimators
• We will prove for the simple regression model. We can generalize this to the multivariate case.
• Consider the model: 𝑦 = 𝛽𝑜 + 𝛽1 𝑥 + 𝑢
• In this case
Σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑦𝑖 −𝑦ത Σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑦𝑖 Σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ (𝛽𝑜 +𝛽1 𝑥𝑖 +𝑢𝑖 ) Σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑢𝑖
• 𝛽መ1 = = = = 𝛽1 +
Σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
2 Σ𝑛𝑖=1 𝑥𝑖 −𝑥ҧ
2
1 𝑛
Σ
𝑛 𝑖=1 𝑖
𝑥 −𝑥ҧ 𝑢𝑖
• = 𝛽1 + 1 𝑛
Σ𝑖=1 𝑥𝑖 − 𝑥ҧ 2
𝑛
1 𝑛
Σ 𝑥 𝑢 − 𝑥ഥ ҧ𝑢
𝑛 𝑖=1 𝑖 𝑖
• = 𝛽1 + 1 𝑛 2 −𝑥ҧ 2
Σ 𝑖=1 𝑥 𝑖
1 𝑛𝑛 1 𝑛 1 𝑛
Σ
𝑛 𝑖=1
𝑥𝑖 𝑢 𝑖 − Σ
𝑛 𝑖=1
𝑥 𝑖 × Σ 𝑢
𝑛 𝑖=1 𝑖
• = 𝛽1 + 1 𝑛 1 𝑛 2 … … (1)
2
Σ 𝑥 − 𝑛Σ𝑖=1 𝑥𝑖
𝑛 𝑖=1 𝑖
• Note that 𝛽1 is a constant. Thus we need to find the probability limit of the second term as n →
∞.
Weak Law of Large Numbers (WLLN)
• Before proceeding further we use a theorem which implies that as n tends to infinity,
we can replace the sample averages by their population quantities.
• Thus the numerator and the denominator converges to their respective population quantities
• This is called the weak law of large numbers
• Weak Law of Large Numbers (WLLN): Let 𝑍1 , 𝑍2 , … . 𝑍𝑛 be independent, identically

distributed random variables with expectation 𝜇. Thus 𝐸 𝑍1 = 𝐸 𝑍2 = 𝐸 𝑍𝑛 = 𝜇.
Assume 𝜇 < ∞. Then
1 𝑛
𝑝𝑙𝑖𝑚 Σ𝑖=1 𝑍𝑖 = 𝜇
𝑛
• Using WLLN, (1) can be written as:
𝐸 𝑥𝑢 −𝐸 𝑥 𝐸(𝑢) 𝑐𝑜𝑣(𝑥,𝑢)
መ
• 𝑝𝑙𝑖𝑚 𝛽1𝑛 = 𝛽1 + = 𝛽1 +
𝐸(𝑥 2 )− 𝐸(𝑥) 2 𝑣𝑎𝑟(𝑥)
• Zero conditional mean assumption:
• 𝐸 𝑢 𝑥 = 0 …..(2)
• Whenever (2) holds:
• 𝐸 𝑥𝑢 = 𝐸 𝐸 𝑥𝑢 𝑥 = 𝐸 𝑥𝐸 𝑢 𝑥 = 0……(3)
• 𝐸 𝑢 = 𝐸 𝐸 𝑢 𝑥 = 0……..(4)
• Under (3) and (4)
• 𝑐𝑜𝑣 𝑥, 𝑢 = 𝐸 𝑥𝑢 − 𝐸 𝑥 𝐸 𝑢 = 0
• Note that (2) implies (3) and (4). Thus whenever (2) holds (3) and (4) should
necessarily hold true.
• However (2) might not necessarily hold true whenever (3) and (4) hold true
• (3) and (4) are weaker conditions than (2)
• Note that, we don’t need the stronger condition (2) for consistency. If we assume (3)
and (4):
• 𝑝𝑙𝑖𝑚 𝛽መ1 = 𝛽1
Asymptotic Normality and Large Sample
Inference
• For hypothesis testing, we need the sampling distribution of the OLS
estimators.
• Under the classical linear model assumptions (including the normality
assumptions), we can show that the sampling distributions are normal.
• This result is the basis for deriving the t and F distributions that we use so often in applied
econometrics.
• If the errors 𝑢1 , 𝑢2 , … . . , 𝑢𝑛 are random draws from some distribution other
than the normal, the 𝛽መ𝑗 will not be normally distributed, which means that the
procedure used previously for hypothesis testing would not work.
• The normality assumption also implies the distribution of y given
𝑥1 , 𝑥2 , … . . , 𝑥𝑘 is normal.
• Because y is observed and u is not, in a particular application, it is much easier to think
about whether the distribution of y is likely to be normal.
• In fact, in many cases, y might not have a conditional normal distribution.
• Suppose the dependent variable is the number of arrests of young men during
a particular year.
• In the population, most men are not arrested during the year, and the vast majority are
arrested one time at the most.
• Because the dependent variable takes on only two values for most of the sample, it
cannot be close to being normally distributed in the population.
• Does this mean that, in such analyses, we must abandon the t statistics for
determining which variables are statistically significant?
• Fortunately, the answer to this question is no.
• Even though the 𝑦𝑖 are not from a normal distribution, we can conclude that the OLS
estimators satisfy asymptotic normality under some conditions, which means they are
approximately normally distributed in large enough sample sizes.
Asymptotic normality
• Let {𝑍𝑛 : n = 1,2, … . . } be a sequence of random variables, such that for all
numbers z,
P(𝑍𝑛 < 𝑧) →Φ(𝑧) as n → ∞
• where Φ(𝑧) is the standard normal cumulative distribution function.
• Then, 𝑍𝑛 is said to have an asymptotic standard normal distribution.
• In this case, we often write 𝑍𝑛 ~𝑎 𝑁(0,1) (The “a” above the tilde stands for
“asymptotically”.)
• Assumptions for asymptotic normality:
• Assumption 1: Linear in parameters: The population model can be
written as:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . . +𝛽𝑘 𝑥𝑘 + 𝑢
• Assumption 2: Random Sampling: We have a random sample
{(𝑦𝑖 , 𝑥𝑖 ): i=1,…,n} following the above population model.
• Assumption 3: No Perfect Collinearity: In the sample (and therefore
in the population), none of the variables is a constant and there is no
exact linear relationship between the variables.
• Assumption 4: Zero Conditional Mean: 𝐸 𝑢 𝑥1 , 𝑥2 , … . , 𝑥𝑘 = 0
• Assumption 5: Homoskedasticity: 𝑉𝑎𝑟 𝑢 𝑥1 , 𝑥2 , … … , 𝑥𝑛 = 𝜎 2
• Under the assumptions given in the last slide:
• (i) 𝑛 𝛽መ𝑗 − 𝛽𝑗 ~𝑎 𝑁(0, 𝜎 2 /𝑎𝑗2 ), where 𝜎 2 /𝑎𝑗2 is the asymptotic variance of

𝑛 𝛽መ𝑗 − 𝛽𝑗 ; for the slope coefficient 𝑎𝑗2 = 𝑝𝑙𝑖𝑚 𝑛−1 Σi=1𝑛
Ƹ 2 , where 𝑟𝑖𝑗
𝑟𝑖𝑗 Ƹ are
the residuals from the regression of 𝑥𝑗 on the other independent variables.
• (ii) 𝜎ො 2 is a consistent estimator of 𝜎 2
෡𝑗 −𝛽𝑗
𝛽
• (iii) ~𝑎 𝑁 0,1
𝜎/ 𝑆𝑆𝑇𝑗 1−𝑅𝑗2
෡𝑗 −𝛽𝑗
𝛽
~𝑎 𝑁(0,1)
ෝ / 𝑆𝑆𝑇𝑗 (1−𝑅𝑗2 )
𝜎
• This asymptotic normality is useful because the normality assumption has been
dropped.
• What the theorem given in the last slide says is that, regardless of the population
distribution of u, the OLS estimators, when properly standardized, have approximate
standard normal distributions.
• Notice how the standardized 𝛽መ𝑗 has an asymptotic standard normal distribution
whether we divide the difference 𝛽መ𝑗 − 𝛽𝑗 by 𝑠𝑑(𝛽መ𝑗 ) (which we do not observe because
it depends on 𝜎 2 ) or by se(𝛽መ𝑗 ) (which we can compute from our data because it
depends on 𝜎ො 2 ).
• In other words,
2
from
2
an asymptotic point of view it does not matter that we have to
replace 𝜎 with 𝜎ො .
• Of course, replacing 𝜎 2 with 𝜎ො 2 affects the exact distribution of the 𝛽መ𝑗 − 𝛽𝑗 .
෡𝑗 −𝛽𝑗
𝛽
• We saw earlier that under the classical linear model assumptions, ෡𝑗 |𝑋) has an exact
෡𝑗 −𝛽𝑗 𝑠𝑑(𝛽
𝛽
Normal(0,1) distribution and ෡𝑗 |𝑋) has an exact 𝑡𝑛−𝑘−1 distribution.
𝑠𝑒(𝛽
• However the asymptotic distribution is the same.
Large Sample Inference
• OLS asymptotically normal ⇒ in large samples we can use the usual t
and F statistics for inference without assuming normality of errors
• E.g.
• 𝐻𝑜 : 𝛽𝑗 = 𝛼𝑗 , 𝐻1 : 𝛽𝑗 ≠ 𝛼𝑗 at significance level α
• Under assumptions 1-5:
෡𝑗 −𝛽𝑗
𝛽
• t-statistic: t = ~𝑎 𝑁(0,1)
ෝ / 𝑆𝑆𝑇𝑗 (1−𝑅𝑗2 )
𝜎
• Reject 𝐻𝑜 if |t| is greater than the critical value corresponding to the

𝛼
100 − percentile of a standard normal distribution
2
Appendix
Central limit theorem
• Let {𝑌1 , 𝑌2 , … . . 𝑌𝑛 } be a iid random variables with mean 𝜇 and 𝜎 2 .
Then,
𝑌ത𝑛 − 𝜇 𝑛 𝑌ത𝑛 − 𝜇
𝑍𝑛 = =
𝜎/ 𝑛 𝜎
• has an asymptotic standard normal distribution
• Practically, we often treat 𝑌ത𝑛 as being asymptotically normally
distributed with mean m and variance 𝜎/ 𝑛

Lecture 8

Uploaded by

Copyright:

Available Formats

Lecture 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8

Uploaded by

Copyright:

Available Formats

Lecture 8

• Under assumptions 1-4, 𝛽መ𝑗 is a consistent estimator of 𝛽𝑗

• Weak Law of Large Numbers (WLLN): Let 𝑍1 , 𝑍2 , … . 𝑍𝑛 be independent, identically

• (i) 𝑛 𝛽መ𝑗 − 𝛽𝑗 ~𝑎 𝑁(0, 𝜎 2 /𝑎𝑗2 ), where 𝜎 2 /𝑎𝑗2 is the asymptotic variance of

• (ii) 𝜎ො 2 is a consistent estimator of 𝜎 2

• Reject 𝐻𝑜 if |t| is greater than the critical value corresponding to the

You might also like