Lecture 8
Lecture 8
Lecture 8
OLS Asymptotics
• Consider the population model:
• 𝑦 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ … + 𝛽𝑘 𝑥𝑘 + 𝑢
• Till now we have seen that given a sample size n, the OLS estimator 𝛽መ𝑗
has a sampling distribution
• The distribution comes from the estimates from all possible samples of size 𝑛
• The distribution is centred around 𝛽𝑗 , since under relevant assumptions
E 𝛽መ𝑗 = 𝛽𝑗
• What happens when 𝑛 → ∞
• First we consider go through the concepts of consistency for an
estimator
Definition: Consistent Estimator
• We will define this for a general case.
• Consider an estimator 𝜃መ𝑛 of a population parameter 𝜃. The estimator 𝜃መ𝑛 is
obtained from a sample of size n.
• The estimator 𝜃መ𝑛 is a consistent estimator of the parameter 𝜃 if 𝜃መ𝑛 converges to 𝜃
as n → ∞
• Note that 𝜃𝑛 is a random variable and has a probability distribution based on different samples
of size n
• Thus how do we define convergence in this case?
• We define 𝜃መ𝑛 to be a consistent estimator of 𝜃 if for all ε>0,
P(|𝜃መ𝑛 −𝜃| ≥ ε) →0 as n → ∞.
• It is denoted as 𝜃መ𝑛 →p𝜃 or plim 𝜃መ𝑛 =𝜃.
• We also call 𝜃 the probability limit of 𝜃መ𝑛
• This means that the probability that the random variable 𝜃መ𝑛 is very different from 𝜃
(that is |𝜃መ𝑛 −𝜃| is large) tends to 0 as n tends to infinity
OLS estimators are consistent estimators
• It can be shown that under the same assumptions as unbiasedness, 𝛽መ𝑗 is a
consistent estimator of 𝛽𝑗 .
• Thus plim 𝛽መ𝑗 = 𝛽𝑗
• The multivariate case is a little difficult and thus we prove it for the simple
regression model (1 variable).
Interpretation of consistency
• Given a sample size n, the OLS estimator 𝛽መ𝑗 has a sampling distribution with
mean 𝛽𝑗
• Consistency implies that this distribution becomes more and more tightly
distributed around 𝛽𝑗
• In limits, the distribution of 𝛽መ𝑗 converges to a point, 𝛽𝑗
• Assumptions for consistency:
• Assumption 1: Linear in parameters: The population model can be written as:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . . +𝛽𝑘 𝑥𝑘 + 𝑢
• Assumption 2: Random Sampling: We have a random sample {(𝑦𝑖 , 𝑥𝑖 ): i=1,…,n}
following the above population model.
• Assumption3: No Perfect Collinearity: In the sample (and therefore in the
population), none of the variables is a constant and there is no exact linear
relationship between the variables.
• Assumption 4: Zero Mean and Zero Correlation: 𝑐𝑜𝑣 𝑥𝑗 , 𝑢 = 0 𝑎𝑛𝑑 𝐸 𝑢 = 0
• Note that 𝛽1 is a constant. Thus we need to find the probability limit of the second term as n →
∞.
Weak Law of Large Numbers (WLLN)
• Before proceeding further we use a theorem which implies that as n tends to infinity,
we can replace the sample averages by their population quantities.
• Thus the numerator and the denominator converges to their respective population quantities
• This is called the weak law of large numbers
𝑗 −𝛽𝑗
𝛽
• (iii) ~𝑎 𝑁 0,1
𝜎/ 𝑆𝑆𝑇𝑗 1−𝑅𝑗2
𝑗 −𝛽𝑗
𝛽
~𝑎 𝑁(0,1)
ෝ / 𝑆𝑆𝑇𝑗 (1−𝑅𝑗2 )
𝜎
• This asymptotic normality is useful because the normality assumption has been
dropped.
• What the theorem given in the last slide says is that, regardless of the population
distribution of u, the OLS estimators, when properly standardized, have approximate
standard normal distributions.
• Notice how the standardized 𝛽መ𝑗 has an asymptotic standard normal distribution
whether we divide the difference 𝛽መ𝑗 − 𝛽𝑗 by 𝑠𝑑(𝛽መ𝑗 ) (which we do not observe because
it depends on 𝜎 2 ) or by se(𝛽መ𝑗 ) (which we can compute from our data because it
depends on 𝜎ො 2 ).
• In other words,
2
from
2
an asymptotic point of view it does not matter that we have to
replace 𝜎 with 𝜎ො .
• Of course, replacing 𝜎 2 with 𝜎ො 2 affects the exact distribution of the 𝛽መ𝑗 − 𝛽𝑗 .
𝑗 −𝛽𝑗
𝛽
• We saw earlier that under the classical linear model assumptions, 𝑗 |𝑋) has an exact
𝑗 −𝛽𝑗 𝑠𝑑(𝛽
𝛽
Normal(0,1) distribution and 𝑗 |𝑋) has an exact 𝑡𝑛−𝑘−1 distribution.
𝑠𝑒(𝛽
• However the asymptotic distribution is the same.
Large Sample Inference
• OLS asymptotically normal ⇒ in large samples we can use the usual t
and F statistics for inference without assuming normality of errors
• E.g.
• 𝐻𝑜 : 𝛽𝑗 = 𝛼𝑗 , 𝐻1 : 𝛽𝑗 ≠ 𝛼𝑗 at significance level α
• Under assumptions 1-5:
𝑗 −𝛽𝑗
𝛽
• t-statistic: t = ~𝑎 𝑁(0,1)
ෝ / 𝑆𝑆𝑇𝑗 (1−𝑅𝑗2 )
𝜎