Assignment3SolNew_Fall2024 (1)
Assignment3SolNew_Fall2024 (1)
Assignment3SolNew_Fall2024 (1)
Sahar Parsa
Fall 2024
The solution of this assignment will be released on Friday, September 27th 2024. You will work on algebraic
properties of the OLS estimators. You are encouraged to discuss your solutions with your colleagues on
Piazza. However, you should also work on your own solutions, since it will be great practice for your exam.
It’s also good practice to report nice regression results, not simply copying down the tables generated by R.
Be clear and concise in your answers.
Question 1
Define the three assumptions to derive the OLS estimator properties. Choose one and discuss whether it is a
plausible assumption. Give a specific example where the assumption might be violated.
Answer
The three assumptions to derive the properties of the OLS estimator are
1. Mean Independence: E[ε|x]=0
2. The sample is i.i.d
3. The fourth moments are finite, namely E[Xi4 ] < ∞ and E[Yi4 ] < ∞
We also saw one more assumption in class, i.e., we need variation in X. Without variation in X, the β is not
well defined as β = Cov(Y, X)/V ar(X) and if V ar(X) = 0, then β is undefined. We will focus on the three
main assumptions in this question.
In many cases the mean independence assumption is violated. A particular example is when researchers
run regressions of wages on explanatory variables. In particular, suppose the researcher ran the following
regression:
Question 2
PN
2 Ẑi2 2
In your assignment, you are asked to show that σ̂Z = i
N −2 is an unbiased and consistent estimator for σZ .
Answer
In this question, we will accept intuitive answers.
We know from the estimator for the variance of a random variable that we need to divide by N − 1 to get an
unbiased estimator of the variance. This is because we are losing one degree of freedom. We need to first
1
estimate the population mean to estimate the population variance. When N is large, it doesn’t matter. But
lets show this result formally:
N N
E[ (Xi − X̄)2 ] = E[
X X
(Xi2 − N X̄ 2 ] =
i i
N
E[Xi2 ] − E[N X̄ 2 ] = N (σX
X
2
+ µ2X ) − N ((σX
2
/N ) + µ2X ) =
i
2
σX (N − 1)
Hence: PN
i (Xi − X̄)2
N −1
would be an unbiased estimator of σX
The principle is the same for the variance of εi . Note that
PN 2 PN
2 i=1 ε̂i (Yi − α̂ − β̂Xi )2
σ̂ε = = i=1
N −2 N −2
where both α̂ and β̂ are estimated through OLS, so we lose two degrees of freedom when we estimate σ̂ε2 .
Question 3
Show that
SST = SSE + SSR
X X X X
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi )2 + 2 (Yi − Ŷi )(Ŷi − Ȳ )
i i i i
It just remains to show that the last term on the right hand side is zero.
Noting that Ŷi = α̂ + β̂Xi , Ȳ = α̂ + β̂ X̄, and that Yi − Ŷi = (Yi − Ȳ ) − (Ŷi − Ȳ ) we have that the last term
is equal to
X
2 [(Yi − Ȳ ) − β̂(Xi − X̄)] ∗ β̂(Xi − X̄)
i
This is equal to
X
2β̂ [(Xi − X̄)(Yi − Ȳ ) − (Xi − X̄)2 β̂]
i
P
X X j (Xj − X̄)(Yj − Ȳ )
2β̂( (Xi − X̄)(Yi − Ȳ ) − (Xi − X̄)2 ∗ P 2
)
i i j (Xj − X̄)
2
Cancelling terms this finally reduces to
X X
2β̂( (Xi − X̄)(Yi − Ȳ ) − (Xj − X̄)(Yj − Ȳ )) = 2β̂(0) = 0
i j
There is more than one way to answer this question. I will accept all answers.
Question 4
1. Explain the differences between these three equations:
Ŷ = αOLS + βOLS X
Y = α + βX + ε
Y = αOLS + βOLS X + e
Answer The first equation represents estimate of Y using the factor X with population regression model.
The second equation represents the true population regression model, with ε to measure the error between
estimated Ŷ and population data Y . The third equation is the OLS regression model. More specifically, this
equation represents an OLS estimation of the population regression model.
2. What is the difference between e and ε from the previous question?
Answer We cannot observe ε in equation 2. Instead, we use e to approximate ε.
3. Suppose you consider a model where Y = βX + ε, what is the consequence for ē? What about SeX ?
Explain your answer intuitively or algebraically.
Answer Intuitively, we are setting α = 0, which leaves us less flexibility to minimize our SSR compared to
the case when we have an intercept. Thus, we will expect our the ê2 for each sample to be higher, since
we put more restrictions on our model right now. Moreover, since we arbitrarily set alpha equals to 0, the
Cov(X, ε) may not be 0 right now. In application, setting α = 0 can sometimes make the estimator more
significant. However, it does not always make sense to set α = 0. Consider an example of the association
between wheat harvesting and precipitation of the year. Even without precipitation, we still would expect
some level of harvesting of wheat, making the setting α = 0 not plausible.
Algebraically, we do a minimization problem for the model without intercept like we did for a OLS model
with intercept:
Yi = β̂X + e
X
min (yi − β̂xi )2
Take first order condition:
n
X
−2 (yi − β̂xi )xi = 0
i=1
n
X
−2 ei xi = 0
i=1
.
Solve the condition, we get:
3
Pn
xi yi
β̂ = Pi=1
n 2
i=1 xi
Pn Pn Pn
However,
Pn even if we want to make −2 i=1 ei xi = 0, it is not very plausible because i=1 ei = i=1 yi −
xi yi Pn Pn
( Pi=1
n
x2
) i=1 xi . Therefore, i=1 ei ̸= 0. Thus, we cannot reach the minimization of e2 even if we want
i=1 i
to.
4. Explain what a sampling distribution is and why an estimator has a sampling distribution. Sampling
distribution is a probability distribution of a statistic obtained from a larger number of samples drawn
from a specific population. The sampling distribution of a given population is the distribution of
frequencies of a range of different outcomes that could possibly occur from a statistic of a population.
The reason why an estimator has a sample distribution is that the samples are chosen randomly. For each
different set of samples, we will get different estimates (realized values of the estimator) for the same estimator.
So an estimator is also a random variable with a corresponding probability distribution. That’s the reason
why an estimator has a sampling distribution.
Question 5
Suppose you are interested by the relationship between earnings and the number of years of education:
Earningsi = α + β × Educi + εi ,
a. Explain what εi is. What is included in εi ?
Answer εi is any other factors that would affect Earnings another than the years of education. It is
unobservable error term representing the gap between the predicted earning by years of education and real
earnings.
Any other factors can be included, such as health condition, IQ, EQ, family connection, college location, etc.
b. Do you think it is likely that E(εi |Educi ) ̸= 0? Explain.
Yes. It is very likely that E(εi |Educi ) ̸= 0. It is because we can easily imagine that factors such as health
condition, family background are associated with years of education. Thus the conditional expectation of εi
is highly likely not equal to zero. Or in other words, Cov(Educ, εi ) ̸= 0.
c. If the assumption is not satisfied, what is the consequence in terms of the properties of the OLS
estimator of α? what about the properties of β?
The α and β are biased. Below are proof and an example showing that they are biased.
Proof First, we prove that β̂ is biased.
We know that: Pn
(x − x̄)(yi − ȳ)
β̂OLS = Pn i
i=1
2
(1)
i=1 (xi − x̄)
And:
Yi = α + βXi + ε(2)
Ȳ = α + β X̄ + ε̄(3)
Subtracting (2) from (1):
Yi − Ȳ = β(Xi − X̄) + (εi − ε̄)(4)
Plug (4) into (1), we get:
Pn
εi (xi − x̄)
β̂OLS = β + Pi=1
n 2
i=1 (xi − x̄)
4
From law of iterated expectation, we have:
Pn Pn Pn
εi (xi − x̄) εi (xi − x̄) E[ε |x](xi − x̄)
i=1
E[β̂OLS ] = E[β] + E[ Pn 2
i=1
] = β + E[E[ Pn 2
Pn i
|x]] = β + E[ i=1 2
]
i=1 (xi − x̄) i=1 (xi − x̄) i=1 (xi − x̄)
With α arbitrarily set 0, E[εi |x] does NOT necessarily equal to 0. Thus, E[β̂OLS ] ̸= β. We have that β is
biased.
Next, we prove that αOLS is biased.
αOLS = Ȳ − β̂OLS X̄
E[αOLS ] = E[Ȳ ] − E[β̂OLS X̄]
Pn
E[ε |x](xi − x̄)
E[αOLS ] = E[Ȳ ] − E[(β + i=1Pn i 2
)X̄]
i=1 (xi − x̄)
Pn
E[ε |x](xi − x̄)
Pn i
E[αOLS ] = E[Ȳ ] − βE[X] − E[( i=1 2
)X̄]
i=1 (xi − x̄)
We can see that the last term does NOT necessarily equal to 0 since E[εi |x] does NOT necessarily equal to 0.
Thus, αOLS is also biased.
An simple Example: Suppose we have a multivariate population regression model Y = α + βX + γZ + ε, and
suppose there is an association between Z and X that Z = d + f X + u. Then, if we absorb Z into the ε of
our original population regression, we get:
This example is showing that, if we ignore a factor Z that is associated with X, we will get biased estimator
for X. Here, α′ = α + γd and β′ = β + γf .
Question 6
Consider the following population linear regression model: Yi = α + βXi + εi .
Give the formula for the OLS estimator of β. Explain how the formula is being derived. What is the intuition
behind the OLS estimator?
Cov(X, Y )
β̂ =
V ar(X)
This is derived from solving the minimization problem below:
n
X
min (yi − α̂ − β̂xi )2
i=1
Pn
Let W = i=1 (yi − α̂ − β̂Xi )2
First, we take First Order Condition with respect to both α and β.
n
∂w X
= −2 (yi − α̂ − β̂xi ) = 0(1)
∂ α̂ i=1
n
∂w X
= −2 (yi − α̂ − β̂xi )xi = 0(2)
∂ β̂ i=1
Then, we solve these two conditions for α and β from equation 1 and 2.
5
From equation (1), we have:
n
X
(yi − α̂ − β̂xi ) = 0
i=1
n
X n
X n
X
yi − α̂ − β̂xi = 0
i=1 i=1 i=1
Pn Pn
As i=1 yi = nȳ and i=1 xi = nx̄. We have:
n
X
(yi − (ȳ − β̂ x̄) − β̂xi )xi = 0
i=1
n
X
[yi xi − (ȳ − β̂ x̄)xi − β̂x2i ] = 0
i=1
n
X n
X n
X n
X
yi xi − ȳ xi − β̂ x̄ xi − β̂ x2i = 0
i=1 i=1 i=1 i=1
Pn Pn
xi yi − nx̄ȳ (x − x̄)(yi − ȳ) Cov(X, Y )
i=1
β̂ = Pn 2 2
Pn i
= i=1 2
=
i=1 xi − nx̄ i=1 (xi − x̄) V ar(X)
Question 7
Which of the following can cause OLS estimators to be biased?
(i) The variance of the population linear regression model error term depends on X.
(ii) Omitting an important variable.
(iii) A sample where Xi and Xj is not independent.
Solution:
Only (ii), omitting an important variable can cause bias in the OLS estimators, and this is true only when
the variable we are omitting is correlated with the included regressor. This is related to assumption A1
(conditional mean zero assumption). If the variable is uncorrelated with the regressor, the OLS estimator is
not biased (remember the proof for the unbiased estimator, we needed E(ε|X) = 0). We didn’t use or need
any of the other points (i) and (iii) to show that our estimator was unbiased. The scenario in (i) and (ii) will
affect the standard error of the estimator.
Data Question
Follow up with the dataset you downloaded in the previous question, i.e., 2016 CPS, which contains
observations on weekly earnings, sex, age, race, and education for respondents aged 25-64.
a. Define the univariate population regression model formally.
6
Answer
Given a dataset (Yi , Xi ) the univariate population regression model corresponds to the estimates (α̂, β̂) which
solve the following minimization problem
X
minα,β ε2
i
where εi = Yi − a − bXi
b. Run the regression and interpret the economic significance of the coefficient.
Answer
library(tidyverse)
##
## Please cite as:
##
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(haven)
#Log Income
subset_data[subset_data==0] <- NA
subset_data=drop_na(subset_data)
subset_data['logincome']=log(subset_data$earnwke)
#Mapping
newvals_grade92 <- c('31'=0,'32'=3,'33'=6,'34'=8,'35'=9,
'36'=10,'37'=11,'38'=12,'39'=12,'40'=14,
7
'41'=14,'42'=14,'43'=16,'44'=17,'45'=20,'46'=22)
subset_data['yrs_ed']=newvals_grade92[as.character(subset_data$grade92)]
# Elasticity Method
ols <- lm(logincome ~ yrs_ed, data=subset_data)
stargazer(ols,type = 'text')
##
## ================================================
## Dependent variable:
## ----------------------------
## logincome
## ------------------------------------------------
## yrs_ed 0.103***
## (0.003)
##
## Constant 5.180***
## (0.037)
##
## ------------------------------------------------
## Observations 11,025
## R2 0.130
## Adjusted R2 0.130
## Residual Std. Error 0.719 (df = 11023)
## F Statistic 1,642.860*** (df = 1; 11023)
## ================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#Compute Mean of X and Y
sapply(subset_data,mean)
## [1] 0.2205711
Economic Significance: The elasticity evaluated at X̄/Ȳ is χY X = 0.103 ∗ 14.251429/6.654985 = 0.2205711.
A one percentage change in education year is associated with a 0.22 percentage change in log wage (or the
weekly income change rate).
# Standardization Method
sapply(subset_data,sd)
## [1] 0.3585958
Economic Significance: A one standard deviation change in education year is associated to a 0.35 standard
deviation change in log wage (or the weekly income change rate).
8
Alternatively, one can just regress wage (not log wage) on education year and get that a one percentage
change in education year is associated with a 1.47 percentage change in wage. Or using the standardization
method, a one standard deviation change in education year is associated to a 0.41 standard deviation change
in wage.
# Elasticity Method
ols <- lm(earnwke ~ yrs_ed, data=subset_data)
stargazer(ols,type = 'text')
##
## ================================================
## Dependent variable:
## ----------------------------
## earnwke
## ------------------------------------------------
## yrs_ed 102.887***
## (2.201)
##
## Constant -473.941***
## (31.918)
##
## ------------------------------------------------
## Observations 11,025
## R2 0.165
## Adjusted R2 0.165
## Residual Std. Error 620.343 (df = 11023)
## F Statistic 2,185.242*** (df = 1; 11023)
## ================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#Compute Mean of X and Y
sapply(subset_data,mean)
## [1] 1.477605
# Standardization Method
sapply(subset_data,sd)
## [1] 0.4067512