ECON7310 Project1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Question 1:

I removed all subjects from the “Environment”, then changed the working directory to
the folder that I save the data file, and I loaded the package readr, finally I use the
read_csv () command to read.

I used summary (cbind ()) and sd () to compute those summary statistics and the
results showed as above.

Plot histograms of two variables were showed as above, used hist ().
There appears to be a mid-strong positive relationship, the scatter chart runs from left
to right in the shape of a trumpet.

Question 2:
(a).

̂ = 1.61 + 0.09 × 𝑒𝑑𝑢𝑐,


𝐿𝑤𝑎𝑔𝑒 𝑅2 = 0.18
(0.09) (0.01)
(b).
(c).

̂ = 1.61 + 0.09 × 𝑒𝑑𝑢𝑐,


𝐿𝑤𝑎𝑔𝑒 𝑅2 = 0.18
(0.09) (0.01)
A female who has one more year of education in educ than average is estimated to earn
9% more than the average in Lwage.
𝐻0: 𝛽1 = 0, 𝐻1: 𝛽1 ≠ 0
The p-value for the t-statistic is ≈ 0.000, which is smaller than 0.01, so reject H0.
(d).
For omitted variable bias to occur, the omitted variable working hours per week must
satisfy the following two conditions:
1. working hours per week is a determinant of Lwage (i.e., working hours per week is
part of u); and
2. working hours per week is correlated with the regressor Educ (i.e., corr (working
hours per week, Educ) ≠ 0)
Both conditions must hold for the omission of working hours per week to result in
omitted variable bias, i.e., OLS estimators are biased and inconsistent.
Suppose the true model is 𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 𝑍𝑖 + 𝑒𝑖
Where 𝑍𝑖 is the working hours per week in district i.
Assuming the following stories are reasonable:
As the years of education (educ) increase, ln(wage) (Lwage) will also increase, that is
𝛽1 > 0 ; 𝑍𝑖 increases mean the working hours per week increase will also cause
ln(wage) (Lwage) increase,indeed 𝛽2 > 0, 𝑍𝑖 increases also cause 𝑒𝑑𝑢𝑐𝑖 decrease,
so corr (𝑍𝑖 , 𝑒𝑑𝑢𝑐𝑖 ) < 0
If the equation without 𝑍𝑖 is estimated, the effect of 𝑍𝑖 on log(wage) (Lwage) will be
partially absorbed into the effect of educ on Lwage.
That is, OLS estimate for 𝛽1 will overestimate the effect of educ on Lwage.
(e).

The estimate is somewhat smaller: it has fallen to 0.087 grams from 0.090 grams, so
the regression in (c) may suffer from omitted variable bias.
Compared to (c):
̂ = 1.30 + 0.09 × 𝑒𝑑𝑢𝑐 + 0.01 × ℎ𝑟𝑠𝑤𝑘
𝐿𝑤𝑎𝑔𝑒 𝑅2 = 0.20
(0.11) (0.01) (0.00)
A female who has one more year of education in educ than average is estimated to earn
9% more than the average in Lwage. A female who has one more working hour per
week in hrswk than average is estimated to earn 1% more than the average in Lwage.
𝐻0: 𝛽1 = 0, 𝐻1: 𝛽1 ≠ 0
The p-value for the t-statistic is ≈ 0.000, which is smaller than 0.01, so reject H0.
Question 3:
(a).
As the R output shows as above, the 95% CI on educ is 0.08 to 0.11
The 95% confidence interval is the set of parameter values that cannot be rejected by
two-sided test at significance level of 𝛼.
We test 𝐻0 : 𝛽1 = 0.12 𝑎𝑡 𝛼 = 0.05 ,
We reject 𝐻0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000 < 𝛼 = 0.05
(b).
𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + 𝑢𝑖 , 𝑖 = 1 … 𝑛
where 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 is binary (𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 0 𝑜𝑟 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 1)
𝛽2 = 𝐸[𝑌|𝑋 = 1] − 𝐸[𝑌|𝑋 = 0] = -0.21, which means holding other constant, from
the population difference in group means, female will earn 0.21 per hourly wage less
than male. As 𝛽𝑓𝑒𝑚𝑎𝑙𝑒 = 0.21, 𝛽𝑒𝑑𝑢𝑐 = 0.09, which the magnitude between two is
0.12, which means being a female has a 0.12 greater impact on wages than an extra year
of education.

(c).

As the corresponding p-value is ≈ 0.000. Therefore, the null hypothesis that the
coefficients on geographic location variables are jointly equal to zero is rejected at 1%
significance level.
(d).

𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑏𝑙𝑎𝑐𝑘𝑖 + 𝛽2 𝑎𝑠𝑖𝑎𝑛 + 𝑢𝑖


𝐻0 : 𝛽1 = 𝛽2 , 𝐻1 : 𝛽1 ≠ 𝛽2
As the corresponding p-value is = 0.68 . Therefore, we don’t reject 𝐻0 at 1%
significance level.
(e).

𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + 𝛽3 (𝑒𝑑𝑢𝑐𝑖 × 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 ) + 𝑢𝑖


As 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 0, 𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝑢𝑖
As 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 1,
𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 + 𝛽3 (𝑒𝑑𝑢𝑐𝑖 × 1) + 𝑢𝑖
𝐿𝑤𝑎𝑔𝑒𝑖 = (𝛽0 + 𝛽2 ) + (𝛽1 + 𝛽3 ) × 𝑒𝑑𝑢𝑐𝑖 + 𝑢𝑖
(f).

As R output shows, predict her hourly wage will be 11.19.


Question 4
(a).
As a result, the sample means of hourly wage for each of the four education categories
are 12.89, 15.99, 27.84 and 19.17 which showed above (we only take the conclusion
that the group is equal to 1).
(b).

No, we can’t obtain the OLS estimates, the problem is that the variable hs is ignored, it
would generate perfect multicollinearity, because hs has only one value of 0 for all
samples which means hs + 1 = 1, because hs is a constant, and this problem occurs if
all variables and this constant are regression, so we need to ignore this variable to obtain
OLS estimates.

̂ = 15.99 − 3.11 × 𝑙𝑡_ℎ𝑠 ,


𝑊𝑎𝑔𝑒 𝑅2 = 0.17
(0.49) (1.11)
̂ = 15.99 + 11.85 × ℎ𝑠 ,
𝑊𝑎𝑔𝑒 𝑅2 = 0.17
(0.49) (0.95)
̂ = 15.99 + 3.18 × 𝑠𝑜𝑚_𝑐𝑜𝑙 ,
𝑊𝑎𝑔𝑒 𝑅2 = 0.17
(0.49) (0.83)

You might also like