ECON7310 Project1
ECON7310 Project1
ECON7310 Project1
I removed all subjects from the “Environment”, then changed the working directory to
the folder that I save the data file, and I loaded the package readr, finally I use the
read_csv () command to read.
I used summary (cbind ()) and sd () to compute those summary statistics and the
results showed as above.
Plot histograms of two variables were showed as above, used hist ().
There appears to be a mid-strong positive relationship, the scatter chart runs from left
to right in the shape of a trumpet.
Question 2:
(a).
The estimate is somewhat smaller: it has fallen to 0.087 grams from 0.090 grams, so
the regression in (c) may suffer from omitted variable bias.
Compared to (c):
̂ = 1.30 + 0.09 × 𝑒𝑑𝑢𝑐 + 0.01 × ℎ𝑟𝑠𝑤𝑘
𝐿𝑤𝑎𝑔𝑒 𝑅2 = 0.20
(0.11) (0.01) (0.00)
A female who has one more year of education in educ than average is estimated to earn
9% more than the average in Lwage. A female who has one more working hour per
week in hrswk than average is estimated to earn 1% more than the average in Lwage.
𝐻0: 𝛽1 = 0, 𝐻1: 𝛽1 ≠ 0
The p-value for the t-statistic is ≈ 0.000, which is smaller than 0.01, so reject H0.
Question 3:
(a).
As the R output shows as above, the 95% CI on educ is 0.08 to 0.11
The 95% confidence interval is the set of parameter values that cannot be rejected by
two-sided test at significance level of 𝛼.
We test 𝐻0 : 𝛽1 = 0.12 𝑎𝑡 𝛼 = 0.05 ,
We reject 𝐻0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000 < 𝛼 = 0.05
(b).
𝐿𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + 𝑢𝑖 , 𝑖 = 1 … 𝑛
where 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 is binary (𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 0 𝑜𝑟 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = 1)
𝛽2 = 𝐸[𝑌|𝑋 = 1] − 𝐸[𝑌|𝑋 = 0] = -0.21, which means holding other constant, from
the population difference in group means, female will earn 0.21 per hourly wage less
than male. As 𝛽𝑓𝑒𝑚𝑎𝑙𝑒 = 0.21, 𝛽𝑒𝑑𝑢𝑐 = 0.09, which the magnitude between two is
0.12, which means being a female has a 0.12 greater impact on wages than an extra year
of education.
(c).
As the corresponding p-value is ≈ 0.000. Therefore, the null hypothesis that the
coefficients on geographic location variables are jointly equal to zero is rejected at 1%
significance level.
(d).
No, we can’t obtain the OLS estimates, the problem is that the variable hs is ignored, it
would generate perfect multicollinearity, because hs has only one value of 0 for all
samples which means hs + 1 = 1, because hs is a constant, and this problem occurs if
all variables and this constant are regression, so we need to ignore this variable to obtain
OLS estimates.