Econometrics Assignment 3

EconometricsAssignment3
Reza Brianca
October 31, 2016
Part 1
Use the data in meap00_01.RData to answer this question. The original source of this data set is Michigan Department of Education [source]
(www.michigan.gov/mde).
a. Estimate the model
math4 = β 0 + β 1 lunch + β 2 log(enroll) + β 3 log(exppp) + u
by OLS and obtain the usual standard errors and the robust standard errors. How do they generally compare?
The normal regression result would give us
math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

(19.962) (0.015) (0.94) (2.098)
And the robust regression result would give us

[23.087] [0.016] [1.13] [2.353]
This model has R2 = 0.373 and Adjusted R2 = 0.372 with residual standard error = 15.302 and degree of freedom = 1688. These two

models provided the same estimation of coefficient and the usual standard error were less than robust standard error in general, although for
lunch coefficient (β 1 ) the difference was only 0.001. This was a general result due to robust standard errors needs to give tolerance for
possibility of heteroskedasticity.
b. Apply the White test for heteroskedasticity. What is the value of the F test? What do you conclude?
In this analysis, we would have H0 that the model was homoskedastic. We could apply white test by modifying BreushPagan test with using
squares and cross products of all variables in the model
The white test for heteroskedasticity result would have very low pvalue and F = 229.78. Therefore, we could reject H0 (meaning this model was
not homoskedastic).
2
c. Obtain gî as the fitted values from the regression log (u
î ) on math4
2 ^
i , math4 i , where math4 i are the
^ ^
OLS fitted values and the uî are the OLS residuals. Let h
^
i = exp(g i ). Use the hi to obtain WLS estimates. Are
^ ^
there big differences with the OLS coefficients?
From point (a), regular OLS would provide result as follows:

(19.962) (0.015) (0.94) (2.098)
And the WLS would provide result as follows:

(16.51) (0.015) (0.83) (1.685)
There were several differences between WLS and OLS although all of the coefficients sign remain as they were and all of the variables were
statistically significant. However, the coefficient values changed as follows:
1. The intercept decrease from 91.932 to 50.478
2. The coefficient lenroll increased from 5.399 to 2.647 implying that increasing school enrollment would have less impact to math student
satisfactory
3. The coefficient lexppp increased from 3.525 to 6.474 implying that increasing expenditure per enrollment would have less impact to math
student satisfactory
4. The coefficient for lunch had very small change (less than 0.001)
5. lexppp became highly statistically significant compared to OLS model
d. Obtain the standard errors for WLS that allow misspecification of the variance function. Do these differ much
from the usual WLS standard errors?
The regular WLS regression result was as follows:

(16.51) (0.015) (0.83) (1.685)
And the robust WLS regression result was as follows:

[18.925] [0.014] [1.054] (1.812)
The robust standard error were larger than regular standard error. Notice that lenroll now had less significant level (pvalue < .05) compared
to previous model (pvalue < .01).
e. For estimating the effect of spending on math4, does OLS or WLS appear to be more precise?
Answer: WLS appeared to have better accuration than OLS. This could be implied by the significance level of lexppp in OLS was at 9.31%
while in WLS it became 0.012%. This was also observable by looking at the comparing the robust model. lexpp would have 13.44% significant
level in OLS and 0.036% in WLS.
Part 2
Use the data in nbasal.RData to answer this question. A regression equation is needed to study the factors that influence salaries of NBA
players. We use log(wage) as the dependent variable. Potential factors that will affect a player’s wage include experience (exper, coll),
games’ participation (games, avgmin ), position (f orward, center, guard ), performance (points, rebounds, assists ), prestige (
draf t, allstar), and demographic factors ( black, children, marr ). For this question, it is fine if you only consider these variables in the level
form.
a. Identify if there is any multicollinearity problem
Out of 22 variables provided in nbasal.RData, there were 8 variables that contribute to multicollinearity namely
wage, exper, age, minutes, guard, avgmin, agesq and marr ∗ black
If we performed correlation test after removed the missing values, we would get several high correlated independent variables among these 8
variables with the remaining 14 variables as follows (top 3 only):
1. Variable age with exper (r(238) = 0.94, pvalue < .001)
2. Variable points with avgmin (r(238) = 0.87, pvalue < .001)
3. Variable points with minutes (r(238) = 0.82, pvalue < .001)
Therefore, it was likely to have multicollinearity problem with these variables and we could use one of the pair in our model
b. Find the model(s) with the lowest AIC by using forward and backwardstepwise selections
The forward selection model using lowest AIC would give us following result:
lwage = 5.993 + 0.022avgmin + 0.064exper − 0.01 draf t + 0.043points − 0.133guard − 0.242allstar

(0.133) (0.08) (0.01) (0.002) (0.015) (0.076) (0.142)
This model has R2 = 0.52 and Adjusted R2 = 0.508 with residual standard error = 0.568 and degree of freedom = 233. The exper and

draf t had high significant level (pvalue < .001) meanwhile avgmin and points had lower significant level (pvalue < .01) and guard and
allstar provided lowest significant level (pvalue < .1).
The backward selection model using lowest AIC would give us following result:
lwage = 5.975 + 0.022avgmin + 0.066exper − 0.011draf t + 0.042points − 0.234allstar

(0.133) (0.08) (0.01) (0.002) (0.015) (0.142)
This model has R2 = 0.514 and Adjusted R2 = 0.503 with residual standard error = 0.57 and degree of freedom = 234

In the backward selection, variable guard was not used to determine the model as it reached < none > value in the backward iteration after it
removed guard variable in the first backward iteration. The exper and draf t had high significant level (pvalue < .001) followed by avgmin
and points with (pvalue < .01) significant level and allstar variable became statistically insignificant.
c. Investigate the four residual plots of model(s) from (b). Are residual plots satisfactory? Comment
The result for forward step selection were as follows:
Residuals vs Fitted Values Scale Location
1.5 2
0.5 1.5
1
−0.5
−1
0.5
−1.5
−2
0
−2.5
6 7 8 6 7 8
QQ Plot Leverage vs Residuals
3 1.5
2 1
1 0.5
0 0
−0.5
−1
−1
−2
−1.5
−3
−2
−4
−2.5
−2 0 2 0 0.05 0.1 0.15 0.2
In the forward selection, Residual vs Fitted plot showed the furthest observation for this model were observation 24, 29, and 166. Meanwhile the
normal QQ plot displayed the outliers were observation 24, 29 and 166. In the ScaleLocation plot, the identified outliers were the same as QQ
plot 24, 29, and 166. And in the Residual vs Leverage plot, observation 103, 104, and 166 were the outliers.
Below are the result for backward step selection
Residuals vs Fitted Values Scale Location
1.5 2
1.5
0.5
1
−0.5
−1
0.5
−1.5
−2
0
6 7 8 6 7 8
QQ Plot Leverage vs Residuals
3 1.5
2 1
1 0.5
0 0
−0.5
−1
−1
−2
−1.5
−3
−2
−4
−2 0 2 0 0.05 0.1 0.15 0.2
In the backward selection, Residual vs Fitted plot showed the furthest observation for this model were observation 24, 29, and 166. Meanwhile
the normal QQ plot also displayed the same were observation 24, 29 and 166. In the ScaleLocation plot, the identified outliers were similar as
QQ plot 24, 29, and 166. And in the Residual vs Leverage plot, observation 103, 104, and 166 were the outliers.
Based on these 2 selections diagrams, we can conclude that the residual plot for both selection were consistent. Therefore, we could identify
the outliers in our sample were 24, 29, 103, 104, and 166. We can also say that the models were satisfactory based on the following:
1. The residuals did not create any distinctive pattern in residual vs fitted plot meaning there was no linear relationship among residuals
2. The residuals mostly followed the straight line in normal QQ plot meaning the residuals were normally distributed
3. The residuals mostly spread equally along the ranges of the predictors in scalelocation plot meaning we can assume that the model has
equal variance (homoscedasticity)
4. Although there were several outliers in the observations, all of these outliers were not influential as they all still inside the Cook’s distance
based on the residual vs leverage plot
d. Repeat (b) without avgmin. Comment on the differences
The forward selection model using lowest AIC would give us following result:
lwage = 6.064 + 0.059points + 0.065exper − 0.01 draf t − 0.325allstar + 0.042 rebounds + 0.036assists
(0.119) (0.01) (0.01) (0.002) (0.14) (0.01s6) 0.021
This model has R2 = 0.515 and Adjusted R2 = 0.502 with residual standard error = 0.571 and degree of freedom = 233. The exper , draf t

and points had high significant level (pvalue < .001) meanwhile allstar and rebounds had significant level of (pvalue < 0.05) and assists
provided lowest significant level (pvalue < 0.1).
The backward selection model using lowest AIC would give us following result:
lwage = 5.811 + 0.064exper + 0.249f orward + 0.272center + 0.069points + 0.058assists − 0.01 draf t − 0.372allstar + 0.161blac
(0.162) (0.011) (0.097) (0.125) (0.009) (0.025) (0.002) (0.142) (0.095)
This model has R2 = 0.52 and Adjusted R2 = 0.504 with residual standard error = 0.57 and degree of freedom = 231
If we did not use variable avgmin , the difference between these models were as follows:
1. The forward selection would use 6 variables namely points, exper, draf t, allstar, rebounds and assists and backward selection would
use 8 variables namely exper, f orward, center, points, assists, draf t, allstar and black
2. The intercept in forward selection model was higher (6.064) than backward selection (5.811)
3. The same variables used in both selection remained to be high significant level (pvalue < .001) were exper , draf t and points
4. Variable allstar had higher siginificance level in backward selection (pvalue < .01) compared to forward selection (pvalue < .05)
5. Variable assists also had higher significance level in backward selection (pvalue < .05) compared to forward selection (pvalue < .1)

Econometrics Assignment 3

Uploaded by

Copyright:

Available Formats

Econometrics Assignment 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Assignment 3

Uploaded by

Copyright:

Available Formats

EconometricsAssignment3

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

This model has R2 = 0.373 and Adjusted R2 = 0.372 with residual standard error = 15.302 and degree of freedom = 1688. These two

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

math4 = 50.478 − 0.449lunch − 2.647log(enroll) + 6.474log(exppp)

math4 = 50.478 − 0.449lunch − 2.647log(enroll) + 6.474log(exppp)

math4 = 50.478 − 0.448lunch − 2.647log(enroll) + 6.474log(exppp)

lwage = 5.993 + 0.022avgmin + 0.064exper − 0.01 draf t + 0.043points − 0.133guard − 0.242allstar

This model has R2 = 0.52 and Adjusted R2 = 0.508 with residual standard error = 0.568 and degree of freedom = 233. The exper and

lwage = 5.975 + 0.022avgmin + 0.066exper − 0.011draf t + 0.042points − 0.234allstar

This model has R2 = 0.514 and Adjusted R2 = 0.503 with residual standard error = 0.57 and degree of freedom = 234

−2 0 2 0 0.05 0.1 0.15 0.2

This model has R2 = 0.515 and Adjusted R2 = 0.502 with residual standard error = 0.571 and degree of freedom = 233. The exper , draf t

This model has R2 = 0.52 and Adjusted R2 = 0.504 with residual standard error = 0.57 and degree of freedom = 231

You might also like

Econometrics Assignment 3

Uploaded by

Copyright:

Available Formats

Econometrics Assignment 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Assignment 3

Uploaded by

Copyright:

Available Formats

Econometrics­Assignment­3

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

This model has R2 = 0.373 and Adjusted R2 = 0.372 with residual standard error = 15.302 and degree of freedom = 1688. These two

math4 = 91.932 − 0.449lunch − 5.399log(enroll) + 3.525log(exppp)

math4 = 50.478 − 0.449lunch − 2.647log(enroll) + 6.474log(exppp)

math4 = 50.478 − 0.449lunch − 2.647log(enroll) + 6.474log(exppp)

math4 = 50.478 − 0.448lunch − 2.647log(enroll) + 6.474log(exppp)

lwage = 5.993 + 0.022avgmin + 0.064exper − 0.01 draf t + 0.043points − 0.133guard − 0.242allstar

This model has R2 = 0.52 and Adjusted R2 = 0.508 with residual standard error = 0.568 and degree of freedom = 233. The exper and

lwage = 5.975 + 0.022avgmin + 0.066exper − 0.011draf t + 0.042points − 0.234allstar

This model has R2 = 0.514 and Adjusted R2 = 0.503 with residual standard error = 0.57 and degree of freedom = 234

−2 0 2 0 0.05 0.1 0.15 0.2

This model has R2 = 0.515 and Adjusted R2 = 0.502 with residual standard error = 0.571 and degree of freedom = 233. The exper , draf t

This model has R2 = 0.52 and Adjusted R2 = 0.504 with residual standard error = 0.57 and degree of freedom = 231

You might also like

EconometricsAssignment3