Tute6Answers ECON339

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Tutorial 6 (Week 7)

Assumptions and Diagnostics

Tutorial assignment:

What might Ramsey’s RESET test be used for?

Ans: The Ramsey’s RESET test is a test to determine the correct functional form.
Ramsey’s RESET test is a test of whether the functional form of the regression is
appropriate. In other words, we test whether the relationship between the dependent
variable and the independent variables really should be linear or whether a non-linear
form would be more appropriate. The test works by adding powers of the fitted values
from the regression into a second regression. If the appropriate model was a linear
one, then the powers of the fitted values would not be significant in this second
regression.

What could be done if it were found that the RESET test failed?

Ans: The test is performed under the null hypothesis of a linear model. The rejection
of the null implies that a nonlinear model is supported by the data. However, the test
does not provide the functional form of the nonlinear model.

If we fail Ramsey’s RESET test, then the easiest “solution” is probably to transform
all of the variables into logarithms. This has the effect of turning a multiplicative
model into an additive one.

If this still fails, then we really have to admit that the relationship between the
dependent variable and the independent variables was probably not linear after all so
that we have to either estimate a non-linear model for the data (which is beyond the
scope of this course) or we have to go back to the drawing board and run a different
regression containing different variables.

Objective: 1. Identifying multicollinearity and possible solutions to the problem


2. Perform Chow Test to determine whether parameters are the same for different
groups

Question 1
The data is available in the file “hedonic1.xls” Consider the following multiple
regression model for new houses only (age=0).

Proc>Set sample> if age=0

a) Estimate the econometric model


ln(SP) t   1   2 SFLAt   3 BEDS   4 BATHS t   5 STORIES t   6VACANTt  u t

Quick>Estimate equation>
Log(selling_price) c sfla beds baths stories vacant

1
Dependent Variable: LOG(SELLING_PRICE)
Method: Least Squares
Date: 07/09/05 Time: 10:42
Sample: 1 6660 IF AGE=0
Included observations: 151

Variable Coefficient Std. Error t-Statistic Prob.

C 11.24201 0.124654 90.18541 0.0000


SFLA 0.000707 5.48E-05 12.90426 0.0000
BEDS -0.084611 0.033665 -2.513295 0.0131
BATHS 0.034427 0.084073 0.409495 0.6828
STORIES -0.141884 0.060394 -2.349299 0.0202
VACANT 0.068117 0.035346 1.927130 0.0559

R-squared 0.784029 Mean dependent var 12.02137


Adjusted R-squared 0.776582 S.D. dependent var 0.431160
S.E. of regression 0.203797 Akaike info criterion -0.304458
Sum squared resid 6.022332 Schwarz criterion -0.184566
Log likelihood 28.98658 F-statistic 105.2772
Durbin-Watson stat 0.213214 Prob(F-statistic) 0.000000

b) Do the coefficients take the expected signs? Check for any evidence of
multicollinearity.

The coefficients for beds and stories do not take the expected signs we expect the
more bedrooms and more stories the greater the selling price however this is not
demonstrated by the data for new houses. Also vacant houses on average are more
expensive than non-vacant houses but this is not significant at the 5% level.

One way to check for multicollinearity is to check the correlation coefficients of all
the explanatory variables.

In the workfile window> hold control and click all the explanatory variables> right
click and select to open them as a group.

View> covariance analysis>tick the option correlation>OK

BATHS BEDS SFLA STORIES VACANT


BATHS 1.000000 0.676223 0.862870 0.673716 -0.178327
BEDS 0.676223 1.000000 0.657986 0.515717 -0.074098
SFLA 0.862870 0.657986 1.000000 0.658058 -0.131856
STORIES 0.673716 0.515717 0.658058 1.000000 -0.029968
VACANT -0.178327 -0.074098 -0.131856 -0.029968 1.000000

2
Any correlations over 0.8 are considered to be evidence of multicollinearity.
However with multicollinearity there is no cut off there is just more of a
possible effect the higher the correlations between the variables.

Another way in which to detect multicollinearity is to run auxiliary


regressions where we run one variable against the rest of the explanatory
variables.

For example:
Quick>Estimate Equation>
Sfla c beds baths stories vacant

Dependent Variable: SFLA


Method: Least Squares
Date: 07/09/05 Time: 12:00
Sample: 1 6660 IF AGE=0
Included observations: 151

Variable Coefficient Std. Error t-Statistic Prob.

C -1312.035 153.8351 -8.528843 0.0000


BEDS 111.6102 50.00764 2.231862 0.0271
BATHS 1015.968 95.17536 10.67469 0.0000
STORIES 204.8847 89.63886 2.285668 0.0237
VACANT 6.589078 53.38976 0.123415 0.9019

R-squared 0.763486 Mean dependent var 1621.993


Adjusted R-squared 0.757006 S.D. dependent var 624.5071
S.E. of regression 307.8469 Akaike info criterion 14.32963
Sum squared resid 13836377 Schwarz criterion 14.42954
Log likelihood -1076.887 F-statistic 117.8251
Durbin-Watson stat 0.343464 Prob(F-statistic) 0.000000

A high r-squared indicates that there is evidence of multicollinearity.

c) What are the possible effects of multicollinearity? What are the possible solutions
to the problem?

Possible consequences include – incorrect signs and sizes of the coefficients and
possible large std errors, so that the variables appear individually not significant
when in fact they are and together they may be significant when doing an F-test.

Possible solutions include – dropping one of the variables in question. Creating a


ratio of the variables. Gathering more data to estimate your model.

3
d) Create a dummy variable for the entire dataset which has a value of 1 for a new
house and 0 for any other house.

Proc> set sample> clear the if statement


Genr>new=0
Genr>new=1 , and type in the sample window next to @all if age=0

Alternatively, in the blank window above the workfile type in dum1= age=0.

You will be able to find a new variable dum1 created in the workfile.
To check whether the dummy is properly created, you can graph the dum1 variable and find
the many spikes that have the value 1 at age=0.

e) Do a Chow test for the complete dataset to see if the equation changes depending
on whether the house is a new house or not.

Quick>Estimate equation>
Log(selling_price) c sfla beds baths stories vacant new new*sfla new*beds
new*baths new*stories new*vacant

Dependent Variable: LOG(SELLING_PRICE)


Method: Least Squares
Date: 07/09/05 Time: 12:24
Sample: 1 6660
Included observations: 6660

Variable Coefficient Std. Error t-Statistic Prob.

C 11.06159 0.016676 663.3423 0.0000


SFLA 0.000465 1.12E-05 41.67081 0.0000
BEDS -0.025994 0.007025 -3.700036 0.0002
BATHS 0.168266 0.009302 18.08933 0.0000
STORIES -0.055975 0.010797 -5.184200 0.0000
VACANT -0.111657 0.007842 -14.23829 0.0000
NEW 0.180419 0.174618 1.033220 0.3015
NEW*SFLA 0.000242 7.72E-05 3.136150 0.0017
NEW*BEDS -0.058617 0.047466 -1.234926 0.2169
NEW*BATHS -0.133838 0.117601 -1.138073 0.2551
NEW*STORIES -0.085909 0.084904 -1.011839 0.3117
NEW*VACANT 0.179774 0.049907 3.602154 0.0003

R-squared 0.538565 Mean dependent var 11.91930


Adjusted R-squared 0.537802 S.D. dependent var 0.418000
S.E. of regression 0.284178 Akaike info criterion 0.323366
Sum squared resid 536.8724 Schwarz criterion 0.335626
Log likelihood -1064.810 F-statistic 705.3853

4
Durbin-Watson stat 0.818210 Prob(F-statistic) 0.000000

View>Coefficient tests>Wald – Coefficient restrictions>


C(7)=0,C(8)=0,C(9)=0,C(10)=0,C(11)=0,C(12)=0

Wald Test:
Equation: EQ01

Test Statistic Value df Probability

F-statistic 4.102574 (6, 6648) 0.0004


Chi-square 24.61545 6 0.0004

Ho: The coefficients are the same regardless of whether it is a new house or not
H1: The coefficients change depending on whether it is a new house or not
Assume a 5% level of significance
p-value=.0004<0.05 Reject the null
At the 5% level we can conclude that there are different effects for new and old
houses

You might also like