Notes 9
Notes 9
Notes 9
Outline
1. Nonlinear regression functions – general comments
2. Nonlinear functions of one variable
3. Nonlinear functions of two variables: interactions
The TestScore – STR relation looks
linear (maybe)
But the TestScore – Income
relation looks nonlinear
Nonlinear Regression – General Ideas
If a relation between Y and X is nonlinear
• The effect on Y of a change in X depends on the value of X
o That is, the marginal effect of X is not constant
o The slope is not constant
• In this case a linear regression will be mis-specified
o That is, if we specify a linear function, that functional form
will be wrong
• Hence, the estimator of the effect on Y of X will be biased
o It needn’t even be right on average
• The solution to this is to estimate a regression function that is
nonlinear in X
The general nonlinear population
regression function
Yi = f(X1i, X2i,…, Xki) + ui, i = 1,…, n
Assumptions
1. E(ui| X1i,X2i,…,Xki) = 0 (same as before)
• Implies that f(X1i, X2i,…, Xki) is the conditional expectation
of Y given the X’s
2. (X1i,…,Xki,Yi) are i.i.d. (same)
3. No big outliers (same)
4. No perfect multicollinearity (same)
Nonlinear Functions of a Single
Independent Variable
We’ll look at two complementary approaches
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2. Logarithmic transformations
• Y and/or X is transformed by taking its logarithm
• this gives a “percentages” interpretation that makes sense
in many applications
1. Polynomials in X
Approximate the population regression function by a polynomial:
Yi = β0 + β1Xi + β2 X i2 +…+ βr X ir + ui
Quadratic specification:
Cubic specification:
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
! = 607.3
TestScore + 3.85 Incomei – 0.0423 (Incomei)2
(2.9) (0.27) (0.0048)
Predicted change in TestScore for a change in income from $5,000 per capita
to $6,000 per capita:
!
DTestScore = 607.3 + 3.85*6 – 0.0423*62– (607.3 + 3.85*5 – 0.0423*52)
= 3.4
! = 607.3 + 3.85Income – 0.0423(Income )2
TestScore i i
Quadratic specification:
Cubic specification:
------------------------------------------------------------------------------
| Robust
How do you test the null hypothesis that the regression function is a
quadratic against the alternative that the regression function is a cubic?
How do you testing the null hypothesis of linearity, against the
alternative that the population regression is quadratic and/or
cubic, that is, it is a polynomial of degree up to 3?
test avginc2 avginc3; Execute the test command after running the regression
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0
F( 2, 416) = 37.69
Prob > F = 0.0000
æ Dx ö
The Taylor first order approximation of ln ç 1 + ÷ is
è x ø
æ Dx ö D x
ln ç 1 + ÷≈ = prop. Change in x
è x ø x
Examples:
ln(1.01) - ln(1) = .00995 - 0 ≈ 0.01;
ln(1.10) - ln(1) = .0953 - 0 ≈ 0.10
The approximation is better the smaller the change in x
The three log regression
specifications
DX
now ln(X + DX) – ln(X) ≈ ,
X
DX
so DY ≈ b1
X
DY
or b1 ≈
DX / X
Linear-log case
Yi = b0 + b1ln(Xi) + ui
DX
Now = proportional change in X
X
DX
So for a 1% increase in X, that is, if we multiply X by 1.01 Þ = 0.01
X
• The model is now linear in ln(Income), so the linear-log model can be estimated
by OLS:
!
TestScore = 557.8 + 36.42 ln(Incomei)
(3.8) (1.40)
• Standard errors, confidence intervals, R2 – all the usual tools of regression apply
here
The linear-log and cubic regression
functions
II. Log-linear regression function
ln(Y) = b0 + b1X (b)
DY
so ≈ ln(Y + DY) – ln(Y) = b1DX
Y
DY / Y
or b1 ≈
DX
Log-linear case
ln(Yi) = b0 + b1Xi + ui
DY / Y
b1 ≈
DX
DY
• = proportional change in Y
Y
DY
• So for a change in X by one unit (DX = 1) Þ b1 ≈
Y
which equals the proportional change in Y
ΔY ΔX
so ≈ β1
Y X
ΔY / Y
or β1 ≈
ΔX / X
Log-log case
ln(Yi) = b0 + b1ln(Xi) + ui
!
ln(TestScore ) = 6.336 + 0.0554*ln(Incomei)
(0.006) (0.0021)
• Di is binary, X is continuous
• As specified above, the effect of X on Y (holding constant D) is equal to b1
o In the current specification, this effect does not depend on D
o But we may want to allow for that dependence
o For example
§ the effect of a drug may be different for males and females
§ The effect of lowering the class size by one student per teacher may be
different for districts with many English learners than for districts with
few English learners
• To allow the effect of X to depend on D, include the “interaction term” Di*Xi as a
regressor:
Yi = b0 + b1 Xi + b2 Di + b3(Di*Xi) + ui
Binary-continuous interactions: the
two regression lines
Yi = β0 + β1X i + β2Di + β3(Di*Xi) + ui
Yi = β0 + β1Xi + β2 + β3Xi + ui
= (β0+β2) + (β1+β3)Xi + ui The D=1 regression line
Binary-continuous interactions: the
two regression lines
• So by including the interaction term, we effectively estimate
two regression lines
o When D=0, the regression line has
§ Intercept b0
§ Slopeb1
o When D=1, the regression line has
§ Intercept b0+b2
§ Slope b1+b3
Binary-continuous interactions, ctd.
Interpreting the coefficients
Yi = b0 + b1X i + b2Di + b3(Di*Xi) + ui
General rule: compare the various cases
Change X:
• When HiEL = 0:
! = 682.2 – 0.97STR
TestScore
• When HiEL = 1,
! = 682.2 – 0.97STR + 5.6 – 1.28STR
TestScore
= 687.8 – 2.25STR
• Two regression lines: one for each HiSTR group.
• Class size reduction is estimated to have a larger effect when
the percent of English learners is large.
Testing hypotheses
! = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR*HiEL)
TestScore
(11.9) (0.59) (19.5) (0.97)
• The two regression lines have the same slope iff the coefficient on
STR×HiEL is zero: t = –1.28/0.97 = –1.32 (this is not rejected at 5%
level)
• The two regression lines have the same intercept iff the coefficient on
HiEL is zero: t = –5.6/19.5 = 0.29 (again, this is not rejected at 5% level)
• The two regression lines are the same iff population coefficient on HiEL
= 0 and population coefficient on STR*HiEL = 0: F = 89.94 (p-value <
.001) !!
• We reject the joint hypothesis but neither individual hypothesis
o How can this be? Multicolinearity makes the standard errors large,
leading to not rejection in the t-tests
o It is hard to know which coefficient is non-zero, but the F test
strongly rejects that they are both equal to zero
(c) Interactions between two
continuous variables
Yi = β0 + β1X1i + β2X2i + ui
!
TestScore = 686.3 – 1.12STR – 0.67PctEL + .0012(STR*PctEL),
(11.8) (0.59) (0.37) (0.019)
57