Notes 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Econ 120B: Econometrics

Summer Session I 2024


UCSD
Nonlinear Regression Functions
• So far, everything has been linear in the X’s
• But linear approximations are not always good in practice
• Real world relations are not always linear
• The multiple regression framework can be easily extended to
handle regression functions that are nonlinear in one or more X

Outline
1. Nonlinear regression functions – general comments
2. Nonlinear functions of one variable
3. Nonlinear functions of two variables: interactions
The TestScore – STR relation looks
linear (maybe)
But the TestScore – Income
relation looks nonlinear
Nonlinear Regression – General Ideas
If a relation between Y and X is nonlinear
• The effect on Y of a change in X depends on the value of X
o That is, the marginal effect of X is not constant
o The slope is not constant
• In this case a linear regression will be mis-specified
o That is, if we specify a linear function, that functional form
will be wrong
• Hence, the estimator of the effect on Y of X will be biased
o It needn’t even be right on average
• The solution to this is to estimate a regression function that is
nonlinear in X
The general nonlinear population
regression function
Yi = f(X1i, X2i,…, Xki) + ui, i = 1,…, n

Assumptions
1. E(ui| X1i,X2i,…,Xki) = 0 (same as before)
• Implies that f(X1i, X2i,…, Xki) is the conditional expectation
of Y given the X’s
2. (X1i,…,Xki,Yi) are i.i.d. (same)
3. No big outliers (same)
4. No perfect multicollinearity (same)
Nonlinear Functions of a Single
Independent Variable
We’ll look at two complementary approaches
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial

2. Logarithmic transformations
• Y and/or X is transformed by taking its logarithm
• this gives a “percentages” interpretation that makes sense
in many applications
1. Polynomials in X
Approximate the population regression function by a polynomial:

Yi = β0 + β1Xi + β2 X i2 +…+ βr X ir + ui

• This is just the linear multiple regression model – except that


the regressors are powers of X
• Estimation, hypothesis testing, etc. proceeds as in the
multiple regression model using OLS
• The coefficients are trickier to interpret, but the regression
function itself is easily interpretable
Example: the TestScore – Income
relation
Incomei = average district income in the ith district,
measured by the variable avginc in our data set
(thousands of dollars per capita)

Quadratic specification:

TestScorei = b0 + b1Incomei + b2(Incomei)2 + ui

Cubic specification:

TestScorei = b0 + b1Incomei + b2(Incomei)2 + b3(Incomei)3 + ui


Estimation of the quadratic
specification in STATA
generate avginc2 = avginc*avginc; Create a new regressor
reg testscr avginc avginc2, r;

Regression with robust standard errors Number of obs = 420


F( 2, 417) = 428.52
Prob > F = 0.0000
R-squared = 0.5562
Root MSE = 12.724

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------

How do you test the null hypothesis of linearity against the


alternative that the regression function is a quadratic?
Interpreting the estimated
regression function:
(a) Plot the predicted values
! = 607.3 + 3.85Income – 0.0423(Income )2
TestScore i i
(2.9) (0.27) (0.0048)
Interpreting the estimated
regression function
(b) Compute “effects” for different values of X

! = 607.3
TestScore + 3.85 Incomei – 0.0423 (Incomei)2
(2.9) (0.27) (0.0048)

Predicted change in TestScore for a change in income from $5,000 per capita
to $6,000 per capita:

!
DTestScore = 607.3 + 3.85*6 – 0.0423*62– (607.3 + 3.85*5 – 0.0423*52)
= 3.4
! = 607.3 + 3.85Income – 0.0423(Income )2
TestScore i i

Predicted “effects” for different values of X:

Change in Income ($1000 per capita) !


ΔTestScore
from 5 to 6 3.4
from 25 to 26 1.7
from 45 to 46 0.0

The “effect” of a change in income is greater at low than high


income levels (perhaps, a declining marginal benefit of an
increase in school budgets?)
Caution! What is the effect of a change from 65 to 66?
Don’t extrapolate outside the range of the data!
Example: the TestScore – Income
relation
Incomei = average district income in the ith district,
measured by the variable avginc in our data set
(thousands of dollars per capita)

Quadratic specification:

TestScorei = b0 + b1Incomei + b2(Incomei)2 + ui

Cubic specification:

TestScorei = b0 + b1Incomei + b2(Incomei)2 + b3(Incomei)3 + ui


Estimation of a cubic specification
in STATA
gen avginc3 = avginc*avginc2; Create the cubic regressor
reg testscr avginc avginc2 avginc3, r;

Regression with robust standard errors Number of obs = 420


F( 3, 416) = 270.18
Prob > F = 0.0000
R-squared = 0.5584
Root MSE = 12.707

------------------------------------------------------------------------------
| Robust

testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]


-------------+----------------------------------------------------------------
avginc | 5.018677 .7073505 7.10 0.000 3.628251 6.409104
avginc2 | -.0958052 .0289537 -3.31 0.001 -.1527191 -.0388913
avginc3 | .0006855 .0003471 1.98 0.049 3.27e-06 .0013677
_cons | 600.079 5.102062 117.61 0.000 590.0499 610.108
------------------------------------------------------------------------------

How do you test the null hypothesis that the regression function is a
quadratic against the alternative that the regression function is a cubic?
How do you testing the null hypothesis of linearity, against the
alternative that the population regression is quadratic and/or
cubic, that is, it is a polynomial of degree up to 3?

H0: coefficients on Income2 and Income3 = 0


H1: at least one of these coefficients is nonzero.

test avginc2 avginc3; Execute the test command after running the regression

( 1) avginc2 = 0.0
( 2) avginc3 = 0.0

F( 2, 416) = 37.69
Prob > F = 0.0000

The hypothesis that the population regression is linear is rejected


at the 1% significance level against the alternative that it is a
polynomial of degree up to 3.
Summary: polynomial regression
functions
2 r
Yi = b0 + b1Xi + b2 X i +…+ br X i + ui
• Estimation: by OLS after defining new regressors
• Coefficients have complicated interpretations
• To interpret the estimated regression function:
• plot predicted values as a function of x
• compute predicted DY/DX at different values of x
• Hypotheses concerning degree r can be tested by t- (for single
coefficients) and F- (for multiple coefficients) tests on the appropriate
blocks of variables
• Choice of degree r
• plot the data
• t- and F-tests
• check sensitivity of estimated effects
• judgment
2. Logarithmic functions of Y and/or X
• ln(X) = the natural logarithm of X

• Logarithmic transforms permit modeling relations in percentage terms

Here’s why: when Dx is small


æ Dx ö
ln(x+Dx) – ln(x) = ln ç 1 + ÷
è x ø

æ Dx ö
The Taylor first order approximation of ln ç 1 + ÷ is
è x ø

æ Dx ö D x
ln ç 1 + ÷≈ = prop. Change in x
è x ø x

The approximation is very good when Dx is small


2. Logarithmic functions of Y and/or X

In sum, when Dx is small


Dx
ln(x+Dx) – ln(x) ≈ = prop. Change in x
x

Examples:
ln(1.01) - ln(1) = .00995 - 0 ≈ 0.01;
ln(1.10) - ln(1) = .0953 - 0 ≈ 0.10
The approximation is better the smaller the change in x
The three log regression
specifications

Case Population regression function


I. linear-log Yi = b0 + b1ln(Xi) + ui
II. log-linear ln(Yi) = b0 + b1Xi + ui
III. log-log ln(Yi) = b0 + b1ln(Xi) + ui

• The interpretation of the slope coefficient differs in each case


• The interpretation is found by applying the general “before
and after” rule
o figure out the change in Y for a given change in X
I. Linear-log regression function
Y = b0 + b1ln(X) (b)

Now change X: Y + DY = b0 + b1ln(X + DX) (a)

Subtract (a) – (b): DY = b1[ln(X + DX) – ln(X)]

DX
now ln(X + DX) – ln(X) ≈ ,
X
DX
so DY ≈ b1
X
DY
or b1 ≈
DX / X
Linear-log case
Yi = b0 + b1ln(Xi) + ui

for small DX,


DY
b1 ≈
DX / X

DX
Now = proportional change in X
X

DX
So for a 1% increase in X, that is, if we multiply X by 1.01 Þ = 0.01
X

Þ DY / 0.01 = b1 Þ DY = 0.01b1 -- this is the predicted change in Y


Example: TestScore vs. ln(Income)
• First define the new regressor, ln(Income)

• The model is now linear in ln(Income), so the linear-log model can be estimated
by OLS:

!
TestScore = 557.8 + 36.42 ln(Incomei)
(3.8) (1.40)

• So a 1% increase in Income is associated with

0.01b1 = 0.01*36.42 = 0.36

more points in test scores

• Standard errors, confidence intervals, R2 – all the usual tools of regression apply
here
The linear-log and cubic regression
functions
II. Log-linear regression function
ln(Y) = b0 + b1X (b)

Now change X: ln(Y + DY) = b0 + b1(X + DX) (a)

Subtract (a) – (b): ln(Y + DY) – ln(Y) = b1DX

DY
so ≈ ln(Y + DY) – ln(Y) = b1DX
Y
DY / Y
or b1 ≈
DX
Log-linear case
ln(Yi) = b0 + b1Xi + ui

DY / Y
b1 ≈
DX

DY
• = proportional change in Y
Y
DY
• So for a change in X by one unit (DX = 1) Þ b1 ≈
Y
which equals the proportional change in Y

• That is, if X increases by 1 unit, Y changes by (100*b1)%


III. Log-log regression function
ln(Yi) = β0 + β1ln(Xi) + ui (b)

Now change X: ln(Y + ΔY) = β0 + β1ln(X + ΔX) (a)

Subtract: ln(Y + ΔY) – ln(Y) = β1[ln(X + ΔX) – ln(X)]

ΔY ΔX
so ≈ β1
Y X
ΔY / Y
or β1 ≈
ΔX / X
Log-log case
ln(Yi) = b0 + b1ln(Xi) + ui

for small DX,


DY / Y
b1 ≈
DX / X
DY DX
• Hence, a 100* = percentage change in Y, is associated with a 100*
Y X
percentage change in X

• E.g. a 1% change in X is associated with a b1% change in Y

• That is, in the log-log specification, b1 has the interpretation of an elasticity


Example: ln(TestScore) vs. ln(Income)
• First defining a new dependent variable, ln(TestScore), and the
new regressor, ln(Income)
• The model is now a linear regression of ln(TestScore) against
ln(Income), which can be estimated by OLS:

!
ln(TestScore ) = 6.336 + 0.0554*ln(Incomei)
(0.006) (0.0021)

A 1% increase in Income is associated with an increase of


0.0554% in TestScore
Example: ln(TestScore) vs. ln(Income)
!
ln(TestScore ) = 6.336 + 0.0554*ln(Incomei)
(0.006) (0.0021)

• For example, suppose income increases from $10,000 to


$11,000, or by 10%
• Then TestScore increases by approximately
0.0554*10% = 0.554%
• If TestScore = 650, this corresponds to an increase of
0.00554*650 = 3.6 points
The log-linear and log-log specifications:

• Note vertical axis


• Neither seems to fit as well as the cubic or linear-log
Summary: Logarithmic transformations
• Three cases, differing in whether Y and/or X is transformed by taking
logarithms.
• The regression is linear in the new variable(s) ln(Y) and/or ln(X), and the
coefficients can be estimated by OLS.
• Hypothesis tests and confidence intervals are now implemented and
interpreted as usual
• The interpretation of b1 differs from case to case
• Choice of specification should be guided by
• Judgment: which interpretation makes the most sense in your
application?
• Tests
• Plotting predicted values
Interactions Between
Independent Variables
• Perhaps a class size reduction is more effective in some
circumstances than in others
• Perhaps smaller classes help more if there are many English
learners, who need individual attention
ΔTestScore
• That is, might depend on PctEL
ΔSTR
ΔY
• More generally, might depend on X2
ΔX 1
• How to model such “interactions” between X1 and X2?
• We first consider binary X’s, then continuous X’s
(a) Interactions between two binary
variables
Yi = b0 + b1D1i + b2D2i + ui

• D1i, D2i are binary


• b1 is the effect of changing D1=0 to D1=1
o In this specification, this effect doesn’t depend on the value
of D2.
• To allow the effect of changing D1 to depend on D2, include the
“interaction term” D1i*D2i as a regressor:

Yi = b0 + b1D1i + b2D2i + b3(D1i*D2i) + ui


Interpreting the coefficients
Yi = β0 + β1D1i + β2D2i + β3(D1i*D2i) + ui
General rule: compare the various cases
E(Yi|D1i=0, D2i=d2) = β0 + β2d2 (b)
E(Yi|D1i=1, D2i=d2) = β0 + β1 + β2d2 + β3d2 (a)
subtract (a) – (b):
E(Yi|D1i=1, D2i=d2) – E(Yi|D1i=0, D2i=d2) = β1 + β3d2
• The effect of D1 depends on d2 (what we wanted)
• β1= effect of D1 on Y regardles of the value of D2
• β3 = increment to the effect of D1 on Y, when D2 = 1
Example: TestScore, STR, English
learners
Let
ì1 if STR ³ 20 ì 1 if PctEL ³ l0
HiSTR = í and HiEL = í
î0 if STR < 20 î0 if PctEL < 10

! = 664.1 – 18.2HiEL – 1.9HiSTR – 3.5(HiSTR*HiEL)


TestScore
(1.4) (2.3) (1.9) (3.1)

• Effect of HiSTR when HiEL = 0 is –1.9


• Effect of HiSTR when HiEL = 1 is –1.9 – 3.5 = –5.4
• Class size reduction is estimated to have a bigger effect when the
percent of English learners is large
• But this interaction isn’t statistically significant: t = 3.5/3.1 = 1.12
(b) Interactions between continuous and
binary variables
Yi = b0 + b1 Xi + b2 Di + ui

• Di is binary, X is continuous
• As specified above, the effect of X on Y (holding constant D) is equal to b1
o In the current specification, this effect does not depend on D
o But we may want to allow for that dependence
o For example
§ the effect of a drug may be different for males and females
§ The effect of lowering the class size by one student per teacher may be
different for districts with many English learners than for districts with
few English learners
• To allow the effect of X to depend on D, include the “interaction term” Di*Xi as a
regressor:

Yi = b0 + b1 Xi + b2 Di + b3(Di*Xi) + ui
Binary-continuous interactions: the
two regression lines
Yi = β0 + β1X i + β2Di + β3(Di*Xi) + ui

Observations with Di= 0 (the “D = 0” group):

Yi = β0 + β1Xi + ui The D=0 regression line

Observations with Di= 1 (the “D = 1” group):

Yi = β0 + β1Xi + β2 + β3Xi + ui
= (β0+β2) + (β1+β3)Xi + ui The D=1 regression line
Binary-continuous interactions: the
two regression lines
• So by including the interaction term, we effectively estimate
two regression lines
o When D=0, the regression line has
§ Intercept b0
§ Slopeb1
o When D=1, the regression line has
§ Intercept b0+b2
§ Slope b1+b3
Binary-continuous interactions, ctd.
Interpreting the coefficients
Yi = b0 + b1X i + b2Di + b3(Di*Xi) + ui
General rule: compare the various cases

Y = b0 + b1X + b2 D + b3(D*X) (a)

Change X:

Y + DY = b0 + b1(X+DX) + b2 D + b3[D*(X+DX)] (b)

subtract (b) – (a):


DY
DY = b1DX + b3DDX or = b1 + b3D
DX

• The effect of X depends on D (what we wanted)


• b3 = increment to the effect (slope) of X, when D = 1
Example: TestScore, STR, HiEL
(=1 if PctEL ≥10)
!
TestScore = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR*HiEL)
(11.9) (0.59) (19.5) (0.97)

• When HiEL = 0:
! = 682.2 – 0.97STR
TestScore
• When HiEL = 1,
! = 682.2 – 0.97STR + 5.6 – 1.28STR
TestScore
= 687.8 – 2.25STR
• Two regression lines: one for each HiSTR group.
• Class size reduction is estimated to have a larger effect when
the percent of English learners is large.
Testing hypotheses
! = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR*HiEL)
TestScore
(11.9) (0.59) (19.5) (0.97)
• The two regression lines have the same slope iff the coefficient on
STR×HiEL is zero: t = –1.28/0.97 = –1.32 (this is not rejected at 5%
level)
• The two regression lines have the same intercept iff the coefficient on
HiEL is zero: t = –5.6/19.5 = 0.29 (again, this is not rejected at 5% level)
• The two regression lines are the same iff population coefficient on HiEL
= 0 and population coefficient on STR*HiEL = 0: F = 89.94 (p-value <
.001) !!
• We reject the joint hypothesis but neither individual hypothesis
o How can this be? Multicolinearity makes the standard errors large,
leading to not rejection in the t-tests
o It is hard to know which coefficient is non-zero, but the F test
strongly rejects that they are both equal to zero
(c) Interactions between two
continuous variables
Yi = β0 + β1X1i + β2X2i + ui

• X1, X2 are continuous


• As specified, the effect of X1 doesn’t depend on X2
• As specified, the effect of X2 doesn’t depend on X1
• To allow the effect of X1 to depend on X2, include the
“interaction term” X1i*X2i as a regressor:

Yi = β0 + β1X1i + β2X2i + β3(X1i*X2i) + ui


Interpreting the coefficients:
Yi = β0 + β1X1i + β2X2i + β3(X1i*X2i) + ui

General rule: compare the various cases


Y = β0 + β1X1 + β2X2 + β3(X1*X2) (b)

Now change X1:


Y+ ΔY = β0 + β1(X1+ΔX1) + β2X2 + β3[(X1+ΔX1)*X2] (a)

subtract (a) – (b):


ΔY
ΔY = β1ΔX1 + β3X2ΔX1 or = β1 + β3X2
ΔX 1
• The effect of X1 depends on X2 (what we wanted)
• β3 = increment to the effect (slope) of X1 from a unit change
in X2
Example: TestScore, STR, PctEL
! = 686.3 – 1.12STR – 0.67PctEL + .0012(STR*PctEL),
TestScore
(11.8) (0.59) (0.37) (0.019)

The estimated effect of class size reduction is nonlinear because


the size of the effect itself depends on PctEL:
ΔTestScore
= –1.12 + .0012PctEL
ΔSTR
PctEL ΔTestScore
ΔSTR
0 –1.12
20% –1.12+.0012*20 = –1.10
Hypothesis tests

!
TestScore = 686.3 – 1.12STR – 0.67PctEL + .0012(STR*PctEL),
(11.8) (0.59) (0.37) (0.019)

• Does population coefficient on STR*PctEL = 0?


t = .0012/.019 = .06 ! can’t reject null at 5% level
• Does population coefficient on STR = 0?
t = –1.12/0.59 = –1.90 ! can’t reject null at 5% level
• Do the coefficients on both STR and STR*PctEL = 0?
F = 3.89 (p-value = .021) ! reject null at 5% level(!!)
• Why? high but imperfect multicollinearity
Summary: Nonlinear Regression
Functions
• Using functions of the independent variables such as ln(X)
or X1*X2, allows recasting a large family of nonlinear
regression functions using multiple regression.
• Estimation and inference proceed in the same way as in
the linear multiple regression model.
• Interpretation of the coefficients is model-specific, but the
general rule is to compute effects by comparing different
cases (different value of the original X’s)
• Many nonlinear specifications are possible, so you must
use judgment:
• What nonlinear effect you want to analyze?
• What makes sense in your application?
Application: Nonlinear Effects on Test
Scores of the Student-Teacher Ratio
Nonlinear specifications let us examine more nuanced questions
about the Test score – STR relation, such as:

1. Are there nonlinear effects of class size reduction on test


scores?
• E.g. does a reduction from 35 to 30 have same effect as a
reduction from 20 to 15?
2. Are there nonlinear interactions between PctEL and STR?
• E.g. are small classes more effective when there are many
English learners?
Strategy for Question #1 (different
effects for different STR?)
• Estimate linear and nonlinear functions of STR, holding constant
relevant demographic variables
• EL_Pct
• Income
• Meal_Pct (fraction on free/subsidized lunch)
• See whether adding the nonlinear terms makes an “economically
important” quantitative difference
• “Economic” or “real-world” importance is different than
statistically significant
• Test for whether the nonlinear terms are significant
Strategy for Question #2 (interactions
between PctEL and STR?)
• Estimate linear and nonlinear functions of STR, interacted with EL_Pct.
• If the specification is nonlinear (with STR, STR2, STR3), then you need to
add interactions with all the terms so that the entire functional form can be
different, depending on the level of EL_Pct.
• We will use a binary-continuous interaction specification by adding
HiEL×STR, HiEL×STR2, and HiEL×STR3
What is a good “base” specification?
The TestScore – Income relation:

The logarithmic specification is better behaved near the extremes of


the sample, especially for large values of income.
Tests of joint hypotheses:

What can you conclude about question #1?


About question #2?
Enjoy the rest of the summer! J

57

You might also like