Nonlinear Panel Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Nonlinear Panel Data

Whitney Newey

Fall 2007

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Panel data control for individual eects correlated with regressors.


Well known how to do this in linear models with additive eects.
Nonlinear model harder.
General set up:
Data: Yi = [Yi1, ..., YiT ]0, Xi = [Xi1, ..., XiT ]0, (i = 1, ..., n).
A linear model:

0 Yit = Xit + i + it, E[ it|Xi, i] = 0.

Alternative, equivalent formulation:


0 E [Yit|Xi, i] = Xit + i.

Species the conditional mean of Yi given Xi, i, and . Likelihood species conditional pdf f (y|x, , ) of Yi given Xi, i and parameter vector .
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Example: Normal linear model: For eT a T 1 vector of 10s, Yi|(Xi, i) N (Xi + ieT , 2IT ). This is distributional version of a linear model.
Binary choice model: Yit {0, 1}; e.g. labor force participation.

0 Yit, (t = 1, ..., T ) independent, Prob(Yit = 1|Xi, i) = G(Xit + i). 0 Count data: Yi1, ..., YiT indep, Yit|Xi, i Poisson with mean exp(Xit + i).

Linear model method is to transform data so i drops out. Dierencing gives


0 0 E [Yit Yit1|Xi] = Xit+E[i|Xi](Xi,t1+E[i|Xi]) = (XitXi,t1)0
,

In nonlinear model, i does not drop out when we dierence.


Binary choice example (What about linear probability model?):

0 0 E [Yit Yit1|Xi] = E[G(Xit + i) G(Xit1 + i)|Xi].


Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Fixed Eects and the Incidental Parameters Problem

Fixed eects is maximizing the log-likelihood over each i as well as .


Fixed eects generally inconsistent in nonlinear model as n grows with T xed.
In a linear model, least squares treating i as a parameter to be estimated is
consistent.
Maximum likelihood treating i as a parameter to be estimated is generally not.
This is known as the incidental parameters problem.
It is caused by only having T observations to estimate each i, so that as n grows
the estimate of i remains random.
In linear models this randomnes gets "averaged out." In nonlinear models it does
not.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Limit of the xed eects estimator as n grows with T xed. Estimator


n 1X = arg max ln f (Yi|Xi, , i). ,1,...,n n i=1

Concentrate out i: For a xed each xed eect is given by i() = max ln f (Yi|Xi, , i).

Substituting in and maximize over to get ,

n 1X = arg max ln f (Yi|Xi, , i()). n i=1

By the usual extremum estimator, as n grows for xed T the estimator has plim T = arg max E[ln f (Yi|Xi, , i())].

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

T = arg max E[ln f (Yi|Xi, , i())].

. Randomness in i() leads to inconsistnecy of i() = max ln f (Yi|Xi, , i).

If i() were replaced by i() = arg max E[ln f (Y |X, , )],

would get consistency. Like measurement error in nonlinear model. Example: Binary logit, Yit {0, 1}, G(u) = eu/(1 + eu). Known that the xed eects estimator F E satises
p F E 2 0

Bias in can be severe. Not so severe in Tobit model.


Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Example: Gaussian linear model, FE estimator of 2 converges to


2 = T T 1 2 . T

Bias in estimates of marginal eects less severe. In binary choice, marginal eect is
Z

[G(X 0 0 + ) G(X 0 0 + )]F(d).

Fixed eects estimator is


n X

i=1

G(X 0 + i) G(X 0 + i)]/n

Hahn and Newey (2004) show quite small biases for probit.
Return to this below.
Discuss now how can get consistent estimators.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Conditional Maximum Likelihood


Occasionally there is statistic Si such that i drops out of the conditional likelihood of Yi given Xi and Si. That is, f (Yi|Xi, Si, , i) = f (Yi|Xi, Si, ). Conditional MLE (CMLE). = arg max
n X

i=1

f (Yi|Xi, Si, )

Consistent and asymptotically normal, and asymptotically ecient when the distribution of i conditional on Xi is unrestricted. Problem is Si only exists in a few cases, including Gaussian linear model, logit binary choice, oisson model for count data, and proportional hazards model. In most other models there is no such Si, so conditional MLE has limited usefulness.
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Identication Issue:
may not be identied in the semiparametric model where the conditional pdf of Yi given Xi, i is specied as f (y|x, , ) and the conditional pdf of i given Xi is unspecied. Chamberlain (1992): T = 2; Pr(Yit = 1|Xi, i) = G(0dit + x0 0 + i), it 0(u) > 0 everywhere, other regularity conditions. If X is di1 = 0, di2 = 1, G i bounded then 0 is not identied if G(u) is not logistic. Also can show that 0 is not identied for T = 2, Pr(Yit = 1|Xi, i) = ( 0Xit+ i), Xit {0, 1}. See following graph.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Extent of nonidentication (e.g. for censored models) is not clear.


No consistent estimator in nonidentied cases.
Could directly estimate identied set.
Recent progress, Honore and Tamer (2006) and other work.
Dicult when Xit takes on many values.
Other approaches are a) restrict distribution of i given Xi; b) nd clever estimators
for identied models; c) large T xed eect bias corrections;

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Correlated Random Eects:


Restricts conditional distribution of i given Xi.
Here consider parametric models; there are nonparametric and semiparametric ver
sions.
Let g(|X, ) be conditional pdf of given X.
Likelihood of Y given X is integrates out , as in f (Y |X, , ) = The MLE is given by
n n 1X 1X , = arg max ln ln f (Yi|Xi, , ) = , n i=1 n i=1

f (Y |X, , )g(|X, )d.

f (Yi|Xi, , )g(|Xi, )d

Consistency of depends on the g(|X, ) being correctly specied.


May be dicult to calculate the integral.
Also, hard to form g(|X, ) in time consistent fashion.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Example: Correlated random eects probit.


Yit = 1(Yit > 0) where conditional on (Xi, i), Yi , ..., YiT are independent 1 has distribution N (X 0 + , 2). Let x = vec(X 0) be the vector and Yit i t i it 0 i of all observations across t on the regressors. Suppose also that the conditional distribution of i given Xi is N (x0 , 2 ). Note that conditional on Xi, i 0 Yit N (Xit 0 + x0 , 2 + 2 ). t i

Then for = ( 0, 0, 2, ..., 2 , 2 )0 and et the tth T 1 unit vector, 1 T


0 0 Xit + xi Pr(Yit = 1|Xi, ) = x0 )/ 2 )d =

q
i 2 + 2 t 0 , = et + . q = xi t t 2 + 2 t

This is a marginal likelihood for Yit. Joint likelihood is very complicated. Yi1, ..., YiT not independent conditional on Xi. This is generally true in models where integrate out i.
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare
(http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Estimation: Do q marginal likelihood (probit) to get 1, ..., T . Normalize 1 = 1 and let t = 1/ 2 + 2 , (t = 1, ..., T ), where we normalize 1 = 1. Repara t meterize so that = ( 0, 0, 2, ..., T )0 and for = ( 0 , ..., 0 )0 let 1 T
h(, ) =

1 1 e1 . .

.
T T eT

We can then do minimum distance, using = ( 0 , ..., 0 )0 mentioned above. 1 T = arg min h( , )0W h( , ).

h( , ) is linear in so easy to do. Ecient two-step estimator. For V an estimator of the joint asymptotic variance of , let = arg min h( , )0V 1h( , ). Then let D = diag(I, 2I, ..., T I) where I is an identity matrix with the same dimension as . Then DV D is estimator of the variance of n( 0), so optimal minimum distance is
0 DV D 1 h( , ). = arg min h( , )

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Empirical example from Chamberlain (1984).


Labor force participation, with n = 924 and T = 4, four years. 1968, 70, 72, 74.
Two Xit number of children under 6 and number of children. Here are the results:
Probit -.121 -.058 Logit .-573 -.336 (.046) (.029) (.115) (.120) Quite dierent estimates; ratios are similar. Correlated random eects depends on T in an essential way.
Many coecients. PT t=1 Xit/T. 0 A more parsimonius model is i
N (Xi , 2 ), Xi =

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Marginal Eects Marginal eect for change in X is, for F () the CDF of , t(X ) t(X), (X) =
Z

((X 0 0 + )/ t)F (d)

By iterated expectations, holding X xed, t(X) = E[1(X 0 0 + i + it > 0)] = E[E[1(X 0 0 + i + it > 0)|Xi]] = E[( t(X 0 0 + x0 0))] i This object can be estimated by t(X) =
n X

(t(X 0 + x0 ))/n i

i=1

Would be interesting to compare this estimator with xed eects marginal eect in the empirical example.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Some Semiparametric Results


Some distribution free results that are useful. Poisson model: Conditional on (Xi, i), Yit is independent over time and Poisson 0 with mean eXit+i . Good model for patents; see Hausman, Hall, Griliches (1984). Wooldridge showed that consistency of CMLE only requires E [Yit|Xi, i] = eXit+i Binary choice: Manski maximum score estimator; Conditions for consistency include
innite support.
Tobit: Honore
Manski and Honore require homoskedasticity over time.
Does not hold in linear model applications.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Large T Fixed Eects Bias Correction

Let T denote plim of xed eects estimator.


As T grows limT T = 0.
Under smoothness,
T = 0 + Example: Gaussian linear model
2 2 = T 1 2 = 2 = 2 + B , B = 2. T

B 1 + O( 2 ). T T

Also n and T grow, we should have


d 1/2 b (nT ) T N (0, ).
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

1 B d 1/2 b T = 0 + + O( 2 ), (nT ) N (0, ). T T T

As a way to think about how bad xed eects bias can be, consider n/T .

1/2 1/2 b b = (nT ) + (nT )1/2(T 0) (nT ) 0 T B 1/2 b = (nT ) + (nT )1/2 + O((nT )1/2/T 2) T T d N B1/2, .

Here there is asymptotic bias.


Consequently, usual asymptotic condence intervals incorrect.

b Asymptotic normality of , centered at its probability limit, like misspecication


result (e.g. White, 1982).

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Analytical Bias Correction


Find formula for B, construct estimator B . Bias corrected estimator is 1 = B/T. To show when this works, suppose
p (nT )1/2(B B)/T 0.

For example, if B itself has (nT )1/2 (B B) asymptotically normal then holds. Plugging in as before we get,
1/2 1/2 b b = (nT ) (nT ) 1 0 T +(nT )1/2(T 0 B/T ) 1/2 1/2(B B )/T b = (nT ) T + (nT ) +O((nT )1/2/T 2)

N (0, ) .
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Iterated Analytical Correction

Often the bias formula will depend on , so that B = B ().


Can iterate the bias correction:
j = B (j1)/T. Iterating to convergence would give = B ()/T. Does not improve asymptotic properties. Can improve small sample properties.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Jackknife Bias Correction

Use how changes with T to form implicit bias correction.


Does not require formula for B.
Let (t) denote xed eects estimator not using tth time period.
Jackknife estimator is

e b T (T 1)
T X

t=1

b (t)/T.

Explain with expansion, B D 1 T = 0 + + 2 + O . 3 T T T


e imit of for xed T and how it changes with T shows bias correction.

= 0 + O

p T T (T 1) T 1 = 0 +

1 . 2 T

1 1 1 D+O T T 1 T2

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Example: Variance estimation in Gaussian l model (Neyman and Scott, 1948):

zit is i.i.d. with distribution N (i, 0).


Here T = T 1 0 = 0 0 . T T

Thus B = 0. Analytical correction:


p

T 1 T 1 1 = + + /T 0 2 T T Is not consistent for xed T. Iterating analytical correction is = + /T, T = . T 1 Can also show that this is jackknife. Here is consistent for xed T .

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Monte Carlo Example: Like Heckman (1981). Design is:


yit = 1(xit0 + i + it > 0), i N (0, 1), it N (0, 1), xit = t/10 + xi,t1/2 + uit, xi0 = ui0, uit = U(1/2, 1/2). N = 100, T = 8; = 1, 1. Marginal eect is average derivative of (x0 + ), = 0E [(x00 + i)]. The xed eects estimator of this object is =
n X

i=1

0 + /n. x i

Consider analytical and jacknife bias corrections.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Table Three: Properties Estimator of 0 Mean Med. MLE 1.18 1.17 Jackknife .953 .950 Analytic 1.05 1.05 Analytic-M 1.05 1.05

of T = 8. , SD p; .05 p; .10 .151 .267 .370 .119 .056 .102 .134 .062 .135 .132 .060 .126

Table Five: Properties Estimator of 0 Mean Med. MLE 1.42 1.41 Jackknife .752 .743 Analytic 1.12 1.11 Analytic-M 1.21 1.20

of T = 4 , SD p; .05 p; .10 .397 .269 .373 .262 .100 .177 .306 .055 .101 .335 .102 .172

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Table Four: Properties of Estimator of /0 Mean Med. MLE 1.02 1.02 Jackknife 1.00 .992 Analytic 1.02 1.02 Analytic-M 1.02 1.02

, T = 8. SD p;.05 p;.10 .131 .078 .140 .130 .086 .159 .133 .090 .153 .131 .087 .154

Table Six: Properties of Estimator of /0 Mean Med. MLE 1.00 1.00 Jackknife 1.06 1.05 Analytic .996 .994 Analytic-M 1.05 1.05

, T = 4. SD p; .05 p; .10 .257 .103 .168 .307 .159 .224 .265 .113 .178 .266 .117 .185

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Bounds for Marginal Eects:

Assume Xit {0, 1}. Pr(Yit = 1|Xi, i) = (0Xit + i).


Object of interest

0 =

[(0 + ) ()]F0(d)

Average change in the probability of Yit = 1.


Let 0 and 1 denote T 1 vectors of 00s and 10s respectively.
Dene =
Z

[(0 + ) ()]F0(d|Xi { 0 , 1 }). /

Then is identied.
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

= is identied.

[(0 + ) ()]F0(d|Xi { 0 , 1 }). /

Proof: Consider X { 0 , 1 }. Then there is t(X) such that xt(X) = 1 and / s(X) such that xs(X) = 1. Then we have E[yi,t(X) yi,s(X)|Xi = X] = E[yi,t(X) yi,s(X)|Xi = X, i]|Xi = X] =
X Z

[(0 + ) ()]F0(d|Xi = X).

Let P (X) = Pr(Xi = X). Then =


X { 0 , 1 } /

P (X)E[yi,t(X) yi,s(X)|Xi = X].

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

X { 0 , 1 } /

P (X)E[yi,t(X) yi,s(X)|Xi = X].

x Cannot identify [(0 + ) ()]F0(d| ) for x { 0 , 1 }. is over identied for T > 2. Simple estimator:

/ Let n = #{i : Xi { 0 , 1 }}.
= 1 n

X { 0 , 1 } {i|Xi=X} /

yi,t(X) yi,s(X) .

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Bounds for 0.
Let D = 1 ( > 0) . Let P = P ( 0 ) + P ( 1 ) (1 P ) (1 D)P 0 (1 P ) + DP Tight bounds use the form (0 + ) ().
Bounds shrink to a point exponentially fast at T grows.
There are 2T possible X so P ( 0 )+P ( 1 ) will shrink like C2T for some constant
C. This fast shrinkage rate might be conjectured fom the bias corrections. In smooth models (all derivative existing) one can form a bias correction that approaches the truth at T J for any integer J.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

You might also like