Academia.eduAcademia.edu

Efficientt Conditional Quantile Estimation: The Time Series Case

2006

In this paper we consider the problem of efficient estimation in conditional quantile models with time series data. Our first result is to derive the semiparametric efficiency bound in time series models of conditional quantiles; this is a nontrivial extension of a large body of work on efficient estimation, which has traditionally focused on models with independent and identically distributed data. In particular, we generalize the bound derived by New and Powell (1990) to the case where the data is weakly dependent and heterogeneous. We then proceed by constructing an M-estimator which achieves the semiparametric efficiency bound. Our efficient M-estimator is obtained by minimizing an objective function which depends on a nonparametric estimator of the conditional distribution of the variable of interest rather than its density.

UC San Diego Recent Work Title Efficientt Conditional Quantile Estimation: The Time Series Case Permalink https://escholarship.org/uc/item/78842570 Authors Komunjer, Ivana Vuong, Quang Publication Date 2006-10-01 eScholarship.org Powered by the California Digital Library University of California 2006-10 UNIVERSITY OF CALIFORNIA, SAN DIEGO DEPARTMENT OF ECONOMICS Efficient Conditional Quantile Estimation: The Time Series Case By Ivana Komunjer Department of Economics University of California, San Diego and Quang Vuong Department of Economics The Pennsylvannia State University DISCUSSION PAPER 2006-10 October 2006 EFFICIENT CONDITIONAL QUANTILE ESTIMATION: THE TIME SERIES CASE IVANA KOMUNJER AND QUANG VUONG Abstract. In this paper we consider the problem of efficient estimation in conditional quantile models with time series data. Our first result is to derive the semiparametric efficiency bound in time series models of conditional quantiles; this is a nontrivial extension of a large body of work on efficient estimation, which has traditionally focused on models with independent and identically distributed data. In particular, we generalize the bound derived by Newey and Powell (1990) to the case where the data is weakly dependent and heterogeneous. We then proceed by constructing an M-estimator which achieves the semiparametric efficiency bound. Our efficient M-estimator is obtained by minimizing an objective function which depends on a nonparametric estimator of the conditional distribution of the variable of interest rather than its density. Keywords: semiparametric efficiency, time series models, dependence, parametric submodels, conditional quantiles. Affiliations and Contact information. Komunjer: Department of Economics, University of California San Diego ([email protected]). Vuong: Department of Economics, Penn State University ([email protected]). Acknowledgments: Earlier versions of this paper were presented at EEA/EESM 2003 meetings in Stockholm, NSF/NBER 2004 Time Series conference at SMU and CEME 2005 conference at MIT. Many thanks to Rob Engle, Clive Granger, Jin Hahn, Bruce Hansen, Cheng Hsiao, Guido Imbens, Guido Kuersteiner, Guy Laroque, Essie Maasoumie, Roger Moon, Whitney Newey, James Powell, Bernard Salanié, Ruey Tsay, Hal White and to all the participants at Ohio State, Penn State, UC Berkeley, UC San Diego, CREST/INSEE and USC econometric seminars. 1 2 1. Introduction The purpose of this paper is to study the problem of asymptotically efficient estimation in models for conditional quantiles. We provide answers to the following closely related questions: what is the semiparametric efficiency bound for the parameters of a given conditional quantile, when the data is weakly dependent and heterogeneous? Is efficient estimation possible in such models, and if so, what is an efficient conditional quantile estimator? The computation of semiparametric efficiency bounds in models with conditional moment restrictions–which include the one studied here–has been considered by numerous authors (Chamberlain, 1986, 1987; Robinson, 1987; Hansen, Heaton, and Ogaki, 1988; Newey, 1990a,b, 1993; Hahn, 1997; Bickel, Klaassen, Ritov, and Wellner, 1998; Brown and Newey, 1998; Ai and Chen, 2003; Cosslett, 2004; Newey, 2004). Our contribution to this large literature is twofold. First, we derive the semiparametric efficiency bound in models with a conditional quantile restriction allowing the data to be weakly dependent and/or heterogeneous. Second, we propose a new estimator for conditional quantiles which actually attains the semiparametric efficiency bound. Our results are important because they do not require independence nor identical distribution of the data. The first of those assumptions–independence–has been prevalent in the existing literature on efficient estimation, for reasons which pertain to the very definition of the semiparametric efficiency bound. Depending on how we characterize the bound–as an “infimum” or as a “supremum”–there are two approaches to its computation. Most of the above literature, with the exception of Chamberlain (1987), has used the “infimum” approach, which can be summarized as follows. Consider a model in which the parameter vector of interest θ is identified via a conditional moment restriction. Assume that the model is regular in the sense of Begun, Hall, Huang, and Wellner (1983) and Newey (1990b). A familiar approach to estimating θ is by using semiparametric estimators such as GMM (Hansen, 1982), M— (Huber, 1967) or instrumental variable estimators. Associated with the choice of a particular semiparametric estimator is its covariance matrix. Hence, to the set of all semiparametric estimators corresponds a set of positive semidefinite matrices. The crucial property of this set is its orthogonal structure (Bickel, 1982; Begun, Hall, Huang, and Wellner, 1983; Chamberlain, 1986; Newey, 1990b): any matrix Ω in this set can be written as a covariance matrix of a Gaussian random EFFICIENT QUANTILE ESTIMATION 3 variable–with a positive semidefinite matrix V –plus an independent noise. The matrix V which is the infimum of this set, is the semiparametric efficiency bound for θ. This characterization of the semiparametric efficiency bound is the starting point of the “infimum” approach to its computation. Essentially geometric, the “infimum” approach uses projection arguments to find V . As such, it requires certain orthogonality conditions, which in econometric terms correspond to the requirement that the random variables involved be independent (Bickel, 1982). Hence, most of the “infimum” approach literature has exclusively focused on models with independent observations.1 In models in which we relax the independence assumption, the projection arguments are difficult to implement, which makes dealing with time series data difficult. Consideration such as those have lead Ai and Chen (2003), for example, to conclude: “although our results [...] can be easily extended to weakly dependent time series data, the problem of semiparametric efficiency bound with time series data is nontrivial.” In this paper, we use the alternative–“supremum”–approach pioneered by Chamberlain (1987). In his seminal paper on semiparametric efficiency bounds in models with conditional moment restrictions, Chamberlain (1987) compares the asymptotic distribution of an efficient GMM estimator–efficient in the sense of Hansen (1982)–with that of a maximum likelihood estimator (MLE). The key property of the MLE is that it is efficient, when correctly specified. Hence any MLE in which the specified likelihood is consistent with the conditional moment restriction and which contains the data generating process, needs to have its asymptotic covariance matrix smaller than the semiparametric efficiency bound. In other words, the semiparametric efficiency bound can be defined as the supremum of asymptotic covariance matrices of all parametric submodels which satisfy the conditional moment restrictions and contain the data generating process–this is the key insight behind Stein’s (1956) characterization of semiparametric efficiency bounds and the starting point of the “supremum” approach. 1Hansen, Heaton, and Ogaki (1988) is an important exception. Their approach however is based on the assumption that some transformation–forward filter–of the moment function used in the conditional moment restriction is serially uncorrelated (see their equation (4.2) and the discussion thereof). Hence, unless the parameters involved in the forward filter transformation are known, the approach of Hansen, Heaton, and Ogaki (1988) is not applicable. For example, in models with conditional moment restrictions in which the variables follow an ARMA(p, q) process–with lags p and q known–one needs to know the q MA parameters in order to construct the forward filter. 4 KOMUNJER AND VUONG Chamberlain (1987) implements the “supremum” approach in the case where the random variables involved in the conditional moment restriction are independent and identically distributed (iid). In the iid case, the efficient (in the sense of Hansen, 1982) GMM estimator and the MLE obtained when the data is generated from a multinomial distribution are both asymptotically normally distributed with asymptotic covariance matrices respectively equal to Ω and I −1 , where I is the Fisher information matrix of the multinomial model. When the data has finite support, Chamberlain (1987) shows that Ω and I −1 are the same. Hence, they must be equal to the semiparametric efficiency bound V . Given that any distribution can be approximated arbitrarily well by a multinomial distribution, the general expression for the bound follows. The iid assumption plays an important role in Chamberlain’s (1987) construction of the semiparametric bounds; without it the multinomial approximation is no longer valid, making the extension of Chamberlain’s (1987) results to time series data difficult. The first contribution of this paper is to extend Chamberlain’s (1987) results to weakly dependent data, by using the “supremum” characterization of the semiparametric efficiency bound, initially due to Stein (1956). In particular, we focus on models with conditional quantile restrictions. In such models, there is no published work prior to ours on asymptotically efficient estimation which would allow for the data to be weakly dependent. Hence, our first contribution is to fill the gaps in the extant literature on efficient conditional quantile estimation (Newey and Powell, 1990; Koenker and Zhao, 1996; Zhao, 2001) and derive the semiparametric efficiency bound in weakly dependent time series models with conditional quantile restrictions. Our “supremum” approach is somewhat different from that used by Chamberlain (1987). We start by constructing a matrix V which is a potential candidate for the semiparametric efficiency bound. Such candidate matrix is obtained as a minimum within a family of asymptotic covariance matrices of conditional quantile M—estimators that are consistent for the parameters of a correctly specified conditional quantile model. Once the candidate matrix V in hand, we follow the insightful approach by Stein (1956), and look for a parametric submodel that is “as difficult” as the semiparametric model. In other words, we construct a fully parametric model that satisfies the conditional quantile restriction, contains the data generating process and in which the inverse of the Fisher’s information matrix equals V . EFFICIENT QUANTILE ESTIMATION 5 This second step is what distinguishes our work from the rest of the literature on asymptotically efficient estimation–specifically, we are able to analytically derive the least favorable parametric submodel. Our result on the semiparametric efficiency bound is general: we derive it under the sole assumption that the model satisfies the conditional quantile restriction. In particular, when constructing V , we do not make any additional assumptions regarding the properties of the residuals from the (nonlinear) quantile regression: they can be dependent and nonidentically distributed. Hence, for the first time in the literature on efficient estimation, we are able to derive the semiparametric efficiency bound in conditional quantile models with time series data that are dependent and conditionally heteroskedastic. The second contribution of this paper is to propose a new conditional quantile estimator that is efficient. We note that the problem of constructing an efficient estimator is even more difficult than that of computing the semiparametric efficiency bound. Though to some extent applicable to time series data, the projection methods used in the “infimum” approach shed no light on how to construct efficient estimators. As already pointed out by Hansen, Heaton, and Ogaki (1988), “although [they] delineate the sense of approximation required for the sequences of GMM estimators to get arbitrarily close to the efficiency bound, [they] do not show how to construct estimators that actually attain the efficiency bound.” It is an open question whether the procedures along the lines of Newey (1990a,b, 1993, 2004) can be extended to models with time series data. Our second contribution to the literature on efficient estimation is to show how–at least in models with conditional quantile restrictions– the “supremum” approach naturally leads to estimators that are efficient. Standard approaches to constructing an efficient estimator are as follows: given a consistent estimator of the parameter of interest θ, take a step away from it in a direction predicted by the efficient score; the resulting estimator is then efficient. An example of this construction method is Newey and Powell’s (1990) “one-step” estimator for the parameters of a quantile regression. Alternatively, instead of taking a step away from an initial consistent estimator of θ, we can use it to construct a set of weights–functions of the efficient score–and compute the corresponding weighted estimator; the weighted estimator is also efficient. An example of this method is Zhao’s (2001) weighted conditional quantile estimator. More recently, extending the conditional empirical likelihood (CEL) approach by Kitamura, Tripathi, and Ahn (2004), Otsu (2003) constructs an efficient estimator in the quantile regression model in the iid case. 6 KOMUNJER AND VUONG We propose an efficient conditional quantile MINPIN-type estimator (Andrews, 1994a) whose construction differs from the previous ones, in two ways. First, our efficient estimator does not require a preliminary consistent estimate of the parameter of interest, hence it is similar to the estimator proposed by Otsu (2003). While Otsu’s (2003) efficient estimator is based on the empirical likelihood principle, our efficient estimator is obtained by minimizing an efficient M—objective function. Second, our efficient estimator depends on a nonparametric estimate of the true conditional distribution, unlike Newey and Powell’s (1990) and Zhao’s (2001) efficient estimators which depend on nonparametric estimates of the true conditional density. For these two reasons, we can expect our efficient estimator to behave better in small samples than the efficient estimators proposed by Newey and Powell (1990) and Zhao (2001). In particular, whenever it is easier to estimate the conditional distributions than densities (Hansen, 2004a,b), we would expect our efficient estimator to perform better than the existing ones. The remainder of the paper is as follows: in Section 2 we define our notation and introduce models for conditional quantiles. Section 3 characterizes the class of M—estimators that are consistent for the parameters of such models, provided they are correctly specified. In the same section we show that such estimators are also asymptotically normally distributed with an asymptotic covariance matrix whose expression depends on the form of the M—objective function being minimized. In Section 4, we derive the minimum bound of the above family of matrices and show that it corresponds to the semiparametric efficiency bound. An efficient conditional quantile estimator is constructed in Section 5, which concludes the paper. We relegate all the proofs to the end of the paper. 2. Setup 2.1. Notation. Consider a stochastic sequence (a time series) X ≡ {Xt , t ∈ N} defined on a probability space (Ω, B, P ) where X : Ω → R(m+1)N and R(m+1)N is the product space m+1 generated by taking a copy of Rm+1 for each integer, i.e. R(m+1)N ≡ ×∞ , m ∈ N. t=1 R We partition the random vector Xt as Xt = (Yt , Wt0 )0 and are interested in the distribution of its first (scalar) component, denoted Yt , conditional on the random m-vector Wt . In particular, we allow Wt to contain lagged values of Yt –particularly interesting for time series applications–together with other (exogenous) components. The family of subfields {Wt , t ∈ N} with Wt ≡ σ(W1 , . . . , Wt ) corresponds to the information set generated by the sequence of conditioning vectors up to time t. EFFICIENT QUANTILE ESTIMATION 7 We use standard notations and let P (Yt ∈ A|Wt ) denote the conditional distribution of Yt , with A an element of the Borel σ-algebra on R. To simplify, we assume that for any T > 1, the joint distribution of (Y1 , W1 , . . . , YT , WT ) has a strictly positive continuous density pT on R(m+1)T so that conditional densities are everywhere defined.2 Then, for every t, 1 6 t 6 T, T > 1, we let Ft0 (·) denote the conditional distribution function of Yt conditional upon Wt , i.e. Ft0 (y) ≡ P (Yt 6 y|Wt ) for every y ∈ R, and we call ft0 (·) the corresponding conditional probability density. Of course, Ft0 (·) (like ft0 (·)) is unknown and we assume that it belongs to F which is the set of all absolutely continuous distribution functions with continuously differentiable densities on R. Throughout the paper we assume that for every t, 1 6 t 6 T, T > 1, ft0 (·) and its derivative are bounded so that there exist constants M0 , M1 > 0 such that supt>1 supy∈R ft0 (y) 6 M0 < ∞ and supt>1 supy∈R |dft0 (y)/dy| 6 M1 < ∞. If V is a real n-vector, V ≡ (V1 , . . . , Vn )0 , then |V | denotes the L2 -norm of V , i.e. |V |2 ≡ P V 0 V = ni=1 Vi2 . If M is a real n × n-matrix, M ≡ (Mij )16i,j6n , then |M| denotes the L∞ - norm of M, i.e. |M| ≡ max16i,j6n |Mij |, and M + denotes a generalized inverse of M. If A is a positive definite n × n-matrix, then A−1/2 = P where P is invertible such that P AP 0 = Id where Id denotes the n×n-identity matrix. Let f : E → R, V 7→ f (V ), with E ⊆ Rn and V = (V1 , ..., Vn )0 , be continuously differentiable to order R > 1 on E. Let r ≡ (r1 , ..., rn ) ∈ Nn : if |r| 6 R then Dr f(V ) ≡ ∂ |r| f (V )/∂V1r1 ...∂Vnrn where |r| ≡ r1 + ... + rn represents the order of derivation. If r = 0 then D0 f (V ) = f(V ). Further, let r! ≡ r1 !...rn ! and V r ≡ V1r1 ...Vnrn . Then, for any (V, V0 ) ∈ E 2 the (familiar) expression in a Taylor expansion of order R can P P P r 1 ∂ k f (V0 ) be written as |r|6R D fr!(V0 ) (V − V0 )r ≡ R k=0 j1 ,...,jk ∈(1,...,n)k k! ∂Vj1 ...∂Vjk (Vj1 − V0j1 )...(Vjk − P V0jk ), for 1 6 l 6 R. For example, when R = 1, we have |r|61 Dr f (V0 )(V − V0 )r = P f(V0 ) + ni=1 [∂f(V0 )/∂Vi ](Vi − V0i ) (Schwartz, 1997). When R > 2, we let ∇V f(V ) denote the gradient of f, ∇V f (V ) ≡ (∂f (V )/∂Vi , ..., ∂f (V )/∂Vn )0 , and use ∆V V f (V ) to denote its Hessian matrix, ∆V V f(V ) ≡ (∂ 2 f(V )/∂Vi ∂Vj )16i,j6n . Finally, the function 1I : R → [0, 1] denotes the Heaviside (or indicator) function: for any x ∈ R, we have 1I(x) = 0 if x 6 0, and 1I(x) = 1 if x > 0 (Bracewell, 2000). The Heaviside function is the indefinite integral Rx of the Dirac delta function δ : R → R, with 1I(x) = a dδ, where a is an arbitrary (possibly infinite) negative constant, a 6 0. 2This example. excludes the possibility that Wt contains indicator functions of lags of Yt or other variables, for 8 KOMUNJER AND VUONG 2.2. Models for conditional quantiles. In this paper we do not consider the conditional distribution Ft0 (·) in its entirety but rather focus on a particular conditional quantile of Yt . In recent years, conditional quantiles have been of particular interest in both applied and theoretical work in economics in which numerous choices for the conditioning variables have been proposed.3 In order to keep our analysis both simple and general, we introduce the following notation: for a given α ∈ (0, 1), let M denote a model for the conditional α- quantile of Yt , M ≡ {qα (Wt , θ)}, with an unknown parameter θ in Θ, where Θ is a compact subset of Rk with non-empty interior, Θ̊ 6= ∅. In what follows we restrict our attention to conditional quantile models M in which the set of following conditions is satisfied: (A0) (i) the model M is identified on Θ, i.e. for any (θ1 , θ2 ) ∈ Θ2 we have: qα (Wt , θ1 ) = qα (Wt , θ2 ), a.s. − P , for every t, 1 6 t 6 T, T > 1, if and only if θ1 = θ2 ; (ii) for every t, 1 6 t 6 T, T > 1, the function qα (Wt , ·) : Θ → R is twice continuously differentiable on Θ a.s. − P ; (iii) for every t, 1 6 t 6 T, T > 1, the matrix ∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 is of full rank a.s. − P for every θ ∈ Θ. The set of conditions in (A0) is fairly standard and generally verified for a wide variety of conditional quantile models. In what follows, we shall always assume that M is a conditional quantile model in which properties (A0)(i)-(iii) above hold. Further, for any given M we shall denote by Q the range of qα , i.e. Q ≡ {qt ∈ R : qt = qα (Wt , θ), θ ∈ Θ, Wt ∈ Rm }, Q ⊆ R. One crucial assumption that we make in our analysis, and which is of different nature than the conditions above, is that the model M is correctly specified, so that there exists some true parameter value θ0 such that Ft0 (qα (Wt , θ0 )) = α, for every t, 1 6 t 6 T, T > 1. In other words, we assume the following: (A1) given α ∈ (0, 1), there exists θ0 ∈ Θ̊ such that E[1I(qα (Wt , θ0 ) − Yt )|Wt ] = α, a.s. − P , for every t, 1 6 t 6 T, T > 1. 3Since the seminal work by Koenker and Bassett (1978), numerous authors have studied the problems of conditional quantile estimation (Koenker and Bassett, 1978; Powell, 1984, 1986; Newey and Powell, 1990; Pollard, 1991; Portnoy, 1991; Koenker and Zhao, 1996; Buchinsky and Hahn, 1998; Khan, 2001; Cai, 2002; Kim and White, 2003; Komunjer, 2005b) and specification testing (Koenker and Bassett, 1982; Zheng, 1998; Bierens and Ginther, 2001; Horowitz and Spokoiny, 2002; Koenker and Xiao, 2002; Kim and White, 2003; Angrist, Chernozhukov, and Fernandez-Val, 2006). An excellent review of applications of quantile regressions in economics (Buchinsky, 1994; Chernozhukov and Hong, 2002; Angrist, Chernozhukov, and Fernandez-Val, 2006) can be found in Koenker and Hallock (2001). EFFICIENT QUANTILE ESTIMATION 9 In other words, for any t, 1 6 t 6 T, T > 1, the difference between the indicator variable above and α is assumed to be orthogonal to any Wt -measurable random variable. 3. M—estimators for conditional quantiles In this paper we consider a particular family of conditional quantile estimators known as M—(or extremal) estimators (Huber, 1967). M—estimators for θ0 , denoted θT , are obtained P by minimizing criterion functions ΨT (θ) of the form ΨT (θ) ≡ T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ t ) where for every t, 1 6 t 6 T, T > 1, ϕ is a real function of the variable of interest Yt , the quantile qα (Wt , θ) and a (possibly inifinite-dimensional) random variable ξ t : Ω → Et , i.e. ϕ : R × Q × Et → R. The variable ξ t can be thought of as a shape parameter of the objective function ϕ. We assume the following: (A2) (i) for every t, 1 6 t 6 T, T > 1, ξ t is Wt -measurable; (ii) for every t, 1 6 t 6 T, T > 1, the function ϕ(·, ·, ·) is twice continuously differentiable a.s. − P on R × Q × Et with respect to its second argument ( qt ). By assumption (A2)(i), the random variable ξ t is allowed to depend only on variables contained in Wt . In other words, the functional form (or shape) of ϕ cannot depend on any variable that is observed after time t. We shall see in subsequent sections that the Wt measurability of ξ t is not trivially satisfied. In particular, if we consider objective functions ϕ that depend on some estimator based on the observations of Yt and Wt up to time T –kernel estimators of conditional distributions or densities are an example–then (A2)(i) fails to hold. The requirement (A2)(ii) that, for given realizations of Yt and ξ t , ϕ be twice continuously differentiable with respect to qt on Q a.s. − P , allows for objective functions such as |Yt − qt | or [α − 1I(qt − Yt )](Yt − qt ), for example. Note that in those two cases the shape ξ t of ϕ remains constant over time. An important subfamily of the class of M—estimators defined above, is that of quasimaximum likelihood estimators (QMLEs) (White, 1982; Gourieroux, Monfort, and Trognon, 1984). If in addition to (A2), we assume that there exists a real function c : R × Et → R, R (y, ξ t ) 7→ c(y, ξ t ) < ∞, independent of qt , and such that R exp[c(y, ξ t ) − ϕ(y, qt , ξ t )]dy = 1 for all (qt , ξ t ) ∈ Q × Et , then we can let lt (·, qt ) ≡ exp[c(·, ξ t ) − ϕ(·, qt , ξ t )], and lt (·, qt ) can be interpreted as the (quasi) likelihood of Yt conditional on Wt . Hence, any minimum θT of the function ΨT (θ) above, is also a maximum of the (quasi) log-likelihood function P LT (θ), LT (θ) ≡ T −1 Tt=1 ln lt (Yt , qα (Wt , θ)) (Komunjer, 2005b). However, due to the above 10 KOMUNJER AND VUONG “integrability” constraint on ϕ(·, qt , ξ t ), the class of QMLEs is smaller than that of M— estimators.4 We shall see in subsequent sections that this difference plays a greatly important role for efficient conditional quantile estimation. We now focus on M—estimators for θ0 that are consistent. 3.1. Class of consistent M—estimators. What are necessary conditions for the M—estimator θT satisfying (A2), to be consistent for the true conditional quantile parameter θ0 in (A1)? The key idea behind the answer to this question is fairly simple. Assume that the process p {Xt } and the functions ϕ(·, ·, ξ t ) are such that θT − θ0T → 0, where θ0T is a unique minimum P of E[ΨT (θ)] ≡ T −1 Tt=1 E[ϕ(Yt , qα (Wt , θ), ξ t )] on Θ̊.5 Then a necessary requirement for consistency of θT is that θ0T − θ0 → 0 as T becomes large. In what follows, we restrict our attention to estimators θT such that θ0T remains constant, i.e. ∀T > 1 we have θ0T = θ0∞ . Then, the class of M—estimators that are consistent for θ0 is obtained by considering all the functions ϕ(·, ·, ξ t ) under which θ0∞ = θ0 . Note that the requirement of having θ0T = θ0 for all T > 1 is stronger than that of having θ0T → θ0 .6 This implies that θ0 can be consistently estimated by minimizing objective functions that are different from the ones derived below, as long as the expected value of this difference converges uniformly to zero with T . An important example in which the condition θ0T = θ0 for all T > 1 fails is when the shape ξ t of the objective function ϕ depends on observations up to time T –hence is not Wt -measurable–as in the case of the estimator θ̂T proposed in Section 5. In that case, θ̂T is consistent provided the difference between its (M—) objective function Ψ̂T and an (M—) objective function Ψ∗T derived in Theorem 3, converges uniformly to zero with T . We now provide a more formal treatment of consistency. A set of sufficient assumptions p for θT − θ0∞ → 0 to hold is as follows (see, e.g., Theorem 2.1 in Newey and McFadden, 1994): (A3) {Xt } and ϕ(·, ·, ξ t ) are such that: (i) for every t, 1 6 t 6 T, T > 1, and every θ ∈ Θ, |Dr ϕ(Yt , qα (Wt , θ), ξ t )| 6 mr (Yt , Wt , ξ t ), a.s. − P , where E[mr (Yt , Wt , ξ t )] < ∞, for r = 0, 1, 2; for any T > 1, (ii) E[ΨT (θ)] is uniquely minimized at θ0∞ ∈ Θ̊, and (iii) p supθ∈Θ |ΨT (θ) − E[ΨT (θ)]| → 0. 4We call R R exp[c(y, ξ t ) − ϕ(y, qt , ξ t )]dy = 1 for all (qt , ξ t ) ∈ Q × Et the “integrability” constraint. This requirement is stronger than exp[−ϕ(·, qt , ξ t )] being integrable with respect to the Lebesgue measure on R. 5θ0 T 6See is also called the pseudo-true value of the parameter θ. White (1994, p.69-70) for a discussion of the requirement θ0T = θ0 . EFFICIENT QUANTILE ESTIMATION 11 Note that the above are not primitive conditions for consistency of θT . For example, the integrability of Dr ϕ(Yt , qα (Wt , θ), ξ t ) with respect to the probability P implied by (A3)(i) involves more primitive conditions on the existence of different moments of Yt , Wt and ξ t . Condition (A3)(ii) states that θ0∞ is a minimum of E[ΨT (θ)] and that this minimum is moreover unique. The first requirement involves more primitive conditions on ∂ϕ/∂qt , ∂ 2 ϕ/∂qt2 and ∇θ qα , which depend on the shape ξ t of ϕ and the functional form of qα . For example, a suffiP cient set of conditions for θ0∞ to be a minimum is that T −1 Tt=1 E[∇θ ϕ(Yt , qα (Wt , θ0∞ ), ξ t )] = P 0 and T −1 Tt=1 E[∆θθ ϕ(Yt , qα (Wt , θ0∞ ), ξ t )] À 0 (Schwartz, 1997). Finally, the uniform convergence condition (A3)(iii) can be obtained by applying an appropriate uniform law of large numbers to the sequence {ϕ(Yt , qα (Wt , θ), ξ t )}. Implicit in (A3)(iii) are primitive assumptions on the dependence structure and heterogeneity of the process {Xt }, and on the properties of ϕ(Yt , qα (Wt , ·), ξ t ). A simple example is one where {Xt } is iid and the functions ϕ(Yt , qα (Wt , ·), ξ t ) are Lipshitz-L1 a.s. − P on Θ (see, e.g., Definition A.2.3 in White, 1994). The above pseudo-true value θ0∞ of the parameter θ equals the true value θ0 if and only if, for any T > 1, θ0 minimizes E[ΨT (θ)]. A necessary and sufficient requirement for θ0∞ = θ0 is given in the following theorem. Theorem 1 (Necessary and sufficient condition for consistency). Assume that (A0), (A2) and (A3) hold. If the true parameter θ0 satisfies the conditional moment condition in p (A1), then the M-estimator θT is consistent for θ0 , i.e. θT − θ0 → 0, if and only if there exist a real function A(·, ·) : R × Et → R that is twice continuously differentiable and strictly increasing with respect to its first argument (qt or Yt ) a.s. − P on Q × Et , and a real function B(·, ·) : R × Et → R, such that ϕ(Yt , qt , ξ t ) = [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )] + B(Yt , ξ t ), a.s. − P on R × Q × Et , for every t, 1 6 t 6 T, T > 1.7 In other words, if for any given sample size T > 1 we are interested in consistently estimating the conditional quantile parameter of a continuously distributed random variable Yt by using an M—estimator θT , then we must employ an objective function ΨT (·) = P T −1 Tt=1 ϕ(Yt , qα (Wt , ·), ξ t ) with (1) ϕ(Yt , qα (Wt , θ), ξ t ) = [α − 1I(qα (Wt , θ) − Yt )][A(Yt , ξ t ) − A(qα (Wt , θ), ξ t )] + B(Yt , ξ t ), 7The real functions A and B in Theorem 1 need not have the same shape parameter: we can let ξ t ≡ (ξ 0At , ξ 0Bt )0 where ξ At and ξ Bt are the shapes of A(·, ξ At ) and B(·, ξ Bt ), respectively. For simplicity, we write A(·, ξ t ) and B(·, ξ t ) with the understanding that changing the shape of A may not affect the shape of B and vice-versa. 12 KOMUNJER AND VUONG a.s. − P , for every t, 1 6 t 6 T . Using objective functions of this form is also a sufficient condition for θT to be consistent for the true parameter θ0 of a correctly specified model for the conditional α-quantile. Given that we restrict our attention to objective functions in which (A2)(ii) holds, the function A(·, ξ t ) in Theorem 1 needs to be twice continuously differentiable a.s. − P on Q. The continuity and differentiability of A(·, ξ t ) need not hold on R\Q. The fact that there are no requirements on A(·, ξ t ) outside the range of qα (Wt , θ) is not surprising, given that changing the objective function outside Q does not affect the values of ∂ϕ/∂qt , and therefore has no effect on the optimum of ΨT . The fact that A(·, ξ t ) is necessarily strictly increasing a.s. − P on Q, comes from the requirement (A3)(ii) that θ0∞ be an interior minimum of E[ΨT (θ)] on Θ. As previously, there are no requirements on the monotonicity of A(·, ξ t ) on R\Q. Finally, note that there are no restrictions on the function B(·, ξ t ), as expected, since changing it does not affect the optimum of the objective function ΨT . In what follows we set B(·, ξ t ) identically equal to 0, which does not affect any of our results but has the benefit of simplifying the notation. Well-known examples of conditional quantile estimators that satisfy Theorem 1 are: (1) Koenker and Bassett’s (1978) unweighted quantile regression estimator for which A(y, ξ t ) = y, for all y ∈ R; (2) Powell’s (1984, 1986) left (right) censored quantile regression estimator obtained when, for all y ∈ R, A(y, ξ t ) = max{y, ct } (A(y, ξ t ) = min{y, ct }) with an observed censoring point ct ;8 (3) weighted quantile regression estimator, proposed by Newey and Powell (1990) and Zhao (2001), in which for all y ∈ R, A(y, ξ t ) = ω t y where ω t is some nonnegative weight, as well as its censored version for which A(y, ξ t ) = ωt max{y, ct }. In particular, the class of objective functions ΨT leading to consistent conditional quantile M—estimators is larger than that leading to consistent QMLEs. In order to simplify the comparison between M—estimators and QMLEs, assume that at any point in time t, 1 6 t 6 T, T > 1, the conditional α-quantile of Yt can take any real value, so Q = R. As 8Note that A(·, ξ t ) = max{·, ct } satisfies the strict monotonicity requirement a.s. − P on Q because, in the censored quantile regression case, qα (Wt , θ0 ) > ct , a.s. − P , as elegantly discussed by Powell (1984, p 4-6). The intuition behind this inequality is simple: suppose Yt = ct , a.s. − P for all t, 1 6 t 6 T, T > 1. Then any value θ0 for which qα (Wt , θ0 ) 6 ct , a.s. − P for all t, 1 6 t 6 T, T > 1, is a minimum of E[ΨT (θ)], which in that case equals 0. This violates the uniqueness assumption (A3)(ii), and hence affects the consistency of θT . The latter is restored by requiring that qα (Wt , θ0 ) > ct , a.s. − P for a large enough portion of the sample (see Assumption R.1 in Powell, 1984). An analogous result holds for the right censored case. EFFICIENT QUANTILE ESTIMATION 13 pointed out previously, the main difference between the two classes of estimators lies in the “integrability” condition on the pseudo-densities. Compare the objective function in Theorem 1 with the family of tick-exponential pseudo-densities which give consistent QMLEs for θ0 (Komunjer, 2005b): fα (Yt , qt , ξ t ) ≡ α(1 − α)a(Yt , ξ t ) exp{[1I(qt − Yt ) − α][A(Yt , ξ t ) − A(qt , ξ t )]} with A(·, ξ t ) twice continuously differentiable and strictly increasing a.s. − P on R, with derivative a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y.9 For fα (·, qt , ξ t ) to be a probability density on R, we need limy→±∞ A(y, ξ t ) = ±∞, for any t, 1 6 t 6 T, T > 1.10 This limit condition restricts the possible choice of functions A(·, ξ t ) in Theorem 1. For example, consider any distribution function Ft (·) in F having a density ft (·) that is continuously differentiable a.s. − P , and let A(y, ξ Ft ) ≡ Ft (y), (2) for any y ∈ R. Note that the parameter ξ Ft in the objective function A(·, ξ Ft ) in Equation (2) corresponds to the conditional distribution Ft (·) which is stochastic and Wt -measurable. Under the assumptions of Theorem 1, the M—estimator θFT , which minimizes ΨFT (θ) ≡ P T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ Ft ) with ϕ(Yt , qα (Wt , θ), ξ Ft ) ≡ [α − 1I(qα (Wt , θ) − Yt )][Ft (Yt ) − Ft (qα (Wt , θ))], (3) is consistent for θ0 ; however, the corresponding function A(·, ξ Ft ) in Equation (2), bounded between 0 and 1, does not satisfy the above limit condition. As a consequence, the class of consistent QMLEs is strictly smaller than that of consistent M—estimators. In subsequent sections we show that the limit restrictions on A(·, ξ Ft ) play a particularly important role for efficient conditional quantile estimation, by constructing an efficient M—estimator whose objective function is of the form (3). To resume, we have shown that an M—estimator θT that satisfies (A2) is consistent for θ0 , only if the objective functions ϕ(·, ·, ξ t ) are of the form given in Theorem 1. The conditions provided in Theorem 1 are not only necessary but also sufficient for consistency. From the 9It is straightforward to see that ϕ(Yt , qt , ξ t ) in Theorem 1 and fα (Yt , qt , ξ t ) in Komunjer (2005b) have the same optimum. R 10The limit conditions on A(·, ξ ) directly follow from the quantile restriction qt f (y, q , ξ )dy = α, t t t −∞ α R qt which is equivalent to (1 − α) exp[−(1 − α)A(qt , ξ t )] −∞ a(y, ξ t ) exp[(1 − α)A(y, ξ t )]dy = 1, so that, upon the change of variable u ≡ A(y, ξ t ), necessarily A(qt , ξ t ) → −∞ as qt → −∞. Combining the above R quantile restriction with the condition R fα (y, qt , ξ t )dy = 1 yields the result for the limit in +∞ by a similar reasoning. 14 KOMUNJER AND VUONG functional form of ϕ(·, ·, ξ t ) in Equation (1), it follows that the asymptotic properties of θT only depend on the choice of A(·, ξ t ) since changing B(·, ξ t ) does not affect the minimum of ΨT (θ). Before considering a particular class of functions A(·, ξ t ), which makes the asymptotics of θT optimal, we need the asymptotic distribution of the latter. We derive the asymptotic distribution of θT in the next section. 3.2. Asymptotic Distribution. We start by imposing the following assumptions, in addition to (A0)-(A2): (A4) for every t, 1 6 t 6 T, T > 1, the functions A(·, ξ t ) : R → R in Theorem 1 have bounded first and second derivatives, i.e. there exist constants K > 0 and L > 0 such that 0 < ∂A(qt , ξ t )/∂qt 6 K and |∂ 2 A(qt , ξ t )/∂qt2 | 6 L, a.s. − P on Q × Et ; (A5) θ0 is an interior point of Θ; (A6) the sequence {(Yt , Wt0 )0 } is α-mixing with α of size −r/(r − 2), with r > 2; (A7) for some > 0: (i) sup16t6T,T >1 E[supθ∈Θ |∇θ qα (Wt , θ)|2(r+ ) ] < ∞, sup16t6T,T >1 E[ supθ∈Θ |∆θθ qα (Wt , θ)|r+ ] < ∞; (ii) sup16t6T,T >1 E[supθ∈Θ |A(qα (Wt , θ), ξ t )|r+ ] < ∞, and sup16t6T,T >1 E[|A(Yt , ξ t )|r+ ] < ∞. The above assumptions provide a set of sufficient conditions for the asymptotic normality of θT that are primitive, unlike the ones for consistency in (A3). In addition to (A1) and (A2), we now require the functions A(·, ξ t ) to have bounded first and second derivatives (A4). The boundedness property is used to show that ϕ(Yt , qα (Wt , ·), ξ t ) are Lipshitz-L1 on Θ a.s. − P . This implies that any pointwise convergence in θ becomes uniform on Θ. Note that we can obtain a similar implication by an alternative argument, if the objective functions ϕ(Yt , qα (Wt , ·), ξ t ) are convex in the parameter θ. This elegant convexity approach has, for example, been used by Pollard (1991), Hjort and Pollard (1993) and Knight (1998) to derive asymptotic normality of the standard Koenker and Bassett’s (1978) quantile regression estimator. In the case of this estimator, the functions A(·, ξ t ) are linear and hence ϕ(Yt , qα (Wt , ·), ξ t )’s are convex in θ, no matter which conditional quantile model qα in (A0) we choose.11 Unfortunately, the convexity in θ of the objective functions ϕ(Yt , qα (Wt , ·), ξ t ) does not hold for general (nonlinear) A(·, ξ t )’s, such as the ones proposed in Equation (3). Therefore, we cannot rely on the convexity argument in our asymptotic normality proof. 11Recall that ϕ(Yt , qα (Wt , ·), ξ t ) is convex in a neighborhood of θ0 if and only if the real function s 7−→ [ϕ(Yt , qα (Wt , θ0 + νs), ξ t ) − ϕ(Yt , qα (Wt , θ0 ), ξ t )]/s is increasing in s ∈ R (ν ∈ Rk ). This condition holds for any model qα in (A0), only if the functions A(·, ξ t ) have zero convexity, i.e. are linear. EFFICIENT QUANTILE ESTIMATION 15 We are forced to abide by the classical approach which, though generally applicable, has the disadvantage of being more complicated and requires stronger regularity conditions, such as the ones in (A4). Our assumptions on the heterogeneity and dependence structure of the data are, on the other hand, fairly weak. We allow the sequence {(Yt , Wt0 )0 } to be nonstationary and our strong mixing (i.e. α-mixing) assumption in (A6) allows for a wide variety of dependence structures (White, 2001). Assumption (A6) is further accompanied by a series of moment conditions in (A7) which guarantee that the appropriate law of large numbers and central limit theorem can be applied. In the special case corresponding to Koenker and Bassett’s (1978) quantile regression estimator for linear models qα (Wt , θ) = θ0 Wt , the set of moment conditions (A7) reduces to: sup16t6T,T >1 E[|Wt |2(r+ ) ] < ∞ and sup16t6T,T >1 E[|Yt |r+ ] < ∞. The asymptotic distribution of θT is given in the following theorem. Theorem 2 (Asymptotic Distribution). Under (A0)-(A2) and (A4)-(A7), we have √ P d (Σ0T )−1/2 ∆0T T (θT −θ0 ) → N (0, Id), where ∆0T ≡ T −1 Tt=1 E[a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))× P ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ] and Σ0T ≡ T −1 Tt=1 α(1 − α)E[(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )× ∇θ qα (Wt , θ0 )0 ], where a(qt , ξ t ) ≡ ∂A(qt , ξ t )/∂qt a.s. − P on Q × Et . In particular, the M—estimator θFT proposed in Equation (3) satisfies the conditions of Theorem 2, provided the conditional probability densities ft (·) are differentiable a.s. − P on R with bounded first derivatives, so that |ft0 (y)| 6 L, a.s. − P on R. Moreover, the moment conditions in (A7) are less stringent for θFT than for Koenker and Bassett’s (1978) estimator: they reduce to E[|Wt |2(r+ ) ] < ∞, if the conditional quantile model is linear, for example. The fact that the moment conditions imposed on Yt disappear in the case of θFT is simply due to the fact that–any conditional distribution function Ft (·) being bounded between 0 and 1–we always have E[supθ∈Θ |Ft (qα (Wt , θ))|r+ ] 6 1 and E[|Ft (Yt )|r+ ] 6 1 so that (A7)(ii) is automatically satisfied. This difference is of particular importance in applications in which we have reason to believe that higher order moments of Yt –order higher than 2–do not exist. In such applications, it is unclear what the asymptotic properties of Koenker and Bassett’s (1978) estimator are. On the other hand, θFT still converges in distribution at the √ usual T rate. −1/2 0,F Using the results of Theorem 2, the asymptotic distribution of θFT is: (Σ0,F ∆T × T ) √ PT d 0,F F −1 0 T (θT −θ0 ) → N (0, Id), with ∆T ≡ T t=1 E[ft (qα (Wt , θ 0 ))ft (qα (Wt , θ 0 ))∇θ qα (Wt , θ 0 )× PT 0,F 0 −1 2 0 ∇θ qα (Wt , θ0 ) ] and ΣT ≡ T t=1 α(1 − α)E[(ft (qα (Wt , θ 0 ))) ∇θ qα (Wt , θ 0 )∇θ qα (Wt , θ 0 ) ]. 16 KOMUNJER AND VUONG Clearly, changing the distribution function Ft (·) in Equation (2)–hence in Equation (3)– affects the asymptotic covariance matrix of the corresponding M—estimator θFT , through the density term ft (·) appearing in the expressions of ∆0,F and Σ0,F T T . In particular, this result suggests that appropriate choices of Ft (·) in Equation (3) lead to efficiency improvements over Koenker and Bassett’s (1978) conditional quantile estimator. Specifically, when the values of ft (·) and of the true conditional density ft0 (·) coincide at the true quantile qα (Wt , θ0 ), 0,F −1 we have Σ0,F = α(1 − α) Id. In other words, this particular choice of ft (·) seems to T (∆T ) lead to a conditional quantile M—estimator with the minimum asymptotic covariance matrix. In the next section we make our heuristic argument more rigorous by exploring the questions of minimum variance and efficient estimation in more details. 4. Semiparametric Efficiency Bound Our first step in discussing the asymptotic efficiency of conditional quantile estimators is to rank all the consistent and asymptotically normal estimators constructed in the previous section by their asymptotic variances. Note that this ranking is useful, as we do not allow M—estimators to be superefficient, i.e. to have asymptotic variances which for some true parameter value are smaller than that of the maximum-likelihood estimator. Superefficiency is ruled out by our continuity assumptions on ft0 (·), qα (Wt , ·) in (A0)(ii) and a(·, ξ t ) in Theorem 1. Typically, the asymptotic distribution of superefficient estimators is discontinuous in the true parameters, and our continuity assumptions rule out this discontinuity. Theorem 3 (Minimum Asymptotic Variance). Assume that (A0)-(A2) and (A4)-(A7) hold. Then the set of matrices (∆0T )−1 Σ0T (∆0T )−1 has a minimum VT0 given by XT E[(ft0 (qα (Wt , θ0 )))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 }−1 . VT0 ≡ α(1 − α){T −1 t=1 P Moreover, an M-estimator θ∗T of the parameter θ0 obtained by minimizing Ψ∗T (θ) ≡ T −1 Tt=1 √ d ϕ(Yt , qα (Wt , θ), ξ ∗t ) attains VT0 , (VT0 )−1/2 T (θ∗T −θ0 ) → N (0, Id), if and only if ϕ(Yt , qt , ξ ∗t ) = [α − 1I(qt − Yt )][Ft0 (Yt ) − Ft0 (qt )], a.s. − P , on R × Q × Et , for every t, 1 6 t 6 T, T > 1. Theorem 3 shows two important results. Firstly, the matrix VT0 is the minimum of the asymptotic variances of all the consistent and asymptotically normal M—estimators of θ0 that satisfy (A2). In other words, for any ξ t and A(·, ξ t ) in Theorem 1, the difference between the corresponding asymptotic covariance matrix (∆0T )−1 Σ0T (∆0T )−1 and VT0 is always positive semidefinite. Secondly, there exists a unique M—estimator θ∗T whose asymptotic EFFICIENT QUANTILE ESTIMATION 17 covariance matrix equals VT0 . This estimator is obtained by minimizing the objective function P Ψ∗T (θ) = T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ ∗t ), in which (4) ϕ(Yt , qα (Wt , θ), ξ ∗t ) = [α − 1I(qα (Wt , θ) − Yt )][Ft0 (Yt ) − Ft0 (qα (Wt , θ))], a.s. − P , for every t, 1 6 t 6 T, T > 1. In particular, the shape ξ ∗t of the optimal objective function in Equation (4) is that of the true conditional distribution Ft0 (·), which is stochastic and Wt -measurable as required by Assumption (A0)(i). Even though our estimator θ∗T satisfies all the assumptions in (A2), its computation is not feasible in reality. In order to construct θ∗T we would need to know the true conditional distribution Ft0 (·) whose inverse– the conditional α-quantile–is the very object that we are trying to estimate. We come back to this important feasibility issue in Section 5. What Theorem 3 does not show is whether VT0 is also the semiparametric efficiency bound for θ0 , in addition to being the minimum of the set of asymptotic covariance matrices of consistent and asymptotically normal M—estimators. 4.1. Stein’s (1956) approach: an example. In order to show that VT0 in Theorem 3 is the semiparametric efficiency bound in the time series models satisfying the conditional quantile restriction (A1), we follow the ingenious approach by Stein (1956). Stein’s (1956) original concern was the possibility of estimating the true parameter adaptively: can we estimate the parameter θ0 in the conditional quantile restriction (A1) as precisely as if we knew the set of true conditional densities f 0 ≡ {ft0 (·), 1 6 t 6 T, T > 1}, up to some finite dimensional parameter? If the set of true conditional densities f 0 ≡ {ft0 (·), 1 6 t 6 T, T > 1} in the conditional quantile restriction (A1) were known up to a finite dimensional parameter, then we could easily construct an estimate of θ0 whose asymptotic covariance matrix attains the classical Cramer-Rao bound. As an illustration, consider the following conditionally heteroskedastic (CH) model with linear heteroskedasticity (5) Yt = β 00 Vt + (1 + |γ 00 Rt |)Ut , where Wt ≡ (Vt0 , Rt0 )0 , the process {(Yt , Wt0 )0 } is α-mixing, the error sequence {Ut } is independent of {Wt } and iid with some absolutely continuous distribution function H0 (·) (continuous density h0 (·)), such that E(Ut ) = 0 and E(Ut2 ) = 1, and where β 0 and γ 0 denote the true values of the parameters β ∈ B ⊆ Rb and γ ∈ Γ ⊆ Rc . Letting Vt ≡ (1, Yt−1 )0 and Rt ≡ Ut−1 18 KOMUNJER AND VUONG the above equation reduces to a well-known AR(1)-ARCH model, for example (Koenker and Zhao, 1996).12 4.1.1. Case 1: no nuisance parameter. Assume that the distribution function H0 (·) is known. In financial applications h0 (·) is typically chosen to be a standardized Gaussian or Studentt density. The conditional density of Yt in the CH model (5) then equals ft0 (y) = (1 + |γ 00 Rt |)−1 h0 ([1 + |γ 00 Rt |]−1 [y − β 00 Vt ]), and its conditional α-quantile is given by: β 00 Vt + H0−1 (α)(1+|γ 00 Rt |). Here, the parameter of interest is θ ≡ (β 0 , γ 0 )0 ∈ Θ ≡ B ×Γ, Θ ⊆ Rk with k ≡ b+c. Note that θ is the only unknown parameter of the conditional density ft0 (·). Hence, we are in the case where the true conditional density is known up to a finite dimensional pa- rameter. The true value θ0 ≡ (β 00 , γ 00 )0 of θ can be estimated by using a maximum likelihood approach. Under standard regularity conditions (Bickel, 1982; Newey, 2004), the maximum√ d likelihood estimator (MLE) θ̃T of θ0 is known to be efficient: (IT0 )1/2 T (θ̃T −θ0 ) → N (0, Id), P where IT0 is the Fisher information matrix, IT0 ≡ T −1 Tt=1 E[(∇θ ln ft0 (Yt ))(∇θ ln ft0 (Yt ))0 ], in which the gradient is evaluated at θ0 .13 4.1.2. Case 2: finite dimensional nuisance parameter. In many interesting situations, the true density h0 (·) of Ut in the CH model (5) is not entirely known and this uncertainty adversely affects the precision of the M—estimates of θ0 . A familiar case is the one where the error Ut belongs to some parametric family of distributions, indexed by a finite dimensional parameter τ . For example, instead of being standardized Gaussian we can assume H0 (·) to be a standardized Asymmetric Power Distribution (APD), with unknown exponent and asymmetry parameters (Komunjer, 2005a). In other words, the true distribution function of Ut is of the form H0 (·, τ 0 ) where τ 0 ∈ Υ ⊂ R+ ∗ × (0, 1) is the unknown parameter of the APD family. Here, the true set of conditional densities f 0 belongs to the parametric family P, P ≡ {f(η), η ∈ Π} with f(η) ≡ {ft (·, η) : R → R+ ∗ , 1 6 t 6 T, T > 1}, indexed by a finitedimensional parameter η ∈ Π, Π ⊆ Rp : η ≡ (β 0 , γ 0 , τ 0 )0 ∈ Π ≡ B × Γ × Υ and p ≡ b + c + 2. The members f (η) of P are such that ft (y, η) = (1 + |γ 0 Rt |)−1 h0 ([1 + |γ 0 Rt |]−1 [y − β 0 Vt ], τ ), for all t, 1 6 t 6 T, T > 1, and the conditional quantile parameter θ is now given by 12In that case we moreover assume that the parameter spaces B and Γ are such that the standard stationarity and invertibility conditions hold. 13Following Bickel (1982) and Newey (2004), the regularity conditions imposed are: [f 0 (·)]1/2 is meant square differentiable with respect to θ0 , the Fisher information matrix IT0 is nonsingular and continuous in θ0 on Θ. EFFICIENT QUANTILE ESTIMATION 19 θ ≡ (β 0 , γ 0 , q)0 ∈ Θ ≡ B ×Γ×Q, Θ ⊆ Rk with k ≡ b+c+1.14 In this interesting situation, the parameter of interest θ has a lower dimensionality than η: dim θ = k and dim η = p = k + 1. We write θ = θ(η), with θ : Π → Θ being some continuously differentiable function, and interpret the rest of η as a nuisance parameter (Stein, 1956; Bickel, 1982). Similar to the previous case, we assume that the above parametric model f (η) is regular (Bickel, 1982; Newey, 2004), that all the conditional densities ft (·, η) satisfy the conditional quantile restriction (A1) and are continuously differentiable on R for each η ∈ Π, and that ft (Yt , ·) is continuously differentiable on Π a.s. − P . Let η0 index the true set of conditional densities of Yt , i.e. f (η0 ) = f 0 , so that the true value of interest θ0 is now written as θ0 = θ(η0 ) where η0 ≡ (β 00 , γ 00 , τ 00 )0 . Also, let IT (η) denote the Fisher information matrix P of the parametric model P, IT (η) ≡ T −1 Tt=1 E[(∇η ln ft (Yt , η))(∇η ln ft (Yt , η))0 ]. Then, √ d an estimator θ̃T of θ0 is efficient if and only if (CT0 )−1/2 T (θ̃T − θ0 ) → N (0, Id), with CT0 ≡ ∇η θ(η 0 )(IT (η 0 ))+ ∇η θ(η0 )0 . In the special case where the sequence {(Yt , Wt0 )0 } is iid, several authors have derived necessary and sufficient conditions for the MLE to be efficient (see, e.g., Conditions S and S ∗ in Stein, 1956; Bickel, 1982; Manski, 1984); those are typically expressed as orthogonality conditions on the gradient of the log-likelihood ∇η ln ft (Yt , η 0 ). 4.1.3. Case 3: infinite dimensional nuisance parameter. Now consider the more realistic situation in which the true density of Ut in Equation (5) is entirely unknown. Instead, f 0 are only known to belong to a class S which contains all parametric families such as P. Unlike in P, the sets of densities in S are indexed by an additional infinite dimensional parameter. In the case of our CH model (5) this infinite dimensional parameter is the unknown probability density h0 (·) of the error term Ut . The density h0 (·) could be for example Gaussian, Student-t, Gamma or any other probability density in a set H–set of all families h of probability densities, which are parametrized by τ and satisfy some appropriate conditions, such as being standardized. The set S is the union of all parametric sub-families Ph ≡ {fh (η), η ∈ Π} obtained when h varies across H. For any given h ∈ H, the parametric submodel fh (η) is defined as fh (η) ≡ {fht (·, η) : R → R+ ∗ , 1 6 t 6 T, T > 1} and is assumed to satisfy standard regularity condiP tions (Bickel, 1982; Newey, 2004). We let IhT (η) ≡ T −1 Tt=1 E[(∇η ln fht (Yt , η))(∇η ln fht (Yt , η))0 ] be the Fisher information matrix of the parametric submodel Ph . In particular, the 14The set Q corresponds to the range of α-quantiles of Ut when the parameter τ of its distribution function H0 (·, τ ) varies in Υ. 20 KOMUNJER AND VUONG 0 matrix IhT (η 0 ), in which fh (η0 ) = f 0 , is such that ChT ≡ ∇η θ(η0 )0 (IhT (η0 ))+ ∇η θ(η0 ) is nonsingular. In addition, we assume that for any η ∈ Π and h ∈ H, the conditional densities fht (·, η) satisfy the conditional quantile restriction (A1) and are continuously differentiable on R, and that for any h ∈ H, fht (Yt , ·) are continuously differentiable a.s. − P on Π. Then, the semiparametric efficiency bound for the conditional quantile parameter θ0 is defined as the 0 supremum of ChT over those h. If such a bound is attained by a particular family h∗ , then P ∗ ≡ Ph∗ is called the least favorable parametric submodel. 4.2. Least favorable parametric submodel. Following Stein’s (1956) ingenious definition, VT0 in Theorem 3 is the semiparametric efficiency bound, if and only if, there exists a ∗ parametric submodel Ph∗ in which the MLE θ̃T of the true parameter θ0 has the same asymptotic covariance matrix VT0 . The following theorem exhibits the least favorable parametric submodel which satisfies the conditional quantile restriction (A1). Theorem 4 (Least Favorable Parametric Submodel). Given M and the set of true conditional densities f 0 ≡ {ft0 , 1 6 t 6 T, T > 1}, consider the parametric submodel P ∗ ≡ {f ∗ (θ), θ ∈ Θ} parametrized by the conditional quantile parameter θ in which f ∗ (θ) ≡ {ft∗ (·, θ) : R → R+ ∗ , 1 6 t 6 T, T > 1} with ft∗ (y, θ) ≡ (6) ft0 (y) α(1 − α)λ(θ) exp{λ(θ)[Ft0 (y) − Ft0 (qα (Wt , θ))][1I(qα (Wt , θ) − y) − α]} , 1 − exp{λ(θ)[1 − Ft0 (qα (Wt , θ)) − 1I(qα (Wt , θ) − y)][1I(qα (Wt , θ) − y) − α]} for all y ∈ R, where λ(θ) ≡ Λ(θ − θ0 ) and Λ : Rk → R is at least twice continuously differ- entiable on Rk with Λ(·) > 0 on Rk \{0}, Λ(0) = 0, ∇θ Λ(0) = 0, ∆θθ Λ(0) nonsingular and |∆θθ Λ(·)| < ∞ in a neighborhood of 0.15 Then, under (A0)(ii) and (A1), P ∗ is a parametric submodel in S, i.e.: (i) for any t, 1 6 t 6 T, T > 1, ft∗ (·, θ) is a probability density for all θ ∈ Θ; (ii) for any t, 1 6 t 6 T, T > 1, ft∗ (·, θ) satisfies the conditional quantile restriction Eθ [1I(qα (Wt , θ) − Yt ) − α|Wt ] = 0, a.s. − P , for all θ ∈ Θ, where Eθ (·|Wt ) denotes the conditional expectation under the density ft∗ (·, θ) for Yt given Wt ; (iii) f 0 ∈ P ∗ . Moreover, under (A0)-(A1) and (A5)-(A7)(i), P ∗ is the least favorable submodel in S, i.e. 15A simple function Λ(·) in Equation (6) which satisfies the conditions of Theorem 4 is Λ(x) = x0 x. EFFICIENT QUANTILE ESTIMATION 21 √ ∗ ∗ d the asymptotic distribution of the MLE θ̃T associated with P ∗ is (VT0 )−1/2 T (θ̃T − θ0 ) → N (0, Id) where VT0 is as defined in Theorem 3. Because P ∗ is a parametric submodel of the set S of all densities satisfying the conditional quantile restriction in (A1), the semiparametric efficiency bound for θ0 is by Stein’s (1956) ∗ definition at least as large as the asymptotic variance of the above MLE θ̃T ; Theorem 4 shows that the latter equals VT0 . On the other hand, in Theorem 3 we have shown that VT0 is also the minimum of the asymptotic variances of the consistent and asymptotically normal M—estimators of θ0 . It follows, first, that the semiparametric efficiency bound is VT0 , and, second, that the parametric model P ∗ is the least favorable parametric submodel in S. The first result–that VT0 is the semiparametric efficiency bound–has the following interpretation: when the only thing we know about the model is that it satisfies the conditional quantile restriction (A1), then we cannot estimate the true conditional quantile parameter θ0 with precision higher than that given by VT0 . Note that our result uses the moment restriction (A1) only; we do not make any additional assumptions regarding the properties of the “error” term Yt − qα (Wt , θ) (other than those contained in (A1) and (A6)). In particular, we allow for Yt − qα (Wt , θ) to be dependent and nonidentically distributed. Perhaps the most important aspect of Theorem 4 is that it relaxes the independence assumption. So far as time series data are concerned, two leading situations in which the independence is violated come into mind. First is the CH model (5): Wt contains serially dependent exogenous variables or/and lags of Yt , residuals are uncorrelated and conditionally heteroskedastic.16 There are some results on this case in Newey and Powell (1990), under the additional assumption that {(Yt , Wt0 )0 } is iid. The authors derive the semiparametric efficiency bound for the parameters in the linear quantile regression qα (Wt , θ) = θ0 Wt by allowing for conditional heteroskedasticity (given Wt ) in the “error” term Yt − θ0 Wt . The first part of Theorem 4 generalizes Newey and Powell’s (1990) re- sults to the case where the sequence {(Yt , Wt0 )0 } is weakly dependent and heterogeneous, as in (A6). Unsurprisingly, when the data is iid and qα linear, the bound VT0 reduces to V 0 ≡ α(1 − α){E[(ft0 (qα (Wt , θ0 )))2 Wt Wt0 ]}−1 derived by Newey and Powell (1990).17 In the second time series situation of interest, the residuals themselves are correlated in addition to 16In the CH model (5) we have: Yt − qα (Wt , θ0 ) = (1 + |γ 00 Rt |)[Ut − µ0 − σ 0 H0−1 (α)]. 17This result is a special case of the result derived by Chamberlain (1987) for models with conditional moment restrictions. 22 KOMUNJER AND VUONG f(y,θ) ft0(y) θ=3/4 θ=7/8 1.2 1 0.8 0.6 0.4 0.2 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 y Figure 1. Case α = .5, qα (Wt , θ) = θ and ft0 (y) = exp(−2|y|). being heteroskedastic. Note that this situation is not covered in the CH model (5); however, our assumption (A1) does not exclude the possiblity that Yt − qα (Wt , θ) be correlated. So far there exist no results on semiparametric efficiency bound which cover this dependent case. To the best of our knowledge, Theorem 4 provides the first result on attainable asymptotic efficiency for nonlinear (and possibly censored) conditional quantile models when the data is dependent. The second result of Theorem 4–an analytic expression of the least favorable parametric submodel–is entirely new and not yet seen in the literature on efficient estimation under conditional moment restrictions. The density ft∗ (·, θ) in Equation (6) is not of the ‘tickexponential’ form derived by Komunjer (2005b): it depends on the true density ft0 (·) as well as the true value θ0 and contains terms such as λ(θ). In the least favorable parametric submodel P ∗ , θ parametrizes both the conditional quantile model M and the shape of ft∗ (·, θ)–in other words, the shape of ft∗ (·, θ) is now determined by ft0 (·) and θ (see Figure 1 for a purely location model of a conditional median). In particular, the density ft∗ (·, θ) is discontinuous for all values of θ different from θ0 ; when θ = θ0 the density ft∗ (·, θ0 ) equals the true density ft0 (y) which is continuous. With the semiparametric efficiency bound VT0 in hand, we now turn to the problem of constructing a conditional quantile estimator which actually attains the bound. EFFICIENT QUANTILE ESTIMATION 23 5. Efficient Conditional Quantile Estimator As already pointed out in Section 4, the shape ξ ∗t of the optimal objective function ϕ(·, ·, ξ ∗t ) in Equation (4) is that of the true conditional distribution Ft0 (·), which is unknown. Hence, the M—estimator θ∗T is in reality infeasible. We construct our (feasible) efficient conditional quantile estimator θ̂T by replacing Ft0 (·) in Equation (4) by a nonparametric estimator F̂t (·). It remains to be shown that the estimator θ̂T retains the same asymptotic variance VT0 . Note that θ̂T is constructed without using any knowledge about the true Ft0 (·). It will then follow that the semiparametric efficiency bound VT0 can be attained, and that the feasible estimator θ̂T is semiparametrically efficient. We let gt0 (·) and ḡT0 (·) be the true density of Wt and the average true density ḡT0 (·) ≡ P T −1 Tt=1 gt0 (·) of {W1 , . . . , WT } respectively, and make the following assumptions:18 (A8) for every T > 1, ḡT0 (·) is continuously differentiable of order R > 1 on Rm with supT >1 supw∈Rm |Dr ḡT0 (w)| < ∞ for every 0 6 |r| 6 R. (A9) (i) for every t, 1 6 t 6 T, T > 1, Ft0 (·) = F 0 (·|Wt ) and ft0 (·) = f 0 (·|Wt ); (ii) the function F 0 (·|·) : Rm+1 → [0, 1] is continuously differentiable of order R + 2 with sup(y,w)∈Rm+1 |Dr F 0 (y|w)| < ∞ for every 0 6 |r| 6 R + 2. R (A10) for some γ > 0 and any vanishing sequence {cT }: (i) {w:ḡ0 (w)<cT } ḡT0 (w)dw = o(1), T R R (ii) {w:ḡ0 (w)<cT } |∇θ qα (w, θ0 )| ḡT0 (w)dw = O(cγT ), and (iii) {w:ḡ0 (w)<cT } f 0 [qα (w, θ0 )|w] × T |∇θ qα (w, θ0 )|ḡT0 (w)dw = O(c2γ T ). T Assumptions (A8) and (A9)(ii) are standard smoothness assumptions on the true densities gt0 (·) and ft0 (·); they adapt assumptions NP2 and NP3 used in Andrews (1995) to the case where the regression function is the conditional distribution (and density) of Yt . On the other hand, assumption (A9)(i) is an additional assumption we need to impose on the true distribution of Yt conditional upon Wt in order to construct an estimator that attains the semiparametric efficiency bound. The content of this assumption is twofold. First, it states that no information other than that contained in Wt is useful in constructing the conditional distribution (and density) of Yt . Note that this is a strengthening of our assumption (A1) which says that Wt contains all the relevant information for the conditional α-quantile of Yt . Second, assumption (A9)(i) implies that the distribution of Yt conditional on Wt should be the same as that of Ys conditional on Ws , for any s 6= t. 18Recall from Section 2.1 that all the components of Wt are continuous. 24 KOMUNJER AND VUONG Assumption (A10)(i) is weak as it is satisfied if the sequence of probability measures {P̄T0 (·)} associated with the average densities {ḡT0 (·)} is tight, which is itself implied by the tightness of {Wt } or equivalently Wt = Op (1).19 The latter is obviously satisfied if the Wt ’s are identically distributed, but it also holds for dependent and heterogenous Wt ’s if {Wt } is uniformly integrable and a fortiori if sup16t6T,T t>1 E[|Wt |1+ ] < ∞ for some > 0. Assump- tions (A10)(ii) and (A10)(iii) are stronger and used to ensure that the bias of Ψ̂T (θ) vanishes √ at a T -rate. It is similar to conditions that eliminate the asymptotic bias when a stochastic trimming is employed as in Hardle and Stoker (1989) and Lavergne and Vuong (1996). It requires that the tails of ḡT0 (·) vanish sufficiently fast given the tail behaviors of |∇θ qα (·, θ0 )| and f 0 [qα (·, θ0 )|·]. For instance, if supw∈Rm |∇θ qα (·, θ0 )| < ∞ and sup(y,w)∈Rm+1 f 0 (y|w) < ∞, R a sufficient (but not necessary) condition for (A10) is that {w:ḡ0 (w)<cT } ḡT0 (w)dw = O(c2γ T ), T which is a condition on the vanishing rate of the tails of the average density ḡT0 (·). The true conditional distribution F 0 (·|·) can be estimated by the kernel estimator F̂ (·|·) defined as F̂ (·|w) = 0 if ĝ(w) = 0, and F̂ (y|w) ≡ Ĝ(y, w)/ĝ(w) if ĝ(w) 6= 0 with T w − Ws 1 X y − Ys )K( ), L( Ĝ(y, w) ≡ m T hwT s=1 hyT hwT (7) (8) where L(y) ≡ R T 1 X w − Ws ), K( ĝ(w) ≡ m T hwT s=1 hwT 1I(y − u)K0 (u)du, K(·) is a multivariate kernel, K0 (·) is a univariate kernel and hwT and hyT are two nonstochastic positive bandwidths. The corresponding kernel estimator of the true conditional density f 0 (·|·) is given by ∂ F̂ (·|·)/∂y, while ĝ(·) can be viewed as a kernel estimator of the average true density ḡT0 (·). In order to eliminate aberrant behavior of kernel estimators for the conditional distribution (density) of Yt in regions where the densities of {Wt } are small, we define F̂t (·) ≡ dt F̂ (·|Wt ), where dt ≡ 1I(ĝ(Wt ) − bT ) effectively deletes (trims out) observations for which ĝ(Wt ) < bT with {bT } a sequence of positive constants. That is, F̂t (·) is a trimmed nonparametric estimator of the true conditional distribution Ft0 (·) which we now use to construct our (feasible) estimator θ̂T . Namely, θ̂T is obtained by minimizing the objective function Ψ̂T (θ) ≡ 19By definition (Bilingsley, 1995) the tightness of {P̄T0 (·)} means that for every ∈ (0, 1) there R0 0 exists M < ∞ such that inf 16 t6 T,T > 1 P̄T ([−M , M ]) > 1 − . Now, {w:ḡ0 (w)<cT } ḡT0 (w)dw = T R R ḡ 0 (w)dw + {w6∈[−M ,M ]m :ḡ0 (w)<cT } ḡT0 (w)dw 6 cT (2M )m + P̄T0 (Rm \[−M , M ]m ) {w∈[−M ,M ]m :ḡ 0 (w)<cT } T T T < cT (2M )m + showing that (A10)(i) holds as cT = o(1) and is arbitrary. EFFICIENT QUANTILE ESTIMATION T −1 (9) PT t=1 25 ϕ(Yt , qα (Wt , θ), ξ̂ t ), in which ϕ(Yt , qα (Wt , θ), ξ̂ t ) ≡ [α − 1I(qα (Wt , θ) − Yt )][F̂t (Yt ) − F̂t (qα (Wt , θ))], for every t, 1 6 t 6 T, T > 1. In other words, our (feasible) estimator θ̂T minimizes a modified version Ψ̂T (·) of the efficient M—objective function Ψ∗T (·) in which we have replaced the true conditional distribution of Yt given Wt with a nonparametric estimator. As a consequence, θ̂T is a MINPIN-type estimator (Andrews, 1994a).20 The shape parameter ξ̂ t of the objective function in Equation (9) is now equal to F̂t (·). In order to establish the asymptotic properties of our feasible estimator θ̂T we impose the following conditions on the kernels: (A11) (i) for any r = (r1 , . . . , rm ) ∈ Nm , the kernel K(·) satisfies supw∈Rm |K(w)| < ∞, R R R K(w)dw = 1, wr K(w)dw = 0 if 1 6 |r| 6 R − 1, and wr K(w)dw < ∞ if |r| = R R; (ii) K(·) has a Fourier transform φ(·) that is absolutely integrable, i.e. |φ(w)|dw < R R ∞; (iii) supy∈R |K0 (y)| < ∞, K0 (y)dy = 1, y r K0 (y)dy = 0 if 1 6 r 6 R − 1 and R R y K0 (y)dy < ∞, (iv) the kernel K0 (·) is continuously differentiable on R with derivative satisfying supy∈R |K00 (y)| < ∞. Assumptions (A11)(i)-(iv) are standard and satisfied, for example, by the multivariate P normal-based kernels considered by Bierens (1987): K(x) = (2π)−m/2 Jj=1 aj |bj |−m exp[− ww0 /(2b2j )], where J > R/2 is a positive integer and {(aj , bj ) : j 6 J} are constants that P P satisfy Jj=1 aj = 1 and Jj=1 aj b2l j = 0, for l = 1, ..., J − 1. We now turn to the asymptotic properties of our feasible estimator θ̂T . Note that the shape ξ̂ t of the objective function in Equation (9) depends on all the data up to time T , hence is not Wt -measurable as required by assumption (A2)(i). In consequence, the results of Theorems 1 and 2 do not apply to θ̂T and its asymptotic properties need to be derived separately. We first establish the consistency of θ̂T . Theorem 5 (Consistency of θ̂T ). Suppose that (A0)-(A1), (A5)-(A7)(i), (A8)-(A10)(i), √ R R (A11) hold. If bT = o(1) with bT T hm wT → ∞, bT /hwT → ∞ and bT /hyT → ∞ as T → ∞, p then θ̂T −→ θ0 . The assumptions on the trimming parameter bT and bandwidths hyT and hwT imply √ that bT does not vanish too rapidly and that hyT → 0, hwT → 0 and T hm wT → ∞ as 20Though θ̂T is a member of the MINPIN family, our objective function associated with Equation (9) does not satisfy the assumptions used by Andrews (1994a). 26 KOMUNJER AND VUONG T goes to infinity. Though stronger than necessary, the latter condition is typically used when deriving uniform convergence rates using the Fourier transform φ(·) of K(·) (Bierens, 1983; Andrews, 1995). In particular, when R 6 m/2, this condition excludes the optimal −1/(2R+m) bandwidth hopt obtained by Stone (1980, 1982) and Truong and Stone (1992). wT ∼ T In order to derive the asymptotic normality of our efficient estimator θ̂T , we strengthen our dependence assumption (A6): (A6’) the sequence {(Yt , Wt0 )0 } is (i) strictly stationary and (ii) β-mixing with β of size −r/(r − 2), with 2 < r < 3; The proof of our result uses Lemma 3 in Arcones (1995) which requires strict stationarity and β-mixing with r > 2. Note that β-mixing (or absolute regularity) in (A6’)(ii) is a condition intermediate between α-mixing (strong mixing)–which is the weakest form of strong mixing–and φ-mixing (uniform mixing)–which is the strongest form of mixing (Bradley, 1986). As such, our weak dependence assumption is stronger than that of α-mixing used by Robinson (1983), for example. Assumption (A6’)(ii) also requires the size of the β-mixing process to be comprised between −∞ and −3. In other words, we limit the amount of de- pendence allowed in {(Yt , Wt0 )0 }.21 In particular, Truong and Stone (1992) use the condition β t = O(ρt ) as t → ∞ for some ρ with 0 < ρ < 1 in order to estimate the conditional quantile nonparametrically at the optimal rate. Their condition implies β-mixing of arbitrary size and hence of size −r/(r − 2), with some r, 2 < r < 3. We can now establish the efficiency of θ̂T . Theorem 6 (Efficiency of θ̂T ). Suppose that Assumptions (A0)-(A1), (A5), (A6’), (A7)(i) 1/4 R and (A8)-(A11) hold. If bT = o(T −1/(4γ) ) with bT T 1/4 hyT hm hwT ) → ∞, wT → ∞, bT /(T √ d 1/4 R 0 −1/2 T (θ̂T − θ0 ) → N (0, Id), and bT /(T hyT ) → ∞, as T → ∞, then θ̂T is efficient: (V̄T ) where V̄T0 ≡ α(1 − α){T −1 XT t=1 E[(f 0 (qα (Wt , θ0 )|Wt ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 }−1 is the semiparametric efficiency bound. 21Note that the −∞ case, obtained when r = 2, corresponds to independence. As the proof of Lemma 10 shows, the assumption (A6’)(ii) is stronger than necessary: we can replace it by β-mixing with mixing √ PT −1 (r−2)/r = O( T ). coefficients β t that satisfy t=1 tβ t EFFICIENT QUANTILE ESTIMATION 27 The conditions on the trimming parameter and bandwidths are stronger than in Theorem 5. They can be written as: ½ ¾ 1 1 1/4 R 1/4 R max , T hwT , T hyT ¿ bT ¿ 1/(4γ) , m 1/4 T hyT hwT T where aT ¿ cT means that aT < cT for T sufficiently large. This implies T 1/(4γ)−1/4 ¿ −[(m+1)/R][1/(4γ)+1/4] hyT hm . Hence, necessary conditions are γ > 1 and R > (m+1)(γ + wT ¿ T 1)/(γ − 1).22 For instance, when m = 1, R = 3 and γ = 6, a feasible choice is: hyT ∝ T −1/10 , hwT ∝ T −1/10 , and bT ∝ T −1/21 . Moreover, if R > (m + 1)(3γ + 1)/[2(γ − 1)], one can choose the L2 -optimal bandwidths h∗yT ∝ T −2R/[(2R+m)(2R+m+1)] and h∗wT ∝ T −1/(2R+m) for estimating f 0 (y|w)ḡT0 (w) and ḡT0 (w).23 For instance, when m = 1, R = 4 and γ = 6, then the L2 -optimal bandwidths h∗yT ∝ T −8/90 , h∗wT ∝ T −1/9 with trimming parameter bT ∝ T −1/21 can be chosen. In particular, our estimator θ̂T differs from many semiparametric ones that are √ T -asymptotically normal under assumptions that imply undersmoothing and thus exclude the L2 -optimal bandwidth. Without assumption (A9)(i) we would not be able to construct a conditional quantile estimator which attains V̄T0 . Note however that our general expression for VT0 derived in Theorem 3 remains valid whether or not we are able to construct an efficient estimator–this is one of the advantages of using the “supremum” characterization of the semiparametric efficiency bound. Our efficient M—estimator θ̂T is asymptotically equivalent to: the ‘one-step’ estimator proposed by Newey and Powell (1990), the weighted quantile regression estimator by Zhao (2001), and the CEL estimator by Otsu (2003). Two important features distinguish our efficient estimator from the previous ones. First, similar to Otsu’s (2003) CEL estimator, 22As indicated in Lavergne and Vuong (1996, p.209), we have c2T = o ³R 0 0 (w)<c } ḡT (w)dw {w:ḡT T ´ when ḡT0 (·) is continuously differentiable on Rm and monotonically decreasing in the tails, whether or not the support ³R ´ of ḡT0 (·) is bounded. Under the same conditions, it can be shown that cT = o {w:ḡ0 (w)<cT } |w|ḡT0 (w)dw T when the support of ḡT0 (·) is Rm . Hence, (A10)(ii) implies γ < 1 when qα (w, θ0 ) = w0 θ0 , which contradicts γ > 1. On the other hand, when the support of ḡT0 (·) is bounded (uniformly in T ), it can be shown that R 0 |w|ḡT0 (w)dw = O(c2−δ T ), where δ > 0 can be arbitrarily close to zero depending on ḡT (·). {w:ḡ 0 (w)<cT } T Hence, our assumptions allow the linear quantile specification qα (w, θ0 ) = w0 θ0 , provided the support of Wt is bounded (uniformly in T ). Bounded supports, however, are not required as our assumptions allow for unbounded ones. In this case f 0 [qα (w, θ0 )|w] and |∇θ qα (w, θ0 )| should vanish in the tails of Wt at appropriate rates for (A10) and the trimming/bandwidth conditions to be compatible. 23See 1/(m+1) ∝ T −1/(2R+m+1) . Stone (1980, 1982) where h∗yT solves (h∗yT h∗m wT ) 28 KOMUNJER AND VUONG our M—estimator θ̂T does not require a preliminary consistent estimate of θ0 . It is well established that such a preliminary step causes poor small sample performance in GMM estimation (Altonji and Segal, 1996).24 Second, the objective functions ϕ(·, ·, ξ̂ t ) used in the construction of θ̂T depend on a nonparametric estimator of the distribution function F 0 (·|·). Newey and Powell’s (1990) and Zhao’s (2001) efficient estimators on the other hand depend on nonparametric estimators of the density f 0 (·|·).25 Both features can potentially affect the small sample properties of these efficient estimators. 6. Conclusion The contributions of this paper are twofold: first, it derives the semiparametric efficiency bound VT0 for parameters of conditional quantiles in time series models with weakly dependent and/or heterogeneous data. Our bound VT0 generalizes expressions previously derived by the literature on efficient conditional quantile estimation. In particular we allow the data to exhibit dependence and/or conditional heteroskedasticity. The second result of the paper is to show that efficient estimation is possible in models for conditional quantiles in which the true conditional distribution does not depend on any other variables than those entering the quantile. In such models, the semiparametric efficiency bound equals V̄T0 and we are able to construct an M—estimator θ̂T which actually attains the bound. Our efficient estimator is different from previous ones and is of the MINPIN-type as the efficient M—objective function that it minimizes depends on a nonparametric estimator of the conditional distribution. An interesting by-product of the paper is to show that the class of M—estimators is rich enough to contain estimators that are efficient, at least in models for conditional quantiles. In general, one can think of the class of GMM estimators as being the widest one. Then comes the class of M—estimators which can be viewed as just-identified GMM estimators. Finally comes the class of QMLEs which is the class of M—estimators whose objective functions satisfy an additional “integrability” condition and can thus be interpreted as quasi-likelihoods. In models for conditional quantiles, efficient estimators do not belong to the class of QMLEs, 24In models with unconditional moment restrictions, Newey and Smith (2004) show how empirical likeli- hood based methods improve the finite sample properties of GMM. 25In particular, when estimating F 0 (·|·) and f 0 (·|·) by kernel estimators, there is always one smoothing parameter less to choose for conditional distributions (Hansen, 2004a,b). For example, in the iid case, our efficient estimator θ̂T can be constructed by using the empirical distribution function. EFFICIENT QUANTILE ESTIMATION 29 but are contained in the class of M—estimators. Hence, at least from a semiparametric efficiency viewpoint, no advantage is gained by considering GMM over M—estimators. However, important efficiency improvements are made by going from QMLEs to M—estimators. Finally, the “supremum” approach we use to derive the semiparametric efficiency bound VT0 does not seem to suffer from strong independence assumptions traditionally imposed by the literature on efficient estimation. Our construction of the least favorable parametric submodel and the corresponding MLE does not depend on any particular dependence or heterogeneity structure of the data. We conjecture that it can thus be generalized fairly easily to accommodate for general moment restrictions. The steps to follow in the construction of semiparametric efficiency bounds in models with time series data seem to be: (1) construct the largest class of M—estimators which are consistent for the true parameter θ0 of the conditional moment restriction in hand; (2) within this class, find the minimum asymptotic covariance matrix–this is a candidate matrix V for the bound–and the M—estimator which attains this minimum; (3) use its expression to derive the least favorable parametric submodel of the initial semiparametric model; (4) show that the inverse of the Fisher information matrix in this submodel equals V . It then follows that V is the semiparametric efficiency bound. While step (3) is perhaps the crucial one, we have little guidance on how exactly to construct the least favorable parametric submodel under general moment restrictions. This seems to be an important topic which we leave for future research. 7. Proofs Proof of Theorem 1. First, note that (A2)-(A3) together with the compactness of the parameter space Θ, are sufficient conditions for θT to be consistent for θ0∞ ∈ Θ̊ (see, e.g., Theorem 2.1 in Newey and McFadden, 1994). We now show that under correct conditional quantile model specification assumption (A1), we have: θ0∞ = θ0 for any T > 1 if and only if there exist a real function A(·, ξ t ) : R → R, twice continuously differentiable and strictly increasing a.s. − P on Q with derivative a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y, and a real function B(·, ξ t ) : R → R, such that, for any T > 1 and every t, 1 6 t 6 T, (10) ϕ(Yt , qt , ξ t ) = [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )] + B(Yt , ξ t ), a.s. − P, on R × Q × Et . We treat separately the two implications contained in the above equivalence. We start with the sufficiency part of the proof and show that if, for any T > 1 and every t, 1 6 t 6 T , 30 KOMUNJER AND VUONG ϕ(·, ·, ξ t ) is as in equation (10) above, then θ0∞ = θ0 for any T > 1, i.e. θ0 is also a minimizer of P E[ΨT (θ)] on Θ̊. Given (A3)(i) we know that ∇θ E[ΨT (θ)] = T −1 Tt=1 E[∇θ ϕ(Yt , qα (Wt , θ), ξ t )]. From (10) and the a.s. − P twice continuous differentiability of A(·, ξ t ) on Q, for any t, 1 6 t 6 T, T > 1, we have: E[∇θ ϕ(Yt , qα (Wt , θ), ξ t )] = E{∇θ qα (Wt , θ)a(qα (Wt , θ), ξ t )[1I(qα (Wt , θ) − Yt ) − α]} = E{∇θ qα (Wt , θ)a(qα (Wt , θ), ξ t )E[1I(qα (Wt , θ) − Yt ) − α|Wt ]}, so that by using the correct model specification assumption (A1) we get E[1I(qα (Wt , θ0 ) − Yt ) − α|Wt ] = 0, a.s. − P , for every t, 1 6 t 6 T, T > 1, and hence ∇θ E[ΨT (θ0 )] = 0. P Similarly, ∆θθ E[ΨT (θ)] = T −1 Tt=1 E[∆θθ ϕ(Yt , qα (Wt , θ), ξ t )] and E[∆θθ ϕ(Yt , qα (Wt , θ), ξ t )] ½ ∂a(qα (Wt , θ), ξ t ) ∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 =E [ ∂y +a(qα (Wt , θ), ξ t )∆θθ qα (Wt , θ)][1I(qα (Wt , θ) − Yt ) − α]} + E [∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 a(qα (Wt , θ), ξ t )δ(qα (Wt , θ) − Yt )] ½ ∂a(qα (Wt , θ), ξ t ) =E [ ∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 ∂y +a(qα (Wt , θ), ξ t )∆θθ qα (Wt , θ)]E[1I(qα (Wt , θ) − Yt ) − α|Wt ]} + E {∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 a(qα (Wt , θ), ξ t )E[δ(qα (Wt , θ) − Yt )|Wt ]} so that by using (A1) ∆θθ E[ΨT (θ0 )] = T −1 T P t=1 (11) = T −1 T P t=1 E{∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 a(qα (Wt , θ0 ), ξ t )E[δ(qα (Wt , θ0 ) − Yt )|Wt ]} E[∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))], where for every t, 1 6 t 6 T , ft0 (·) is the true probability density function of Yt conditional on Wt . We now show that ∆θθ E[ΨT (θ0 )] À 0. By using (11), we know that for any χ ∈ Rk , P χ0 ∆θθ E[ΨT (θ0 )]χ = 0 only if T −1 Tt=1 E[χ0 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 χa(qα (Wt , θ0 ), ξ t )× EFFICIENT QUANTILE ESTIMATION 31 ft0 (qα (Wt , θ0 ))] = 0. Now, note that for any t, 1 6 t 6 T and T > 1, E[χ0 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 χa(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))] = E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))] > 0, (12) for any χ ∈ Rk , since we know that a(qα (Wt , θ0 ), ξ t ) > 0, a.s. − P and ft0 (qα (Wt , θ0 )) > 0, a.s. − P . Taking into account the inequality in (12) we have that for any χ ∈ Rk , χ0 ∆θθ E[ΨT (θ0 )]χ = 0 only if E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))] = 0, for all t, 1 6 t 6 T , T > 1. Using again the strict positivity of a(·, ξ t ) and ft0 (·) this last equality is true only if χ0 ∇θ qα (Wt , θ0 ) = 0, a.s. − P , for every t, 1 6 t 6 T , T > 1. This, together with (A0)(iii), implies that χ = 0. From there we conclude that ∆θθ E[ΨT (θ0 )] À 0 and therefore θ0 is a minimizer E[ΨT (θ)] on Θ̊. Since by (A3)(ii) this minimizer is unique, we have that for any T > 1, θ0∞ = θ0 which completes the sufficiency part of the proof. We now show that the functional form of ϕ(·, ·, ξ t ) in (10) is necessary for θ0∞ = θ0 to hold for any T > 1. Given the differentiability of E[ΨT (θ)] on Θ by (A3)(i), a necessary requirement for θ0∞ = θ0 is that the first order condition ∇θ E[ΨT (θ0 )] = 0 be satisfied, which is equivalent to T −1 T X t=1 E{∇θ qα (Wt , θ0 )E[ ∂ϕ (Yt , qα (Wt , θ0 ), ξ t )|Wt ]} = 0. ∂qt Since the above equality needs to hold for any T > 1, any choice of conditional quantile model M and for any true parameter θ0 ∈ Θ̊, we need to find a necessary condition for the implication (13) E[1I(qα (Wt , θ0 ) − Yt ) − α|Wt ] = 0, a.s. − P ∂ϕ ⇒ E[ (Yt , qα (Wt , θ0 ), ξ t )|Wt ] = 0, a.s. − P, ∂qt to hold, for all t, 1 6 t 6 T , T > 1, and all absolutely continuous distribution function Ft0 in F. We now show that (14) ∂ϕ (Yt , qα (Wt , θ0 ), ξ t ) = a(qα (Wt , θ0 ), ξ t )[1I(qα (Wt , θ0 ) − Yt ) − α], a.s. − P, ∂qt for any θ0 ∈ Θ̊ and any t, 1 6 t 6 T , T > 1, where a(·, ξ t ) : R → R is strictly positive a.s.−P on Q, is a necessary condition for (13). Using a generalized Farkas lemma (Lemma 8.1, p 240, vol 1) in Gourieroux and Monfort (1995), (13) implies there exists a Wt -measurable 32 KOMUNJER AND VUONG random variable at such that ∂ϕ (Yt , qα (Wt , θ0 ), ξ t ) = at [1I(qα (Wt , θ0 ) − Yt ) − α], a.s. − P. ∂qt Since the left-hand side only depends on Yt , qα (Wt , θ0 ) and ξ t , the same must hold for the right-hand side. Hence, at can only depend on qα (Wt , θ0 ) and ξ t and we can write at = a(qα (Wt , θ0 ), ξ t ); so the equality in (14) holds. We now need to show that a(·, ξ t ) is strictly positive a.s. − P on Q. A necessary condition for θ0 ∈ Θ̊ to be a minimizer of E[ΨT (θ)] (in addition to the above first order condition) is that for every χ ∈ Rk the quadratic form χ0 ∆θθ E[ΨT (θ0 )]χ > 0 (existence of ∆θθ E[ΨT (θ)] is ensured by (A3)(i)).26 Taking into account (14) and our previous computations leading to (11), we have 0 χ ∆θθ E[ΨT (θ0 )]χ = T −1 T X χ0 E[∆θθ ϕ(Yt , qα (Wt , θ0 ), ξ t )]χ t=1 = T −1 T X t=1 0 E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))]. Hence, the quadratic form χ ∆θθ E[ΨT (θ0 )]χ is nonnegative for any T > 1, any conditional quantile model M, any true value θ0 ∈ Θ̊ and any conditional density ft0 (·), only if a(qα (Wt , θ0 ), ξ t ) > 0, a.s. − P , for all t, 1 6 t 6 T, T > 1. Note that the uniqueness of the solution θ0 implies that a(qt , ξ t ) > 0, a.s. − P for any qt ∈ Q and for all t, 1 6 t 6 T, T > 1. The remainder of the proof is straightforward: we need to integrate the necessary condition (14) with respect to qt . Note that (14) can be written ( (1 − α)a(qα (Wt , θ0 ), ξ t ), if Yt 6 qα (Wt , θ0 ), ∂ϕ (Yt , qα (Wt , θ0 ), ξ t ) = , a.s. − P, ∂qt −αa(qα (Wt , θ0 ), ξ t ), if Yt > qα (Wt , θ0 ), for any θ0 ∈ Θ̊ and for any t, 1 6 t 6 T, T > 1. Together with the continuity of ϕ(Yt , ·, ξ t ) a.s. − P on Q in (A2)(ii), the above integrates into ( (1 − α)[A(qα (Wt , θ0 ), ξ t ) − A(Yt , ξ t )], if Yt 6 qα (Wt , θ0 ), ϕ(Yt , qα (Wt , θ0 ), ξ t ) = B(Yt , ξ t )+ −α[A(qα (Wt , θ0 ), ξ t ) − A(Yt , ξ t )], if Yt > qα (Wt , θ0 ), a.s. − P , where for every t, 1 6 t 6 T, T > 1, A(·, ξ t ) is an indefinite integral of a(·, ξ t ), Rq A(qt , ξ t ) ≡ a t a(r, ξ t )dr, a ∈ R, and B(·, ξ t ) : R → R is a real function. Note that the above 26Note that this requirement is weaker than the positive definiteness of ∆θθ E[ΨT (θ0 )], ∆θθ E[ΨT (θ0 )] À 0, which is a sufficient condition for θ0 to be a minimum. EFFICIENT QUANTILE ESTIMATION 33 equality has to hold for any θ0 ∈ Θ̊ so that (15) ϕ(Yt , qα (Wt , θ), ξ t ) = B(Yt , ξ t ) + [α − 1I(qα (Wt , θ) − Yt )][A(Yt , ξ t ) − A(qα (Wt , θ), ξ t )], a.s. − P, for every t, 1 6 t 6 T, T > 1, and for all θ ∈ Θ; this is a necessary condition for the M— estimator θT to be consistent for θ0 . Equality (15) implies that for any t, 1 6 t 6 T, T > 1, ϕ(Yt , qt , ξ t ) = B(Yt , ξ t ) + [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )], a.s. − P on R × Q × Et . ¤ Proof of Theorem 2. To show that Theorem 2 holds, we first show that under primitive conp ditions given in (A0)-(A2) and (A4)-(A7), θT is consistent for θ0 , i.e. θT − θ0 → 0. We proceed by checking that all the assumptions for consistency used by Komunjer (2005b) in her Theorem 3 hold. Given that her proof of consistency for the family of tick-exponential QMLEs derived in Theorem 3 does not require any assumptions on the limits in ±∞ of the functions A(·, ξ t ), it applies directly to the M—estimator θT defined in (A2). Assumptions A2 and A3 in Komunjer (2005b) are satisfied by imposing our (A5) and (A4), respectively. The α-mixing condition A4 in Komunjer (2005b) and the assumption that Wt is a function of some finite number of lags of Xt stated in A0.iv in Komunjer (2005b) are used to ensure that {(Yt , Wt0 )0 } is α-mixing of with α of the same size −r/(r − 2), r > 2. Here, we directly impose the mixing of the sequence {(Yt , Wt0 )0 } in our (A6), which is suf- ficient for the proof of Theorem 3 in Komunjer (2005b) to go through. Finally, the moment conditions A5 in Komunjer (2005b) directly follow from our (A7) and the fact that E[supθ∈Θ |∇θ qα (Wt , θ)|] 6 max{1, E[supθ∈Θ |∇θ qα (Wt , θ)|2 ]} < ∞. Hence we can use the results of Theorem 3 in Komunjer (2005b)–corresponding to the case where the conditional quantile model is correctly specified (A1)–which proves the consistency of θT . Similarly, we derive asymptotic normality by using the results of Corollary 5 in Komunjer (2005b). The boundedness of the second derivative of A(·, ξ t ) contained in assumption A3’ in Komunjer (2005b) is directly implied by (A4). The moment condition in assumption A5’ in Komunjer (2005b) follows from our (A7). Finally in our setup we have assumed that the true conditional density ft0 (·) of Yt is strictly positive and bounded on R, which verifies assumption A6 in Komunjer (2005b). Hence, from Corollary 5 in Komunjer (2005b) we know √ d that T (Σ0T )−1/2 ∆0T (θT − θ0 ) → N (0, Id) where (16) ∆0T = T −1 PT t=1 E[a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ], 34 KOMUNJER AND VUONG and Σ0T = T −1 (17) PT t=1 α(1 − α)E[(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ]. ¤ Proof of Theorem 3. The proof of this theorem is inspired by a similar result by Gourieroux, Monfort, and Trognon (1984). Let VT0 be as defined in Theorem 3 and consider the difference (∆0T )−1 Σ0T (∆0T )−1 − VT0 . We show that this difference is positive definite for any A(·, ξ t ), 1 6 t 6 T, T > 1, in Theorem 1: (∆0T )−1 Σ0T (∆0T )−1 − VT0 = VT0 (VT0 )−1 VT0 − VT0 ∆0T (∆0T )−1 − (∆0T )−1 ∆0T VT0 + (∆0T )−1 Σ0T (∆0T )−1 = T −1 T P t=1 E{VT0 [ (ft0 (qα (Wt , θ0 )))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ]VT0 α(1 − α) − VT0 [ft0 (qα (Wt , θ0 ))a(qα (Wt , θ0 ), ξ t )∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ](∆0T )−1 − (∆0T )−1 [ft0 (qα (Wt , θ0 ))a(qα (Wt , θ0 ), ξ t )∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ]VT0 + (∆0T )−1 [α(1 − α)(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ](∆0T )−1 }, so that T (∆0T )−1 Σ0T (∆0T )−1 − VT0 = X 1 T −1 E[χt χ0t ], α(1 − α) t=1 where for every t, 1 6 t 6 T, T > 1, we let χt ≡ [ft0 (qα (Wt , θ0 ))VT0 − α(1 − α)a(qα (Wt , θ0 ), ξ t )(∆0T )−1 ]∇θ qα (Wt , θ0 ), and a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y as previously. Hence, for any A(·, ξ t ), 1 6 t 6 T, T > 1, such that a(·, ξ t ) > 0, a.s.−P on Q, the matrix (∆0T )−1 Σ0T (∆0T )−1 −VT0 is positive semidefinite. In other words, the matrix VT0 is the lower bound of the set of asymptotic matrices (∆0T )−1 Σ0T (∆0T )−1 obtained with functions A(·, ξ t ) satisfying the conditions of Theorem 1. We now show that this lower bound VT0 is attained by an M—estimator θ∗T if and only if P its objective function corresponds to Ψ∗T (θ) ≡ T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ ∗t ) with (18) ϕ(Yt , qt , ξ ∗t ) = [α − 1I(qt − Yt )][Ft0 (Yt ) − Ft0 (qt )], a.s. − P, EFFICIENT QUANTILE ESTIMATION 35 on R×Q×Et , for every 1 6 t 6 T, T > 1. We first show the necessary part of this equivalence: VT0 is attained only if for any T > 1, there exist ξ ∗t and A(·, ξ ∗t ), 1 6 t 6 T, T > 1, such that (19) T −1 T X E(χt χ0t ) = 0. t=1 The above needs to hold for any T > 1, hence (19) implies that E(χt χ0t ) = 0 for every t, 1 6 t 6 T, T > 1. Taking into account the positivity a.s. − P of the quadratic form χt χ0t , the latter equalities holds only if for every t, 1 6 t 6 T, T > 1, we have χt χ0t = 0, a.s. − P . Hence, (∆0T )−1 Σ0T (∆0T )−1 = VT0 for any T > 1, if and only if for every t, 1 6 t 6 T, T > 1, χt = 0, a.s. − P , which combined with (A0)(iii) is equivalent to ft0 (qα (Wt , θ0 )) V 0 ∆0 = Id, a.s. − P, α(1 − α)a(qα (Wt , θ0 ), ξ ∗t ) T T for all t, 1 6 t 6 T, T > 1. This in turn implies that for every t, 1 6 t 6 T, T > 1 and any qt ∈ Q, ft0 (qt ) and VT0 ∆0T = c · Id, α(1 − α) where c is some strictly positive real constant, c > 0. Note that the above condition is a(qt , ξ ∗t ) = c equivalent to a(qt , ξ ∗t ) = cft0 (qt )/[α(1 − α)] alone, which by integration with respect to qt gives that for every t, 1 6 t 6 T, T > 1, and any qt ∈ Q (20) A(qt , ξ ∗t ) = c Ft0 (qt ) + d, α(1 − α) with d ∈ R. Condition (20) is both a necessary and a sufficient condition for the equality in (19) to hold for any T > 1. It is important to note that changing the value of A(·, ξ ∗t ) outside Q does not affect the minima of E[ΨT ] so A(·, ξ ∗t ) can take arbitrary values on R\Q. To keep the notation simple and without altering the general validity of our result, we set A(y, ξ ∗t ) = cFt0 (y)/[α(1 − α)] + d, for all y ∈ R. Moreover, changing the constants c and d does not affect the value of (∆0T )−1 Σ0T (∆0T )−1 so that they can be arbitrarily chosen in R∗+ × R for any T > 1. For example, we can let c = α(1 − α) and d = 0 in which case (21) A(y, ξ ∗t ) = Ft0 (y), for all y ∈ R; this completes the proof of the necessary part. Now, we show that under (A0)-(A1), (A5)-(A6) and (A7)(i), the M—estimator θ∗T —obtained √ d by minimizing Ψ∗T (θ) associated with (18)—is such that T (VT0 )−1/2 (θ∗T − θ0 ) → N (0, Id). Note that the shape ξ ∗t of A(·, ξ ∗t ) corresponds to the true conditional distribution Ft0 (·) which is stochastic and Wt -measurable thereby satisfying (A2)(i). Moreover, Ft0 (·) is twice 36 KOMUNJER AND VUONG continuously differentiable with bounded ft0 (y) and |dft0 (y)/dy|, which satisfies (A2)(ii) and (A4). Moreover, Ft0 (·) being bounded by 1 the moment conditions in (A7)(ii) automatically hold. Hence, we can apply Theorem 2 to show that, under (A0)-(A1), (A5)-(A6) and (A7)(i), √ d θ∗T with A(·, ξ ∗t ) as in (21), is asymptotically normally distributed T (Σ0T )−1/2 ∆0T (θT −θ0 ) → N (0, Id) with ∆0T = T −1 PT t=1 E{[ft0 (qα (Wt , θ0 ))]2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 }, and Σ0T = α(1 − α)∆0T , so that (∆0T )−1 Σ0T (∆0T )−1 = VT0 . ¤ Proof of Theorem 4. The following lemma shows that (i) − (iii) in Theorem 4 hold: Lemma 7. The parametric submodel P ∗ defined by (6) is a submodel of S. In order to show that P ∗ is the least favorable model, consider estimating the parameter θ P ∗ in P ∗ by using the MLE θ̃T , which maximizes the log-likelihood LT (θ) ≡ T −1 Tt=1 ln ft (Yt , θ). ∗ STEP1: First, we establish the consistency of θ̃T by checking that conditions (i)-(iv) of Theorem 2.1 in Newey and McFadden (1994) hold. Given (A0)(i) we know that ln ft∗ (Yt , θ) 6= ln ft∗ (Yt , θ0 ) a.s. − P , whenever θ 6= θ0 (see Figure 1 for example); this verifies the uniqueness condition (i) of Theorem 2.1. The compactness condition (ii) of Theorem 2.1 follows by assumption. Using qt (θ) = qα (Wt , θ) we have ln ft∗ (Yt , θ) = ln[α(1 − α)ft0 (Yt )] + ln λ(θ) + λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α] ¢ ¡ − ln 1 − exp{λ(θ)[1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − Yt ) − Ft0 (qt (θ))]} , showing that E[ln ft∗ (Yt , θ)] is continous on Θ and that E[supθ∈Θ | ln ft∗ (Yt , θ)|r+ ] < ∞ for all t, 1 6 t 6 T, T > 1, and > 0; this verifies condition (iii) of Theorem 2.1. We show the uniform convergence condition (iv) of Theorem 2.1 by following the same steps as in the proof of Theorem 3 in Komunjer (2005b). To simplify the notation let (22) x(θ) ≡ [1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − Yt ) − Ft0 (qt (θ))] and u(z) ≡ exp z , 1 − exp z EFFICIENT QUANTILE ESTIMATION 37 for θ ∈ Θ and z ∈ R− . Note that −1 < x(θ) < 0 and −λ(θ) < λ(θ)x(θ) < 0 on Θ a.s. − P . We have ∇θ ln ft∗ (Yt , θ) = ∇θ λ(θ) + ∇θ λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α] λ(θ) − λ(θ)ft0 (qt (θ))∇θ qt (θ)[1I(qt (θ) − Yt ) − α] + λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))]δ(qt (θ) − Yt )∇θ qt (θ) (23) + u(λ(θ)x(θ))∇θ (λ(θ)x(θ)), where ∇θ (λ(θ)x(θ)) = ∇θ λ(θ)x(θ) + λ(θ)∇θ x(θ) and (24) © ª ∇θ x(θ) = ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] ∇θ qt (θ). (The equality in (24) follows from (22) and the fact that [1I(·)]2 = 1I(·).) Note that u(z) = −1/z − 1/2 + o(1) in the neigborhood of 0 and that λ(θ)x(θ) = op (1) in the neigborhood of θ0 so ∇θ (λ(θ)x(θ)) + op (1) λ(θ)x(θ) ∇θ λ(θ) ∇θ x(θ) =− − + op (1), λ(θ) x(θ) u(λ(θ)x(θ))∇θ (λ(θ)x(θ)) = − (25) in the neighborhood of θ0 . In particular, combining (23) (25), (24) and (22) we get (26) ∇θ ln ft∗ (Yt , θ0 ) ½ 0 ¾ ft (qt (θ0 ))[α − 1I(qt (θ0 ) − Yt )] + δ(qt (θ0 ) − Yt )[α − Ft0 (qt (θ0 ))] = −∇θ qt (θ0 ) [1I(qt (θ0 ) − Yt ) − α][1 − 1I(qt (θ0 ) − Yt ) − Ft0 (qt (θ0 ))] 1 =− ∇θ qt (θ0 )ft0 (qt (θ0 ))[1I(qt (θ0 ) − Yt ) − α], α(1 − α) where the second equality uses x(θ0 ) = −α(1 − α) and Ft0 (qt (θ0 )) = α. Using −1 < x(θ) < 0 on Θ a.s. − P so that ¯ ¯ ¯ ∇θ λ(θ) ¯ ¯ ¯ 6 |x(θ)∇θ λ(θ)|, {1 + λ(θ)x(θ)u(λ(θ)x(θ))} ¯ λ(θ) ¯ we then have (27) sup |∇θ ln ft∗ (Yt , θ)| 6 2 sup |∇θ λ(θ)| + sup |λ(θ)|M0 |∇θ qt (θ)| + θ∈Θ θ∈Θ θ∈Θ ¯ ¯ 0 ¯ ¯ f (q (θ))∇ q (θ) t θ t t ¯ , a.s. − P, +C1 sup ¯¯ 0 ¯ θ∈Θ 1 − 1I(qt (θ) − Yt ) − Ft (qt (θ)) 38 KOMUNJER AND VUONG x where C1 ≡ supx∈[0,supθ∈Θ λ(θ)] | 1−exp(−x) | < ∞. We have supt>1 supθ∈Θ Ft0 (qt (θ)) ∈ (a, b) with a > 0 and b < 1, so C2 ≡ supt>1 supy∈R supθ∈Θ (|1 − 1I(qt (θ) − y) − Ft0 (qt (θ))|−1 ) < ∞ and the last term of the above inequality is bounded above by C1 C2 M0 supθ∈Θ |∇θ qt (θ)|. From (A7)(i) we know that E[supθ∈Θ |∇θ qt (θ)|] < ∞, so E[supθ∈Θ |∇θ ln ft∗ (Yt , θ)|] < ∞ for all t, 1 6 t 6 T, T > 1, which shows that equation (25) in Komunjer (2005b) holds; together with (A6) and E[supθ∈Θ | ln ft∗ (Yt , θ)|r+ ] < ∞ for all t, 1 6 t 6 T, T > 1, this establishes condition (iv) of Theorem 2.1 and completes the proof of consistency. ∗ STEP2: We now show that the MLE θ̃T is asymptotically normal by checking that conditions (i)-(v) of Theorem 7.2 in Newey and McFadden (1994)–applied to ∇θ LT (θ)–hold. √ ∗ p We first establish the asymptotic first order condition T ∇θ LT (θ̃T ) → 0 by following the same steps as in the proof of Lemma A1 in Komunjer (2005b): for every j = 1, . . . , k, let P ∗ G̃∗T,j (h) be the right-derivative of L̃∗T,j (h) ≡ T −1 Tt=1 ln ft∗ (Yt , θ̃T + hej ), where {ej }kj=1 is ∗ the standard basis of Rk , and h ∈ R is such that for all j = 1, . . . , k, θ̃T + hej ∈ Θ. Since for every j = 1, . . . , k, L̃∗T,j (0) = LT (θ̂T ) so that the functions h 7→ L̃∗T ,j (h) achieve their maximum at h = 0, we have, for ε > 0, G̃∗T,j (ε) 6 G̃∗T,j (0) 6 G̃∗T,j (−ε), with G̃∗T,j (ε) 6 0 and G̃∗T,j (−ε) > 0. Therefore |G̃∗T,j (0)| 6 G̃∗T,j (−ε) − G̃∗T,j (ε). By taking the limit of this inequality as ε → 0, we get ¯ ¯ ¯# "¯ ∗ ¯ T ¯ ∂λ(θ̃∗ ) ¯ ¯ ∗ X ∂q ) ( θ̃ ∗ ∗ ¯ ¯ t T ¯ T ¯ [1 + 2C1 ] ¯ |G̃∗T,j (0)| 6 T −1 ¯ + ¯λ(θ̃T )ft0 (qt (θ̃T )) ¯ 1I{qt (θ̃T ) = Yt }. ¯ ∂θj ¯ ¯ ∂θj ¯ t=1 Hence µ ¶ ´ ³√ √ ∗ ∗ T |∇θ LT (θ̃T )| > ε 6 P T max |G̃T,j (0)| > ε P 16j6k ¯ ¯ ¯# ! à T "¯ ∗ ¯ X ¯¯ ∂λ(θ̃∗ ) ¯¯ ¯¯ ∗ √ ∂qt (θ̃T ) ¯ ∗ ∗ T 0 −1 . 6P ¯ ¯ + ¯λ(θ̃ )f (q (θ̃ )) ¯ 1I{qt (θ̃T ) = Yt } > ε T (1 + 2C1 ) ¯ ∂θj ¯ ¯ T t t T ∂θj ¯ t=1 ¯ ¯ ¯ ¯ ∗ ∗ ∗ ¯ ∂λ(θ̃∗T ) ¯ ¯ ∗ 0 ∂qt (θ̃T ) ¯ The facts that P (1I{qt (θ̃T ) = Yt } 6= 0) = 0 and that E[¯ ∂θj ¯ + ¯λ(θ̃T )ft (qt (θ̃T )) ∂θj ¯] is ´ ³√ ∗ bounded then ensure that limT →∞ P T |∇θ LT (θ̃T )| > ε = 0. Condition (i) of Theorem 7.2 follows from the correct specification of ft (·) (see (iii) in Theorem 4). By (A5), θ0 is an interior point of Θ so that condition (iii) of Theorem 7.2 holds. We now check the differentiability of E[∇θ LT (θ)] and the nonsingularity condition (ii) of P Theorem 7.2. We have E[∇θ LT (θ)] = T −1 Tt=1 E[∇θ ln ft∗ (Yt , θ)]; using (23) and (24) the latter is easily shown to be differentiable at any θ ∈ Θ̊. We now show that ∇θ E[∇0θ LT (θ0 )] = P T −1 Tt=1 E[∆θθ ln ft∗ (Yt , θ0 )] and that the latter is nonsingular. For u(z) in (22) we have EFFICIENT QUANTILE ESTIMATION 39 du(z)/dz = u(z) + [u(z)]2 , hence, for any t, 1 6 t 6 T, T > 1, ∆θθ ln ft∗ (Yt , θ) ∆θθ λ(θ) ∇θ λ(θ)∇θ λ(θ)0 + ∆θθ λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α] − = 2 λ(θ) [λ(θ)] © ª + 2∇θ λ(θ)∇θ qt (θ)0 ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[Ft0 (Yt ) − Ft0 (qt (θ))] ½ 0 dft (qt (θ)) 0 + λ(θ)∇θ qt (θ)∇θ qt (θ) [α − 1I(qt (θ) − Yt )] dq ¾ dδ(qt (θ) − Yt ) 0 0 0 −2ft (qt (θ))δ(qt (θ) − Yt ) + [Ft (Yt ) − Ft (qt (θ))] dq © 0 ª + λ(θ)∆θθ qt (θ) ft (qt (θ))[α − 1I(qt (θ) − Yt )] + [Ft0 (Yt ) − Ft0 (qt (θ))]δ(qt (θ) − Yt ) £ ¤ + u(λ(θ)x(θ)) + (u(λ(θ)x(θ)))2 (∇θ (λ(θ)x(θ))) (∇θ (λ(θ)x(θ)))0 (28) + u(λ(θ)x(θ))∆θθ (λ(θ)x(θ)), where ∆θθ (λ(θ)x(θ)) = ∆θθ λ(θ)x(θ) + 2∇θ λ(θ)∇θ x(θ)0 + λ(θ)∆θθ x(θ) and ∆θθ x(θ) ½ 0 dft (qt (θ)) = [α − 1I(qt (θ) − Yt )] − 2ft0 (qt (θ))δ(qt (θ) − Yt ) dq ¾ dδ(qt (θ) − Yt ) 0 [α − Ft (qt (θ))] ∇θ qt (θ)∇θ qt (θ)0 + dq © ª + ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] ∆θθ qt (θ). Now, note that u(z) + [u(z)]2 = 1/z 2 − 1/12 + o(1) in the neighborhood of 0 so that £ ¤ u(λ(θ)x(θ)) + (u(λ(θ)x(θ)))2 (∇θ (λ(θ)x(θ))) (∇θ (λ(θ)x(θ)))0 = (29) ∇θ λ(θ)∇θ λ(θ)0 ∇θ λ(θ)∇θ x(θ)0 + 2 + ∇θ qt (θ)∇θ qt (θ)0 × [λ(θ)]2 λ(θ)x(θ) ½ ¾2 [α − Ft0 (qt (θ))] [α − 1I(qt (θ) − Yt )] 0 + δ(qt (θ) − Yt ) ft (qt (θ)) + op (1), x(θ) x(θ) 40 KOMUNJER AND VUONG in the neighborhood of θ0 . Similarly, u(λ(θ)x(θ))∆θθ (λ(θ)x(θ)) ∇θ λ(θ)∇θ x(θ)0 ∆θθ λ(θ) 1 − ∆θθ λ(θ)x(θ) − 2 λ(θ) 2 λ(θ)x(θ) ½ 0 dft (qt (θ)) [α − 1I(qt (θ) − Yt )] − ∇θ qt (θ)∇θ qt (θ)0 dq x(θ) =− (30) ¾ ft0 (qt (θ))δ(qt (θ) − Yt ) dδ(qt (θ) − Yt ) [α − Ft0 (qt (θ))] + −2 x(θ) dq x(θ) ¾ ½ [α − Ft0 (qt (θ))] [α − 1I(qt (θ) − Yt )] 0 + δ(qt (θ) − Yt ) + op (1), − ∆θθ qt (θ) ft (qt (θ)) x(θ) x(θ) in the neighborhood of θ0 . Combining (28) with (29) and (30), we then get that, for any t, 1 6 t 6 T, T > 1, (31) ∆θθ ln ft∗ (Yt , θ) ¾ ½ 1 0 0 = ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ) 2 ½ ¾2 [α − Ft0 (qt (θ))] [α − 1I(qt (θ) − Yt )] 0 0 + δ(qt (θ) − Yt ) + ∇θ qt (θ)∇θ qt (θ) ft (qt (θ)) x(θ) x(θ) ½ 0 dft (qt (θ)) [α − 1I(qt (θ) − Yt )] − ∇θ qt (θ)∇θ qt (θ)0 dq x(θ) ¾ ft0 (qt (θ))δ(qt (θ) − Yt ) dδ(qt (θ) − Yt ) [α − Ft0 (qt (θ))] + −2 x(θ) dq x(θ) ½ ¾ [α − 1I(qt (θ) − Yt )] [α − Ft0 (qt (θ))] 0 + δ(qt (θ) − Yt ) − ∆θθ qt (θ) ft (qt (θ)) + op (1), x(θ) x(θ) in the neigborhood of θ0 . Using α = Ft0 (qt (θ0 )) and x(θ0 ) = −α(1 − α) we have |∆θθ ln ft∗ (Yt , θ0 )| µ ¶ M02 M1 5 0 + 6 |∆θθ λ(θ0 )| + |∇θ qt (θ0 )∇θ qt (θ0 ) | 2 [α(1 − α)]2 α(1 − α) M0 +|∆θθ qt (θ0 )| + op (1), α(1 − α) with |∆θθ λ(θ0 )| < ∞. From (A7)(i) we have E[|∇θ qt (θ0 )∇θ qt (θ0 )0 |] < ∞ and E[|∆θθ qt (θ0 )|] < ∞, which shows that the expectation of the right hand side of the above inequality is finite; hence ∇θ E[∇0θ ln ft∗ (Yt , θ0 )] = E[∆θθ ln ft∗ (Yt , θ0 )] for any t, 1 6 t 6 T, T > 1 and so P ∇θ E[∇0θ LT (θ0 )] = T −1 Tt=1 E[∆θθ ln ft∗ (Yt , θ0 )] as desired. EFFICIENT QUANTILE ESTIMATION 41 Now consider E[∆θθ ln ft∗ (Yt , θ0 )]; for any t, 1 6 t 6 T, T > 1, we have µ ½ ¾¶ 1 0 0 E ∆θθ λ(θ0 ) [Ft (Yt ) − Ft (qt (θ0 ))][1I(qt (θ0 ) − Yt ) − α] − x(θ0 ) 2 ¸ ∙ ¡ 0 ¢ 1 = ∆θθ λ(θ0 ) E [Ft (Yt ) − α][1I(qt (θ0 ) − Yt ) − α] + α(1 − α) 2 ¸ ∙ 1 1 = ∆θθ λ(θ0 ) − α(1 − α) + α(1 − α) 2 2 = 0, since ¡ ¢ Et [Ft0 (Yt ) − α][1I(qt (θ0 ) − Yt ) − α] Z qt (θ0 ) Z 0 0 = (1 − α) [Ft (y) − α]ft (y)dy − α −∞ +∞ qt (θ0 ) [Ft0 (y) − α]ft0 (y)dy h1 iqt (θ0 ) i+∞ h1 − α [Ft0 (y) − α]2 = (1 − α) [Ft0 (y) − α]2 2 2 −∞ qt (θ0 ) 1 = − α(1 − α). 2 In addition, α = Ft0 (qt (θ0 )) and x(θ0 ) = −α(1 − α) so à ¾2 ! ½ [α − Ft0 (qt (θ0 ))] [α − 1I(qt (θ0 ) − Yt )] 0 0 E ∇θ qt (θ0 )∇θ qt (θ0 ) ft (qt (θ0 )) + δ(qt (θ0 ) − Yt ) x(θ0 ) x(θ0 ) µ ½ 0 ¾¶ [ft (qt (θ0 ))]2 [α − 1I(qt (θ0 ) − Yt )]2 = E ∇θ qt (θ0 )∇θ qt (θ0 )0 Et α2 (1 − α)2 ¶ µ 2 0 0 [ft (qt (θ 0 ))] , = E ∇θ qt (θ0 )∇θ qt (θ0 ) α(1 − α) where the last equality uses Et ([1I(qt (θ0 ) − Yt ) − α]2 ) = α(1 − α), a.s. − P . Similarly, µ ½ 0 dft (qt (θ0 )) [α − 1I(qt (θ0 ) − Yt )] 0 E ∇θ qt (θ0 )∇θ qt (θ0 ) dq x(θ0 ) ¾¶ 0 ft (qt (θ0 ))δ(qt (θ0 ) − Yt ) dδ(qt (θ0 ) − Yt ) [α − Ft0 (qt (θ0 ))] + −2 x(θ0 ) dq x(θ0 ) ½ 0 ¾¶ µ dft (qt (θ0 )) [1I(qt (θ0 ) − Yt ) − α] ft0 (qt (θ0 ))δ(qt (θ0 ) − Yt ) 0 +2 = E ∇θ qt (θ0 )∇θ qt (θ0 ) Et dq α(1 − α) α(1 − α) ¶ µ 0 2 [f (qt (θ0 ))] , = 2E ∇θ qt (θ0 )∇θ qt (θ0 )0 t α(1 − α) 42 KOMUNJER AND VUONG where the last equality uses Et (1I(qt (θ0 ) − Yt ) − α) = 0, a.s. − P and Et (δ(qt (θ0 ) − Yt )) = ft0 (qt (θ0 )), a.s. − P . Finally, using the same reasoning gives µ ½ ¾¶ [α − 1I(qt (θ0 ) − Yt )] [α − Ft0 (qt (θ0 ))] 0 E ∆θθ qt (θ0 ) ft (qt (θ0 )) + δ(qt (θ0 ) − Yt ) = 0. x(θ0 ) x(θ0 ) Combining the above results then yields, by (31), ¶ µ 0 2 ∗ 0 [ft (qt (θ 0 ))] (32) E[∆θθ ln ft (Yt , θ0 )] = −E ∇θ qt (θ0 )∇θ qt (θ0 ) , α(1 − α) for all t, 1 6 t 6 T, T > 1. Hence, for any χ ∈ Rk , 0 χ ∇θ E[∇0θ LT (θ0 )]χ = −T −1 T X t=1 µ ¶ 0 2 0 2 [ft (qt (θ 0 ))] E |∇θ qt (θ0 ) χ| 6 0, α(1 − α) with equality if and only if χ = 0. Hence ∇θ E[∇0θ LT (θ0 )] is negative definite (therefore nonsingular). We now check condition (iv) of Theorem 7.2 by using a CLT for α-mixing sequences (e.g. Theorem 5.20 in White, 2001, p.130). By (A6), for any θ ∈ Θ̊, the sequence {∇θ ln ft∗ (Yt , θ)} is strong mixing (i.e. α-mixing) with α of size −r/(r − 2), r > 2 (see, e.g., Theorem 3.49 in White, 2001, p.50). Moreover, using (23) and (A1), E[∇θ ln ft∗ (Yt , θ0 )] = 0 and using (A7)(i), E[|∇θ ln ft∗ (Yt , θ0 )|r ] 6 {M0 /[α(1 − α)]}r E[supθ∈Θ |∇θ qt (θ)|r ] < ∞, for all t, 1 6 t 6 T, T > 1. Now, ´ ³ XT ∇θ ln ft∗ (Yt , θ0 ) Var T −1 t=1 ´ ³ XT ∇θ ln ft∗ (Yt , θ0 )∇θ ln ft∗ (Yt , θ0 )0 = E T −1 t=1 µ ¶ XT [f 0 (qt (θ0 ))]2 [1I(qt (θ0 ) − Yt ) − α]2 t −1 0 =E T ∇θ qt (θ0 )∇θ qt (θ0 ) t=1 [α(1 − α)]2 = VT0 where the first equality uses Et (∇θ ln ft∗ (Yt , θ0 )) = 0, a.s. − P , implied by (A1), and the last equality uses Et ([1I(qt (θ0 ) − Yt ) − α]2 ) = α(1 − α), a.s. − P . Applying Theorem 5.20 in √ d White (2001) we then have (VT0 )−1/2 T ∇θ LT (θ0 ) → N (0, Id) with VT0 as defined in Theorem 3. Finally, we check the stochastic equicontinuity condition (v) of Theorem 7.2 by veryfing that all the assumptions in Theorem 7.3 in Newey and McFadden (1994) hold. (The main reason for using Theorem 7.3 is that it does not put any restrictions on the dependence EFFICIENT QUANTILE ESTIMATION 43 structure of {(Yt , Wt0 )0 }.) For any t, 1 6 t 6 T, T > 1, let rt (θ) = |∇θ ln ft∗ (Yt , θ) − ∇θ ln ft∗ (Yt , θ0 ) − ∆θθ ln ft∗ (Yt , θ)0 (θ − θ0 )|/|θ − θ0 |, for θ ∈ Θ̊. Using u(z) = −1/z −1/2+o(1) in the neigborhood of 0 and λ(θ)x(θ) = op (|θ−θ0 |) (1) (2) (3) in the neigborhood of θ0 , we have, from (23), (26) and (28), rt (θ) 6 rt (θ)+rt (θ)+rt (θ)+ op (1), where ¯ ¯ ¯ 0 x(θ) ¯¯ |∇θ λ(θ) − ∆θθ λ(θ)0 (θ − θ0 )| 0 ¯ ≡ ¯[Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − , 2 ¯ |θ − θ0 | ¯ 0 ∙ 0 ¸¯ ¯ ft (qt (θ)) Ft (qt (θ)) α ¯¯ |λ(θ)∇θ qt (θ)| (2) 0 ¯ [1I(qt (θ) − Yt ) − α] + δ(qt (θ) − Yt ) − Ft (Yt ) + , rt (θ) ≡ ¯ 2 2 2 ¯ |θ − θ0 | ¯ ¯ ¯ ∇θ x(θ) ∇θ x(θ0 ) ∆θθ x(θ)0 (θ − θ0 ) ∇θ x(θ)∇θ x(θ)0 (θ − θ0 ) ¯ (3) ¯ /|θ − θ0 |. − − + rt (θ) ≡ ¯¯ ¯ x(θ) x(θ0 ) x(θ) [x(θ)]2 (1) rt (θ) (1) With probability one, rt (θ) 6 2|∇θ λ(θ) − ∆θθ λ(θ)0 (θ − θ0 )|/|θ − θ0 | for any θ ∈ Θ̊. Given (1) that λ(·) is twice continously differentiable on Rk , with probability one rt (θ) → 0 as θ → θ0 and there exists ε1 > 0 such that ´ ³ (1) E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) < ∞. (33) Now, note that |ft0 (qt (θ))[1I(qt (θ) − Yt ) − α]| 6 M0 for any θ ∈ Θ̊, so (2) o |λ(θ)∇ q (θ)| 1n θ t M0 + δ(qt (θ) − Yt )[Ft0 (qt (θ)) − 2Ft0 (Yt ) + α] 2 |θ − θ0 | o 1n M0 + δ(qt (θ) − Yt )[Ft0 (qt (θ)) − 2Ft0 (Yt ) + α] |∇θ λ(θc )| · |∇θ qt (θ)| 6 2 rt (θ) 6 for some θc ≡ cθ0 +(1−c)θ with c ∈ (0, 1). Hence, using the fact that ∇θ λ(·) is continuous on Rk , that ∇θ λ(θ0 ) = 0 and that δ(qt (θ0 ) − Yt )[Ft0 (qt (θ0 )) − 2Ft0 (Yt ) + α] = 0, with probability 44 KOMUNJER AND VUONG (2) one rt (θ) → 0 as θ → θ0 . Moreover, for some θd ≡ dθ0 + (1 − d)θ, d ∈ (0, 1), ´ ³ (2) E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) à ¯¶ o ¯ 0 µ nM ¯ ¯ Ft (qt (θ)) α 0 0 + Et δ(qt (θ) − Yt ) ¯¯ − Ft (Yt ) + ¯¯ 6E sup 2 2 2 θ∈Θ̊:|θ−θ0 |<ε1 ! × |∇θ λ(θc )| · |∇θ qt (θ)| à ! M0 ·E sup 6 |∇θ λ(θc )| · |∇θ qt (θ)| 2 θ∈Θ̊:|θ−θ0 |<ε1 à ! 1 |∇θ λ(θc )| · |∇θ qt (θ)| · |α − Ft0 (qt (θ))|ft0 (qt (θ)) sup + E 2 θ∈Θ̊:|θ−θ0 |<ε1 à ! M0 · sup |∇θ λ(θc )| 6 2 θ∈Θ̊:|θ−θ0 |<ε1 ! à " Ã × E sup θ∈Θ̊:|θ−θ0 |<ε1 |∇θ qt (θ)| + M0 E sup θ∈Θ̊:|θ−θ0 |<ε1 !# |∇θ qt (θ)| · |∇θ qt (θd )| < ∞, (34) where the last inequality uses the continuity of ∇θ λ(·) on Rk , (A7)(i) and the Cauchy- Schwarz inequality. Finally, let rx (θ) = [x(θ0 ) − x(θ) − ∇θ x(θ)0 (θ0 − θ)] /|θ0 −θ| and Rx (θ) = [∇θ x(θ0 )−∇θ x(θ)−∆θθ x(θ)0 (θ0 −θ)]/|θ0 −θ| and note that with probability one supθ∈Θ̊:|θ−θ0 |<ε1 |rx (θ)| → 0 and supθ∈Θ̊:|θ−θ0 |<ε1 |Rx (θ)| → 0 as θ → θ0 . This implies that with probability (3) one rt (θ) → 0 as θ → θ0 . Moreover ³ ´ (3) E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) ´ ³ 6 E supθ∈Θ̊:|θ−θ0 |<ε1 [|rx (θ)| + |Rx (θ)|]/|x(θ)| ³ ´´ ³ 6 E supθ∈Θ̊:|θ−θ0 |<ε1 [1/|x(θ)|] supθ∈Θ̊:|θ−θ0 |<ε1 |rx (θ)| + supθ∈Θ̊:|θ−θ0 |<ε1 |Rx (θ)| (35) < ∞, where the last inequality uses the fact that supt>1 supθ∈Θ Ft0 (qt (θ)) ∈ (a, b) with a > 0 and ³ ´ −1 b < 1, so C3 ≡ supt>1 supy∈R supθ∈Θ |[1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − y) − Ft0 (qt (θ))]| < ∞. Combining results (33) − (35) then gives that with probability one rt (θ) → 0 as θ → θ0 ´ ³ and that E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) < ∞. It remains to be shown that for all θ in a neighP p borhood of θ0 we have T −1 Tt=1 ∆θθ ln ft∗ (Yt , θ) → ∇θ E[∇0θ LT (θ)]. By (A6), for any θ ∈ Θ̊, EFFICIENT QUANTILE ESTIMATION 45 the sequence {∆θθ ln ft∗ (Yt , θ)} is strong mixing (i.e. α-mixing) with α of size −r/(r − 2), r > 2 (see, e.g. Theorem 3.49 in White, 2001, p.50). Now note that given θ ∈ Θ̊, there exists θa = aθ0 + (1 − a)θ, a ∈ (0, 1), such that for any η > 0 ¯ ¯ ¯ ¡ ¢ ¡¯ ¢ P δ(qt (θ) − Yt ) ¯α − Ft0 (qt (θ))¯ > η 6 E ¯α − Ft0 (qt (θ))¯ ft0 (qt (θ)) /η ¡ ¢ 6 |θ − θ0 |E |∇θ qt (θa )|ft0 (qt (θa ))ft0 (qt (θ)) /η 6 |θ − θ0 |M02 E[supθ∈Θ |∇θ qt (θ)|]/η, (36) so that in a neighborhood of θ0 , δ(qt (θ) − Yt ) |α − Ft0 (qt (θ))| = op (1). Similarly, ¯ ¯ ¯ ¡ ¢ ¡¯ ¢ P [dδ(qt (θ) − Yt )/dq] ¯α − Ft0 (qt (θ))¯ > η 6 E ¯α − Ft0 (qt (θ))¯ dft0 (qt (θ))/dq /η 6 |θ − θ0 |M0 M1 E[supθ∈Θ |∇θ qt (θ)|]/η (37) where the first inequality uses the fact that Et (dδ(qt (θ) − Yt )/dq) = dft0 (qt (θ))/dq, a.s. − P . From (31) we have that for any t, 1 6 t 6 T, T > 1, ∆θθ ln ft∗ (Yt , θ) ¾ ½ 1 0 0 = ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ) 2 0 n¡ ¢2 ¡ ¢2 ∇θ qt (θ)∇θ qt (θ) 0 0 + f (q (θ))[α − 1 I(q (θ) − Y )] + δ(q (θ) − Y )[α − F (q (θ))] t t t t t t t t [x(θ)]2 ª −x(θ)[dft0 (qt (θ))/dq][α − 1I(qt (θ) − Yt )] + x(θ)[dδ(qt (θ) − Yt )/dq][α − Ft0 (qt (θ))] − ª ∆θθ qt (θ) © 0 ft (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] + op (1), x(θ) in a neighborhood of θ0 , which combined with (36) and (37) gives ∆θθ ln ft∗ (Yt , θ) ½ ¾ 1 0 0 = ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ) 2 o 0 n¡ ¢2 ∇θ qt (θ)∇θ qt (θ) 0 0 f − x(θ)[df (q (θ))[α − 1 I(q (θ) − Y )] (q (θ))/dq][α − 1 I(q (θ) − Y )] t t t t t t t t [x(θ)]2 ª ∆θθ qt (θ) © 0 − ft (qt (θ))[α − 1I(qt (θ) − Yt )] + op (1), x(θ) 46 KOMUNJER AND VUONG so that for a given ε > 0, there is a positive constant nr,ε such that |∆θθ ln ft∗ (Yt , θ)|r+ε ³ ªr+ε 2(r+ε) © 6 nr,ε |∆θθ λ(θ)|r+ε (5/2)r+ε + |∇θ qt (θ)∇θ qt (θ)0 |r+ε C3 M02 + M1 ¢ +|∆θθ qt (θ)|r+ε C3r+ε M0r+ε + op (1), in a neigborhood of θ0 , and so using (A7)(i) and the fact that |∆θθ λ(θ)| < ∞ in a neighbor- hood of θ0 , we have E[|∆θθ ln ft∗ (Yt , θ)|r+ε ] < ∞. The weak LLN then follows from Corollary ∗ 3.48 in White (2001). This completes the proof of asymptotic normality of the MLE θ̃T . ¤ Proof of Lemma 7. We proceed in two steps. STEP1: To prove (i) and (iii), we start by showing that for any θ ∈ Θ\{θ0 }, the function ft∗ (·, θ) in (6) is a probability density, for all t, 1 6 t 6 T, T > 1. First, note that for any θ ∈ Θ\{θ0 }, ft∗ (·, θ) is continuous and ft∗ (·, θ) > 0 on R. Thus it suffices to show R that R ft∗ (y, θ)dy = 1. Consider the change of variable u ≡ λ(θ)Ft0 (y), where λ(θ)Ft0 (·) is strictly increasing in y since λ(θ) = Λ(θ − θ0 ) > 0 and ft0 (·) is strictly positive (so du = λ(θ)ft0 (y)dy). To simplify the notation, we let qt (θ) ≡ qα (Wt , θ). Noting that 1I(qt (θ) − y) = 1I[λ(θ)Ft0 (qt (θ)) − u], we have Z R ft∗ (y, θ)dy = 0 λ(θ)FZ t (qt (θ)) 0 Z α(1 − α) exp{(1 − α)[u − λ(θ)Ft0 (qt (θ))]} du 1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))] λ(θ) α(1 − α) exp{−α[u − λ(θ)Ft0 (qt (θ))]} du 1 − exp{−αλ(θ)[1 − Ft0 (qt (θ))]} λ(θ)Ft0 (qt (θ)) iλ(θ)Ft0 (qt (θ)) α exp[−(1 − α)λ(θ)Ft0 (qt (θ))] h = exp[(1 − α)u 1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))] 0 h iλ(θ) 0 (1 − α) exp[αλ(θ)Ft (qt (θ))] − exp(−αu) + 1 − exp{−αλ(θ)[1 − Ft0 (qt (θ))]} λ(θ)Ft0 (qt (θ)) = α + (1 − α) = 1, + which shows that ft∗ (·, θ) is a probability density for any θ ∈ Θ\{θ0 }. We now show that this is also true for θ0 and that ft∗ (·, θ0 ) = ft0 (·). For this, let (38) Pt (θ) ≡ α(1 − α)λ(θ) exp{λ(θ)[Ft0 (y) − Ft0 (qt (θ))][1I(qt (θ) − y) − α]}, (39) Qt (θ) ≡ 1 − exp{λ(θ)[1 − Ft0 (qt (θ)) − 1I(qt (θ) − y)][1I(qt (θ) − y) − α]}, so that ft∗ (y, θ) = ft0 (y)Pt (θ)/Qt (θ). By (A0)(ii), the functions Pt and Qt are at least twice continuously differentiable on Θ a.s. − P ; thus for every (θ, θ0 ) ∈ Θ2 we can write their EFFICIENT QUANTILE ESTIMATION 47 respective Taylor developments of order two as Pt (θ) = (40) X Dl Pt (θ0 ) |l|62 (41) Qt (θ) = l! (θ − θ0 )l + o(|θ − θ0 |2 ), X Dl Qt (θ0 ) |l|62 l! (θ − θ0 )l + o(|θ − θ0 |2 ). Straightforward though lengthy computations show that, for any function λ(θ) = Λ(θ − θ0 ) such that ∇θ Λ(0) = 0 and ∆θθ Λ(0) nonsingular, we have Pt (θ0 ) = 0, D1 Pt (θ0 ) = 0, D2 Pt (θ0 ) = α(1 − α)D2 λ(θ0 ), (42) and Qt (θ0 ) = 0, D1 Qt (θ0 ) = 0, D2 Qt (θ0 ) = α(1 − α)D2 λ(θ0 ), (43) Hence (44) Pt (θ) = 1 α(1 2 − α)D2 λ(θ0 )(θ − θ0 )2 + o(|θ − θ0 |2 ), (45) Qt (θ) = 1 α(1 2 − α)D2 λ(θ0 )(θ − θ0 )2 + o(|θ − θ0 |2 ). Given the nonsingularity of ∆θθ Λ(0), an immediate consequence of l’Hôpital’s rule and (44)− (45) is that limθ→θ0 Pt (θ)/Qt (θ) = 1. Hence by a.s. − P continuity of ft∗ (y, ·) on Θ, we have, for any y ∈ R, ft∗ (y, θ0 ) = limθ→θ0 ft∗ (y, θ) = ft0 (y). This shows that ft∗ (·, θ) is a probability density for any θ ∈ Θ, and that ft∗ (·, θ0 ) = ft0 (·), so that f 0 ∈ P ∗ , as desired. STEP 2: It remains to be shown that this parametric model P ∗ satisfies the conditional moment restriction in (ii) for all θ ∈ Θ. This restriction is clearly satisfied when θ = θ0 as ft∗ (·, θ0 ) = ft0 (·) and [θ0 , ft0 (·)] satisfies (A1) by assumption. When θ 6= θ0 , using again the change of variable u ≡ λ(θ)Ft0 (y), we have Z qt (θ) Eθ [1I(qt (θ) − Yt )|Wt ] = ft∗ (y, θ)dy = Z 0 λ(θ)Ft0 (qt (θ)) −∞ α(1 − α) exp{(1 − α)[u − λ(θ)Ft0 (qt (θ))]} du = α. 1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))] ¤ Proof of Theorem 5. From Theorem 3 we know that θ∗T which minimizes Ψ∗T (θ) is consistent for θ0 . Thus, in order to establish the consistency of θ̂T , it suffices to show that Ψ̂T (θ)−Ψ∗T (θ) converges uniformly (in θ) to zero, i.e. supθ∈Θ |Ψ̂T (θ) − Ψ∗T (θ)| = op (1). For this we need a 48 KOMUNJER AND VUONG uniform consistency property of Dλ Ĝ(·, ·), where Dλ denotes the λth derivative with respect to y. Lemma 8. Suppose that (A6), (A8)-(A9), (A11) hold. Then, sup(y,w)∈Rm+1 |Dλ Ĝ(y, w) − √ 0 R R 0 λ 0 0 (y, w)| = Op [1/( T hλyT hm HλT wT )] + Op (hwT ) + Op (hyT ), where HλT (y, w) ≡ D F (y|w)ḡT (w) and λ = 0, 1, 2. P We will also need the uniform consistency of ĝ(·) for ḡT0 (·) ≡ (1/T ) Tt=1 gt0 (·): √ R (46) sup |ĝ(w) − ḡT0 (w)| = Op [1/( T hm wT )] + Op (hwT ), w∈Rm which follows from Theorem 1(a) in Andrews (1995) with η = ∞ given (A6), (A8) and (A11)(i)-(ii). We let qt (θ) = qα (Wt , θ) as previously, and bT ≡ bT + T, dt ≡ 1I[ḡT0 (Wt ) − bT ] and ΨT (θ) be equal to Ψ̂T (θ) where F̂t (·) ≡ dt F̂ (·|Wt ) is replaced by dt F̂ (·|Wt ), i.e. (47) ΨT (θ) = T 1X d [α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F̂ (qt (θ)|Wt )], T t=1 t where { T } > 0 is an appropriate vanishing sequence. The remainder of the proof adapts the consistency proof of Theorem 1 in Lavergne and Vuong (1996). Let √ m R R T = o(bT ), T T hwT → ∞, T /hwT → ∞ and T /hyT → ∞ . As T be such that sup |Ψ̂T (θ) − Ψ∗T (θ)| 6 sup |Ψ̂T (θ) − ΨT (θ)| + sup |ΨT (θ) − Ψ∗T (θ)|, θ∈Θ θ∈Θ θ∈Θ where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms in the right-hand side are op (1). Given Lemma 8 and Equation (46) we will use (48) a−1 T sup (y,w)∈Rm+1 0 |Ĝ(y, w) − H0T (y, w)| = op (1), 0 a−1 T sup |ĝ(w) − ḡT (w)| = op (1), (49) w∈Rm √ R R which hold for any sequence {aT } satisfying aT T hm wT → ∞, aT /hwT → ∞ and aT /hyT → ∞. We will also use the identity (50) F̂ (y|w) − F 0 (y|w) = 1 F 0 (y|w) 0 [Ĝ(y, w) − H0T [ĝ(w) − ḡT0 (w)]. (y, w)] − ĝ(w) ĝ(w) STEP 1: We first show that supθ∈Θ |Ψ̂T (θ) − ΨT (θ)| = op (1). We have Ψ̂T (θ) − ΨT (θ) = T 1X (Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F̂ (qt (θ)|Wt )] T t=1 = ∆Ψ̂1T − ∆Ψ̂2T + ∆Ψ̂3T , EFFICIENT QUANTILE ESTIMATION 49 where Jt = dt (1 − dt ), Ht = (1 − dt )dt and T 1X = (Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F 0 (Yt |Wt )], T t=1 ∆Ψ̂1T T 1X = (Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt )], T t=1 ∆Ψ̂2T T 1X (Jt − Ht )[α − 1I(qt (θ) − Yt )][F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt )]. = T t=1 ∆Ψ̂3T As Ht 6 1I[|ĝ(Wt ) − ḡT0 (Wt )| − T] and the event {supw |ĝ(w) − ḡT0 (w)| > probability 0 because Property (49) holds with aT = T T} has asymptotic by construction of T, we have sup16t6T,T >1 Ht = 0 with probability approaching one. Hence, we need to consider the Jt terms only. Namely, it suffices to show that supθ∈Θ ∆Ψ̂JjT = op (1) for j = 1, 2, 3. Using Identity (50) and the definition of Jt , we obtain |∆Ψ̂J1T | 6 Because (1/T ) b−1 T PT " t=1 sup (y,w)∈Rm+1 |Ĝ(y, w) − 0 H0T (y, w)| + sup |ĝ(w) − w∈Rm # gT0 (w)| T 1X Jt T t=1 Jt 6 1, we get supθ∈Θ ∆Ψ̂J1T = op (1) in view of Properties (48) − (49) with aT = bT under our assumptions on bT . Similarly, supθ∈Θ |∆Ψ̂J2T | = op (1). Regarding P P P ∆Ψ̂J3T , we have |∆Ψ̂J3T | 6 (1/T ) Tt=1 Jt . But (1/T ) Tt=1 Jt 6 (1/T ) Tt=1 (1 − dt ) with # Z T T Z 1X 1X ḡT0 (w)dw = o(1), gt (w)dw = (1 − dt ) = E 0 T t=1 T t=1 {w:ḡT0 (w)<bT } {w:ḡT (w)<bT } " where the last equality follows by taking cT = bT in (A10)(i). Hence, (1/T ) op (1) by Markov inequality. Thus, (51) and supθ∈Θ ∆Ψ̂J3T = op (1). T 1X Jt = op (1), T t=1 PT t=1 (1 − dt ) = 50 KOMUNJER AND VUONG STEP 2: We next show that supθ∈Θ |ΨT (θ) − Ψ∗T (θ)| = op (1). We have ΨT (θ) − Ψ∗T (θ) = T 1X d [α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F 0 (Yt |Wt )] T t=1 t T 1X − d [α − 1I(qt (θ) − Yt )][F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt )] T t=1 T − T 1X (1 − dT )[α − 1I(qt (θ) − Yt )][F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt )] T t=1 ≡ ∆Ψ1T − ∆Ψ2T − ∆Ψ3T Thus, it suffices to show that supθ∈Θ ∆ΨjT = op (1) for j = 1, 2, 3. Because T = o(bT ), √ R R bT ≡ bT + T is a sequence satisfying bT T hm wT → ∞, bT /hwT → ∞ and bT /hyT → ∞ so that Properties (48) − (49) hold with aT = bT . In particular, Property (49) implies ´ ³ P inf {w:ḡT0 (w)>bT } ĝ(w) > bT (1 − η) → 1 as T → ∞ for any η ∈ (0, 1). Thus, using Identity (50), we have |∆Ψ1T | 6 (bT )−1 (1−η)−1 ( sup (y,w)∈Rm+1 0 |Ĝ(y, w) − H0T (y, w)| + sup |ĝ(w) − ḡT0 (w)| with probability approaching 1, where (1/T ) w∈Rm PT t=1 ) T 1X d, T t=1 t dt 6 1. Hence, supθ∈Θ ∆Ψ1T = op (1) using Properties (48) − (49) with aT = bT . Similarly, supθ∈Θ ∆Ψ2T = op (1). Regarding P ∆Ψ3T , we have supθ∈Θ |∆Ψ3T | 6 (1/T ) Tt=1 (1 − dT ) = op (1) from Step 1. ¤ Proof of Lemma 8. The proof adapts that of Lemma A-1 in Andrews (1995) to incorporate the supremum over y-values, which leads to the additional term Op (hR yT ). It is done in three R steps. Recall that L(·) was defined as L(y) ≡ 1I(y −u)K0 (u)du. Let It (y) be L[(y −Yt )/hyT ] if λ = 0, K0 [(y−Yt )/hyT ] if λ = 1, and K00 [(y−Yt )/hyT ] if λ = 2. Thus, omitting the subscript T , we have sup (y,w)∈Rm+1 |Dλ Ĝ(y, w) − Hλ0 (y, w)| ¯ ¯ ¶ µ T T ¯ 1 X ¯ X w − Wt 1 ¯ ¯ λ 0 0 − D F (y|w) gt (w)¯ It (y)K = sup ¯ λ m ¯ hw T t=1 (y,w)∈Rm+1 ¯ T hy hw t=1 EFFICIENT QUANTILE ESTIMATION 51 for λ = 0, 1, 2. The desired result then follows from: (i) ¯ ¶ µ T ¯ 1 X w − Wt ¯ sup ¯ λ m It (y)K hw (y,w)∈Rm+1 ¯ T hy hw t=1 à ! ∙ µ ¶¸¯ T w − Wt ¯¯ 1 X 1 (52) − E It (y)K ¯ = Op √ λ m , ¯ T hλy hm hw T hy hw w t=1 which is proved in Step 1 by adapting Andrews’ (1995) proof of Lemma A-2, (ii) ¯ ∙ µ ¶¸ T ¯ 1 X w − Wt ¯ E It (y)K sup ¯ λ m hw (y,w)∈Rm+1 ¯ T hy hw t=1 ¶¸¯¯ ∙ µ T X ¡ R¢ 1 w − Wt ¯ λ 0 = O − E D F (y|W )K (53) hy , ¯ t p ¯ T hm hw w t=1 which is proved in Step 2, and (iii) ¯ ¶¸ ∙ µ T ¯ 1 X w − Wt ¯ λ 0 E D F (y|Wt )K sup ¯ m hw (y,w)∈Rm+1 ¯ T hw t=1 ¯ T ¯ X ¡ ¢ 1 ¯ λ 0 0 − D F (y|w) gt (w)¯ = Op hR (54) w , ¯ T t=1 which is proved in Step 3. STEP1: When λ = 0, note that |It (y)| 6 R |K0 (u)|du < ∞ by (A11)(iii). When λ = 1, |It (y)| 6 supy∈R |K0 (y)| < ∞ by (A11)(iii). When λ = 2, |It (y)| 6 supy∈R |K00 (y)| < ∞ by (A11)(iv). Hence, It (y) is bounded by some C0 < ∞. Moreover, (A6) and Theorem 3.49 in White (2001) guarantee that for every y, the sequence {(It (y), Wt0 )0 } is strong mixing with α of size −r/(r − 2), r > 2. Hence, for any (t, s), 1 6 t, s 6 T , T > 1, we have α(|t − s|) = O(|t − s|−r/(r−2)− ) for some > 0 (see Definition 3.45 in White, 2001), and P C1 ≡ ∞ s=0 α(s) < ∞. Thus, by Billingsley (1995, Lemma 2, p.365), we have ¯ ³ ´¯ ¯ ¯ ¯Cov It (y) cos(v0 Wt ), Iu (y) cos(v0 Wu ) ¯ 6 4C02 α(|t − u|), for any v ∈ Rm and any y, t, u ∈ R. Hence, instead of (A.15) in Andrews (1995), we have ! à T T −1 X 8C02 C1 1X 0 21 . It (y) cos(v Wt ) 6 8C0 α(s) 6 Var T t=1 T t=0 T As this also holds for sin(·) replacing cos(·), Lyapunov inequality implies ¯ ¯ r T n ¯1 X o¯ 8C1 ¯ ¯ 0 0 It (y) exp(iv Wt ) − E[It (y) exp(iv Wt )] ¯ 6 2C0 , E¯ ¯ ¯ T t=1 T 52 KOMUNJER AND VUONG for any v ∈ Rm and any y ∈ R. Let LT denote the left-hand side of (52). Using (A.11) in Andrews (1995) with λ = 0 and the above inequality, we obtain ¯ ¯ ! T n ¯ 1 X o¯ ¯ ¯ It (y) exp(iv 0 Wt ) − E[It (y) exp(iv 0 Wt )] ¯ |φ(hw v)|dv E(LT ) 6 E sup ¯ λ ¯ ¯ T h y∈R y t=1 r Z 8C1 1 C2 √ v)|dv = , 6 2C0 |φ(h w T hλy T hλy hm w ÃZ √ R where C2 = 2C0 8C1 |φ(u)|du < ∞ by (A11)(ii) using the change of variable u = hw v. By Markov inequality Equation (52) follows. STEP 2: Consider first λ = 0. Using Fubini’s Theorem, we note that µ ¶ y−Y E[It (y)|Wt = w] = L dF 0 (Y |w) hy ¶ Z Z µ y−Y − u K0 (u)dudF 0 (Y |w) = 1I hy Z = F 0 (y − hy u|w)K0 (u)du, Z (55) which does not depend on t because of (A9)(i). When λ = 1, using the change of variable Y = y − hy u, we note that (56) ∙ ¸ Z µ ¶ Z 1 It (y) y−Y 0 f (Y |w)dY = f 0 (y − hy u|w)K0 (u)du. E |Wt = w = K0 hy hy hy When λ = 2, using the change of variable Y = y − hy u and integration by parts, we have ∙ ¸ Z µ ¶ Z 1 0 y−Y 1 It (y) 0 f 0 (y − hy u|w)K00 (u)du f (Y |w)dY = |Wt = w = K E h2y h2y 0 hy hy Z = Df 0 (y − hy u|w)K0 (u)du. (57) EFFICIENT QUANTILE ESTIMATION 53 Now, let LT (y, w) denote the term inside the absolute value on the left-hand side of Equation (53). Combining Equations (55)-(57) with (A11)(iii), we have LT (y, w) ¸ µ ¶¾ ½∙ T w − Wt It (y) 1 X λ 0 − D F (y|Wt ) K E = T hm hλy hw w t=1 ¶¾ ¸ µ ½∙Z T 1 X w − Wt λ 0 λ 0 = E [D F (y − hy u|Wt ) − D F (y|Wt )]K0 (u)du K T hm hw w t=1 ¶ ¸ µ Z ∙Z T w−W 1 X 0 λ 0 λ 0 = [D F (y − hy u|W ) − D F (y|W )]K0 (u)du K gt (W )dW. hw T hm w t=1 Hence, taking an Rth-order Taylor expansion of Dλ F 0 (y−hyT u|W ) at y, and using (A11)(iii) we obtain sup (y,w)∈Rm+1 |LT (y, w)| 6 hR y sup |D λ+R (y,w)∈Rm+1 × sup sup ḡT0 (w), 0 F (y|w)| Z R |u K0 (u)|du Z |K(W̃ )|dW̃ T >1 w∈Rm which establishes Equation (53) because of (A8), (A9)(ii), and (A11)(i,iii). STEP 3: The study of the bias (54) is standard as in the proof of Lemma A-3 in Andrews (1995). Using (A9)(i) we have µ ∙ µ ¶¸ ¶ T T Z w−W w − Wt 1 X 1 X λ 0 λ 0 E D F (y|Wt )K D F (y|W )K = gt0 (W )dW m m T hw t=1 hw T hw t=1 hw Z = Hλ0 (y, w − hw W̃ )K(W̃ )dW̃ , where W̃ = (w − W )/hw . Hence, using a Taylor expansion of order R at w together with (A11)(i) we obtain ∙ µ ¶¸ T T 1 X w − Wt 1X 0 λ 0 λ 0 E D F (y|W )K g (w) − D F (y|w) t T hm hw T t=1 t w t=1 Z h i 0 0 = Hλ (y, w − hw W̃ ) − Hλ (y, w) K(W̃ )dW̃ ⎡ ⎤ Z X R R 0 (−1) R ∂ Hλ (y|w − h̃w W̃ ) r1 = ⎣ W̃1 . . . W̃mrm ⎦ K(W̃ )dW̃ , h R! w ∂W1r1 . . . ∂Wmrm |r|=R where 0 < h̃w < hw . This establishes Equation (54) using (A8), (A9)(ii) and (A11)(i). ¤ 54 KOMUNJER AND VUONG Proof of Theorem 6. From Equation (9), the first order conditions associated with θ̂T are √ (58) T ∇θ Ψ̂T (θ̂T ) = 0, a.s. − P, h i P where ∇θ Ψ̂T (θ) = (1/T ) Tt=1 {1I[qt (θ)−Yt ]−α}fˆt [qt (θ)]∇θ qt (θ) and fˆt (y) = dt ∂ Ĝ(y, Wt )/∂y /ĝ(Wt ). Given (A11)(iv) fˆt (·) is continuously differentiable on R. Thus, a first-order Taylor expansion of the condition (58) at θ0 gives √ c √ T ∇θ Ψ̂T (θ0 ) + ∆θθ Ψ̂T (θ̄T ) T (θ̂T − θ0 ) = 0, a.s. − P, (59) c where θ̄T ≡ cθ0 + (1 − c)θ̂T for some c ∈ (0, 1). To establish the theorem, we need two lemmas: Lemma 9. Suppose that (A0)-(A1), (A5)-(A7)(i), (A8)-(A10)(i) and (A11) hold. If bT → 0 √ c ∗ R R with bT T h2yT hm wT → ∞, bT /hwT → ∞ and bT /hyT → ∞, then ∆θθ Ψ̂T (θ̄ T ) − ∆θθ ΨT (θ 0 ) = op (1). In particular, the conditions in Theorem 6 imply the conditions in Lemma 9. Thus c ∆θθ Ψ̂T (θ̄T ) − ∆θθ Ψ∗T (θ0 ) = op (1). Lemma 10. Suppose that all the conditions of Theorem 6 hold. Then, ∇θ Ψ∗T (θ0 )] = op (1). √ T [∇θ Ψ̂T (θ0 ) − The remainder of the proof is straightforward: Equation (59), Lemmas 9 and 10 together ´ ³√ √ imply: T (θ̂T − θ0 ) = − [∆θθ Ψ∗T (θ0 ) + op (1)]−1 T ∇θ Ψ∗T (θ0 ) + op (1) , a.s. − P . Thus θ̂T √ is T -asymptotically equivalent to θ∗T . The desired result follows. ¤ Proof of Lemma 9. Note that the assumptions of Theorem 5 are satisfied under those of p c Lemma 9. Hence, θ̂T −→ θ0 . Moreover, because θ̄T = cθ0 + (1 − c)θ̂T for some c ∈ (0, 1), c p we have θ̄T −→ θ0 . Thus, it suffices to prove that supθ∈Θ |∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ)| = op (1), where ∆θθ Ψ̂T (θ) T n o 1X dt {1I[qt (θ) − Yt ] − α} D2 F̂ [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 + DF̂ [qt (θ)|Wt ]∆θθ qt (θ) , = T t=1 ∆θθ Ψ∗T (θ) T © ª 1X {1I[qt (θ) − Yt ] − α} D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 + DF 0 [qt (θ)|Wt ]∆θθ qt (θ) . = T t=1 EFFICIENT QUANTILE ESTIMATION Let T be such that T = o(bT ), T √ 2 m T hyT hwT → ∞, R T /hwT → ∞ and 55 R T /hyT → ∞. As sup |∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ)| 6 sup |∆θθ Ψ̂T (θ) − ∆θθ ΨT (θ)| + sup |∆θθ ΨT (θ) − ∆θθ Ψ∗T (θ)|, θ∈Θ θ∈Θ θ∈Θ where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms in the right-hand side of the above inequality are op (1). Given Lemma 8 and Equation (46) we will use a−1 T sup (y,w)∈Rm+1 0 |Dλ Ĝ(y, w) − HλT (y, w)| = op (1), 0 a−1 T sup |ĝ(w) − ḡT (w)| = op (1), w∈Rm √ R for λ = 1, 2, which hold for any sequence {aT } satisfying aT T h2yT hm wT → ∞, aT /hwT → ∞ and aT /hR yT → ∞. For λ = 1, 2 we will also use the identity Dλ F̂ (y|w) − Dλ F 0 (y|w) = 1 Dλ F 0 (y|w) 0 [Dλ Ĝ(y, w) − HλT (y, w)] − [ĝ(w) − ḡT0 (w)] ĝ(w) ĝ(w) which follows from Equation (50). The proof then draws from that of Theorem 5. Specifiθθ θθ θθ cally, in Step 1 we deal with ∆θθ Ψ̂T (θ) − ∆θθ Ψ̂T (θ) = −∆Ψ̂θθ 1T − ∆Ψ̂2T − ∆Ψ̂3T , where ∆Ψ̂jT are equal to ∆Ψ̂jT , for j = 1, 2, 3, where F̂ (Yt |Wt ) − F 0 (Yt |Wt ), F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt ), F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt ) are replaced by {D2 F̂ [qt (θ)|Wt ] − D2 F 0 [qt (θ)|Wt ]}∇θ qt (θ)∇θ qt (θ)0 , {DF̂ [qt (θ)|Wt ]− DF 0 [qt (θ)|Wt ]}∆θθ qt (θ), and D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 +DF 0 [qt (θ)|Wt ] ∆θθ qt (θ), respectively. We then obtain " −1 |∆Ψ̂θθJ 1T | 6 bT + sup (y,w)∈Rm+1 sup (y,w)∈Rm+1 0 |D2 Ĝ(y, w) − H2T (y, w)| #" |D2 F 0 (y|w)| sup |ĝ(w) − ḡT0 (w)| w∈Rm # T 1X Jt sup |∇θ qt (θ)∇θ qt (θ)0 | . T t=1 θ∈Θ Thus, supθ∈Θ ∆Ψ̂θθJ 1T = op (1) as Cauchy-Schwarz inequality gives T 1X Jt sup |∇θ qt (θ)∇θ qt (θ)0 | 6 T t=1 θ∈Θ (60) à T 1X Jt T t=1 = op (1), !1/2 à ¶2 !1/2 T µ 1X sup |∇θ qt (θ)∇θ qt (θ)0 | T t=1 θ∈Θ P by Equation (51) and (1/T ) Tt=1 (supθ∈Θ |∇θ qt (θ)∇θ qt (θ)0 |)2 = Op (1), which follows from " µ ¶2 ¶2 # T µ 1X 0 0 6 sup E sup |∇θ qt (θ)∇θ qt (θ) | < ∞, sup |∇θ qt (θ)∇θ qt (θ) | E T t=1 θ∈Θ 16t6T,T >1 θ∈Θ 56 KOMUNJER AND VUONG using (A7)(i) and Markov inequality. Similarly, supθ∈Θ ∆Ψ̂θθJ 2T = op (1) using T 1X Jt sup |∆θθ qt (θ)| = op (1). T t=1 θ∈Θ (61) Regarding ∆Ψ̂θθJ 3T , we have |∆Ψ̂θθJ 3T | T 1X 6 sup |D F (y|w)| Jt sup |∇θ qt (θ)∇θ qt (θ)0 | T t=1 θ∈Θ (y,w)∈Rm+1 2 + sup 0 |DF 0 (y|w)| (y,w)∈Rm+1 T 1X Jt sup |∆θθ qt (θ)|, T t=1 θ∈Θ showing that supθ∈Θ ∆Ψ̂θθJ 3T = op (1) using Equations (60) − (61) and (A9)(ii). In Step 2, we deal with ∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ) = −∆Ψ1Tθθ − ∆Ψ2Tθθ + ∆Ψ3Tθθ , where ∆ΨjTθθ are equal to ∆ΨjT , for j = 1, 2, 3, where F̂ (Yt |Wt ) − F 0 (Yt |Wt ), F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt ), F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt ) are replaced by {D2 F̂ [qt (θ)|Wt ] − D2 F 0 [qt (θ)|Wt ]}∇θ qt (θ)∇θ qt (θ)0 , {DF̂ [qt (θ)|Wt ]−DF 0 [qt (θ)|Wt ]}∆θθ qt (θ), and D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 +DF 0 [qt (θ)|Wt ] ∆θθ qt (θ), respectively. We then obtain |∆Ψ1Tθθ | 6 (bT )−1 (1 − η)−1 + sup (y,w)∈Rm+1 2 " sup (y,w)∈Rm+1 0 |D2 Ĝ(y, w) − H2T (y, w)| 0 |D F (y|w)| sup |ĝ(w) − w∈Rm with probability approaching 1, where (1/T ) 0 #" ḡT0 (w)| PT t=1 # T 1X 0 d sup |∇θ qt (θ)∇θ qt (θ) | , T t=1 t θ∈Θ dt supθ∈Θ |∇θ qt (θ)∇θ qt (θ)0 | 6 (1/T ) supθ∈Θ |∇θ qt (θ)∇θ qt (θ) | = Op (1) by Markov inequality and (A7)(i). Hence, PT t=1 θθ supθ∈Θ ∆Ψ1T = EFFICIENT QUANTILE ESTIMATION 57 op (1). Similarly, supθ∈Θ ∆Ψ2Tθθ = op (1). Regarding ∆Ψ3Tθθ , we have |∆Ψ3Tθθ | " #" # T X 1 6 sup |D2 F 0 (y|w)| (1 − dT ) sup |∇θ qt (θ)∇θ qt (θ)0 | T t=1 θ∈Θ (y,w)∈Rm+1 # #" " T X 1 (1 − dT ) sup |∆θθ qt (θ)| + sup |DF 0 (y|w)| T t=1 θ∈Θ (y,w)∈Rm+1 #1/2 " #" " ¶2 #1/2 T T µ X X 1 1 sup |∇θ qt (θ)∇θ qt (θ)0 | (1 − dT )2 6 sup |D2 F 0 (y|w)| T t=1 T t=1 θ∈Θ (y,w)∈Rm+1 #1/2 " #" " ¶2 #1/2 T T µ X X 1 1 (1 − dT )2 sup |∆θθ qt (θ)| + sup |DF 0 (y|w)| T T m+1 θ∈Θ (y,w)∈R t=1 t=1 which is an op (1) as (1/T ) Theorem 5. PT t=1 (1 − dT )2 = (1/T ) PT t=1 (1 − dT ) = op (1) from Step 1 of ¤ ³ Proof of Lemma 10. Note that for any density ft (·), we have E ft [qt (θ0 )]∇θ qt (θ0 ){1I[qt (θ0 )− ´ Yt ] − α} = 0. Thus, Lemma 10 could be established from (i) the stochastic equicontinuity √ P at f 0 (·|·) of the vector process ν T (f) = (1/ T ) Tt=1 f [qt (θ0 )|Wt ]∇θ qt (θ0 ){1I[qt (θ0 ) −Yt ] −α} ˆ with respect to some pseudo-metric ρ(f1 , f2 ), and (ii) the consistency of f(·|·) = 1I[ĝ(·) − bT ]DĜ(·, ·)/ĝ(·) to f 0 (·|·) with respect to ρ(·, ·). See Andrews (1994b) for some general results on stochastic equicontinuity. These require, however, a more elaborate trimming than the one used here in view of Andrews (1995, p.571). We thus prove Lemma 10 directly. Though more complex, our proof draws from the asymptotic normality proof of Theorem 1 in Lavergne and Vuong (1996). For similar asymptotic normality proofs in the iid case see also Robinson (1988) and Hardle and Stoker (1989). As previously, we let qt0 = qα (Wt , θ0 ). Moreover, let T /(T 1/4 R hyT ) T be such that → ∞. As ∇θ Ψ̂T (θ0 ) − ∇θ Ψ∗T (θ0 ) T = o(bT ), TT 1/4 hyT hm wT → ∞, T /(T 1/4 R hwT ) → ∞ and h i = ∇θ Ψ̂T (θ0 ) − ∇θ ΨT (θ0 ) + [∇θ ΨT (θ0 ) − ∇θ Ψ∗T (θ0 )] , where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms on the righthand side of the above equality are op (T −1/2 ). Given Lemma 8 and Equation (46) we shall 58 KOMUNJER AND VUONG use a−2 T (62) (63) a−2 T sup (y,w)∈Rm+1 sup (y,w)∈Rm+1 0 |DĜ(y, w) − H1T (y, w)|2 = op (T −1/2 ), 0 |DĜ(y, w) − H1T (y, w)| sup |ĝ(w) − ḡT0 (w)| = op (T −1/2 ), w∈Rm 0 2 = op (T −1/2 ), a−2 T sup |ĝ(w) − ḡT (w)| (64) w∈Rm 1/4 R which hold for any sequence {aT } satisfying aT T 1/4 hyT hm hwT ) → ∞ and wT → ∞, aT /(T aT /(T 1/4 hR yT ) → ∞. We shall also use the identities (65) fˆ(y|w) − f 0 (y|w) = = (66) f 0 (y|w) 1 0 [DĜ(y, w) − H1T (y, w)] − [ĝ(w) − ḡT0 (w)], ĝ(w) ĝ(w) DĜ(y, w) − f 0 (y|w)ĝ(w) f 0 (y|w) + [ĝ(w) − ḡT0 (w)]2 ḡT0 (w) ĝ(w)ḡT0 (w) 1 0 [DĜ(y, w) − H1T (y, w)][ĝ(w) − ḡT0 (w)]. − ĝ(w)ḡT0 (w) STEP1: We first show that ∇θ Ψ̂T (θ0 ) − ∇θ ΨT (θ0 ) = op (T −1/2 ). We have T i √ h 1 X ˆ 0 |Wt )∇θ q 0 √ (θ ) − ∇ Ψ (θ ) = T ∇θ Ψ̂T 0 (Jt − Ht )[1I(qt0 − Yt ) − α]f(q θ T 0 t t T t=1 √ √ T ∆Ψ̂θ1T (θ0 ) + T ∆Ψ̂θ2T (θ0 ), = where Jt = dt (1 − dt ), Ht = (1 − dt )dt and T √ 1 X θ T ∆Ψ̂1T (θ0 ) = √ (Jt − Ht )[1I(qt0 − Yt ) − α][fˆ(qt0 |Wt ) − f 0 (qt0 |Wt )]∇θ qt0 , T t=1 T √ 1 X T ∆Ψ̂θ2T (θ0 ) = √ (Jt − Ht )[1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 . T t=1 As Ht 6 1I[|ĝ(Wt ) − ḡT0 (Wt )| − T] and the event {supw |ĝ(w) − ḡT0 (w)| > probability 0 because Property (64) holds with aT = T T} has asymptotic by construction of T, we have sup16t6T,T >1 Ht = 0 with probability approaching one. Hence, we need to consider the Jt −1/2 terms only. Namely, it suffices to show that ∆Ψ̂θJ ) for j = 1, 2. Using jT (θ 0 ) = op (T EFFICIENT QUANTILE ESTIMATION Equality (65) and the definition of Jt , we obtain " −1 |∆Ψ̂θJ 1T (θ 0 )| 6 bT + sup (y,w)∈Rm+1 sup (y,w)∈Rm+1 |∆Ψ̂θJ 2T (θ 0 )| 0 |DĜ(y, w) − H1T (y, w)| #" f 0 (y|w) sup |ĝ(w) − ḡT0 (w)| w∈Rm T 1X 6 Jt f 0 (qt0 |Wt )|∇θ qt0 |. T t=1 # T X 1 Jt |∇θ qt0 | , T t=1 P P Jt |∇θ qt0 | 6 (1/T ) Tt=1 (1 − dt )|∇θ qt0 | = Op (bγT ) and (1/T ) Tt=1 Jt f 0 (qt0 |Wt ) P |∇θ qt0 | 6 (1/T ) Tt=1 (1 − dt )f 0 (qt0 |Wt )|∇θ qt0 | = Op (b2γ T ) by Markov inequality combined with But (1/T ) PT 59 t=1 (A10)(ii)-(iii) where cT = bT = O(bT ). Hence, using sup(y,w)∈Rm+1 f 0 (y|w) < ∞ by (A9)(ii) −1/4 γ combined with Properties (62) and (64) where aT = bT , we obtain ∆Ψ̂θJ bT ) 1T (θ 0 ) = Op (T 2γ −1/(4γ) −1/2 and ∆Ψ̂θJ ) we obtain ∆Ψ̂θJ ) for 2T (θ 0 ) = Op (bT ). Since bT = o(T jT (θ 0 ) = op (T j = 1, 2, as desired. STEP 2: We next show that ∇θ ΨT (θ0 ) − ∇θ Ψ∗T (θ0 ) = op (T −1/2 ). We have ∇θ ΨT (θ0 ) = P µ0 + [∇θ ΨT (θ0 ) − µ0 ], where µ0 ≡ T −1 Tt=1 dt [1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 and ∇θ ΨT (θ0 ) − µ0 T 1X d [1I(qt0 − Yt ) − α][fˆ(qt0 |Wt ) − f 0 (qt0 |Wt )]∇θ qt0 = T t=1 t T 1X DĜ(qt0 , Wt ) − f 0 (qt0 |Wt )ĝ(Wt ) 0 = ∇θ qt0 dt [1I(qt − Yt ) − α] 0 T t=1 ḡT (Wt ) T f 0 (qt0 |Wt ) 1X dt [1I(qt0 − Yt ) − α] [ĝ(Wt ) − ḡT0 (Wt )]2 ∇θ qt0 + T t=1 ĝ(Wt )ḡT0 (Wt ) T 0 (qt0 , Wt ) 1X DĜ(qt0 , Wt ) − H1T [ĝ(Wt ) − ḡT0 (Wt )]∇θ qt0 − dt [1I(qt0 − Yt ) − α] T t=1 ĝ(Wt )ḡT0 (Wt ) ≡ µ1 + µ2 − µ3 , using Equality (66). Hence, ∇θ ΨT (θ0 ) = µ0 + µ1 + µ2 − µ3 . Thus, the proof is complete if µ0 = ∇θ Ψ∗T (θ0 ) + op (T −1/2 ) and µj = op (T −1/2 ) for j = 1, 2, 3, as shown next. STEP 2a: We show that µ0 = ∇θ Ψ∗T (θ0 ) + op (T −1/2 ). We have µ0 = ∇θ Ψ∗T (θ0 ) T 1X − (1 − dt )[1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 . T t=1 60 KOMUNJER AND VUONG Let µ02 denote the second term on the right-hand side of the above equality. Thus, it P suffices to show that µ02 = op (T −1/2 ). But, from Step 1 we know that |µ02 | 6 T −1 Tt=1 (1 − −1/(4γ) dt )f 0 (qt0 |Wt )|∇θ qt0 | = Op (b2γ ). T ). The desired result follows from bT = o(T STEP 2b: Next, we show that µ2 = µ3 = op (T −1/2 ). We have √ | T µ2 | 6 √ T 0 sup f (y|w) sup [ĝ(w) − bT inf {w:ḡT0 (w)>bT } |ĝ(w)| (y,w)∈Rm+1 w∈Rm ḡT0 (w)]2 T 1X |∇θ qt0 |. T t=1 But Property (64) with aT = bT implies (bT )−1 supw |ĝ(w) − ḡT0 (w)| = op (T −1/4 ) = op (1). Hence, for any η ∈ (0, 1) we have inf {w:ḡT0 (w)/bT >1} |ĝT0 (w)|/bT > 1 − η with probability approaching one. Thus, with probability approaching one √ | T µ2 | 6 √ T X T 0 0 21 |∇θ qt0 |, sup f (y|w) sup [ĝ(w) − ḡT (w)] 2 (bT ) (1 − η) (y,w)∈Rm+1 T t=1 w∈Rm which is an op (1) by Property (64) with aT = bT as sup(y,w)∈Rm+1 f 0 (y|w) < ∞ and (1/T ) |∇θ qt0 | = Op (1) as noted earlier. That is, µ2 = op (T √ | T µ3 | 6 × −1/2 ). Similarly, PT t=1 √ T bT inf {w:ḡT0 (w)>bT } |ĝ(w)| sup (y,w)∈Rm+1 0 |DĜ(y, w) − H1T (y, w)| sup |ĝ(w) − ḡT0 (w)| w∈Rm T 1X |∇θ qt0 |, T t=1 which shows that µ3 = op (T −1/2 ) using the same argument with Property (63). STEP 2c: Lastly, we show that µ1 = op (T −1/2 ). Let K0T (·) ≡ (1/hyT )K0 (·/hyT ) and KT (·) ≡ (1/hm wT )K0 (·/hwT ). Thus, from the definitions of D Ĝ(y, w) and ĝ(w) we have µ1 T T ¤ 1 X X 1I(qt0 − Yt ) − α £ K0T (qt0 − Ys ) − f 0 (qt0 |Wt ) KT (Wt − Ws )∇θ qt0 dt = 0 2 T t=1 s=1 ḡT (Wt ) ≡ L+ T −1 U, T EFFICIENT QUANTILE ESTIMATION 61 where L and U are the diagonal and U-statistic parts defined as T ¤ 1 X 1I(qt0 − Yt ) − α £ 0 0 0 K (q − Y ) − f (q |W ) KT (0)∇θ qt0 d L ≡ 0T t t t t t 0 2 T t=1 ḡT (Wt ) X 1 uT ts U ≡ T (T − 1) 16t6=s6T ¢ 1¡ 0 uT ts + u0T st ≡ hT (Yt , Wt , Ys , Ws ) 2 £ ¤ 1I(qt0 − Yt ) − α = K0T (qt0 − Ys ) − f 0 (qt0 |Wt ) KT (Wt − Ws )dt ∇θ qt0 ḡT0 (Wt ) £ ¤ 1I(qs0 − Ys ) − α = K0T (qs0 − Yt ) − f 0 (qs0 |Ws ) KT (Ws − Wt )ds ∇θ qs0 , ḡT0 (Ws ) uT ts = u0T ts u0T st for 1 6 t 6= s 6 T . Note that hT (Yt , Wt , Ys , Ws ) is symmetric in (Yt , Wt ) and (Ys , Ws ). Hence, it suffices to show that L and U are both op (T −1/2 ). For L we have √ 1 | T L| 6 √ T bT hm wT " # T 1 1X 0 sup |K0 (y)| + sup f (y|w) |K(0)| |∇θ qt0 | hyT y∈R T t=1 (y,w)∈Rm+1 where sup(y,w)∈Rm+1 f 0 (y|w) < ∞, supy∈R |K0 (y)| < ∞ and |K(0)| < ∞ by (A11)(iii) and P (A9)(ii). As (1/T ) Tt=1 |∇θ qt0 | = Op (1) by (A7)(i), (A5) and Markov inequality, we obtain √ √ 1/4 1/4 → ∞ using bT = bT (1 + o(1)). hyT hm T L = op (1) because T bT hyT hm wT T wT = bT T It remains to be shown that U = op (T −1/2 ). Because of the stationarity assumption (A6’)(i), we have ḡT0 (·) = gt0 (·) ≡ g 0 (·). Moreover, from the Hoeffding decomposition (see e.g. Arcones (1995, eq. 1.7)), we have U = U0 + 2U1 + U2 where Z Z Z Z 2 Y U0 = hT (y1 , w1 , y2 , w2 ) [f 0 (yt |wt )g 0 (wt )dyt dwt ] (67) t=1 (68) U1 = (69) U2 = (70) (71) T 1X hT 1 (Yt , Wt ) T t=1 X 1 hT 2 (Yt , Wt , Ys , Ws ) T (T − 1) 16t6=s6T Z Z hT 1 (y1 , w1 ) = hT (y1 , w1 , y2 , w2 )f 0 (y2 |w2 )g 0 (w2 )dy2 dw2 − U0 hT 2 (y1 , w1 , y2 , w2 ) = hT (y1 , w1 , y2 , w2 ) − hT 1 (y1 , w1 ) − hT 1 (y2 , w2 ) − U0 . Note that U0 6= E[U] as Q2 t=1 [f 0 (yt |wt )g0 (wt )] is not the joint density of (Y1 , W1 , Y2 , W2 ), while hT 1 (·) and hT 2 (·) are canonical kernels, i.e. symmetric kernels satisfying E[hT 1 (Y1 , W1 )] 62 KOMUNJER AND VUONG = 0 and E[hT 2 (y1 , w1 , Y2 , W2 )] = 0, respectively, as noted by Arcones (1995). Thus, it suffices √ to show that T Uk = op (1) for k = 0, 1, 2. √ STEP 2c(i): We start by showing that T U0 = op (1). In fact, we have U0 = 0 as Equation (67) gives U0 Z Z Z Z 2 Y 1 0 0 (u + uT 21 ) [f 0 (yt |wt )g 0 (wt )dyt dwt ] = 2 T 12 t=1 ¾ Z Z ½Z £ ¤ 0 1 0 0 0 K0T (q1 − y2 ) − f (q1 |w1 ) f (y2 |w2 )dy2 = 2 ¾ ½Z ¤ 0 £ 0 d ∇θ q10 0 g (w1 )g0 (w2 )dw1 dw2 × 1I(q1 − y1 ) − α f (y1 |w1 )dy1 KT (w1 − w2 ) 10 g (w1 ) ¾ Z Z ½Z £ ¤ 0 1 0 0 0 K0T (q2 − y1 ) − f (q2 |w2 ) f (y1 |w1 )dy1 + 2 ¾ ½Z ¤ 0 £ 0 d2 ∇θ q20 0 g (w1 )g0 (w2 )dw1 dw2 , × 1I(q2 − y2 ) − α f (y2 |w2 )dy2 KT (w2 − w1 ) 0 g (w2 ) R [1I(qt0 − yt ) − α] f 0 (yt |wt )dyt = 0 for any t by assumptions (A1) and (A9)(i). √ STEP 2c(ii): We now show that T U1 = op (1). By Markov inequality it suffices to show where that E(T U12 ) = o(1). But assumption (A6’) and Lemma 3 in Arcones (1995) with p = r imply (72) E(T U12 ) = T −1 E " ³ X 16t6T # T −1 ³ ´ ´2 X (r−2)/r −1 −1 tβ t MT2 1 , hT 1 (Yt , Wt ) 6c T +T t=1 where β t are the mixing coefficients of {(Yt , Wt0 )0 }, c is a universal constant and MT 1 = sup16t<∞ [E|hT 1 (Yt , Wt )|r ]1/r . Note that Lemma 3 in Arcones (1995) is written for canonical kernels that are independent of T . It is, however, easy to see from his proofs that this lemma and Lemma 8, which is used to prove it, both hold even when canonical kernels depend on P (r−2)/r T as in hT 1 (·) and hT 2 (·). From (A6’)(ii) we know that ∞ < ∞ (see e.g. White t=1 β t PT −1 (r−2)/r −1 −1 2001 for the definition of the size) hence T + T = O(1). We now show t=1 tβ t that MT 1 → 0. As U0 = 0 and the integral of u0T 21 with respect to f 0 (y2 |w2 )g 0 (w2 )dy2 dw2 is EFFICIENT QUANTILE ESTIMATION zero because so R 63 [1I(q20 − y2 ) − α] f 0 (y2 |w2 )dy2 = 0, we have from Equation (70) |hT 1 (y1 , w1 )| ¯ ¯1 £ ¤ d ∇θ q10 = ¯¯ 1I(q10 − y1 ) − α 10 2 g (w1 ) ¯ ¾ Z ½Z ¯ £ ¤ 0 0 0 0 0 × K0T (q1 − y2 ) − f (q1 |w1 ) f (y2 |w2 )dy2 KT (w1 − w2 )g (w2 )dw2 ¯¯ ¯ ¯Z ½Z ¾ ¯ £ 0 0 ¤ |∇θ q10 | ¯¯ 0 0 0 ¯ 6 K (u) f (q − uh |w ) − f (q |w ) du K (w − w )g (w )dw 0 yT 2 1 T 1 2 2 2 1 1 ¯ 2bT ¯ ¯ ¯ ½ ¾ Z Z ¯ £ 0 0 ¤ |∇θ q10 | ¯¯ 0 0 0 6 K0 (u) f (q1 − uhyT |w2 ) − f (q1 |w2 ) du KT (w1 − w2 )g (w2 )dw2 ¯¯ ¯ 2bT ¯Z ¯ ¯ ¤ |∇θ q10 | ¯¯ £ 0 0 0 0 0 + f (q1 |w2 ) −f (q1 |w1 ) KT (w1 − w2 )g (w2 )dw2 ¯¯ ¯ 2bT ¯ Z ¯Z ¯ £ 0 0 ¤ |∇θ q10 | ¯¯ 0 0 ¯ |KT (w1 − w2 )|g0 (w2 )dw2 6 K (u) f (q − uh |w ) − f (q |w ) du 0 yT 2 2 1 1 ¯ ¯ 2bT ¯ ¯ Z ¯ ¤ |∇θ q10 | ¯¯ £ 0 0 0 0 0 0 ¯ f (q |w )g (w ) − f (q |w )g (w ) K (w − w )dw + 2 2 1 1 T 1 2 2 1 1 ¯ 2bT ¯ ¯ ¯ Z ¯ ¯ £ 0 ¤ |∇θ q10 | 0 0 + g (w2 ) − g 0 (w1 ) KT (w1 − w2 )dw2 ¯¯ f (q1 |w1 ) ¯¯ 2bT Z |∇θ q10 | R 6 O(hyT ) |K(v)|g 0 (w1 − vhwT )dv bT ¯ ¯Z ¯ ¤ |∇θ q10 | ¯¯ £ 0 0 0 0 0 0 ¯ + f (q |w − vh )g (w − vh ) − f (q |w )g (w ) K(v)dv 1 wT 1 wT 1 1 1 1 ¯ 2bT ¯ ¯ ¯ Z ¯ ¯ £ 0 ¤ |∇θ q10 | 0 0 + f (q1 |w1 ) ¯¯ g (w1 − vhwT ) − g 0 (w1 ) K(v)dv ¯¯ 2bT ª |∇θ q10 | © 0 0 R 6 O(hR yT ) + [1 + f (q1 |w1 )]O(hwT ) , bT (73) |hT 1 (y1 , w1 )| 6 ª |∇θ q10 | © R O(hR ) + O(h ) , yT wT bT where we have used the change of variables u = (q10 − y2 )/hyT and v = (w1 − w2 )/hwT combined with (A8), (A9)(ii), (A11)(i,iii) and Taylor expansions of order R of the inte- grands. As E|∇θ q10 |r < sup16t6T,T >1 E[supθ∈Θ |∇θ qα (Wt , θ)|r ] < ∞ by (A5) and (A7)(i), ª © R uniformly in t. Hence, it follows that (E|hT 1 (Yt , Wt )|r )1/r 6 (1/bT ) O(hR yT ) + O(hwT ) © ª R R R MT 1 6 (1/bT ) O(hR yT ) + O(hwT ) . Given bT = bT [1 + o(1)], hyT = o(bT ) and hwT = o(bT ), 64 KOMUNJER AND VUONG 1/4 R which follow from bT /(T 1/4 hR hwT ) → ∞ respectively, we have that yT ) → ∞ and bT /(T MT 1 = o(1). Combining Property (72) and (A6’)(i) then gives E(T U12 ) = o(1) as desired. √ STEP 2c(iii): Finally we show that T U2 = op (1). Again, by Markov inequality it suffices to show that E(T U22 ) = o(1). Similar to the previous case, Assumption (A6’) and Lemma 3 in Arcones (1995) with p = 2r imply " # ¶2 µ ³ X ´2 T hT 2 (Yt , Wt , Ys , Ws ) T −3 E E(T U22 ) = T −1 16t6=s6T (74) 6 µ T T −1 ¶2 ³ T −1 ´ X (r−1)/r MT2 2 , c T −1 + T −1 tβ t t=1 1/(2r) where c is a universal constant and MT 2 = sup16t6=s<∞ [E|hT 2 (Yt , Wt , Ys , Ws )|2r ] . We √ P (r−1)/r T −1 tβ t = O(1/ T ) and that MT 2 = o(T 1/4 ). The first now show that T −1 + T −1 t=1 P P (r−1)/r (r−1)/r property is implied by ∞ < ∞ for which it suffices to show that 2τ → t=1 tβ t t=τ tβ t P∞ (r−2)/r (r−2)/r 0 as τ → ∞. As previously, from (A6’)(ii) we know that t=1 β t < ∞ hence β t t→ P P (r−1)/r 2τ 2τ 0 as t → ∞ and β t 6 tr/(2−r) for t large enough. Thus t=τ tβ t 6 t=τ t−1/(r−2) which vanishes when 2 < r < 3 as assumed. For the second property, we bound MT 2 . From Equations (71), (73) and U0 = 0 we obtain ª |∇θ q10 | + |∇θ q20 | © R ) + O(h ) O(hR yT wT bT ª |∇θ q10 | + |∇θ q20 | © −1 R 6 + O(hR O(hyT hm wT ) yT ) + O(hwT ) , bT |hT 2 (y1 , w1 , y2 , w2 )| 6 |hT (y1 , w1 , y2 , w2 )| + where the second equality follows from the definitions of u0T 12 and u0T 21 , where supy∈R |K0 (y)| < ∞, supw∈Rm |K(w)| < ∞ and sup(y,w)∈Rm+1 f 0 (y|w) < ∞ by (A11)(i,iii) and (A9)(ii). Thus, by Minkowski inequality we obtain n£ o £ ¤ ¤ 0 2r 1/(2r) 0 2r 1/(2r) E|∇θ qt | + E|∇θ qs | MT 2 6 sup × 16t6=s<∞ ( µ O ! µ R ¶) hR hwT yT +O +O m bT hyT hwT bT bT ! à µ R ¶ ¶ µ hR hwT 1 yT +O , +O = O m bT hyT hwT bT bT 1 ¶ à R 1/4 by (A7)(i). Given bT = bT [1 + o(1)], hR hyT hm yT = o(bT ), hwT = o(bT ) and bT T wT → ∞, we have MT 2 = o(T 1/4 ) as desired. Thus, Equation (74) implies E(T U22 ) = o(1). ¤ EFFICIENT QUANTILE ESTIMATION 65 References Ai, C., and X. Chen (2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions,” Econometrica, 71, 1795—1843. Altonji, J., and L. Segal (1996): “Small Sample Bias in GMM Estimation of Covariance Structures,” Journal of Economic and Business Statistics, 14, 353—366. Andrews, D. W. K. (1994a): “Asymptotics for Semiparametric Econometric Models Via Stochastic Equicontinuity,” Econometrica, 62, 43—72. (1994b): “Empirical Process Methods in Econometrics,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, pp. 2247—2294. Elsevier Science. (1995): “Nonparametric Kernel Estimation for Semiparametric Econometric Models,” Econometric Theory, 11, 560—596. Angrist, J., V. Chernozhukov, and I. Fernandez-Val (2006): “Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure,” Econometrica, 74, 539—564. Arcones, M. A. (1995): “On the Central Limit Theorem for U-Statistics under Absolute Regularity,” Statistics and Probability Letters, 24, 245—249. Begun, J., W. Hall, W. Huang, and J. Wellner (1983): “Information and Asymptotic Efficiency in Parametric-Nonparametric Models,” Annals of Statistics, 11, 432—452. Bickel, P. J. (1982): “On Adaptive Estimation,” Annals of Statistics, 10, 647—671. Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1998): Efficient and Adaptive Estimation for Semiparametric Models. Springer-Verlag, New York. Bierens, H. J. (1983): “Uniform Consistency of Kernel Estimators of a Regression Function Under Generalized Conditions,” Journal of the American Statistical Association, 78, 699— 707. (1987): “Kernel Estimators of Regression Functions,” in Advances in Econometrics: Fifth World Congress, ed. by T. Bewley, vol. 1 of Advances in Econometrics: Fifth World Congress, pp. 99—144. Cambridge University Press, New York. Bierens, H. J., and D. Ginther (2001): “Integrated Conditional Moment Testing of Quantile Regression Models,” Empirical Economics, 26, 307—324. Bilingsley, P. (1995): Probability and Measure. John Wiley and Sons, New York, 3rd edn. Bracewell, R. N. (2000): The Fourier Transform and Its Applications. McGraw—Hill, 3rd edn. 66 KOMUNJER AND VUONG Bradley, R. C. (1986): “Basic Properties of Strong Mixing Conditions,” in Dependence in Probability and Statistics, ed. by E. Eberlein, and M. S. Taqqu, pp. 165—192. Birkhauser, Boston. Brown, B., and W. K. Newey (1998): “Efficient Semiparametric Estimation of Expectations,” Econometrica, 66, 453—464. Buchinsky, M. (1994): “Changes in the US Wage Structure 1963-1987: Application of Quantile Regression,” Econometrica, 62, 405—458. Buchinsky, M., and J. Hahn (1998): “An Alternative Estimator for the Censored Quantile Regression Model,” Econometrica, 66, 653—671. Cai, Z. (2002): “Regression Quantiles for Time Series,” Econometric Theory, 18, 169—192. Chamberlain, G. (1986): “Asymptotic Efficiency in Semi-Parametric Models with Censoring,” Journal of Econometrics, 32, 189—218. (1987): “Asymptotic Efficiency in Estimation with Conditional Moment Restrictions,” Journal of Econometrics, 34, 305—334. Chernozhukov, V., and H. Hong (2002): “Three-Step Censored Quantile Regression and Extramarital Affairs,” Journal of the American Statistical Association, 97, 872—882. Cosslett, S. R. (2004): “Efficient Semiparametric Estimation of Censored and Truncated Regressions Via a Smoothed Self-Consistency Equation,” Econometrica, 72, 1277—1293. Gourieroux, C., and A. Monfort (1995): Statistics and Econometric Models. Cambridge University Press. Gourieroux, C., A. Monfort, and A. Trognon (1984): “Pseudo Maximum Likelihood Methods: Theory,” Econometrica, 52, 681—700. Hahn, J. (1997): “Efficient Estimation of Panel Data Models with Sequential Moment Restrictions,” Journal of Econometrics, 79, 1—21. Hansen, B. E. (2004a): “Nonparametric Estimation of Smooth Conditional Distributions,” University of Winsconsin, Madison. (2004b): “Uniform Convergence Rates for Kernel Estimation,” University of Winsconsin, Madison. Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moment Estimators,” Econometrica, 50, 1029—1054. Hansen, L. P., J. Heaton, and M. Ogaki (1988): “Efficiency Bounds Implied by Multiperiod Conditional Moment Restrictions,” Journal of the American Statistical Association, 83, 863—871. EFFICIENT QUANTILE ESTIMATION 67 Hardle, W., and T. Stoker (1989): “Investigating Smooth Multiple Regression by the Method of Average Derivatives,” Journal of the American Statistical Association, 89, 986— 995. Hjort, N. L., and D. Pollard (1993): “Asymptotics for Minimizers of Convex Processes,” Yale University. Horowitz, J., and V. G. Spokoiny (2002): “An Adaptive Rate-Optimal Test of Linearity for Median Regression Models,” Journal of the American Statistical Association, 97, 822— 835. Huber, P. J. (1967): “The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions,” in Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics and Probability, Berkeley. University of California Press. Khan, S. (2001): “Two-Stage Rank Estimation of Quantile Index Models,” Journal of Econometrics, 100, 319—335. Kim, T.-H., and H. White (2003): “Estimation, Inference, and Specification Analysis for Possibly Misspecified Quantile Regression,” in Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later, ed. by T. Fromby, and R. C. Hill, pp. 107—132. Elsevier, New York. Kitamura, Y., G. Tripathi, and H. Ahn (2004): “Empirical Likelihood-Based Inference in Conditional Moment Restriction Models,” Econometrica, 72, 1667—1714. Knight, K. (1998): “Limiting Distributions for L1 -Regression Estimators under General Conditions,” Annals of Statistics, 26, 755—770. Koenker, R., and G. Bassett, Jr. (1978): “Regression Quantiles,” Econometrica, 46(1), 33—50. (1982): “Robust Tests for Heteroscedasticity Based on Regression Quantiles,” Econometrica, 50(1), 43—62. Koenker, R., and K. F. Hallock (2001): “Quantile Regression,” Journal of Economic Perspectives, 15(4), 143—156. Koenker, R., and Z. Xiao (2002): “Inference on the Quantile Regression Process,” Econometrica, 70, 1583—1612. Koenker, R., and Q. Zhao (1996): “Conditional Quantile Estimation and Inference for ARCH Models,” Econometric Theory, 12, 793—813. Komunjer, I. (2005a): “Asymmetric Power Distribution: Theory and Applications to Risk Measurement,” University of California, San Diego. 68 KOMUNJER AND VUONG (2005b): “Quasi-Maximum Likelihood Estimation for Conditional Quantiles,” Journal of Econometrics, 128, 137—164. Lavergne, P., and Q. Vuong (1996): “Nonparametric Selection of Regressors: The Nonnested Case,” Econometrica, 64, 207—219. Newey, W., and R. J. Smith (2004): “Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators,” Econometrica, 72, 219—256. Newey, W. K. (1990a): “Efficient Instrumental Variables Estimation of Nonlinear Models,” Econometrica, 58, 809—837. (1990b): “Semiparametric Effficiency Bounds,” Journal of Applied Econometrics, 5, 99—135. (1993): “Efficient Estimation of Models with Conditional Moment Restrictions,” in Handbook of Statistics, Volume 11: Econometrics, ed. by G. S. Maddala, C. R. Rao, and H. D. Vinod, pp. 419—454. North Holland, Amsterdam. (2004): “Efficient Semiparametric Estimation Via Moment Restrictions,” Econometrica, 72, 1877—1897. Newey, W. K., and D. L. McFadden (1994): “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, pp. 2113—2247. Elsevier Science. Newey, W. K., and J. L. Powell (1990): “Efficient Estimation of Linear and Type I Censored Regression Models Under Conditional Quantile Restrictions,” Econometric Theory, 6, 295—317. Otsu, T. (2003): “Empirical Likelihood for Quantile Regression,” University of Wisconsin Madison. Pollard, D. (1991): “Asymptotics for Least Absolute Deviation Regression Estimators,” Econometric Theory, 7, 186—199. Portnoy, S. (1991): “Behavior of Regression Quantiles in Non-Stationary, Dependent Cases,” Journal of Multivariate Analysis, 38, 100—113. Powell, J. L. (1984): “Least Absolute Deviations Estimation for the Censored Regression Model,” Journal of Econometrics, 25, 303—325. (1986): “Censored Regression Quantiles,” Journal of Econometrics, 32, 143—155. Robinson, P. M. (1983): “Nonparametric Estimators for Time Series,” Journal of Time Series Analysis, 4, 185—207. EFFICIENT QUANTILE ESTIMATION 69 (1987): “Asymptotically Efficient Estimation in the Presence of Heteroskedasticity of Unknown Form,” Econometrica, 55, 875—891. (1988): “Root-N Consistent Semiparametric Regression,” Econometrica, 56, 931— 954. Schwartz, L. (1997): Analyse. Hermann, Paris. Stein, C. (1956): “Efficient Nonparametric Testing and Estimation,” in Proceedings of the Third Berkeley Symposium in Mathematical Statistics and Probability, vol. 1, pp. 187—196, Berkeley. University of California Press. Stone, C. J. (1980): “Optimal Rates of Convergence for Nonparametric Estimators,” Annals of Statistics, 8, 1348—1360. (1982): “Optimal Global Rates of Convergence for Nonparametric Regression,” Annals of Statistics, 10, 1040—1053. Truong, Y. K., and C. J. Stone (1992): “Nonparametric Function Estimation Involving Time Series,” Annals of Statistics, 20, 77—97. White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models,” Econometrica, 50, 1—25. (2001): Asymptotic Theory for Econometricians. Academic Press, San Diego. Zhao, Q. (2001): “Asymptotically Efficient Median Regression in the Presence of Heteroskedasticity of Unknown Form,” Econometric Theory, 17, 765—784. Zheng, J. X. (1998): “A Consistent Nonparametric Test of Parametric Regression Models under Conditional Quantile Restrictions,” Econometric Theory, 14, 123—138.