UC San Diego
Recent Work
Title
Efficientt Conditional Quantile Estimation: The Time Series Case
Permalink
https://escholarship.org/uc/item/78842570
Authors
Komunjer, Ivana
Vuong, Quang
Publication Date
2006-10-01
eScholarship.org
Powered by the California Digital Library
University of California
2006-10
UNIVERSITY OF CALIFORNIA, SAN DIEGO
DEPARTMENT OF ECONOMICS
Efficient Conditional Quantile Estimation: The Time Series Case
By
Ivana Komunjer
Department of Economics
University of California, San Diego
and
Quang Vuong
Department of Economics
The Pennsylvannia State University
DISCUSSION PAPER 2006-10
October 2006
EFFICIENT CONDITIONAL QUANTILE ESTIMATION:
THE TIME SERIES CASE
IVANA KOMUNJER AND QUANG VUONG
Abstract. In this paper we consider the problem of efficient estimation in conditional
quantile models with time series data. Our first result is to derive the semiparametric efficiency bound in time series models of conditional quantiles; this is a nontrivial extension of
a large body of work on efficient estimation, which has traditionally focused on models with
independent and identically distributed data. In particular, we generalize the bound derived
by Newey and Powell (1990) to the case where the data is weakly dependent and heterogeneous. We then proceed by constructing an M-estimator which achieves the semiparametric
efficiency bound. Our efficient M-estimator is obtained by minimizing an objective function
which depends on a nonparametric estimator of the conditional distribution of the variable
of interest rather than its density.
Keywords: semiparametric efficiency, time series models, dependence, parametric submodels, conditional quantiles.
Affiliations and Contact information. Komunjer: Department of Economics, University of California San
Diego (
[email protected]). Vuong: Department of Economics, Penn State University (
[email protected]).
Acknowledgments: Earlier versions of this paper were presented at EEA/EESM 2003 meetings in Stockholm, NSF/NBER 2004 Time Series conference at SMU and CEME 2005 conference at MIT. Many thanks
to Rob Engle, Clive Granger, Jin Hahn, Bruce Hansen, Cheng Hsiao, Guido Imbens, Guido Kuersteiner, Guy
Laroque, Essie Maasoumie, Roger Moon, Whitney Newey, James Powell, Bernard Salanié, Ruey Tsay, Hal
White and to all the participants at Ohio State, Penn State, UC Berkeley, UC San Diego, CREST/INSEE
and USC econometric seminars.
1
2
1. Introduction
The purpose of this paper is to study the problem of asymptotically efficient estimation in
models for conditional quantiles. We provide answers to the following closely related questions: what is the semiparametric efficiency bound for the parameters of a given conditional
quantile, when the data is weakly dependent and heterogeneous? Is efficient estimation
possible in such models, and if so, what is an efficient conditional quantile estimator?
The computation of semiparametric efficiency bounds in models with conditional moment
restrictions–which include the one studied here–has been considered by numerous authors (Chamberlain, 1986, 1987; Robinson, 1987; Hansen, Heaton, and Ogaki, 1988; Newey,
1990a,b, 1993; Hahn, 1997; Bickel, Klaassen, Ritov, and Wellner, 1998; Brown and Newey,
1998; Ai and Chen, 2003; Cosslett, 2004; Newey, 2004). Our contribution to this large literature is twofold. First, we derive the semiparametric efficiency bound in models with a
conditional quantile restriction allowing the data to be weakly dependent and/or heterogeneous. Second, we propose a new estimator for conditional quantiles which actually attains
the semiparametric efficiency bound. Our results are important because they do not require
independence nor identical distribution of the data.
The first of those assumptions–independence–has been prevalent in the existing literature on efficient estimation, for reasons which pertain to the very definition of the semiparametric efficiency bound. Depending on how we characterize the bound–as an “infimum” or
as a “supremum”–there are two approaches to its computation. Most of the above literature, with the exception of Chamberlain (1987), has used the “infimum” approach, which
can be summarized as follows.
Consider a model in which the parameter vector of interest θ is identified via a conditional
moment restriction. Assume that the model is regular in the sense of Begun, Hall, Huang,
and Wellner (1983) and Newey (1990b). A familiar approach to estimating θ is by using
semiparametric estimators such as GMM (Hansen, 1982), M— (Huber, 1967) or instrumental
variable estimators. Associated with the choice of a particular semiparametric estimator
is its covariance matrix. Hence, to the set of all semiparametric estimators corresponds
a set of positive semidefinite matrices. The crucial property of this set is its orthogonal
structure (Bickel, 1982; Begun, Hall, Huang, and Wellner, 1983; Chamberlain, 1986; Newey,
1990b): any matrix Ω in this set can be written as a covariance matrix of a Gaussian random
EFFICIENT QUANTILE ESTIMATION
3
variable–with a positive semidefinite matrix V –plus an independent noise. The matrix V
which is the infimum of this set, is the semiparametric efficiency bound for θ.
This characterization of the semiparametric efficiency bound is the starting point of the
“infimum” approach to its computation. Essentially geometric, the “infimum” approach uses
projection arguments to find V . As such, it requires certain orthogonality conditions, which
in econometric terms correspond to the requirement that the random variables involved be
independent (Bickel, 1982). Hence, most of the “infimum” approach literature has exclusively focused on models with independent observations.1 In models in which we relax the
independence assumption, the projection arguments are difficult to implement, which makes
dealing with time series data difficult. Consideration such as those have lead Ai and Chen
(2003), for example, to conclude: “although our results [...] can be easily extended to weakly
dependent time series data, the problem of semiparametric efficiency bound with time series
data is nontrivial.”
In this paper, we use the alternative–“supremum”–approach pioneered by Chamberlain
(1987). In his seminal paper on semiparametric efficiency bounds in models with conditional
moment restrictions, Chamberlain (1987) compares the asymptotic distribution of an efficient GMM estimator–efficient in the sense of Hansen (1982)–with that of a maximum
likelihood estimator (MLE). The key property of the MLE is that it is efficient, when correctly specified. Hence any MLE in which the specified likelihood is consistent with the
conditional moment restriction and which contains the data generating process, needs to
have its asymptotic covariance matrix smaller than the semiparametric efficiency bound.
In other words, the semiparametric efficiency bound can be defined as the supremum of
asymptotic covariance matrices of all parametric submodels which satisfy the conditional
moment restrictions and contain the data generating process–this is the key insight behind
Stein’s (1956) characterization of semiparametric efficiency bounds and the starting point of
the “supremum” approach.
1Hansen,
Heaton, and Ogaki (1988) is an important exception. Their approach however is based on
the assumption that some transformation–forward filter–of the moment function used in the conditional
moment restriction is serially uncorrelated (see their equation (4.2) and the discussion thereof). Hence,
unless the parameters involved in the forward filter transformation are known, the approach of Hansen,
Heaton, and Ogaki (1988) is not applicable. For example, in models with conditional moment restrictions
in which the variables follow an ARMA(p, q) process–with lags p and q known–one needs to know the q
MA parameters in order to construct the forward filter.
4
KOMUNJER AND VUONG
Chamberlain (1987) implements the “supremum” approach in the case where the random
variables involved in the conditional moment restriction are independent and identically
distributed (iid). In the iid case, the efficient (in the sense of Hansen, 1982) GMM estimator
and the MLE obtained when the data is generated from a multinomial distribution are both
asymptotically normally distributed with asymptotic covariance matrices respectively equal
to Ω and I −1 , where I is the Fisher information matrix of the multinomial model. When
the data has finite support, Chamberlain (1987) shows that Ω and I −1 are the same. Hence,
they must be equal to the semiparametric efficiency bound V . Given that any distribution
can be approximated arbitrarily well by a multinomial distribution, the general expression
for the bound follows. The iid assumption plays an important role in Chamberlain’s (1987)
construction of the semiparametric bounds; without it the multinomial approximation is
no longer valid, making the extension of Chamberlain’s (1987) results to time series data
difficult.
The first contribution of this paper is to extend Chamberlain’s (1987) results to weakly
dependent data, by using the “supremum” characterization of the semiparametric efficiency
bound, initially due to Stein (1956). In particular, we focus on models with conditional
quantile restrictions. In such models, there is no published work prior to ours on asymptotically efficient estimation which would allow for the data to be weakly dependent. Hence, our
first contribution is to fill the gaps in the extant literature on efficient conditional quantile
estimation (Newey and Powell, 1990; Koenker and Zhao, 1996; Zhao, 2001) and derive the
semiparametric efficiency bound in weakly dependent time series models with conditional
quantile restrictions.
Our “supremum” approach is somewhat different from that used by Chamberlain (1987).
We start by constructing a matrix V which is a potential candidate for the semiparametric efficiency bound. Such candidate matrix is obtained as a minimum within a family of
asymptotic covariance matrices of conditional quantile M—estimators that are consistent for
the parameters of a correctly specified conditional quantile model. Once the candidate matrix V in hand, we follow the insightful approach by Stein (1956), and look for a parametric
submodel that is “as difficult” as the semiparametric model. In other words, we construct a
fully parametric model that satisfies the conditional quantile restriction, contains the data
generating process and in which the inverse of the Fisher’s information matrix equals V .
EFFICIENT QUANTILE ESTIMATION
5
This second step is what distinguishes our work from the rest of the literature on asymptotically efficient estimation–specifically, we are able to analytically derive the least favorable
parametric submodel.
Our result on the semiparametric efficiency bound is general: we derive it under the sole
assumption that the model satisfies the conditional quantile restriction. In particular, when
constructing V , we do not make any additional assumptions regarding the properties of the
residuals from the (nonlinear) quantile regression: they can be dependent and nonidentically
distributed. Hence, for the first time in the literature on efficient estimation, we are able to
derive the semiparametric efficiency bound in conditional quantile models with time series
data that are dependent and conditionally heteroskedastic.
The second contribution of this paper is to propose a new conditional quantile estimator
that is efficient. We note that the problem of constructing an efficient estimator is even
more difficult than that of computing the semiparametric efficiency bound. Though to some
extent applicable to time series data, the projection methods used in the “infimum” approach
shed no light on how to construct efficient estimators. As already pointed out by Hansen,
Heaton, and Ogaki (1988), “although [they] delineate the sense of approximation required
for the sequences of GMM estimators to get arbitrarily close to the efficiency bound, [they]
do not show how to construct estimators that actually attain the efficiency bound.” It is an
open question whether the procedures along the lines of Newey (1990a,b, 1993, 2004) can
be extended to models with time series data. Our second contribution to the literature on
efficient estimation is to show how–at least in models with conditional quantile restrictions–
the “supremum” approach naturally leads to estimators that are efficient.
Standard approaches to constructing an efficient estimator are as follows: given a consistent estimator of the parameter of interest θ, take a step away from it in a direction
predicted by the efficient score; the resulting estimator is then efficient. An example of this
construction method is Newey and Powell’s (1990) “one-step” estimator for the parameters
of a quantile regression. Alternatively, instead of taking a step away from an initial consistent estimator of θ, we can use it to construct a set of weights–functions of the efficient
score–and compute the corresponding weighted estimator; the weighted estimator is also
efficient. An example of this method is Zhao’s (2001) weighted conditional quantile estimator. More recently, extending the conditional empirical likelihood (CEL) approach by
Kitamura, Tripathi, and Ahn (2004), Otsu (2003) constructs an efficient estimator in the
quantile regression model in the iid case.
6
KOMUNJER AND VUONG
We propose an efficient conditional quantile MINPIN-type estimator (Andrews, 1994a)
whose construction differs from the previous ones, in two ways. First, our efficient estimator
does not require a preliminary consistent estimate of the parameter of interest, hence it is
similar to the estimator proposed by Otsu (2003). While Otsu’s (2003) efficient estimator is
based on the empirical likelihood principle, our efficient estimator is obtained by minimizing
an efficient M—objective function. Second, our efficient estimator depends on a nonparametric
estimate of the true conditional distribution, unlike Newey and Powell’s (1990) and Zhao’s
(2001) efficient estimators which depend on nonparametric estimates of the true conditional
density. For these two reasons, we can expect our efficient estimator to behave better in
small samples than the efficient estimators proposed by Newey and Powell (1990) and Zhao
(2001). In particular, whenever it is easier to estimate the conditional distributions than
densities (Hansen, 2004a,b), we would expect our efficient estimator to perform better than
the existing ones.
The remainder of the paper is as follows: in Section 2 we define our notation and introduce
models for conditional quantiles. Section 3 characterizes the class of M—estimators that are
consistent for the parameters of such models, provided they are correctly specified. In the
same section we show that such estimators are also asymptotically normally distributed with
an asymptotic covariance matrix whose expression depends on the form of the M—objective
function being minimized. In Section 4, we derive the minimum bound of the above family of
matrices and show that it corresponds to the semiparametric efficiency bound. An efficient
conditional quantile estimator is constructed in Section 5, which concludes the paper. We
relegate all the proofs to the end of the paper.
2. Setup
2.1. Notation. Consider a stochastic sequence (a time series) X ≡ {Xt , t ∈ N} defined on
a probability space (Ω, B, P ) where X : Ω → R(m+1)N and R(m+1)N is the product space
m+1
generated by taking a copy of Rm+1 for each integer, i.e. R(m+1)N ≡ ×∞
, m ∈ N.
t=1 R
We partition the random vector Xt as Xt = (Yt , Wt0 )0 and are interested in the distribution
of its first (scalar) component, denoted Yt , conditional on the random m-vector Wt . In
particular, we allow Wt to contain lagged values of Yt –particularly interesting for time
series applications–together with other (exogenous) components. The family of subfields
{Wt , t ∈ N} with Wt ≡ σ(W1 , . . . , Wt ) corresponds to the information set generated by the
sequence of conditioning vectors up to time t.
EFFICIENT QUANTILE ESTIMATION
7
We use standard notations and let P (Yt ∈ A|Wt ) denote the conditional distribution
of Yt , with A an element of the Borel σ-algebra on R. To simplify, we assume that for
any T > 1, the joint distribution of (Y1 , W1 , . . . , YT , WT ) has a strictly positive continuous
density pT on R(m+1)T so that conditional densities are everywhere defined.2 Then, for every
t, 1 6 t 6 T, T > 1, we let Ft0 (·) denote the conditional distribution function of Yt conditional
upon Wt , i.e. Ft0 (y) ≡ P (Yt 6 y|Wt ) for every y ∈ R, and we call ft0 (·) the corresponding
conditional probability density. Of course, Ft0 (·) (like ft0 (·)) is unknown and we assume
that it belongs to F which is the set of all absolutely continuous distribution functions with
continuously differentiable densities on R. Throughout the paper we assume that for every
t, 1 6 t 6 T, T > 1, ft0 (·) and its derivative are bounded so that there exist constants
M0 , M1 > 0 such that supt>1 supy∈R ft0 (y) 6 M0 < ∞ and supt>1 supy∈R |dft0 (y)/dy| 6 M1 <
∞.
If V is a real n-vector, V ≡ (V1 , . . . , Vn )0 , then |V | denotes the L2 -norm of V , i.e. |V |2 ≡
P
V 0 V = ni=1 Vi2 . If M is a real n × n-matrix, M ≡ (Mij )16i,j6n , then |M| denotes the L∞ -
norm of M, i.e. |M| ≡ max16i,j6n |Mij |, and M + denotes a generalized inverse of M. If A is
a positive definite n × n-matrix, then A−1/2 = P where P is invertible such that P AP 0 = Id
where Id denotes the n×n-identity matrix. Let f : E → R, V 7→ f (V ), with E ⊆ Rn and V =
(V1 , ..., Vn )0 , be continuously differentiable to order R > 1 on E. Let r ≡ (r1 , ..., rn ) ∈ Nn : if
|r| 6 R then Dr f(V ) ≡ ∂ |r| f (V )/∂V1r1 ...∂Vnrn where |r| ≡ r1 + ... + rn represents the order
of derivation. If r = 0 then D0 f (V ) = f(V ). Further, let r! ≡ r1 !...rn ! and V r ≡ V1r1 ...Vnrn .
Then, for any (V, V0 ) ∈ E 2 the (familiar) expression in a Taylor expansion of order R can
P
P P
r
1 ∂ k f (V0 )
be written as |r|6R D fr!(V0 ) (V − V0 )r ≡ R
k=0
j1 ,...,jk ∈(1,...,n)k k! ∂Vj1 ...∂Vjk (Vj1 − V0j1 )...(Vjk −
P
V0jk ), for 1 6 l 6 R. For example, when R = 1, we have |r|61 Dr f (V0 )(V − V0 )r =
P
f(V0 ) + ni=1 [∂f(V0 )/∂Vi ](Vi − V0i ) (Schwartz, 1997). When R > 2, we let ∇V f(V ) denote
the gradient of f, ∇V f (V ) ≡ (∂f (V )/∂Vi , ..., ∂f (V )/∂Vn )0 , and use ∆V V f (V ) to denote its
Hessian matrix, ∆V V f(V ) ≡ (∂ 2 f(V )/∂Vi ∂Vj )16i,j6n . Finally, the function 1I : R → [0, 1]
denotes the Heaviside (or indicator) function: for any x ∈ R, we have 1I(x) = 0 if x 6 0,
and 1I(x) = 1 if x > 0 (Bracewell, 2000). The Heaviside function is the indefinite integral
Rx
of the Dirac delta function δ : R → R, with 1I(x) = a dδ, where a is an arbitrary (possibly
infinite) negative constant, a 6 0.
2This
example.
excludes the possibility that Wt contains indicator functions of lags of Yt or other variables, for
8
KOMUNJER AND VUONG
2.2. Models for conditional quantiles. In this paper we do not consider the conditional
distribution Ft0 (·) in its entirety but rather focus on a particular conditional quantile of Yt .
In recent years, conditional quantiles have been of particular interest in both applied and
theoretical work in economics in which numerous choices for the conditioning variables have
been proposed.3 In order to keep our analysis both simple and general, we introduce the
following notation: for a given α ∈ (0, 1), let M denote a model for the conditional α-
quantile of Yt , M ≡ {qα (Wt , θ)}, with an unknown parameter θ in Θ, where Θ is a compact
subset of Rk with non-empty interior, Θ̊ 6= ∅. In what follows we restrict our attention to
conditional quantile models M in which the set of following conditions is satisfied:
(A0) (i) the model M is identified on Θ, i.e. for any (θ1 , θ2 ) ∈ Θ2 we have: qα (Wt , θ1 ) =
qα (Wt , θ2 ), a.s. − P , for every t, 1 6 t 6 T, T > 1, if and only if θ1 = θ2 ; (ii) for every t,
1 6 t 6 T, T > 1, the function qα (Wt , ·) : Θ → R is twice continuously differentiable on Θ
a.s. − P ; (iii) for every t, 1 6 t 6 T, T > 1, the matrix ∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 is of full
rank a.s. − P for every θ ∈ Θ.
The set of conditions in (A0) is fairly standard and generally verified for a wide variety of
conditional quantile models. In what follows, we shall always assume that M is a conditional
quantile model in which properties (A0)(i)-(iii) above hold. Further, for any given M we
shall denote by Q the range of qα , i.e. Q ≡ {qt ∈ R : qt = qα (Wt , θ), θ ∈ Θ, Wt ∈ Rm },
Q ⊆ R.
One crucial assumption that we make in our analysis, and which is of different nature
than the conditions above, is that the model M is correctly specified, so that there exists
some true parameter value θ0 such that Ft0 (qα (Wt , θ0 )) = α, for every t, 1 6 t 6 T, T > 1.
In other words, we assume the following:
(A1) given α ∈ (0, 1), there exists θ0 ∈ Θ̊ such that E[1I(qα (Wt , θ0 ) − Yt )|Wt ] = α, a.s. − P ,
for every t, 1 6 t 6 T, T > 1.
3Since
the seminal work by Koenker and Bassett (1978), numerous authors have studied the problems of
conditional quantile estimation (Koenker and Bassett, 1978; Powell, 1984, 1986; Newey and Powell, 1990;
Pollard, 1991; Portnoy, 1991; Koenker and Zhao, 1996; Buchinsky and Hahn, 1998; Khan, 2001; Cai, 2002;
Kim and White, 2003; Komunjer, 2005b) and specification testing (Koenker and Bassett, 1982; Zheng, 1998;
Bierens and Ginther, 2001; Horowitz and Spokoiny, 2002; Koenker and Xiao, 2002; Kim and White, 2003;
Angrist, Chernozhukov, and Fernandez-Val, 2006). An excellent review of applications of quantile regressions
in economics (Buchinsky, 1994; Chernozhukov and Hong, 2002; Angrist, Chernozhukov, and Fernandez-Val,
2006) can be found in Koenker and Hallock (2001).
EFFICIENT QUANTILE ESTIMATION
9
In other words, for any t, 1 6 t 6 T, T > 1, the difference between the indicator variable
above and α is assumed to be orthogonal to any Wt -measurable random variable.
3. M—estimators for conditional quantiles
In this paper we consider a particular family of conditional quantile estimators known as
M—(or extremal) estimators (Huber, 1967). M—estimators for θ0 , denoted θT , are obtained
P
by minimizing criterion functions ΨT (θ) of the form ΨT (θ) ≡ T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ t )
where for every t, 1 6 t 6 T, T > 1, ϕ is a real function of the variable of interest Yt , the
quantile qα (Wt , θ) and a (possibly inifinite-dimensional) random variable ξ t : Ω → Et , i.e.
ϕ : R × Q × Et → R. The variable ξ t can be thought of as a shape parameter of the objective
function ϕ. We assume the following:
(A2) (i) for every t, 1 6 t 6 T, T > 1, ξ t is Wt -measurable; (ii) for every t, 1 6 t 6
T, T > 1, the function ϕ(·, ·, ·) is twice continuously differentiable a.s. − P on R × Q × Et
with respect to its second argument ( qt ).
By assumption (A2)(i), the random variable ξ t is allowed to depend only on variables
contained in Wt . In other words, the functional form (or shape) of ϕ cannot depend on
any variable that is observed after time t. We shall see in subsequent sections that the Wt measurability of ξ t is not trivially satisfied. In particular, if we consider objective functions ϕ
that depend on some estimator based on the observations of Yt and Wt up to time T –kernel
estimators of conditional distributions or densities are an example–then (A2)(i) fails to hold.
The requirement (A2)(ii) that, for given realizations of Yt and ξ t , ϕ be twice continuously
differentiable with respect to qt on Q a.s. − P , allows for objective functions such as |Yt − qt |
or [α − 1I(qt − Yt )](Yt − qt ), for example. Note that in those two cases the shape ξ t of ϕ
remains constant over time.
An important subfamily of the class of M—estimators defined above, is that of quasimaximum likelihood estimators (QMLEs) (White, 1982; Gourieroux, Monfort, and Trognon,
1984). If in addition to (A2), we assume that there exists a real function c : R × Et → R,
R
(y, ξ t ) 7→ c(y, ξ t ) < ∞, independent of qt , and such that R exp[c(y, ξ t ) − ϕ(y, qt , ξ t )]dy = 1
for all (qt , ξ t ) ∈ Q × Et , then we can let lt (·, qt ) ≡ exp[c(·, ξ t ) − ϕ(·, qt , ξ t )], and lt (·, qt )
can be interpreted as the (quasi) likelihood of Yt conditional on Wt . Hence, any minimum
θT of the function ΨT (θ) above, is also a maximum of the (quasi) log-likelihood function
P
LT (θ), LT (θ) ≡ T −1 Tt=1 ln lt (Yt , qα (Wt , θ)) (Komunjer, 2005b). However, due to the above
10
KOMUNJER AND VUONG
“integrability” constraint on ϕ(·, qt , ξ t ), the class of QMLEs is smaller than that of M—
estimators.4 We shall see in subsequent sections that this difference plays a greatly important
role for efficient conditional quantile estimation. We now focus on M—estimators for θ0 that
are consistent.
3.1. Class of consistent M—estimators. What are necessary conditions for the M—estimator
θT satisfying (A2), to be consistent for the true conditional quantile parameter θ0 in (A1)?
The key idea behind the answer to this question is fairly simple. Assume that the process
p
{Xt } and the functions ϕ(·, ·, ξ t ) are such that θT − θ0T → 0, where θ0T is a unique minimum
P
of E[ΨT (θ)] ≡ T −1 Tt=1 E[ϕ(Yt , qα (Wt , θ), ξ t )] on Θ̊.5 Then a necessary requirement for
consistency of θT is that θ0T − θ0 → 0 as T becomes large. In what follows, we restrict our
attention to estimators θT such that θ0T remains constant, i.e. ∀T > 1 we have θ0T = θ0∞ .
Then, the class of M—estimators that are consistent for θ0 is obtained by considering all the
functions ϕ(·, ·, ξ t ) under which θ0∞ = θ0 .
Note that the requirement of having θ0T = θ0 for all T > 1 is stronger than that of having
θ0T → θ0 .6 This implies that θ0 can be consistently estimated by minimizing objective
functions that are different from the ones derived below, as long as the expected value of
this difference converges uniformly to zero with T . An important example in which the
condition θ0T = θ0 for all T > 1 fails is when the shape ξ t of the objective function ϕ depends
on observations up to time T –hence is not Wt -measurable–as in the case of the estimator
θ̂T proposed in Section 5. In that case, θ̂T is consistent provided the difference between
its (M—) objective function Ψ̂T and an (M—) objective function Ψ∗T derived in Theorem 3,
converges uniformly to zero with T .
We now provide a more formal treatment of consistency. A set of sufficient assumptions
p
for θT − θ0∞ → 0 to hold is as follows (see, e.g., Theorem 2.1 in Newey and McFadden, 1994):
(A3) {Xt } and ϕ(·, ·, ξ t ) are such that: (i) for every t, 1 6 t 6 T, T > 1, and every
θ ∈ Θ, |Dr ϕ(Yt , qα (Wt , θ), ξ t )| 6 mr (Yt , Wt , ξ t ), a.s. − P , where E[mr (Yt , Wt , ξ t )] < ∞,
for r = 0, 1, 2; for any T > 1, (ii) E[ΨT (θ)] is uniquely minimized at θ0∞ ∈ Θ̊, and (iii)
p
supθ∈Θ |ΨT (θ) − E[ΨT (θ)]| → 0.
4We
call
R
R
exp[c(y, ξ t ) − ϕ(y, qt , ξ t )]dy = 1 for all (qt , ξ t ) ∈ Q × Et the “integrability” constraint. This
requirement is stronger than exp[−ϕ(·, qt , ξ t )] being integrable with respect to the Lebesgue measure on R.
5θ0
T
6See
is also called the pseudo-true value of the parameter θ.
White (1994, p.69-70) for a discussion of the requirement θ0T = θ0 .
EFFICIENT QUANTILE ESTIMATION
11
Note that the above are not primitive conditions for consistency of θT . For example, the
integrability of Dr ϕ(Yt , qα (Wt , θ), ξ t ) with respect to the probability P implied by (A3)(i) involves more primitive conditions on the existence of different moments of Yt , Wt and ξ t . Condition (A3)(ii) states that θ0∞ is a minimum of E[ΨT (θ)] and that this minimum is moreover
unique. The first requirement involves more primitive conditions on ∂ϕ/∂qt , ∂ 2 ϕ/∂qt2 and
∇θ qα , which depend on the shape ξ t of ϕ and the functional form of qα . For example, a suffiP
cient set of conditions for θ0∞ to be a minimum is that T −1 Tt=1 E[∇θ ϕ(Yt , qα (Wt , θ0∞ ), ξ t )] =
P
0 and T −1 Tt=1 E[∆θθ ϕ(Yt , qα (Wt , θ0∞ ), ξ t )] À 0 (Schwartz, 1997). Finally, the uniform convergence condition (A3)(iii) can be obtained by applying an appropriate uniform law of
large numbers to the sequence {ϕ(Yt , qα (Wt , θ), ξ t )}. Implicit in (A3)(iii) are primitive assumptions on the dependence structure and heterogeneity of the process {Xt }, and on the
properties of ϕ(Yt , qα (Wt , ·), ξ t ). A simple example is one where {Xt } is iid and the functions
ϕ(Yt , qα (Wt , ·), ξ t ) are Lipshitz-L1 a.s. − P on Θ (see, e.g., Definition A.2.3 in White, 1994).
The above pseudo-true value θ0∞ of the parameter θ equals the true value θ0 if and only if,
for any T > 1, θ0 minimizes E[ΨT (θ)]. A necessary and sufficient requirement for θ0∞ = θ0
is given in the following theorem.
Theorem 1 (Necessary and sufficient condition for consistency). Assume that (A0),
(A2) and (A3) hold. If the true parameter θ0 satisfies the conditional moment condition in
p
(A1), then the M-estimator θT is consistent for θ0 , i.e. θT − θ0 → 0, if and only if there
exist a real function A(·, ·) : R × Et → R that is twice continuously differentiable and strictly
increasing with respect to its first argument (qt or Yt ) a.s. − P on Q × Et , and a real function
B(·, ·) : R × Et → R, such that ϕ(Yt , qt , ξ t ) = [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )] + B(Yt , ξ t ),
a.s. − P on R × Q × Et , for every t, 1 6 t 6 T, T > 1.7
In other words, if for any given sample size T > 1 we are interested in consistently
estimating the conditional quantile parameter of a continuously distributed random variable Yt by using an M—estimator θT , then we must employ an objective function ΨT (·) =
P
T −1 Tt=1 ϕ(Yt , qα (Wt , ·), ξ t ) with
(1)
ϕ(Yt , qα (Wt , θ), ξ t ) = [α − 1I(qα (Wt , θ) − Yt )][A(Yt , ξ t ) − A(qα (Wt , θ), ξ t )] + B(Yt , ξ t ),
7The
real functions A and B in Theorem 1 need not have the same shape parameter: we can let ξ t ≡
(ξ 0At , ξ 0Bt )0
where ξ At and ξ Bt are the shapes of A(·, ξ At ) and B(·, ξ Bt ), respectively. For simplicity, we write
A(·, ξ t ) and B(·, ξ t ) with the understanding that changing the shape of A may not affect the shape of B and
vice-versa.
12
KOMUNJER AND VUONG
a.s. − P , for every t, 1 6 t 6 T . Using objective functions of this form is also a sufficient
condition for θT to be consistent for the true parameter θ0 of a correctly specified model for
the conditional α-quantile.
Given that we restrict our attention to objective functions in which (A2)(ii) holds, the
function A(·, ξ t ) in Theorem 1 needs to be twice continuously differentiable a.s. − P on Q.
The continuity and differentiability of A(·, ξ t ) need not hold on R\Q. The fact that there
are no requirements on A(·, ξ t ) outside the range of qα (Wt , θ) is not surprising, given that
changing the objective function outside Q does not affect the values of ∂ϕ/∂qt , and therefore
has no effect on the optimum of ΨT . The fact that A(·, ξ t ) is necessarily strictly increasing
a.s. − P on Q, comes from the requirement (A3)(ii) that θ0∞ be an interior minimum of
E[ΨT (θ)] on Θ. As previously, there are no requirements on the monotonicity of A(·, ξ t ) on
R\Q. Finally, note that there are no restrictions on the function B(·, ξ t ), as expected, since
changing it does not affect the optimum of the objective function ΨT . In what follows we
set B(·, ξ t ) identically equal to 0, which does not affect any of our results but has the benefit
of simplifying the notation.
Well-known examples of conditional quantile estimators that satisfy Theorem 1 are: (1)
Koenker and Bassett’s (1978) unweighted quantile regression estimator for which A(y, ξ t ) =
y, for all y ∈ R; (2) Powell’s (1984, 1986) left (right) censored quantile regression estimator
obtained when, for all y ∈ R, A(y, ξ t ) = max{y, ct } (A(y, ξ t ) = min{y, ct }) with an observed
censoring point ct ;8 (3) weighted quantile regression estimator, proposed by Newey and
Powell (1990) and Zhao (2001), in which for all y ∈ R, A(y, ξ t ) = ω t y where ω t is some
nonnegative weight, as well as its censored version for which A(y, ξ t ) = ωt max{y, ct }.
In particular, the class of objective functions ΨT leading to consistent conditional quantile
M—estimators is larger than that leading to consistent QMLEs. In order to simplify the
comparison between M—estimators and QMLEs, assume that at any point in time t, 1 6
t 6 T, T > 1, the conditional α-quantile of Yt can take any real value, so Q = R. As
8Note
that A(·, ξ t ) = max{·, ct } satisfies the strict monotonicity requirement a.s. − P on Q because, in the
censored quantile regression case, qα (Wt , θ0 ) > ct , a.s. − P , as elegantly discussed by Powell (1984, p 4-6).
The intuition behind this inequality is simple: suppose Yt = ct , a.s. − P for all t, 1 6 t 6 T, T > 1. Then
any value θ0 for which qα (Wt , θ0 ) 6 ct , a.s. − P for all t, 1 6 t 6 T, T > 1, is a minimum of E[ΨT (θ)], which
in that case equals 0. This violates the uniqueness assumption (A3)(ii), and hence affects the consistency
of θT . The latter is restored by requiring that qα (Wt , θ0 ) > ct , a.s. − P for a large enough portion of the
sample (see Assumption R.1 in Powell, 1984). An analogous result holds for the right censored case.
EFFICIENT QUANTILE ESTIMATION
13
pointed out previously, the main difference between the two classes of estimators lies in
the “integrability” condition on the pseudo-densities. Compare the objective function in
Theorem 1 with the family of tick-exponential pseudo-densities which give consistent QMLEs
for θ0 (Komunjer, 2005b): fα (Yt , qt , ξ t ) ≡ α(1 − α)a(Yt , ξ t ) exp{[1I(qt − Yt ) − α][A(Yt , ξ t ) −
A(qt , ξ t )]} with A(·, ξ t ) twice continuously differentiable and strictly increasing a.s. − P on
R, with derivative a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y.9 For fα (·, qt , ξ t ) to be a probability density on
R, we need limy→±∞ A(y, ξ t ) = ±∞, for any t, 1 6 t 6 T, T > 1.10 This limit condition
restricts the possible choice of functions A(·, ξ t ) in Theorem 1.
For example, consider any distribution function Ft (·) in F having a density ft (·) that is
continuously differentiable a.s. − P , and let
A(y, ξ Ft ) ≡ Ft (y),
(2)
for any y ∈ R. Note that the parameter ξ Ft in the objective function A(·, ξ Ft ) in Equation (2)
corresponds to the conditional distribution Ft (·) which is stochastic and Wt -measurable.
Under the assumptions of Theorem 1, the M—estimator θFT , which minimizes ΨFT (θ) ≡
P
T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ Ft ) with
ϕ(Yt , qα (Wt , θ), ξ Ft ) ≡ [α − 1I(qα (Wt , θ) − Yt )][Ft (Yt ) − Ft (qα (Wt , θ))],
(3)
is consistent for θ0 ; however, the corresponding function A(·, ξ Ft ) in Equation (2), bounded
between 0 and 1, does not satisfy the above limit condition. As a consequence, the class of
consistent QMLEs is strictly smaller than that of consistent M—estimators. In subsequent
sections we show that the limit restrictions on A(·, ξ Ft ) play a particularly important role
for efficient conditional quantile estimation, by constructing an efficient M—estimator whose
objective function is of the form (3).
To resume, we have shown that an M—estimator θT that satisfies (A2) is consistent for θ0 ,
only if the objective functions ϕ(·, ·, ξ t ) are of the form given in Theorem 1. The conditions
provided in Theorem 1 are not only necessary but also sufficient for consistency. From the
9It
is straightforward to see that ϕ(Yt , qt , ξ t ) in Theorem 1 and fα (Yt , qt , ξ t ) in Komunjer (2005b) have
the same optimum.
R
10The limit conditions on A(·, ξ ) directly follow from the quantile restriction qt f (y, q , ξ )dy = α,
t t
t
−∞ α
R qt
which is equivalent to (1 − α) exp[−(1 − α)A(qt , ξ t )] −∞ a(y, ξ t ) exp[(1 − α)A(y, ξ t )]dy = 1, so that, upon
the change of variable u ≡ A(y, ξ t ), necessarily A(qt , ξ t ) → −∞ as qt → −∞. Combining the above
R
quantile restriction with the condition R fα (y, qt , ξ t )dy = 1 yields the result for the limit in +∞ by a similar
reasoning.
14
KOMUNJER AND VUONG
functional form of ϕ(·, ·, ξ t ) in Equation (1), it follows that the asymptotic properties of θT
only depend on the choice of A(·, ξ t ) since changing B(·, ξ t ) does not affect the minimum
of ΨT (θ). Before considering a particular class of functions A(·, ξ t ), which makes the asymptotics of θT optimal, we need the asymptotic distribution of the latter. We derive the
asymptotic distribution of θT in the next section.
3.2. Asymptotic Distribution. We start by imposing the following assumptions, in addition to (A0)-(A2):
(A4) for every t, 1 6 t 6 T, T > 1, the functions A(·, ξ t ) : R → R in Theorem 1 have
bounded first and second derivatives, i.e. there exist constants K > 0 and L > 0 such that
0 < ∂A(qt , ξ t )/∂qt 6 K and |∂ 2 A(qt , ξ t )/∂qt2 | 6 L, a.s. − P on Q × Et ;
(A5) θ0 is an interior point of Θ;
(A6) the sequence {(Yt , Wt0 )0 } is α-mixing with α of size −r/(r − 2), with r > 2;
(A7) for some
> 0: (i) sup16t6T,T >1 E[supθ∈Θ |∇θ qα (Wt , θ)|2(r+ ) ] < ∞, sup16t6T,T >1 E[
supθ∈Θ |∆θθ qα (Wt , θ)|r+ ] < ∞; (ii) sup16t6T,T >1 E[supθ∈Θ |A(qα (Wt , θ), ξ t )|r+ ] < ∞, and
sup16t6T,T >1 E[|A(Yt , ξ t )|r+ ] < ∞.
The above assumptions provide a set of sufficient conditions for the asymptotic normality
of θT that are primitive, unlike the ones for consistency in (A3). In addition to (A1) and
(A2), we now require the functions A(·, ξ t ) to have bounded first and second derivatives
(A4). The boundedness property is used to show that ϕ(Yt , qα (Wt , ·), ξ t ) are Lipshitz-L1
on Θ a.s. − P . This implies that any pointwise convergence in θ becomes uniform on Θ.
Note that we can obtain a similar implication by an alternative argument, if the objective
functions ϕ(Yt , qα (Wt , ·), ξ t ) are convex in the parameter θ. This elegant convexity approach
has, for example, been used by Pollard (1991), Hjort and Pollard (1993) and Knight (1998)
to derive asymptotic normality of the standard Koenker and Bassett’s (1978) quantile regression estimator. In the case of this estimator, the functions A(·, ξ t ) are linear and hence
ϕ(Yt , qα (Wt , ·), ξ t )’s are convex in θ, no matter which conditional quantile model qα in (A0)
we choose.11 Unfortunately, the convexity in θ of the objective functions ϕ(Yt , qα (Wt , ·), ξ t )
does not hold for general (nonlinear) A(·, ξ t )’s, such as the ones proposed in Equation (3).
Therefore, we cannot rely on the convexity argument in our asymptotic normality proof.
11Recall
that ϕ(Yt , qα (Wt , ·), ξ t ) is convex in a neighborhood of θ0 if and only if the real function s 7−→
[ϕ(Yt , qα (Wt , θ0 + νs), ξ t ) − ϕ(Yt , qα (Wt , θ0 ), ξ t )]/s is increasing in s ∈ R (ν ∈ Rk ). This condition holds for
any model qα in (A0), only if the functions A(·, ξ t ) have zero convexity, i.e. are linear.
EFFICIENT QUANTILE ESTIMATION
15
We are forced to abide by the classical approach which, though generally applicable, has the
disadvantage of being more complicated and requires stronger regularity conditions, such as
the ones in (A4).
Our assumptions on the heterogeneity and dependence structure of the data are, on the
other hand, fairly weak. We allow the sequence {(Yt , Wt0 )0 } to be nonstationary and our
strong mixing (i.e. α-mixing) assumption in (A6) allows for a wide variety of dependence
structures (White, 2001). Assumption (A6) is further accompanied by a series of moment
conditions in (A7) which guarantee that the appropriate law of large numbers and central
limit theorem can be applied. In the special case corresponding to Koenker and Bassett’s
(1978) quantile regression estimator for linear models qα (Wt , θ) = θ0 Wt , the set of moment
conditions (A7) reduces to: sup16t6T,T >1 E[|Wt |2(r+ ) ] < ∞ and sup16t6T,T >1 E[|Yt |r+ ] < ∞.
The asymptotic distribution of θT is given in the following theorem.
Theorem 2 (Asymptotic Distribution). Under (A0)-(A2) and (A4)-(A7), we have
√
P
d
(Σ0T )−1/2 ∆0T T (θT −θ0 ) → N (0, Id), where ∆0T ≡ T −1 Tt=1 E[a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))×
P
∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ] and Σ0T ≡ T −1 Tt=1 α(1 − α)E[(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )×
∇θ qα (Wt , θ0 )0 ], where a(qt , ξ t ) ≡ ∂A(qt , ξ t )/∂qt a.s. − P on Q × Et .
In particular, the M—estimator θFT proposed in Equation (3) satisfies the conditions of
Theorem 2, provided the conditional probability densities ft (·) are differentiable a.s. − P on
R with bounded first derivatives, so that |ft0 (y)| 6 L, a.s. − P on R. Moreover, the moment
conditions in (A7) are less stringent for θFT than for Koenker and Bassett’s (1978) estimator:
they reduce to E[|Wt |2(r+ ) ] < ∞, if the conditional quantile model is linear, for example.
The fact that the moment conditions imposed on Yt disappear in the case of θFT is simply due
to the fact that–any conditional distribution function Ft (·) being bounded between 0 and
1–we always have E[supθ∈Θ |Ft (qα (Wt , θ))|r+ ] 6 1 and E[|Ft (Yt )|r+ ] 6 1 so that (A7)(ii) is
automatically satisfied. This difference is of particular importance in applications in which
we have reason to believe that higher order moments of Yt –order higher than 2–do not
exist. In such applications, it is unclear what the asymptotic properties of Koenker and
Bassett’s (1978) estimator are. On the other hand, θFT still converges in distribution at the
√
usual T rate.
−1/2 0,F
Using the results of Theorem 2, the asymptotic distribution of θFT is: (Σ0,F
∆T ×
T )
√
PT
d
0,F
F
−1
0
T (θT −θ0 ) → N (0, Id), with ∆T ≡ T
t=1 E[ft (qα (Wt , θ 0 ))ft (qα (Wt , θ 0 ))∇θ qα (Wt , θ 0 )×
PT
0,F
0
−1
2
0
∇θ qα (Wt , θ0 ) ] and ΣT ≡ T
t=1 α(1 − α)E[(ft (qα (Wt , θ 0 ))) ∇θ qα (Wt , θ 0 )∇θ qα (Wt , θ 0 ) ].
16
KOMUNJER AND VUONG
Clearly, changing the distribution function Ft (·) in Equation (2)–hence in Equation (3)–
affects the asymptotic covariance matrix of the corresponding M—estimator θFT , through the
density term ft (·) appearing in the expressions of ∆0,F
and Σ0,F
T
T . In particular, this result
suggests that appropriate choices of Ft (·) in Equation (3) lead to efficiency improvements
over Koenker and Bassett’s (1978) conditional quantile estimator. Specifically, when the values of ft (·) and of the true conditional density ft0 (·) coincide at the true quantile qα (Wt , θ0 ),
0,F −1
we have Σ0,F
= α(1 − α) Id. In other words, this particular choice of ft (·) seems to
T (∆T )
lead to a conditional quantile M—estimator with the minimum asymptotic covariance matrix.
In the next section we make our heuristic argument more rigorous by exploring the questions
of minimum variance and efficient estimation in more details.
4. Semiparametric Efficiency Bound
Our first step in discussing the asymptotic efficiency of conditional quantile estimators is
to rank all the consistent and asymptotically normal estimators constructed in the previous
section by their asymptotic variances. Note that this ranking is useful, as we do not allow
M—estimators to be superefficient, i.e. to have asymptotic variances which for some true
parameter value are smaller than that of the maximum-likelihood estimator. Superefficiency
is ruled out by our continuity assumptions on ft0 (·), qα (Wt , ·) in (A0)(ii) and a(·, ξ t ) in Theorem 1. Typically, the asymptotic distribution of superefficient estimators is discontinuous
in the true parameters, and our continuity assumptions rule out this discontinuity.
Theorem 3 (Minimum Asymptotic Variance). Assume that (A0)-(A2) and (A4)-(A7)
hold. Then the set of matrices (∆0T )−1 Σ0T (∆0T )−1 has a minimum VT0 given by
XT
E[(ft0 (qα (Wt , θ0 )))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 }−1 .
VT0 ≡ α(1 − α){T −1
t=1
P
Moreover, an M-estimator θ∗T of the parameter θ0 obtained by minimizing Ψ∗T (θ) ≡ T −1 Tt=1
√
d
ϕ(Yt , qα (Wt , θ), ξ ∗t ) attains VT0 , (VT0 )−1/2 T (θ∗T −θ0 ) → N (0, Id), if and only if ϕ(Yt , qt , ξ ∗t ) =
[α − 1I(qt − Yt )][Ft0 (Yt ) − Ft0 (qt )], a.s. − P , on R × Q × Et , for every t, 1 6 t 6 T, T > 1.
Theorem 3 shows two important results. Firstly, the matrix VT0 is the minimum of the
asymptotic variances of all the consistent and asymptotically normal M—estimators of θ0
that satisfy (A2). In other words, for any ξ t and A(·, ξ t ) in Theorem 1, the difference
between the corresponding asymptotic covariance matrix (∆0T )−1 Σ0T (∆0T )−1 and VT0 is always
positive semidefinite. Secondly, there exists a unique M—estimator θ∗T whose asymptotic
EFFICIENT QUANTILE ESTIMATION
17
covariance matrix equals VT0 . This estimator is obtained by minimizing the objective function
P
Ψ∗T (θ) = T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ ∗t ), in which
(4)
ϕ(Yt , qα (Wt , θ), ξ ∗t ) = [α − 1I(qα (Wt , θ) − Yt )][Ft0 (Yt ) − Ft0 (qα (Wt , θ))],
a.s. − P , for every t, 1 6 t 6 T, T > 1. In particular, the shape ξ ∗t of the optimal objective
function in Equation (4) is that of the true conditional distribution Ft0 (·), which is stochastic
and Wt -measurable as required by Assumption (A0)(i). Even though our estimator θ∗T
satisfies all the assumptions in (A2), its computation is not feasible in reality. In order to
construct θ∗T we would need to know the true conditional distribution Ft0 (·) whose inverse–
the conditional α-quantile–is the very object that we are trying to estimate. We come back
to this important feasibility issue in Section 5.
What Theorem 3 does not show is whether VT0 is also the semiparametric efficiency bound
for θ0 , in addition to being the minimum of the set of asymptotic covariance matrices of
consistent and asymptotically normal M—estimators.
4.1. Stein’s (1956) approach: an example. In order to show that VT0 in Theorem 3 is the
semiparametric efficiency bound in the time series models satisfying the conditional quantile
restriction (A1), we follow the ingenious approach by Stein (1956). Stein’s (1956) original
concern was the possibility of estimating the true parameter adaptively: can we estimate the
parameter θ0 in the conditional quantile restriction (A1) as precisely as if we knew the set
of true conditional densities f 0 ≡ {ft0 (·), 1 6 t 6 T, T > 1}, up to some finite dimensional
parameter?
If the set of true conditional densities f 0 ≡ {ft0 (·), 1 6 t 6 T, T > 1} in the conditional
quantile restriction (A1) were known up to a finite dimensional parameter, then we could
easily construct an estimate of θ0 whose asymptotic covariance matrix attains the classical
Cramer-Rao bound. As an illustration, consider the following conditionally heteroskedastic
(CH) model with linear heteroskedasticity
(5)
Yt = β 00 Vt + (1 + |γ 00 Rt |)Ut ,
where Wt ≡ (Vt0 , Rt0 )0 , the process {(Yt , Wt0 )0 } is α-mixing, the error sequence {Ut } is independent of {Wt } and iid with some absolutely continuous distribution function H0 (·) (continuous
density h0 (·)), such that E(Ut ) = 0 and E(Ut2 ) = 1, and where β 0 and γ 0 denote the true
values of the parameters β ∈ B ⊆ Rb and γ ∈ Γ ⊆ Rc . Letting Vt ≡ (1, Yt−1 )0 and Rt ≡ Ut−1
18
KOMUNJER AND VUONG
the above equation reduces to a well-known AR(1)-ARCH model, for example (Koenker and
Zhao, 1996).12
4.1.1. Case 1: no nuisance parameter. Assume that the distribution function H0 (·) is known.
In financial applications h0 (·) is typically chosen to be a standardized Gaussian or Studentt density. The conditional density of Yt in the CH model (5) then equals ft0 (y) = (1 +
|γ 00 Rt |)−1 h0 ([1 + |γ 00 Rt |]−1 [y − β 00 Vt ]), and its conditional α-quantile is given by: β 00 Vt +
H0−1 (α)(1+|γ 00 Rt |). Here, the parameter of interest is θ ≡ (β 0 , γ 0 )0 ∈ Θ ≡ B ×Γ, Θ ⊆ Rk with
k ≡ b+c. Note that θ is the only unknown parameter of the conditional density ft0 (·). Hence,
we are in the case where the true conditional density is known up to a finite dimensional pa-
rameter. The true value θ0 ≡ (β 00 , γ 00 )0 of θ can be estimated by using a maximum likelihood
approach. Under standard regularity conditions (Bickel, 1982; Newey, 2004), the maximum√
d
likelihood estimator (MLE) θ̃T of θ0 is known to be efficient: (IT0 )1/2 T (θ̃T −θ0 ) → N (0, Id),
P
where IT0 is the Fisher information matrix, IT0 ≡ T −1 Tt=1 E[(∇θ ln ft0 (Yt ))(∇θ ln ft0 (Yt ))0 ],
in which the gradient is evaluated at θ0 .13
4.1.2. Case 2: finite dimensional nuisance parameter. In many interesting situations, the
true density h0 (·) of Ut in the CH model (5) is not entirely known and this uncertainty
adversely affects the precision of the M—estimates of θ0 . A familiar case is the one where the
error Ut belongs to some parametric family of distributions, indexed by a finite dimensional
parameter τ . For example, instead of being standardized Gaussian we can assume H0 (·)
to be a standardized Asymmetric Power Distribution (APD), with unknown exponent and
asymmetry parameters (Komunjer, 2005a). In other words, the true distribution function of
Ut is of the form H0 (·, τ 0 ) where τ 0 ∈ Υ ⊂ R+
∗ × (0, 1) is the unknown parameter of the APD
family. Here, the true set of conditional densities f 0 belongs to the parametric family P,
P ≡ {f(η), η ∈ Π} with f(η) ≡ {ft (·, η) : R → R+
∗ , 1 6 t 6 T, T > 1}, indexed by a finitedimensional parameter η ∈ Π, Π ⊆ Rp : η ≡ (β 0 , γ 0 , τ 0 )0 ∈ Π ≡ B × Γ × Υ and p ≡ b + c + 2.
The members f (η) of P are such that ft (y, η) = (1 + |γ 0 Rt |)−1 h0 ([1 + |γ 0 Rt |]−1 [y − β 0 Vt ], τ ),
for all t, 1 6 t 6 T, T > 1, and the conditional quantile parameter θ is now given by
12In
that case we moreover assume that the parameter spaces B and Γ are such that the standard
stationarity and invertibility conditions hold.
13Following Bickel (1982) and Newey (2004), the regularity conditions imposed are: [f 0 (·)]1/2 is meant
square differentiable with respect to θ0 , the Fisher information matrix IT0 is nonsingular and continuous in
θ0 on Θ.
EFFICIENT QUANTILE ESTIMATION
19
θ ≡ (β 0 , γ 0 , q)0 ∈ Θ ≡ B ×Γ×Q, Θ ⊆ Rk with k ≡ b+c+1.14 In this interesting situation, the
parameter of interest θ has a lower dimensionality than η: dim θ = k and dim η = p = k + 1.
We write θ = θ(η), with θ : Π → Θ being some continuously differentiable function, and
interpret the rest of η as a nuisance parameter (Stein, 1956; Bickel, 1982).
Similar to the previous case, we assume that the above parametric model f (η) is regular
(Bickel, 1982; Newey, 2004), that all the conditional densities ft (·, η) satisfy the conditional
quantile restriction (A1) and are continuously differentiable on R for each η ∈ Π, and that
ft (Yt , ·) is continuously differentiable on Π a.s. − P . Let η0 index the true set of conditional
densities of Yt , i.e. f (η0 ) = f 0 , so that the true value of interest θ0 is now written as
θ0 = θ(η0 ) where η0 ≡ (β 00 , γ 00 , τ 00 )0 . Also, let IT (η) denote the Fisher information matrix
P
of the parametric model P, IT (η) ≡ T −1 Tt=1 E[(∇η ln ft (Yt , η))(∇η ln ft (Yt , η))0 ]. Then,
√
d
an estimator θ̃T of θ0 is efficient if and only if (CT0 )−1/2 T (θ̃T − θ0 ) → N (0, Id), with
CT0 ≡ ∇η θ(η 0 )(IT (η 0 ))+ ∇η θ(η0 )0 . In the special case where the sequence {(Yt , Wt0 )0 } is iid,
several authors have derived necessary and sufficient conditions for the MLE to be efficient
(see, e.g., Conditions S and S ∗ in Stein, 1956; Bickel, 1982; Manski, 1984); those are typically
expressed as orthogonality conditions on the gradient of the log-likelihood ∇η ln ft (Yt , η 0 ).
4.1.3. Case 3: infinite dimensional nuisance parameter. Now consider the more realistic situation in which the true density of Ut in Equation (5) is entirely unknown. Instead, f 0 are
only known to belong to a class S which contains all parametric families such as P. Unlike
in P, the sets of densities in S are indexed by an additional infinite dimensional parameter.
In the case of our CH model (5) this infinite dimensional parameter is the unknown probability density h0 (·) of the error term Ut . The density h0 (·) could be for example Gaussian,
Student-t, Gamma or any other probability density in a set H–set of all families h of probability densities, which are parametrized by τ and satisfy some appropriate conditions, such
as being standardized.
The set S is the union of all parametric sub-families Ph ≡ {fh (η), η ∈ Π} obtained when h
varies across H. For any given h ∈ H, the parametric submodel fh (η) is defined as fh (η) ≡
{fht (·, η) : R → R+
∗ , 1 6 t 6 T, T > 1} and is assumed to satisfy standard regularity condiP
tions (Bickel, 1982; Newey, 2004). We let IhT (η) ≡ T −1 Tt=1 E[(∇η ln fht (Yt , η))(∇η ln fht (Yt ,
η))0 ] be the Fisher information matrix of the parametric submodel Ph . In particular, the
14The
set Q corresponds to the range of α-quantiles of Ut when the parameter τ of its distribution function
H0 (·, τ ) varies in Υ.
20
KOMUNJER AND VUONG
0
matrix IhT (η 0 ), in which fh (η0 ) = f 0 , is such that ChT
≡ ∇η θ(η0 )0 (IhT (η0 ))+ ∇η θ(η0 ) is
nonsingular.
In addition, we assume that for any η ∈ Π and h ∈ H, the conditional densities fht (·, η)
satisfy the conditional quantile restriction (A1) and are continuously differentiable on R,
and that for any h ∈ H, fht (Yt , ·) are continuously differentiable a.s. − P on Π. Then, the
semiparametric efficiency bound for the conditional quantile parameter θ0 is defined as the
0
supremum of ChT
over those h. If such a bound is attained by a particular family h∗ , then
P ∗ ≡ Ph∗ is called the least favorable parametric submodel.
4.2. Least favorable parametric submodel. Following Stein’s (1956) ingenious definition, VT0 in Theorem 3 is the semiparametric efficiency bound, if and only if, there exists a
∗
parametric submodel Ph∗ in which the MLE θ̃T of the true parameter θ0 has the same asymptotic covariance matrix VT0 . The following theorem exhibits the least favorable parametric
submodel which satisfies the conditional quantile restriction (A1).
Theorem 4 (Least Favorable Parametric Submodel). Given M and the set of true
conditional densities f 0 ≡ {ft0 , 1 6 t 6 T, T > 1}, consider the parametric submodel
P ∗ ≡ {f ∗ (θ), θ ∈ Θ} parametrized by the conditional quantile parameter θ in which f ∗ (θ) ≡
{ft∗ (·, θ) : R → R+
∗ , 1 6 t 6 T, T > 1} with
ft∗ (y, θ) ≡
(6)
ft0 (y)
α(1 − α)λ(θ) exp{λ(θ)[Ft0 (y) − Ft0 (qα (Wt , θ))][1I(qα (Wt , θ) − y) − α]}
,
1 − exp{λ(θ)[1 − Ft0 (qα (Wt , θ)) − 1I(qα (Wt , θ) − y)][1I(qα (Wt , θ) − y) − α]}
for all y ∈ R, where λ(θ) ≡ Λ(θ − θ0 ) and Λ : Rk → R is at least twice continuously differ-
entiable on Rk with Λ(·) > 0 on Rk \{0}, Λ(0) = 0, ∇θ Λ(0) = 0, ∆θθ Λ(0) nonsingular and
|∆θθ Λ(·)| < ∞ in a neighborhood of 0.15 Then, under (A0)(ii) and (A1), P ∗ is a parametric
submodel in S, i.e.:
(i) for any t, 1 6 t 6 T, T > 1, ft∗ (·, θ) is a probability density for all θ ∈ Θ;
(ii) for any t, 1 6 t 6 T, T > 1, ft∗ (·, θ) satisfies the conditional quantile restriction
Eθ [1I(qα (Wt , θ) − Yt ) − α|Wt ] = 0, a.s. − P , for all θ ∈ Θ, where Eθ (·|Wt ) denotes the
conditional expectation under the density ft∗ (·, θ) for Yt given Wt ;
(iii) f 0 ∈ P ∗ .
Moreover, under (A0)-(A1) and (A5)-(A7)(i), P ∗ is the least favorable submodel in S, i.e.
15A
simple function Λ(·) in Equation (6) which satisfies the conditions of Theorem 4 is Λ(x) = x0 x.
EFFICIENT QUANTILE ESTIMATION
21
√
∗
∗
d
the asymptotic distribution of the MLE θ̃T associated with P ∗ is (VT0 )−1/2 T (θ̃T − θ0 ) →
N (0, Id) where VT0 is as defined in Theorem 3.
Because P ∗ is a parametric submodel of the set S of all densities satisfying the conditional
quantile restriction in (A1), the semiparametric efficiency bound for θ0 is by Stein’s (1956)
∗
definition at least as large as the asymptotic variance of the above MLE θ̃T ; Theorem 4
shows that the latter equals VT0 . On the other hand, in Theorem 3 we have shown that VT0 is
also the minimum of the asymptotic variances of the consistent and asymptotically normal
M—estimators of θ0 . It follows, first, that the semiparametric efficiency bound is VT0 , and,
second, that the parametric model P ∗ is the least favorable parametric submodel in S.
The first result–that VT0 is the semiparametric efficiency bound–has the following interpretation: when the only thing we know about the model is that it satisfies the conditional
quantile restriction (A1), then we cannot estimate the true conditional quantile parameter
θ0 with precision higher than that given by VT0 . Note that our result uses the moment restriction (A1) only; we do not make any additional assumptions regarding the properties of
the “error” term Yt − qα (Wt , θ) (other than those contained in (A1) and (A6)). In particular,
we allow for Yt − qα (Wt , θ) to be dependent and nonidentically distributed.
Perhaps the most important aspect of Theorem 4 is that it relaxes the independence
assumption. So far as time series data are concerned, two leading situations in which
the independence is violated come into mind. First is the CH model (5): Wt contains
serially dependent exogenous variables or/and lags of Yt , residuals are uncorrelated and
conditionally heteroskedastic.16 There are some results on this case in Newey and Powell (1990), under the additional assumption that {(Yt , Wt0 )0 } is iid. The authors derive
the semiparametric efficiency bound for the parameters in the linear quantile regression
qα (Wt , θ) = θ0 Wt by allowing for conditional heteroskedasticity (given Wt ) in the “error”
term Yt − θ0 Wt . The first part of Theorem 4 generalizes Newey and Powell’s (1990) re-
sults to the case where the sequence {(Yt , Wt0 )0 } is weakly dependent and heterogeneous,
as in (A6). Unsurprisingly, when the data is iid and qα linear, the bound VT0 reduces to
V 0 ≡ α(1 − α){E[(ft0 (qα (Wt , θ0 )))2 Wt Wt0 ]}−1 derived by Newey and Powell (1990).17 In the
second time series situation of interest, the residuals themselves are correlated in addition to
16In
the CH model (5) we have: Yt − qα (Wt , θ0 ) = (1 + |γ 00 Rt |)[Ut − µ0 − σ 0 H0−1 (α)].
17This result is a special case of the result derived by Chamberlain (1987) for models with conditional
moment restrictions.
22
KOMUNJER AND VUONG
f(y,θ)
ft0(y)
θ=3/4
θ=7/8
1.2
1
0.8
0.6
0.4
0.2
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
y
Figure 1. Case α = .5, qα (Wt , θ) = θ and ft0 (y) = exp(−2|y|).
being heteroskedastic. Note that this situation is not covered in the CH model (5); however,
our assumption (A1) does not exclude the possiblity that Yt − qα (Wt , θ) be correlated. So far
there exist no results on semiparametric efficiency bound which cover this dependent case.
To the best of our knowledge, Theorem 4 provides the first result on attainable asymptotic
efficiency for nonlinear (and possibly censored) conditional quantile models when the data
is dependent.
The second result of Theorem 4–an analytic expression of the least favorable parametric
submodel–is entirely new and not yet seen in the literature on efficient estimation under
conditional moment restrictions. The density ft∗ (·, θ) in Equation (6) is not of the ‘tickexponential’ form derived by Komunjer (2005b): it depends on the true density ft0 (·) as
well as the true value θ0 and contains terms such as λ(θ). In the least favorable parametric
submodel P ∗ , θ parametrizes both the conditional quantile model M and the shape of
ft∗ (·, θ)–in other words, the shape of ft∗ (·, θ) is now determined by ft0 (·) and θ (see Figure
1 for a purely location model of a conditional median). In particular, the density ft∗ (·, θ) is
discontinuous for all values of θ different from θ0 ; when θ = θ0 the density ft∗ (·, θ0 ) equals
the true density ft0 (y) which is continuous.
With the semiparametric efficiency bound VT0 in hand, we now turn to the problem of
constructing a conditional quantile estimator which actually attains the bound.
EFFICIENT QUANTILE ESTIMATION
23
5. Efficient Conditional Quantile Estimator
As already pointed out in Section 4, the shape ξ ∗t of the optimal objective function ϕ(·, ·, ξ ∗t )
in Equation (4) is that of the true conditional distribution Ft0 (·), which is unknown. Hence,
the M—estimator θ∗T is in reality infeasible. We construct our (feasible) efficient conditional
quantile estimator θ̂T by replacing Ft0 (·) in Equation (4) by a nonparametric estimator F̂t (·).
It remains to be shown that the estimator θ̂T retains the same asymptotic variance VT0 . Note
that θ̂T is constructed without using any knowledge about the true Ft0 (·). It will then follow
that the semiparametric efficiency bound VT0 can be attained, and that the feasible estimator
θ̂T is semiparametrically efficient.
We let gt0 (·) and ḡT0 (·) be the true density of Wt and the average true density ḡT0 (·) ≡
P
T −1 Tt=1 gt0 (·) of {W1 , . . . , WT } respectively, and make the following assumptions:18
(A8) for every T > 1, ḡT0 (·) is continuously differentiable of order R > 1 on Rm with
supT >1 supw∈Rm |Dr ḡT0 (w)| < ∞ for every 0 6 |r| 6 R.
(A9) (i) for every t, 1 6 t 6 T, T > 1, Ft0 (·) = F 0 (·|Wt ) and ft0 (·) = f 0 (·|Wt ); (ii)
the function F 0 (·|·) : Rm+1 → [0, 1] is continuously differentiable of order R + 2 with
sup(y,w)∈Rm+1 |Dr F 0 (y|w)| < ∞ for every 0 6 |r| 6 R + 2.
R
(A10) for some γ > 0 and any vanishing sequence {cT }: (i) {w:ḡ0 (w)<cT } ḡT0 (w)dw = o(1),
T
R
R
(ii) {w:ḡ0 (w)<cT } |∇θ qα (w, θ0 )| ḡT0 (w)dw = O(cγT ), and (iii) {w:ḡ0 (w)<cT } f 0 [qα (w, θ0 )|w] ×
T
|∇θ qα (w, θ0 )|ḡT0 (w)dw = O(c2γ
T ).
T
Assumptions (A8) and (A9)(ii) are standard smoothness assumptions on the true densities
gt0 (·)
and ft0 (·); they adapt assumptions NP2 and NP3 used in Andrews (1995) to the case
where the regression function is the conditional distribution (and density) of Yt . On the
other hand, assumption (A9)(i) is an additional assumption we need to impose on the true
distribution of Yt conditional upon Wt in order to construct an estimator that attains the
semiparametric efficiency bound. The content of this assumption is twofold. First, it states
that no information other than that contained in Wt is useful in constructing the conditional
distribution (and density) of Yt . Note that this is a strengthening of our assumption (A1)
which says that Wt contains all the relevant information for the conditional α-quantile of Yt .
Second, assumption (A9)(i) implies that the distribution of Yt conditional on Wt should be
the same as that of Ys conditional on Ws , for any s 6= t.
18Recall
from Section 2.1 that all the components of Wt are continuous.
24
KOMUNJER AND VUONG
Assumption (A10)(i) is weak as it is satisfied if the sequence of probability measures
{P̄T0 (·)} associated with the average densities {ḡT0 (·)} is tight, which is itself implied by the
tightness of {Wt } or equivalently Wt = Op (1).19 The latter is obviously satisfied if the Wt ’s
are identically distributed, but it also holds for dependent and heterogenous Wt ’s if {Wt } is
uniformly integrable and a fortiori if sup16t6T,T t>1 E[|Wt |1+ ] < ∞ for some > 0. Assump-
tions (A10)(ii) and (A10)(iii) are stronger and used to ensure that the bias of Ψ̂T (θ) vanishes
√
at a T -rate. It is similar to conditions that eliminate the asymptotic bias when a stochastic
trimming is employed as in Hardle and Stoker (1989) and Lavergne and Vuong (1996). It
requires that the tails of ḡT0 (·) vanish sufficiently fast given the tail behaviors of |∇θ qα (·, θ0 )|
and f 0 [qα (·, θ0 )|·]. For instance, if supw∈Rm |∇θ qα (·, θ0 )| < ∞ and sup(y,w)∈Rm+1 f 0 (y|w) < ∞,
R
a sufficient (but not necessary) condition for (A10) is that {w:ḡ0 (w)<cT } ḡT0 (w)dw = O(c2γ
T ),
T
which is a condition on the vanishing rate of the tails of the average density ḡT0 (·).
The true conditional distribution F 0 (·|·) can be estimated by the kernel estimator F̂ (·|·)
defined as F̂ (·|w) = 0 if ĝ(w) = 0, and F̂ (y|w) ≡ Ĝ(y, w)/ĝ(w) if ĝ(w) 6= 0 with
T
w − Ws
1 X y − Ys
)K(
),
L(
Ĝ(y, w) ≡
m
T hwT s=1
hyT
hwT
(7)
(8)
where L(y) ≡
R
T
1 X
w − Ws
),
K(
ĝ(w) ≡
m
T hwT s=1
hwT
1I(y − u)K0 (u)du, K(·) is a multivariate kernel, K0 (·) is a univariate kernel
and hwT and hyT are two nonstochastic positive bandwidths. The corresponding kernel
estimator of the true conditional density f 0 (·|·) is given by ∂ F̂ (·|·)/∂y, while ĝ(·) can be
viewed as a kernel estimator of the average true density ḡT0 (·).
In order to eliminate aberrant behavior of kernel estimators for the conditional distribution
(density) of Yt in regions where the densities of {Wt } are small, we define F̂t (·) ≡ dt F̂ (·|Wt ),
where dt ≡ 1I(ĝ(Wt ) − bT ) effectively deletes (trims out) observations for which ĝ(Wt ) < bT
with {bT } a sequence of positive constants. That is, F̂t (·) is a trimmed nonparametric
estimator of the true conditional distribution Ft0 (·) which we now use to construct our
(feasible) estimator θ̂T . Namely, θ̂T is obtained by minimizing the objective function Ψ̂T (θ) ≡
19By
definition (Bilingsley, 1995) the tightness of {P̄T0 (·)} means that for every
∈ (0, 1) there
R0
0
exists M < ∞ such that inf 16 t6 T,T > 1 P̄T ([−M , M ]) > 1 − . Now, {w:ḡ0 (w)<cT } ḡT0 (w)dw =
T
R
R
ḡ 0 (w)dw + {w6∈[−M ,M ]m :ḡ0 (w)<cT } ḡT0 (w)dw 6 cT (2M )m + P̄T0 (Rm \[−M , M ]m )
{w∈[−M ,M ]m :ḡ 0 (w)<cT } T
T
T
< cT (2M )m + showing that (A10)(i) holds as cT = o(1) and
is arbitrary.
EFFICIENT QUANTILE ESTIMATION
T −1
(9)
PT
t=1
25
ϕ(Yt , qα (Wt , θ), ξ̂ t ), in which
ϕ(Yt , qα (Wt , θ), ξ̂ t ) ≡ [α − 1I(qα (Wt , θ) − Yt )][F̂t (Yt ) − F̂t (qα (Wt , θ))],
for every t, 1 6 t 6 T, T > 1. In other words, our (feasible) estimator θ̂T minimizes a
modified version Ψ̂T (·) of the efficient M—objective function Ψ∗T (·) in which we have replaced
the true conditional distribution of Yt given Wt with a nonparametric estimator. As a
consequence, θ̂T is a MINPIN-type estimator (Andrews, 1994a).20 The shape parameter ξ̂ t
of the objective function in Equation (9) is now equal to F̂t (·).
In order to establish the asymptotic properties of our feasible estimator θ̂T we impose the
following conditions on the kernels:
(A11) (i) for any r = (r1 , . . . , rm ) ∈ Nm , the kernel K(·) satisfies supw∈Rm |K(w)| < ∞,
R
R
R
K(w)dw = 1, wr K(w)dw = 0 if 1 6 |r| 6 R − 1, and wr K(w)dw < ∞ if |r| =
R
R; (ii) K(·) has a Fourier transform φ(·) that is absolutely integrable, i.e. |φ(w)|dw <
R
R
∞; (iii) supy∈R |K0 (y)| < ∞, K0 (y)dy = 1, y r K0 (y)dy = 0 if 1 6 r 6 R − 1 and
R R
y K0 (y)dy < ∞, (iv) the kernel K0 (·) is continuously differentiable on R with derivative
satisfying supy∈R |K00 (y)| < ∞.
Assumptions (A11)(i)-(iv) are standard and satisfied, for example, by the multivariate
P
normal-based kernels considered by Bierens (1987): K(x) = (2π)−m/2 Jj=1 aj |bj |−m exp[−
ww0 /(2b2j )], where J > R/2 is a positive integer and {(aj , bj ) : j 6 J} are constants that
P
P
satisfy Jj=1 aj = 1 and Jj=1 aj b2l
j = 0, for l = 1, ..., J − 1.
We now turn to the asymptotic properties of our feasible estimator θ̂T . Note that the
shape ξ̂ t of the objective function in Equation (9) depends on all the data up to time T ,
hence is not Wt -measurable as required by assumption (A2)(i). In consequence, the results
of Theorems 1 and 2 do not apply to θ̂T and its asymptotic properties need to be derived
separately. We first establish the consistency of θ̂T .
Theorem 5 (Consistency of θ̂T ). Suppose that (A0)-(A1), (A5)-(A7)(i), (A8)-(A10)(i),
√
R
R
(A11) hold. If bT = o(1) with bT T hm
wT → ∞, bT /hwT → ∞ and bT /hyT → ∞ as T → ∞,
p
then θ̂T −→ θ0 .
The assumptions on the trimming parameter bT and bandwidths hyT and hwT imply
√
that bT does not vanish too rapidly and that hyT → 0, hwT → 0 and T hm
wT → ∞ as
20Though
θ̂T is a member of the MINPIN family, our objective function associated with Equation (9)
does not satisfy the assumptions used by Andrews (1994a).
26
KOMUNJER AND VUONG
T goes to infinity. Though stronger than necessary, the latter condition is typically used
when deriving uniform convergence rates using the Fourier transform φ(·) of K(·) (Bierens,
1983; Andrews, 1995). In particular, when R 6 m/2, this condition excludes the optimal
−1/(2R+m)
bandwidth hopt
obtained by Stone (1980, 1982) and Truong and Stone (1992).
wT ∼ T
In order to derive the asymptotic normality of our efficient estimator θ̂T , we strengthen
our dependence assumption (A6):
(A6’) the sequence {(Yt , Wt0 )0 } is (i) strictly stationary and (ii) β-mixing with β of size
−r/(r − 2), with 2 < r < 3;
The proof of our result uses Lemma 3 in Arcones (1995) which requires strict stationarity
and β-mixing with r > 2. Note that β-mixing (or absolute regularity) in (A6’)(ii) is a condition intermediate between α-mixing (strong mixing)–which is the weakest form of strong
mixing–and φ-mixing (uniform mixing)–which is the strongest form of mixing (Bradley,
1986). As such, our weak dependence assumption is stronger than that of α-mixing used by
Robinson (1983), for example. Assumption (A6’)(ii) also requires the size of the β-mixing
process to be comprised between −∞ and −3. In other words, we limit the amount of de-
pendence allowed in {(Yt , Wt0 )0 }.21 In particular, Truong and Stone (1992) use the condition
β t = O(ρt ) as t → ∞ for some ρ with 0 < ρ < 1 in order to estimate the conditional quantile
nonparametrically at the optimal rate. Their condition implies β-mixing of arbitrary size
and hence of size −r/(r − 2), with some r, 2 < r < 3.
We can now establish the efficiency of θ̂T .
Theorem 6 (Efficiency of θ̂T ). Suppose that Assumptions (A0)-(A1), (A5), (A6’), (A7)(i)
1/4 R
and (A8)-(A11) hold. If bT = o(T −1/(4γ) ) with bT T 1/4 hyT hm
hwT ) → ∞,
wT → ∞, bT /(T
√
d
1/4 R
0 −1/2
T (θ̂T − θ0 ) → N (0, Id),
and bT /(T hyT ) → ∞, as T → ∞, then θ̂T is efficient: (V̄T )
where
V̄T0 ≡ α(1 − α){T −1
XT
t=1
E[(f 0 (qα (Wt , θ0 )|Wt ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 }−1
is the semiparametric efficiency bound.
21Note
that the −∞ case, obtained when r = 2, corresponds to independence. As the proof of Lemma
10 shows, the assumption (A6’)(ii) is stronger than necessary: we can replace it by β-mixing with mixing
√
PT −1 (r−2)/r
= O( T ).
coefficients β t that satisfy t=1 tβ t
EFFICIENT QUANTILE ESTIMATION
27
The conditions on the trimming parameter and bandwidths are stronger than in Theorem
5. They can be written as:
½
¾
1
1
1/4 R
1/4 R
max
, T hwT , T hyT ¿ bT ¿ 1/(4γ) ,
m
1/4
T hyT hwT
T
where aT ¿ cT means that aT < cT for T sufficiently large. This implies T 1/(4γ)−1/4 ¿
−[(m+1)/R][1/(4γ)+1/4]
hyT hm
. Hence, necessary conditions are γ > 1 and R > (m+1)(γ +
wT ¿ T
1)/(γ − 1).22 For instance, when m = 1, R = 3 and γ = 6, a feasible choice is: hyT ∝ T −1/10 ,
hwT ∝ T −1/10 , and bT ∝ T −1/21 . Moreover, if R > (m + 1)(3γ + 1)/[2(γ − 1)], one can
choose the L2 -optimal bandwidths h∗yT ∝ T −2R/[(2R+m)(2R+m+1)] and h∗wT ∝ T −1/(2R+m) for
estimating f 0 (y|w)ḡT0 (w) and ḡT0 (w).23 For instance, when m = 1, R = 4 and γ = 6, then the
L2 -optimal bandwidths h∗yT ∝ T −8/90 , h∗wT ∝ T −1/9 with trimming parameter bT ∝ T −1/21
can be chosen. In particular, our estimator θ̂T differs from many semiparametric ones that are
√
T -asymptotically normal under assumptions that imply undersmoothing and thus exclude
the L2 -optimal bandwidth.
Without assumption (A9)(i) we would not be able to construct a conditional quantile
estimator which attains V̄T0 . Note however that our general expression for VT0 derived in
Theorem 3 remains valid whether or not we are able to construct an efficient estimator–this
is one of the advantages of using the “supremum” characterization of the semiparametric
efficiency bound.
Our efficient M—estimator θ̂T is asymptotically equivalent to: the ‘one-step’ estimator
proposed by Newey and Powell (1990), the weighted quantile regression estimator by Zhao
(2001), and the CEL estimator by Otsu (2003). Two important features distinguish our
efficient estimator from the previous ones. First, similar to Otsu’s (2003) CEL estimator,
22As
indicated in Lavergne and Vuong (1996, p.209), we have c2T = o
³R
0
0 (w)<c } ḡT (w)dw
{w:ḡT
T
´
when ḡT0 (·)
is continuously differentiable on Rm and monotonically decreasing in the tails, whether or not the support
³R
´
of ḡT0 (·) is bounded. Under the same conditions, it can be shown that cT = o {w:ḡ0 (w)<cT } |w|ḡT0 (w)dw
T
when the support of ḡT0 (·) is Rm . Hence, (A10)(ii) implies γ < 1 when qα (w, θ0 ) = w0 θ0 , which contradicts
γ > 1. On the other hand, when the support of ḡT0 (·) is bounded (uniformly in T ), it can be shown that
R
0
|w|ḡT0 (w)dw = O(c2−δ
T ), where δ > 0 can be arbitrarily close to zero depending on ḡT (·).
{w:ḡ 0 (w)<cT }
T
Hence, our assumptions allow the linear quantile specification qα (w, θ0 ) = w0 θ0 , provided the support of
Wt is bounded (uniformly in T ). Bounded supports, however, are not required as our assumptions allow
for unbounded ones. In this case f 0 [qα (w, θ0 )|w] and |∇θ qα (w, θ0 )| should vanish in the tails of Wt at
appropriate rates for (A10) and the trimming/bandwidth conditions to be compatible.
23See
1/(m+1)
∝ T −1/(2R+m+1) .
Stone (1980, 1982) where h∗yT solves (h∗yT h∗m
wT )
28
KOMUNJER AND VUONG
our M—estimator θ̂T does not require a preliminary consistent estimate of θ0 . It is well
established that such a preliminary step causes poor small sample performance in GMM
estimation (Altonji and Segal, 1996).24 Second, the objective functions ϕ(·, ·, ξ̂ t ) used in the
construction of θ̂T depend on a nonparametric estimator of the distribution function F 0 (·|·).
Newey and Powell’s (1990) and Zhao’s (2001) efficient estimators on the other hand depend
on nonparametric estimators of the density f 0 (·|·).25 Both features can potentially affect the
small sample properties of these efficient estimators.
6. Conclusion
The contributions of this paper are twofold: first, it derives the semiparametric efficiency
bound VT0 for parameters of conditional quantiles in time series models with weakly dependent and/or heterogeneous data. Our bound VT0 generalizes expressions previously derived
by the literature on efficient conditional quantile estimation. In particular we allow the data
to exhibit dependence and/or conditional heteroskedasticity. The second result of the paper
is to show that efficient estimation is possible in models for conditional quantiles in which
the true conditional distribution does not depend on any other variables than those entering
the quantile. In such models, the semiparametric efficiency bound equals V̄T0 and we are able
to construct an M—estimator θ̂T which actually attains the bound. Our efficient estimator is
different from previous ones and is of the MINPIN-type as the efficient M—objective function
that it minimizes depends on a nonparametric estimator of the conditional distribution.
An interesting by-product of the paper is to show that the class of M—estimators is rich
enough to contain estimators that are efficient, at least in models for conditional quantiles. In
general, one can think of the class of GMM estimators as being the widest one. Then comes
the class of M—estimators which can be viewed as just-identified GMM estimators. Finally
comes the class of QMLEs which is the class of M—estimators whose objective functions satisfy an additional “integrability” condition and can thus be interpreted as quasi-likelihoods.
In models for conditional quantiles, efficient estimators do not belong to the class of QMLEs,
24In
models with unconditional moment restrictions, Newey and Smith (2004) show how empirical likeli-
hood based methods improve the finite sample properties of GMM.
25In particular, when estimating F 0 (·|·) and f 0 (·|·) by kernel estimators, there is always one smoothing
parameter less to choose for conditional distributions (Hansen, 2004a,b). For example, in the iid case, our
efficient estimator θ̂T can be constructed by using the empirical distribution function.
EFFICIENT QUANTILE ESTIMATION
29
but are contained in the class of M—estimators. Hence, at least from a semiparametric efficiency viewpoint, no advantage is gained by considering GMM over M—estimators. However,
important efficiency improvements are made by going from QMLEs to M—estimators.
Finally, the “supremum” approach we use to derive the semiparametric efficiency bound
VT0 does not seem to suffer from strong independence assumptions traditionally imposed by
the literature on efficient estimation. Our construction of the least favorable parametric
submodel and the corresponding MLE does not depend on any particular dependence or
heterogeneity structure of the data. We conjecture that it can thus be generalized fairly easily
to accommodate for general moment restrictions. The steps to follow in the construction of
semiparametric efficiency bounds in models with time series data seem to be: (1) construct
the largest class of M—estimators which are consistent for the true parameter θ0 of the
conditional moment restriction in hand; (2) within this class, find the minimum asymptotic
covariance matrix–this is a candidate matrix V for the bound–and the M—estimator which
attains this minimum; (3) use its expression to derive the least favorable parametric submodel
of the initial semiparametric model; (4) show that the inverse of the Fisher information
matrix in this submodel equals V . It then follows that V is the semiparametric efficiency
bound. While step (3) is perhaps the crucial one, we have little guidance on how exactly to
construct the least favorable parametric submodel under general moment restrictions. This
seems to be an important topic which we leave for future research.
7. Proofs
Proof of Theorem 1. First, note that (A2)-(A3) together with the compactness of the parameter space Θ, are sufficient conditions for θT to be consistent for θ0∞ ∈ Θ̊ (see, e.g., Theorem
2.1 in Newey and McFadden, 1994). We now show that under correct conditional quantile
model specification assumption (A1), we have: θ0∞ = θ0 for any T > 1 if and only if there
exist a real function A(·, ξ t ) : R → R, twice continuously differentiable and strictly increasing
a.s. − P on Q with derivative a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y, and a real function B(·, ξ t ) : R → R,
such that, for any T > 1 and every t, 1 6 t 6 T,
(10)
ϕ(Yt , qt , ξ t ) = [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )] + B(Yt , ξ t ), a.s. − P,
on R × Q × Et .
We treat separately the two implications contained in the above equivalence. We start
with the sufficiency part of the proof and show that if, for any T > 1 and every t, 1 6 t 6 T ,
30
KOMUNJER AND VUONG
ϕ(·, ·, ξ t ) is as in equation (10) above, then θ0∞ = θ0 for any T > 1, i.e. θ0 is also a minimizer of
P
E[ΨT (θ)] on Θ̊. Given (A3)(i) we know that ∇θ E[ΨT (θ)] = T −1 Tt=1 E[∇θ ϕ(Yt , qα (Wt , θ), ξ t )].
From (10) and the a.s. − P twice continuous differentiability of A(·, ξ t ) on Q, for any t,
1 6 t 6 T, T > 1, we have:
E[∇θ ϕ(Yt , qα (Wt , θ), ξ t )]
= E{∇θ qα (Wt , θ)a(qα (Wt , θ), ξ t )[1I(qα (Wt , θ) − Yt ) − α]}
= E{∇θ qα (Wt , θ)a(qα (Wt , θ), ξ t )E[1I(qα (Wt , θ) − Yt ) − α|Wt ]},
so that by using the correct model specification assumption (A1) we get E[1I(qα (Wt , θ0 ) −
Yt ) − α|Wt ] = 0, a.s. − P , for every t, 1 6 t 6 T, T > 1, and hence ∇θ E[ΨT (θ0 )] = 0.
P
Similarly, ∆θθ E[ΨT (θ)] = T −1 Tt=1 E[∆θθ ϕ(Yt , qα (Wt , θ), ξ t )] and
E[∆θθ ϕ(Yt , qα (Wt , θ), ξ t )]
½
∂a(qα (Wt , θ), ξ t )
∇θ qα (Wt , θ)∇θ qα (Wt , θ)0
=E [
∂y
+a(qα (Wt , θ), ξ t )∆θθ qα (Wt , θ)][1I(qα (Wt , θ) − Yt ) − α]}
+ E [∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 a(qα (Wt , θ), ξ t )δ(qα (Wt , θ) − Yt )]
½
∂a(qα (Wt , θ), ξ t )
=E [
∇θ qα (Wt , θ)∇θ qα (Wt , θ)0
∂y
+a(qα (Wt , θ), ξ t )∆θθ qα (Wt , θ)]E[1I(qα (Wt , θ) − Yt ) − α|Wt ]}
+ E {∇θ qα (Wt , θ)∇θ qα (Wt , θ)0 a(qα (Wt , θ), ξ t )E[δ(qα (Wt , θ) − Yt )|Wt ]}
so that by using (A1)
∆θθ E[ΨT (θ0 )]
= T −1
T
P
t=1
(11)
= T −1
T
P
t=1
E{∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 a(qα (Wt , θ0 ), ξ t )E[δ(qα (Wt , θ0 ) − Yt )|Wt ]}
E[∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))],
where for every t, 1 6 t 6 T , ft0 (·) is the true probability density function of Yt conditional
on Wt . We now show that ∆θθ E[ΨT (θ0 )] À 0. By using (11), we know that for any χ ∈ Rk ,
P
χ0 ∆θθ E[ΨT (θ0 )]χ = 0 only if T −1 Tt=1 E[χ0 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 χa(qα (Wt , θ0 ), ξ t )×
EFFICIENT QUANTILE ESTIMATION
31
ft0 (qα (Wt , θ0 ))] = 0. Now, note that for any t, 1 6 t 6 T and T > 1,
E[χ0 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 χa(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))]
= E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))] > 0,
(12)
for any χ ∈ Rk , since we know that a(qα (Wt , θ0 ), ξ t ) > 0, a.s. − P and ft0 (qα (Wt , θ0 )) >
0, a.s. − P . Taking into account the inequality in (12) we have that for any χ ∈ Rk ,
χ0 ∆θθ E[ΨT (θ0 )]χ = 0 only if E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))] = 0, for
all t, 1 6 t 6 T , T > 1. Using again the strict positivity of a(·, ξ t ) and ft0 (·) this last
equality is true only if χ0 ∇θ qα (Wt , θ0 ) = 0, a.s. − P , for every t, 1 6 t 6 T , T > 1. This,
together with (A0)(iii), implies that χ = 0. From there we conclude that ∆θθ E[ΨT (θ0 )] À 0
and therefore θ0 is a minimizer E[ΨT (θ)] on Θ̊. Since by (A3)(ii) this minimizer is unique,
we have that for any T > 1, θ0∞ = θ0 which completes the sufficiency part of the proof.
We now show that the functional form of ϕ(·, ·, ξ t ) in (10) is necessary for θ0∞ = θ0 to
hold for any T > 1. Given the differentiability of E[ΨT (θ)] on Θ by (A3)(i), a necessary
requirement for θ0∞ = θ0 is that the first order condition ∇θ E[ΨT (θ0 )] = 0 be satisfied, which
is equivalent to
T
−1
T
X
t=1
E{∇θ qα (Wt , θ0 )E[
∂ϕ
(Yt , qα (Wt , θ0 ), ξ t )|Wt ]} = 0.
∂qt
Since the above equality needs to hold for any T > 1, any choice of conditional quantile
model M and for any true parameter θ0 ∈ Θ̊, we need to find a necessary condition for the
implication
(13)
E[1I(qα (Wt , θ0 ) − Yt ) − α|Wt ] = 0, a.s. − P
∂ϕ
⇒ E[ (Yt , qα (Wt , θ0 ), ξ t )|Wt ] = 0, a.s. − P,
∂qt
to hold, for all t, 1 6 t 6 T , T > 1, and all absolutely continuous distribution function Ft0
in F. We now show that
(14)
∂ϕ
(Yt , qα (Wt , θ0 ), ξ t ) = a(qα (Wt , θ0 ), ξ t )[1I(qα (Wt , θ0 ) − Yt ) − α], a.s. − P,
∂qt
for any θ0 ∈ Θ̊ and any t, 1 6 t 6 T , T > 1, where a(·, ξ t ) : R → R is strictly positive a.s.−P
on Q, is a necessary condition for (13). Using a generalized Farkas lemma (Lemma 8.1, p
240, vol 1) in Gourieroux and Monfort (1995), (13) implies there exists a Wt -measurable
32
KOMUNJER AND VUONG
random variable at such that
∂ϕ
(Yt , qα (Wt , θ0 ), ξ t ) = at [1I(qα (Wt , θ0 ) − Yt ) − α], a.s. − P.
∂qt
Since the left-hand side only depends on Yt , qα (Wt , θ0 ) and ξ t , the same must hold for
the right-hand side. Hence, at can only depend on qα (Wt , θ0 ) and ξ t and we can write
at = a(qα (Wt , θ0 ), ξ t ); so the equality in (14) holds.
We now need to show that a(·, ξ t ) is strictly positive a.s. − P on Q. A necessary condition
for θ0 ∈ Θ̊ to be a minimizer of E[ΨT (θ)] (in addition to the above first order condition) is
that for every χ ∈ Rk the quadratic form χ0 ∆θθ E[ΨT (θ0 )]χ > 0 (existence of ∆θθ E[ΨT (θ)] is
ensured by (A3)(i)).26 Taking into account (14) and our previous computations leading to
(11), we have
0
χ ∆θθ E[ΨT (θ0 )]χ = T
−1
T
X
χ0 E[∆θθ ϕ(Yt , qα (Wt , θ0 ), ξ t )]χ
t=1
= T
−1
T
X
t=1
0
E[(χ0 ∇θ qα (Wt , θ0 ))2 a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))].
Hence, the quadratic form χ ∆θθ E[ΨT (θ0 )]χ is nonnegative for any T > 1, any conditional quantile model M, any true value θ0 ∈ Θ̊ and any conditional density ft0 (·), only
if a(qα (Wt , θ0 ), ξ t ) > 0, a.s. − P , for all t, 1 6 t 6 T, T > 1. Note that the uniqueness of the
solution θ0 implies that a(qt , ξ t ) > 0, a.s. − P for any qt ∈ Q and for all t, 1 6 t 6 T, T > 1.
The remainder of the proof is straightforward: we need to integrate the necessary condition
(14) with respect to qt . Note that (14) can be written
(
(1 − α)a(qα (Wt , θ0 ), ξ t ), if Yt 6 qα (Wt , θ0 ),
∂ϕ
(Yt , qα (Wt , θ0 ), ξ t ) =
, a.s. − P,
∂qt
−αa(qα (Wt , θ0 ), ξ t ), if Yt > qα (Wt , θ0 ),
for any θ0 ∈ Θ̊ and for any t, 1 6 t 6 T, T > 1. Together with the continuity of ϕ(Yt , ·, ξ t )
a.s. − P on Q in (A2)(ii), the above integrates into
(
(1 − α)[A(qα (Wt , θ0 ), ξ t ) − A(Yt , ξ t )], if Yt 6 qα (Wt , θ0 ),
ϕ(Yt , qα (Wt , θ0 ), ξ t ) = B(Yt , ξ t )+
−α[A(qα (Wt , θ0 ), ξ t ) − A(Yt , ξ t )], if Yt > qα (Wt , θ0 ),
a.s. − P , where for every t, 1 6 t 6 T, T > 1, A(·, ξ t ) is an indefinite integral of a(·, ξ t ),
Rq
A(qt , ξ t ) ≡ a t a(r, ξ t )dr, a ∈ R, and B(·, ξ t ) : R → R is a real function. Note that the above
26Note
that this requirement is weaker than the positive definiteness of ∆θθ E[ΨT (θ0 )], ∆θθ E[ΨT (θ0 )] À 0,
which is a sufficient condition for θ0 to be a minimum.
EFFICIENT QUANTILE ESTIMATION
33
equality has to hold for any θ0 ∈ Θ̊ so that
(15)
ϕ(Yt , qα (Wt , θ), ξ t ) = B(Yt , ξ t ) + [α − 1I(qα (Wt , θ) − Yt )][A(Yt , ξ t ) − A(qα (Wt , θ), ξ t )], a.s. − P,
for every t, 1 6 t 6 T, T > 1, and for all θ ∈ Θ; this is a necessary condition for the M—
estimator θT to be consistent for θ0 . Equality (15) implies that for any t, 1 6 t 6 T, T > 1,
ϕ(Yt , qt , ξ t ) = B(Yt , ξ t ) + [α − 1I(qt − Yt )][A(Yt , ξ t ) − A(qt , ξ t )], a.s. − P on R × Q × Et .
¤
Proof of Theorem 2. To show that Theorem 2 holds, we first show that under primitive conp
ditions given in (A0)-(A2) and (A4)-(A7), θT is consistent for θ0 , i.e. θT − θ0 → 0. We
proceed by checking that all the assumptions for consistency used by Komunjer (2005b) in
her Theorem 3 hold. Given that her proof of consistency for the family of tick-exponential
QMLEs derived in Theorem 3 does not require any assumptions on the limits in ±∞ of
the functions A(·, ξ t ), it applies directly to the M—estimator θT defined in (A2). Assumptions A2 and A3 in Komunjer (2005b) are satisfied by imposing our (A5) and (A4), respectively. The α-mixing condition A4 in Komunjer (2005b) and the assumption that Wt
is a function of some finite number of lags of Xt stated in A0.iv in Komunjer (2005b) are
used to ensure that {(Yt , Wt0 )0 } is α-mixing of with α of the same size −r/(r − 2), r > 2.
Here, we directly impose the mixing of the sequence {(Yt , Wt0 )0 } in our (A6), which is suf-
ficient for the proof of Theorem 3 in Komunjer (2005b) to go through. Finally, the moment conditions A5 in Komunjer (2005b) directly follow from our (A7) and the fact that
E[supθ∈Θ |∇θ qα (Wt , θ)|] 6 max{1, E[supθ∈Θ |∇θ qα (Wt , θ)|2 ]} < ∞. Hence we can use the
results of Theorem 3 in Komunjer (2005b)–corresponding to the case where the conditional
quantile model is correctly specified (A1)–which proves the consistency of θT . Similarly,
we derive asymptotic normality by using the results of Corollary 5 in Komunjer (2005b).
The boundedness of the second derivative of A(·, ξ t ) contained in assumption A3’ in Komunjer (2005b) is directly implied by (A4). The moment condition in assumption A5’ in
Komunjer (2005b) follows from our (A7). Finally in our setup we have assumed that the
true conditional density ft0 (·) of Yt is strictly positive and bounded on R, which verifies assumption A6 in Komunjer (2005b). Hence, from Corollary 5 in Komunjer (2005b) we know
√
d
that T (Σ0T )−1/2 ∆0T (θT − θ0 ) → N (0, Id) where
(16)
∆0T = T −1
PT
t=1
E[a(qα (Wt , θ0 ), ξ t )ft0 (qα (Wt , θ0 ))∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ],
34
KOMUNJER AND VUONG
and
Σ0T = T −1
(17)
PT
t=1
α(1 − α)E[(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ].
¤
Proof of Theorem 3. The proof of this theorem is inspired by a similar result by Gourieroux,
Monfort, and Trognon (1984). Let VT0 be as defined in Theorem 3 and consider the difference
(∆0T )−1 Σ0T (∆0T )−1 − VT0 . We show that this difference is positive definite for any A(·, ξ t ),
1 6 t 6 T, T > 1, in Theorem 1:
(∆0T )−1 Σ0T (∆0T )−1 − VT0
= VT0 (VT0 )−1 VT0 − VT0 ∆0T (∆0T )−1 − (∆0T )−1 ∆0T VT0 + (∆0T )−1 Σ0T (∆0T )−1
= T −1
T
P
t=1
E{VT0 [
(ft0 (qα (Wt , θ0 )))2
∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ]VT0
α(1 − α)
− VT0 [ft0 (qα (Wt , θ0 ))a(qα (Wt , θ0 ), ξ t )∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ](∆0T )−1
− (∆0T )−1 [ft0 (qα (Wt , θ0 ))a(qα (Wt , θ0 ), ξ t )∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ]VT0
+ (∆0T )−1 [α(1 − α)(a(qα (Wt , θ0 ), ξ t ))2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 ](∆0T )−1 },
so that
T
(∆0T )−1 Σ0T (∆0T )−1 − VT0 =
X
1
T −1
E[χt χ0t ],
α(1 − α)
t=1
where for every t, 1 6 t 6 T, T > 1, we let
χt ≡ [ft0 (qα (Wt , θ0 ))VT0 − α(1 − α)a(qα (Wt , θ0 ), ξ t )(∆0T )−1 ]∇θ qα (Wt , θ0 ),
and a(y, ξ t ) ≡ ∂A(y, ξ t )/∂y as previously. Hence, for any A(·, ξ t ), 1 6 t 6 T, T > 1, such that
a(·, ξ t ) > 0, a.s.−P on Q, the matrix (∆0T )−1 Σ0T (∆0T )−1 −VT0 is positive semidefinite. In other
words, the matrix VT0 is the lower bound of the set of asymptotic matrices (∆0T )−1 Σ0T (∆0T )−1
obtained with functions A(·, ξ t ) satisfying the conditions of Theorem 1.
We now show that this lower bound VT0 is attained by an M—estimator θ∗T if and only if
P
its objective function corresponds to Ψ∗T (θ) ≡ T −1 Tt=1 ϕ(Yt , qα (Wt , θ), ξ ∗t ) with
(18)
ϕ(Yt , qt , ξ ∗t ) = [α − 1I(qt − Yt )][Ft0 (Yt ) − Ft0 (qt )], a.s. − P,
EFFICIENT QUANTILE ESTIMATION
35
on R×Q×Et , for every 1 6 t 6 T, T > 1. We first show the necessary part of this equivalence:
VT0 is attained only if for any T > 1, there exist ξ ∗t and A(·, ξ ∗t ), 1 6 t 6 T, T > 1, such that
(19)
T
−1
T
X
E(χt χ0t ) = 0.
t=1
The above needs to hold for any T > 1, hence (19) implies that E(χt χ0t ) = 0 for every t,
1 6 t 6 T, T > 1. Taking into account the positivity a.s. − P of the quadratic form χt χ0t ,
the latter equalities holds only if for every t, 1 6 t 6 T, T > 1, we have χt χ0t = 0, a.s. − P .
Hence, (∆0T )−1 Σ0T (∆0T )−1 = VT0 for any T > 1, if and only if for every t, 1 6 t 6 T, T > 1,
χt = 0, a.s. − P , which combined with (A0)(iii) is equivalent to
ft0 (qα (Wt , θ0 ))
V 0 ∆0 = Id, a.s. − P,
α(1 − α)a(qα (Wt , θ0 ), ξ ∗t ) T T
for all t, 1 6 t 6 T, T > 1. This in turn implies that for every t, 1 6 t 6 T, T > 1 and any
qt ∈ Q,
ft0 (qt )
and VT0 ∆0T = c · Id,
α(1 − α)
where c is some strictly positive real constant, c > 0. Note that the above condition is
a(qt , ξ ∗t ) = c
equivalent to a(qt , ξ ∗t ) = cft0 (qt )/[α(1 − α)] alone, which by integration with respect to qt
gives that for every t, 1 6 t 6 T, T > 1, and any qt ∈ Q
(20)
A(qt , ξ ∗t ) = c
Ft0 (qt )
+ d,
α(1 − α)
with d ∈ R. Condition (20) is both a necessary and a sufficient condition for the equality
in (19) to hold for any T > 1. It is important to note that changing the value of A(·, ξ ∗t )
outside Q does not affect the minima of E[ΨT ] so A(·, ξ ∗t ) can take arbitrary values on R\Q.
To keep the notation simple and without altering the general validity of our result, we set
A(y, ξ ∗t ) = cFt0 (y)/[α(1 − α)] + d, for all y ∈ R. Moreover, changing the constants c and
d does not affect the value of (∆0T )−1 Σ0T (∆0T )−1 so that they can be arbitrarily chosen in
R∗+ × R for any T > 1. For example, we can let c = α(1 − α) and d = 0 in which case
(21)
A(y, ξ ∗t ) = Ft0 (y),
for all y ∈ R; this completes the proof of the necessary part.
Now, we show that under (A0)-(A1), (A5)-(A6) and (A7)(i), the M—estimator θ∗T —obtained
√
d
by minimizing Ψ∗T (θ) associated with (18)—is such that T (VT0 )−1/2 (θ∗T − θ0 ) → N (0, Id).
Note that the shape ξ ∗t of A(·, ξ ∗t ) corresponds to the true conditional distribution Ft0 (·)
which is stochastic and Wt -measurable thereby satisfying (A2)(i). Moreover, Ft0 (·) is twice
36
KOMUNJER AND VUONG
continuously differentiable with bounded ft0 (y) and |dft0 (y)/dy|, which satisfies (A2)(ii) and
(A4). Moreover, Ft0 (·) being bounded by 1 the moment conditions in (A7)(ii) automatically
hold. Hence, we can apply Theorem 2 to show that, under (A0)-(A1), (A5)-(A6) and (A7)(i),
√
d
θ∗T with A(·, ξ ∗t ) as in (21), is asymptotically normally distributed T (Σ0T )−1/2 ∆0T (θT −θ0 ) →
N (0, Id) with
∆0T = T −1
PT
t=1
E{[ft0 (qα (Wt , θ0 ))]2 ∇θ qα (Wt , θ0 )∇θ qα (Wt , θ0 )0 },
and Σ0T = α(1 − α)∆0T , so that (∆0T )−1 Σ0T (∆0T )−1 = VT0 .
¤
Proof of Theorem 4. The following lemma shows that (i) − (iii) in Theorem 4 hold:
Lemma 7. The parametric submodel P ∗ defined by (6) is a submodel of S.
In order to show that P ∗ is the least favorable model, consider estimating the parameter θ
P
∗
in P ∗ by using the MLE θ̃T , which maximizes the log-likelihood LT (θ) ≡ T −1 Tt=1 ln ft (Yt , θ).
∗
STEP1: First, we establish the consistency of θ̃T by checking that conditions (i)-(iv) of
Theorem 2.1 in Newey and McFadden (1994) hold. Given (A0)(i) we know that ln ft∗ (Yt , θ) 6=
ln ft∗ (Yt , θ0 ) a.s. − P , whenever θ 6= θ0 (see Figure 1 for example); this verifies the uniqueness
condition (i) of Theorem 2.1. The compactness condition (ii) of Theorem 2.1 follows by
assumption. Using qt (θ) = qα (Wt , θ) we have
ln ft∗ (Yt , θ) = ln[α(1 − α)ft0 (Yt )] + ln λ(θ) + λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α]
¢
¡
− ln 1 − exp{λ(θ)[1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − Yt ) − Ft0 (qt (θ))]} ,
showing that E[ln ft∗ (Yt , θ)] is continous on Θ and that E[supθ∈Θ | ln ft∗ (Yt , θ)|r+ ] < ∞ for
all t, 1 6 t 6 T, T > 1, and
> 0; this verifies condition (iii) of Theorem 2.1. We show
the uniform convergence condition (iv) of Theorem 2.1 by following the same steps as in the
proof of Theorem 3 in Komunjer (2005b). To simplify the notation let
(22)
x(θ) ≡ [1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − Yt ) − Ft0 (qt (θ))] and u(z) ≡
exp z
,
1 − exp z
EFFICIENT QUANTILE ESTIMATION
37
for θ ∈ Θ and z ∈ R− . Note that −1 < x(θ) < 0 and −λ(θ) < λ(θ)x(θ) < 0 on Θ a.s. − P .
We have
∇θ ln ft∗ (Yt , θ) =
∇θ λ(θ)
+ ∇θ λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α]
λ(θ)
− λ(θ)ft0 (qt (θ))∇θ qt (θ)[1I(qt (θ) − Yt ) − α]
+ λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))]δ(qt (θ) − Yt )∇θ qt (θ)
(23)
+ u(λ(θ)x(θ))∇θ (λ(θ)x(θ)),
where ∇θ (λ(θ)x(θ)) = ∇θ λ(θ)x(θ) + λ(θ)∇θ x(θ) and
(24)
©
ª
∇θ x(θ) = ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] ∇θ qt (θ).
(The equality in (24) follows from (22) and the fact that [1I(·)]2 = 1I(·).) Note that u(z) =
−1/z − 1/2 + o(1) in the neigborhood of 0 and that λ(θ)x(θ) = op (1) in the neigborhood of
θ0 so
∇θ (λ(θ)x(θ))
+ op (1)
λ(θ)x(θ)
∇θ λ(θ) ∇θ x(θ)
=−
−
+ op (1),
λ(θ)
x(θ)
u(λ(θ)x(θ))∇θ (λ(θ)x(θ)) = −
(25)
in the neighborhood of θ0 . In particular, combining (23) (25), (24) and (22) we get
(26)
∇θ ln ft∗ (Yt , θ0 )
½ 0
¾
ft (qt (θ0 ))[α − 1I(qt (θ0 ) − Yt )] + δ(qt (θ0 ) − Yt )[α − Ft0 (qt (θ0 ))]
= −∇θ qt (θ0 )
[1I(qt (θ0 ) − Yt ) − α][1 − 1I(qt (θ0 ) − Yt ) − Ft0 (qt (θ0 ))]
1
=−
∇θ qt (θ0 )ft0 (qt (θ0 ))[1I(qt (θ0 ) − Yt ) − α],
α(1 − α)
where the second equality uses x(θ0 ) = −α(1 − α) and Ft0 (qt (θ0 )) = α.
Using −1 < x(θ) < 0 on Θ a.s. − P so that
¯
¯
¯ ∇θ λ(θ)
¯
¯
¯ 6 |x(θ)∇θ λ(θ)|,
{1
+
λ(θ)x(θ)u(λ(θ)x(θ))}
¯ λ(θ)
¯
we then have
(27)
sup |∇θ ln ft∗ (Yt , θ)| 6 2 sup |∇θ λ(θ)| + sup |λ(θ)|M0 |∇θ qt (θ)| +
θ∈Θ
θ∈Θ
θ∈Θ
¯
¯
0
¯
¯
f
(q
(θ))∇
q
(θ)
t
θ
t
t
¯ , a.s. − P,
+C1 sup ¯¯
0
¯
θ∈Θ 1 − 1I(qt (θ) − Yt ) − Ft (qt (θ))
38
KOMUNJER AND VUONG
x
where C1 ≡ supx∈[0,supθ∈Θ λ(θ)] | 1−exp(−x)
| < ∞. We have supt>1 supθ∈Θ Ft0 (qt (θ)) ∈ (a, b) with
a > 0 and b < 1, so C2 ≡ supt>1 supy∈R supθ∈Θ (|1 − 1I(qt (θ) − y) − Ft0 (qt (θ))|−1 ) < ∞ and
the last term of the above inequality is bounded above by C1 C2 M0 supθ∈Θ |∇θ qt (θ)|. From
(A7)(i) we know that E[supθ∈Θ |∇θ qt (θ)|] < ∞, so E[supθ∈Θ |∇θ ln ft∗ (Yt , θ)|] < ∞ for all t,
1 6 t 6 T, T > 1, which shows that equation (25) in Komunjer (2005b) holds; together
with (A6) and E[supθ∈Θ | ln ft∗ (Yt , θ)|r+ ] < ∞ for all t, 1 6 t 6 T, T > 1, this establishes
condition (iv) of Theorem 2.1 and completes the proof of consistency.
∗
STEP2: We now show that the MLE θ̃T is asymptotically normal by checking that conditions (i)-(v) of Theorem 7.2 in Newey and McFadden (1994)–applied to ∇θ LT (θ)–hold.
√
∗
p
We first establish the asymptotic first order condition T ∇θ LT (θ̃T ) → 0 by following the
same steps as in the proof of Lemma A1 in Komunjer (2005b): for every j = 1, . . . , k, let
P
∗
G̃∗T,j (h) be the right-derivative of L̃∗T,j (h) ≡ T −1 Tt=1 ln ft∗ (Yt , θ̃T + hej ), where {ej }kj=1 is
∗
the standard basis of Rk , and h ∈ R is such that for all j = 1, . . . , k, θ̃T + hej ∈ Θ. Since
for every j = 1, . . . , k, L̃∗T,j (0) = LT (θ̂T ) so that the functions h 7→ L̃∗T ,j (h) achieve their
maximum at h = 0, we have, for ε > 0, G̃∗T,j (ε) 6 G̃∗T,j (0) 6 G̃∗T,j (−ε), with G̃∗T,j (ε) 6 0
and G̃∗T,j (−ε) > 0. Therefore |G̃∗T,j (0)| 6 G̃∗T,j (−ε) − G̃∗T,j (ε). By taking the limit of this
inequality as ε → 0, we get
¯ ¯
¯#
"¯
∗ ¯
T
¯ ∂λ(θ̃∗ ) ¯ ¯ ∗
X
∂q
)
(
θ̃
∗
∗
¯
¯
t T ¯
T ¯
[1 + 2C1 ] ¯
|G̃∗T,j (0)| 6 T −1
¯ + ¯λ(θ̃T )ft0 (qt (θ̃T ))
¯ 1I{qt (θ̃T ) = Yt }.
¯ ∂θj ¯ ¯
∂θj ¯
t=1
Hence
µ
¶
´
³√
√
∗
∗
T |∇θ LT (θ̃T )| > ε 6 P
T max |G̃T,j (0)| > ε
P
16j6k
¯
¯
¯#
!
à T "¯
∗ ¯
X ¯¯ ∂λ(θ̃∗ ) ¯¯ ¯¯ ∗
√
∂qt (θ̃T ) ¯
∗
∗
T
0
−1
.
6P
¯
¯ + ¯λ(θ̃ )f (q (θ̃ ))
¯ 1I{qt (θ̃T ) = Yt } > ε T (1 + 2C1 )
¯ ∂θj ¯ ¯ T t t T
∂θj ¯
t=1
¯
¯ ¯
¯
∗
∗
∗
¯ ∂λ(θ̃∗T ) ¯ ¯ ∗ 0
∂qt (θ̃T ) ¯
The facts that P (1I{qt (θ̃T ) = Yt } 6= 0) = 0 and that E[¯ ∂θj ¯ + ¯λ(θ̃T )ft (qt (θ̃T )) ∂θj ¯] is
´
³√
∗
bounded then ensure that limT →∞ P
T |∇θ LT (θ̃T )| > ε = 0. Condition (i) of Theorem
7.2 follows from the correct specification of ft (·) (see (iii) in Theorem 4). By (A5), θ0 is an
interior point of Θ so that condition (iii) of Theorem 7.2 holds.
We now check the differentiability of E[∇θ LT (θ)] and the nonsingularity condition (ii) of
P
Theorem 7.2. We have E[∇θ LT (θ)] = T −1 Tt=1 E[∇θ ln ft∗ (Yt , θ)]; using (23) and (24) the
latter is easily shown to be differentiable at any θ ∈ Θ̊. We now show that ∇θ E[∇0θ LT (θ0 )] =
P
T −1 Tt=1 E[∆θθ ln ft∗ (Yt , θ0 )] and that the latter is nonsingular. For u(z) in (22) we have
EFFICIENT QUANTILE ESTIMATION
39
du(z)/dz = u(z) + [u(z)]2 , hence, for any t, 1 6 t 6 T, T > 1,
∆θθ ln ft∗ (Yt , θ)
∆θθ λ(θ) ∇θ λ(θ)∇θ λ(θ)0
+ ∆θθ λ(θ)[Ft0 (Yt ) − Ft0 (qt (θ))][1I(qt (θ) − Yt ) − α]
−
=
2
λ(θ)
[λ(θ)]
©
ª
+ 2∇θ λ(θ)∇θ qt (θ)0 ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[Ft0 (Yt ) − Ft0 (qt (θ))]
½ 0
dft (qt (θ))
0
+ λ(θ)∇θ qt (θ)∇θ qt (θ)
[α − 1I(qt (θ) − Yt )]
dq
¾
dδ(qt (θ) − Yt )
0
0
0
−2ft (qt (θ))δ(qt (θ) − Yt ) + [Ft (Yt ) − Ft (qt (θ))]
dq
© 0
ª
+ λ(θ)∆θθ qt (θ) ft (qt (θ))[α − 1I(qt (θ) − Yt )] + [Ft0 (Yt ) − Ft0 (qt (θ))]δ(qt (θ) − Yt )
£
¤
+ u(λ(θ)x(θ)) + (u(λ(θ)x(θ)))2 (∇θ (λ(θ)x(θ))) (∇θ (λ(θ)x(θ)))0
(28) + u(λ(θ)x(θ))∆θθ (λ(θ)x(θ)),
where ∆θθ (λ(θ)x(θ)) = ∆θθ λ(θ)x(θ) + 2∇θ λ(θ)∇θ x(θ)0 + λ(θ)∆θθ x(θ) and
∆θθ x(θ)
½ 0
dft (qt (θ))
=
[α − 1I(qt (θ) − Yt )] − 2ft0 (qt (θ))δ(qt (θ) − Yt )
dq
¾
dδ(qt (θ) − Yt )
0
[α − Ft (qt (θ))] ∇θ qt (θ)∇θ qt (θ)0
+
dq
©
ª
+ ft0 (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] ∆θθ qt (θ).
Now, note that u(z) + [u(z)]2 = 1/z 2 − 1/12 + o(1) in the neighborhood of 0 so that
£
¤
u(λ(θ)x(θ)) + (u(λ(θ)x(θ)))2 (∇θ (λ(θ)x(θ))) (∇θ (λ(θ)x(θ)))0
=
(29)
∇θ λ(θ)∇θ λ(θ)0
∇θ λ(θ)∇θ x(θ)0
+
2
+ ∇θ qt (θ)∇θ qt (θ)0 ×
[λ(θ)]2
λ(θ)x(θ)
½
¾2
[α − Ft0 (qt (θ))]
[α − 1I(qt (θ) − Yt )]
0
+ δ(qt (θ) − Yt )
ft (qt (θ))
+ op (1),
x(θ)
x(θ)
40
KOMUNJER AND VUONG
in the neighborhood of θ0 . Similarly,
u(λ(θ)x(θ))∆θθ (λ(θ)x(θ))
∇θ λ(θ)∇θ x(θ)0
∆θθ λ(θ) 1
− ∆θθ λ(θ)x(θ) − 2
λ(θ)
2
λ(θ)x(θ)
½ 0
dft (qt (θ)) [α − 1I(qt (θ) − Yt )]
− ∇θ qt (θ)∇θ qt (θ)0
dq
x(θ)
=−
(30)
¾
ft0 (qt (θ))δ(qt (θ) − Yt ) dδ(qt (θ) − Yt ) [α − Ft0 (qt (θ))]
+
−2
x(θ)
dq
x(θ)
¾
½
[α − Ft0 (qt (θ))]
[α − 1I(qt (θ) − Yt )]
0
+ δ(qt (θ) − Yt )
+ op (1),
− ∆θθ qt (θ) ft (qt (θ))
x(θ)
x(θ)
in the neighborhood of θ0 . Combining (28) with (29) and (30), we then get that, for any t,
1 6 t 6 T, T > 1,
(31)
∆θθ ln ft∗ (Yt , θ)
¾
½
1
0
0
= ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ)
2
½
¾2
[α − Ft0 (qt (θ))]
[α − 1I(qt (θ) − Yt )]
0
0
+ δ(qt (θ) − Yt )
+ ∇θ qt (θ)∇θ qt (θ) ft (qt (θ))
x(θ)
x(θ)
½ 0
dft (qt (θ)) [α − 1I(qt (θ) − Yt )]
− ∇θ qt (θ)∇θ qt (θ)0
dq
x(θ)
¾
ft0 (qt (θ))δ(qt (θ) − Yt ) dδ(qt (θ) − Yt ) [α − Ft0 (qt (θ))]
+
−2
x(θ)
dq
x(θ)
½
¾
[α − 1I(qt (θ) − Yt )]
[α − Ft0 (qt (θ))]
0
+ δ(qt (θ) − Yt )
− ∆θθ qt (θ) ft (qt (θ))
+ op (1),
x(θ)
x(θ)
in the neigborhood of θ0 . Using α = Ft0 (qt (θ0 )) and x(θ0 ) = −α(1 − α) we have
|∆θθ ln ft∗ (Yt , θ0 )|
µ
¶
M02
M1
5
0
+
6 |∆θθ λ(θ0 )| + |∇θ qt (θ0 )∇θ qt (θ0 ) |
2
[α(1 − α)]2 α(1 − α)
M0
+|∆θθ qt (θ0 )|
+ op (1),
α(1 − α)
with |∆θθ λ(θ0 )| < ∞. From (A7)(i) we have E[|∇θ qt (θ0 )∇θ qt (θ0 )0 |] < ∞ and E[|∆θθ qt (θ0 )|]
< ∞, which shows that the expectation of the right hand side of the above inequality is
finite; hence ∇θ E[∇0θ ln ft∗ (Yt , θ0 )] = E[∆θθ ln ft∗ (Yt , θ0 )] for any t, 1 6 t 6 T, T > 1 and so
P
∇θ E[∇0θ LT (θ0 )] = T −1 Tt=1 E[∆θθ ln ft∗ (Yt , θ0 )] as desired.
EFFICIENT QUANTILE ESTIMATION
41
Now consider E[∆θθ ln ft∗ (Yt , θ0 )]; for any t, 1 6 t 6 T, T > 1, we have
µ
½
¾¶
1
0
0
E ∆θθ λ(θ0 ) [Ft (Yt ) − Ft (qt (θ0 ))][1I(qt (θ0 ) − Yt ) − α] − x(θ0 )
2
¸
∙
¡ 0
¢ 1
= ∆θθ λ(θ0 ) E [Ft (Yt ) − α][1I(qt (θ0 ) − Yt ) − α] + α(1 − α)
2
¸
∙
1
1
= ∆θθ λ(θ0 ) − α(1 − α) + α(1 − α)
2
2
= 0,
since
¡
¢
Et [Ft0 (Yt ) − α][1I(qt (θ0 ) − Yt ) − α]
Z qt (θ0 )
Z
0
0
= (1 − α)
[Ft (y) − α]ft (y)dy − α
−∞
+∞
qt (θ0 )
[Ft0 (y) − α]ft0 (y)dy
h1
iqt (θ0 )
i+∞
h1
− α [Ft0 (y) − α]2
= (1 − α) [Ft0 (y) − α]2
2
2
−∞
qt (θ0 )
1
= − α(1 − α).
2
In addition, α = Ft0 (qt (θ0 )) and x(θ0 ) = −α(1 − α) so
Ã
¾2 !
½
[α − Ft0 (qt (θ0 ))]
[α − 1I(qt (θ0 ) − Yt )]
0
0
E ∇θ qt (θ0 )∇θ qt (θ0 ) ft (qt (θ0 ))
+ δ(qt (θ0 ) − Yt )
x(θ0 )
x(θ0 )
µ
½ 0
¾¶
[ft (qt (θ0 ))]2 [α − 1I(qt (θ0 ) − Yt )]2
= E ∇θ qt (θ0 )∇θ qt (θ0 )0 Et
α2 (1 − α)2
¶
µ
2
0
0 [ft (qt (θ 0 ))]
,
= E ∇θ qt (θ0 )∇θ qt (θ0 )
α(1 − α)
where the last equality uses Et ([1I(qt (θ0 ) − Yt ) − α]2 ) = α(1 − α), a.s. − P . Similarly,
µ
½ 0
dft (qt (θ0 )) [α − 1I(qt (θ0 ) − Yt )]
0
E ∇θ qt (θ0 )∇θ qt (θ0 )
dq
x(θ0 )
¾¶
0
ft (qt (θ0 ))δ(qt (θ0 ) − Yt ) dδ(qt (θ0 ) − Yt ) [α − Ft0 (qt (θ0 ))]
+
−2
x(θ0 )
dq
x(θ0 )
½ 0
¾¶
µ
dft (qt (θ0 )) [1I(qt (θ0 ) − Yt ) − α]
ft0 (qt (θ0 ))δ(qt (θ0 ) − Yt )
0
+2
= E ∇θ qt (θ0 )∇θ qt (θ0 ) Et
dq
α(1 − α)
α(1 − α)
¶
µ
0
2
[f (qt (θ0 ))]
,
= 2E ∇θ qt (θ0 )∇θ qt (θ0 )0 t
α(1 − α)
42
KOMUNJER AND VUONG
where the last equality uses Et (1I(qt (θ0 ) − Yt ) − α) = 0, a.s. − P and Et (δ(qt (θ0 ) − Yt )) =
ft0 (qt (θ0 )), a.s. − P . Finally, using the same reasoning gives
µ
½
¾¶
[α − 1I(qt (θ0 ) − Yt )]
[α − Ft0 (qt (θ0 ))]
0
E ∆θθ qt (θ0 ) ft (qt (θ0 ))
+ δ(qt (θ0 ) − Yt )
= 0.
x(θ0 )
x(θ0 )
Combining the above results then yields, by (31),
¶
µ
0
2
∗
0 [ft (qt (θ 0 ))]
(32)
E[∆θθ ln ft (Yt , θ0 )] = −E ∇θ qt (θ0 )∇θ qt (θ0 )
,
α(1 − α)
for all t, 1 6 t 6 T, T > 1. Hence, for any χ ∈ Rk ,
0
χ
∇θ E[∇0θ LT (θ0 )]χ
= −T
−1
T
X
t=1
µ
¶
0
2
0 2 [ft (qt (θ 0 ))]
E |∇θ qt (θ0 ) χ|
6 0,
α(1 − α)
with equality if and only if χ = 0. Hence ∇θ E[∇0θ LT (θ0 )] is negative definite (therefore
nonsingular).
We now check condition (iv) of Theorem 7.2 by using a CLT for α-mixing sequences (e.g.
Theorem 5.20 in White, 2001, p.130). By (A6), for any θ ∈ Θ̊, the sequence {∇θ ln ft∗ (Yt , θ)}
is strong mixing (i.e. α-mixing) with α of size −r/(r − 2), r > 2 (see, e.g., Theorem
3.49 in White, 2001, p.50). Moreover, using (23) and (A1), E[∇θ ln ft∗ (Yt , θ0 )] = 0 and
using (A7)(i), E[|∇θ ln ft∗ (Yt , θ0 )|r ] 6 {M0 /[α(1 − α)]}r E[supθ∈Θ |∇θ qt (θ)|r ] < ∞, for all t,
1 6 t 6 T, T > 1. Now,
´
³
XT
∇θ ln ft∗ (Yt , θ0 )
Var T −1
t=1
´
³
XT
∇θ ln ft∗ (Yt , θ0 )∇θ ln ft∗ (Yt , θ0 )0
= E T −1
t=1
µ
¶
XT [f 0 (qt (θ0 ))]2 [1I(qt (θ0 ) − Yt ) − α]2
t
−1
0
=E T
∇θ qt (θ0 )∇θ qt (θ0 )
t=1
[α(1 − α)]2
= VT0
where the first equality uses Et (∇θ ln ft∗ (Yt , θ0 )) = 0, a.s. − P , implied by (A1), and the
last equality uses Et ([1I(qt (θ0 ) − Yt ) − α]2 ) = α(1 − α), a.s. − P . Applying Theorem 5.20 in
√
d
White (2001) we then have (VT0 )−1/2 T ∇θ LT (θ0 ) → N (0, Id) with VT0 as defined in Theorem
3.
Finally, we check the stochastic equicontinuity condition (v) of Theorem 7.2 by veryfing
that all the assumptions in Theorem 7.3 in Newey and McFadden (1994) hold. (The main
reason for using Theorem 7.3 is that it does not put any restrictions on the dependence
EFFICIENT QUANTILE ESTIMATION
43
structure of {(Yt , Wt0 )0 }.) For any t, 1 6 t 6 T, T > 1, let
rt (θ) = |∇θ ln ft∗ (Yt , θ) − ∇θ ln ft∗ (Yt , θ0 ) − ∆θθ ln ft∗ (Yt , θ)0 (θ − θ0 )|/|θ − θ0 |,
for θ ∈ Θ̊. Using u(z) = −1/z −1/2+o(1) in the neigborhood of 0 and λ(θ)x(θ) = op (|θ−θ0 |)
(1)
(2)
(3)
in the neigborhood of θ0 , we have, from (23), (26) and (28), rt (θ) 6 rt (θ)+rt (θ)+rt (θ)+
op (1), where
¯
¯
¯ 0
x(θ) ¯¯ |∇θ λ(θ) − ∆θθ λ(θ)0 (θ − θ0 )|
0
¯
≡ ¯[Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] −
,
2 ¯
|θ − θ0 |
¯ 0
∙ 0
¸¯
¯ ft (qt (θ))
Ft (qt (θ))
α ¯¯ |λ(θ)∇θ qt (θ)|
(2)
0
¯
[1I(qt (θ) − Yt ) − α] + δ(qt (θ) − Yt )
− Ft (Yt ) +
,
rt (θ) ≡ ¯
2
2
2 ¯ |θ − θ0 |
¯
¯
¯ ∇θ x(θ) ∇θ x(θ0 ) ∆θθ x(θ)0 (θ − θ0 ) ∇θ x(θ)∇θ x(θ)0 (θ − θ0 ) ¯
(3)
¯ /|θ − θ0 |.
−
−
+
rt (θ) ≡ ¯¯
¯
x(θ)
x(θ0 )
x(θ)
[x(θ)]2
(1)
rt (θ)
(1)
With probability one, rt (θ) 6 2|∇θ λ(θ) − ∆θθ λ(θ)0 (θ − θ0 )|/|θ − θ0 | for any θ ∈ Θ̊. Given
(1)
that λ(·) is twice continously differentiable on Rk , with probability one rt (θ) → 0 as θ → θ0
and there exists ε1 > 0 such that
´
³
(1)
E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) < ∞.
(33)
Now, note that |ft0 (qt (θ))[1I(qt (θ) − Yt ) − α]| 6 M0 for any θ ∈ Θ̊, so
(2)
o |λ(θ)∇ q (θ)|
1n
θ t
M0 + δ(qt (θ) − Yt )[Ft0 (qt (θ)) − 2Ft0 (Yt ) + α]
2
|θ − θ0 |
o
1n
M0 + δ(qt (θ) − Yt )[Ft0 (qt (θ)) − 2Ft0 (Yt ) + α] |∇θ λ(θc )| · |∇θ qt (θ)|
6
2
rt (θ) 6
for some θc ≡ cθ0 +(1−c)θ with c ∈ (0, 1). Hence, using the fact that ∇θ λ(·) is continuous on
Rk , that ∇θ λ(θ0 ) = 0 and that δ(qt (θ0 ) − Yt )[Ft0 (qt (θ0 )) − 2Ft0 (Yt ) + α] = 0, with probability
44
KOMUNJER AND VUONG
(2)
one rt (θ) → 0 as θ → θ0 . Moreover, for some θd ≡ dθ0 + (1 − d)θ, d ∈ (0, 1),
´
³
(2)
E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ)
Ã
¯¶ o
¯ 0
µ
nM
¯
¯ Ft (qt (θ))
α
0
0
+ Et δ(qt (θ) − Yt ) ¯¯
− Ft (Yt ) + ¯¯
6E
sup
2
2
2
θ∈Θ̊:|θ−θ0 |<ε1
!
× |∇θ λ(θc )| · |∇θ qt (θ)|
Ã
!
M0
·E
sup
6
|∇θ λ(θc )| · |∇θ qt (θ)|
2
θ∈Θ̊:|θ−θ0 |<ε1
Ã
!
1
|∇θ λ(θc )| · |∇θ qt (θ)| · |α − Ft0 (qt (θ))|ft0 (qt (θ))
sup
+ E
2
θ∈Θ̊:|θ−θ0 |<ε1
Ã
!
M0
·
sup
|∇θ λ(θc )|
6
2
θ∈Θ̊:|θ−θ0 |<ε1
!
Ã
" Ã
× E
sup
θ∈Θ̊:|θ−θ0 |<ε1
|∇θ qt (θ)|
+ M0 E
sup
θ∈Θ̊:|θ−θ0 |<ε1
!#
|∇θ qt (θ)| · |∇θ qt (θd )|
< ∞,
(34)
where the last inequality uses the continuity of ∇θ λ(·) on Rk , (A7)(i) and the Cauchy-
Schwarz inequality. Finally, let rx (θ) = [x(θ0 ) − x(θ) − ∇θ x(θ)0 (θ0 − θ)] /|θ0 −θ| and Rx (θ) =
[∇θ x(θ0 )−∇θ x(θ)−∆θθ x(θ)0 (θ0 −θ)]/|θ0 −θ| and note that with probability one supθ∈Θ̊:|θ−θ0 |<ε1
|rx (θ)| → 0 and supθ∈Θ̊:|θ−θ0 |<ε1 |Rx (θ)| → 0 as θ → θ0 . This implies that with probability
(3)
one rt (θ) → 0 as θ → θ0 . Moreover
³
´
(3)
E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ)
´
³
6 E supθ∈Θ̊:|θ−θ0 |<ε1 [|rx (θ)| + |Rx (θ)|]/|x(θ)|
³
´´
³
6 E supθ∈Θ̊:|θ−θ0 |<ε1 [1/|x(θ)|] supθ∈Θ̊:|θ−θ0 |<ε1 |rx (θ)| + supθ∈Θ̊:|θ−θ0 |<ε1 |Rx (θ)|
(35)
< ∞,
where the last inequality uses the fact that supt>1 supθ∈Θ Ft0 (qt (θ)) ∈ (a, b) with a > 0 and
³
´
−1
b < 1, so C3 ≡ supt>1 supy∈R supθ∈Θ |[1I(qt (θ) − Yt ) − α][1 − 1I(qt (θ) − y) − Ft0 (qt (θ))]|
<
∞. Combining results (33) − (35) then gives that with probability one rt (θ) → 0 as θ → θ0
´
³
and that E supθ∈Θ̊:|θ−θ0 |<ε1 rt (θ) < ∞. It remains to be shown that for all θ in a neighP
p
borhood of θ0 we have T −1 Tt=1 ∆θθ ln ft∗ (Yt , θ) → ∇θ E[∇0θ LT (θ)]. By (A6), for any θ ∈ Θ̊,
EFFICIENT QUANTILE ESTIMATION
45
the sequence {∆θθ ln ft∗ (Yt , θ)} is strong mixing (i.e. α-mixing) with α of size −r/(r − 2),
r > 2 (see, e.g. Theorem 3.49 in White, 2001, p.50). Now note that given θ ∈ Θ̊, there exists
θa = aθ0 + (1 − a)θ, a ∈ (0, 1), such that for any η > 0
¯
¯
¯
¡
¢
¡¯
¢
P δ(qt (θ) − Yt ) ¯α − Ft0 (qt (θ))¯ > η 6 E ¯α − Ft0 (qt (θ))¯ ft0 (qt (θ)) /η
¡
¢
6 |θ − θ0 |E |∇θ qt (θa )|ft0 (qt (θa ))ft0 (qt (θ)) /η
6 |θ − θ0 |M02 E[supθ∈Θ |∇θ qt (θ)|]/η,
(36)
so that in a neighborhood of θ0 , δ(qt (θ) − Yt ) |α − Ft0 (qt (θ))| = op (1). Similarly,
¯
¯
¯
¡
¢
¡¯
¢
P [dδ(qt (θ) − Yt )/dq] ¯α − Ft0 (qt (θ))¯ > η 6 E ¯α − Ft0 (qt (θ))¯ dft0 (qt (θ))/dq /η
6 |θ − θ0 |M0 M1 E[supθ∈Θ |∇θ qt (θ)|]/η
(37)
where the first inequality uses the fact that Et (dδ(qt (θ) − Yt )/dq) = dft0 (qt (θ))/dq, a.s. − P .
From (31) we have that for any t, 1 6 t 6 T, T > 1,
∆θθ ln ft∗ (Yt , θ)
¾
½
1
0
0
= ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ)
2
0 n¡
¢2 ¡
¢2
∇θ qt (θ)∇θ qt (θ)
0
0
+
f
(q
(θ))[α
−
1
I(q
(θ)
−
Y
)]
+
δ(q
(θ)
−
Y
)[α
−
F
(q
(θ))]
t
t
t
t
t
t
t
t
[x(θ)]2
ª
−x(θ)[dft0 (qt (θ))/dq][α − 1I(qt (θ) − Yt )] + x(θ)[dδ(qt (θ) − Yt )/dq][α − Ft0 (qt (θ))]
−
ª
∆θθ qt (θ) © 0
ft (qt (θ))[α − 1I(qt (θ) − Yt )] + δ(qt (θ) − Yt )[α − Ft0 (qt (θ))] + op (1),
x(θ)
in a neighborhood of θ0 , which combined with (36) and (37) gives
∆θθ ln ft∗ (Yt , θ)
½
¾
1
0
0
= ∆θθ λ(θ) [Ft (Yt ) − Ft (qt (θ))][1I(qt (θ) − Yt ) − α] − x(θ)
2
o
0 n¡
¢2
∇θ qt (θ)∇θ qt (θ)
0
0
f
−
x(θ)[df
(q
(θ))[α
−
1
I(q
(θ)
−
Y
)]
(q
(θ))/dq][α
−
1
I(q
(θ)
−
Y
)]
t
t
t
t
t
t
t
t
[x(θ)]2
ª
∆θθ qt (θ) © 0
−
ft (qt (θ))[α − 1I(qt (θ) − Yt )] + op (1),
x(θ)
46
KOMUNJER AND VUONG
so that for a given ε > 0, there is a positive constant nr,ε such that
|∆θθ ln ft∗ (Yt , θ)|r+ε
³
ªr+ε
2(r+ε) ©
6 nr,ε |∆θθ λ(θ)|r+ε (5/2)r+ε + |∇θ qt (θ)∇θ qt (θ)0 |r+ε C3
M02 + M1
¢
+|∆θθ qt (θ)|r+ε C3r+ε M0r+ε + op (1),
in a neigborhood of θ0 , and so using (A7)(i) and the fact that |∆θθ λ(θ)| < ∞ in a neighbor-
hood of θ0 , we have E[|∆θθ ln ft∗ (Yt , θ)|r+ε ] < ∞. The weak LLN then follows from Corollary
∗
3.48 in White (2001). This completes the proof of asymptotic normality of the MLE θ̃T . ¤
Proof of Lemma 7. We proceed in two steps.
STEP1: To prove (i) and (iii), we start by showing that for any θ ∈ Θ\{θ0 }, the function
ft∗ (·, θ)
in (6) is a probability density, for all t, 1 6 t 6 T, T > 1. First, note that for
any θ ∈ Θ\{θ0 }, ft∗ (·, θ) is continuous and ft∗ (·, θ) > 0 on R. Thus it suffices to show
R
that R ft∗ (y, θ)dy = 1. Consider the change of variable u ≡ λ(θ)Ft0 (y), where λ(θ)Ft0 (·)
is strictly increasing in y since λ(θ) = Λ(θ − θ0 ) > 0 and ft0 (·) is strictly positive (so
du = λ(θ)ft0 (y)dy). To simplify the notation, we let qt (θ) ≡ qα (Wt , θ). Noting that 1I(qt (θ) −
y) = 1I[λ(θ)Ft0 (qt (θ)) − u], we have
Z
R
ft∗ (y, θ)dy
=
0
λ(θ)FZ
t (qt (θ))
0
Z
α(1 − α) exp{(1 − α)[u − λ(θ)Ft0 (qt (θ))]}
du
1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))]
λ(θ)
α(1 − α) exp{−α[u − λ(θ)Ft0 (qt (θ))]}
du
1 − exp{−αλ(θ)[1 − Ft0 (qt (θ))]}
λ(θ)Ft0 (qt (θ))
iλ(θ)Ft0 (qt (θ))
α exp[−(1 − α)λ(θ)Ft0 (qt (θ))] h
=
exp[(1 − α)u
1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))]
0
h
iλ(θ)
0
(1 − α) exp[αλ(θ)Ft (qt (θ))]
−
exp(−αu)
+
1 − exp{−αλ(θ)[1 − Ft0 (qt (θ))]}
λ(θ)Ft0 (qt (θ))
= α + (1 − α) = 1,
+
which shows that ft∗ (·, θ) is a probability density for any θ ∈ Θ\{θ0 }.
We now show that this is also true for θ0 and that ft∗ (·, θ0 ) = ft0 (·). For this, let
(38)
Pt (θ) ≡ α(1 − α)λ(θ) exp{λ(θ)[Ft0 (y) − Ft0 (qt (θ))][1I(qt (θ) − y) − α]},
(39)
Qt (θ) ≡ 1 − exp{λ(θ)[1 − Ft0 (qt (θ)) − 1I(qt (θ) − y)][1I(qt (θ) − y) − α]},
so that ft∗ (y, θ) = ft0 (y)Pt (θ)/Qt (θ). By (A0)(ii), the functions Pt and Qt are at least twice
continuously differentiable on Θ a.s. − P ; thus for every (θ, θ0 ) ∈ Θ2 we can write their
EFFICIENT QUANTILE ESTIMATION
47
respective Taylor developments of order two as
Pt (θ) =
(40)
X Dl Pt (θ0 )
|l|62
(41)
Qt (θ) =
l!
(θ − θ0 )l + o(|θ − θ0 |2 ),
X Dl Qt (θ0 )
|l|62
l!
(θ − θ0 )l + o(|θ − θ0 |2 ).
Straightforward though lengthy computations show that, for any function λ(θ) = Λ(θ − θ0 )
such that ∇θ Λ(0) = 0 and ∆θθ Λ(0) nonsingular, we have
Pt (θ0 ) = 0, D1 Pt (θ0 ) = 0, D2 Pt (θ0 ) = α(1 − α)D2 λ(θ0 ),
(42)
and
Qt (θ0 ) = 0, D1 Qt (θ0 ) = 0, D2 Qt (θ0 ) = α(1 − α)D2 λ(θ0 ),
(43)
Hence
(44)
Pt (θ) =
1
α(1
2
− α)D2 λ(θ0 )(θ − θ0 )2 + o(|θ − θ0 |2 ),
(45)
Qt (θ) =
1
α(1
2
− α)D2 λ(θ0 )(θ − θ0 )2 + o(|θ − θ0 |2 ).
Given the nonsingularity of ∆θθ Λ(0), an immediate consequence of l’Hôpital’s rule and (44)−
(45) is that limθ→θ0 Pt (θ)/Qt (θ) = 1. Hence by a.s. − P continuity of ft∗ (y, ·) on Θ, we have,
for any y ∈ R, ft∗ (y, θ0 ) = limθ→θ0 ft∗ (y, θ) = ft0 (y). This shows that ft∗ (·, θ) is a probability
density for any θ ∈ Θ, and that ft∗ (·, θ0 ) = ft0 (·), so that f 0 ∈ P ∗ , as desired.
STEP 2: It remains to be shown that this parametric model P ∗ satisfies the conditional
moment restriction in (ii) for all θ ∈ Θ. This restriction is clearly satisfied when θ = θ0 as
ft∗ (·, θ0 ) = ft0 (·) and [θ0 , ft0 (·)] satisfies (A1) by assumption. When θ 6= θ0 , using again the
change of variable u ≡ λ(θ)Ft0 (y), we have
Z qt (θ)
Eθ [1I(qt (θ) − Yt )|Wt ] =
ft∗ (y, θ)dy
=
Z
0
λ(θ)Ft0 (qt (θ))
−∞
α(1 − α) exp{(1 − α)[u − λ(θ)Ft0 (qt (θ))]}
du = α.
1 − exp[−(1 − α)λ(θ)Ft0 (qt (θ))]
¤
Proof of Theorem 5. From Theorem 3 we know that θ∗T which minimizes Ψ∗T (θ) is consistent
for θ0 . Thus, in order to establish the consistency of θ̂T , it suffices to show that Ψ̂T (θ)−Ψ∗T (θ)
converges uniformly (in θ) to zero, i.e. supθ∈Θ |Ψ̂T (θ) − Ψ∗T (θ)| = op (1). For this we need a
48
KOMUNJER AND VUONG
uniform consistency property of Dλ Ĝ(·, ·), where Dλ denotes the λth derivative with respect
to y.
Lemma 8. Suppose that (A6), (A8)-(A9), (A11) hold. Then, sup(y,w)∈Rm+1 |Dλ Ĝ(y, w) −
√
0
R
R
0
λ 0
0
(y, w)| = Op [1/( T hλyT hm
HλT
wT )] + Op (hwT ) + Op (hyT ), where HλT (y, w) ≡ D F (y|w)ḡT (w)
and λ = 0, 1, 2.
P
We will also need the uniform consistency of ĝ(·) for ḡT0 (·) ≡ (1/T ) Tt=1 gt0 (·):
√
R
(46)
sup |ĝ(w) − ḡT0 (w)| = Op [1/( T hm
wT )] + Op (hwT ),
w∈Rm
which follows from Theorem 1(a) in Andrews (1995) with η = ∞ given (A6), (A8) and
(A11)(i)-(ii). We let qt (θ) = qα (Wt , θ) as previously, and bT ≡ bT +
T,
dt ≡ 1I[ḡT0 (Wt ) − bT ]
and ΨT (θ) be equal to Ψ̂T (θ) where F̂t (·) ≡ dt F̂ (·|Wt ) is replaced by dt F̂ (·|Wt ), i.e.
(47)
ΨT (θ) =
T
1X
d [α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F̂ (qt (θ)|Wt )],
T t=1 t
where { T } > 0 is an appropriate vanishing sequence. The remainder of the proof adapts
the consistency proof of Theorem 1 in Lavergne and Vuong (1996). Let
√ m
R
R
T = o(bT ), T T hwT → ∞, T /hwT → ∞ and T /hyT → ∞ . As
T
be such that
sup |Ψ̂T (θ) − Ψ∗T (θ)| 6 sup |Ψ̂T (θ) − ΨT (θ)| + sup |ΨT (θ) − Ψ∗T (θ)|,
θ∈Θ
θ∈Θ
θ∈Θ
where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms in the right-hand
side are op (1). Given Lemma 8 and Equation (46) we will use
(48)
a−1
T
sup
(y,w)∈Rm+1
0
|Ĝ(y, w) − H0T
(y, w)| = op (1),
0
a−1
T sup |ĝ(w) − ḡT (w)| = op (1),
(49)
w∈Rm
√
R
R
which hold for any sequence {aT } satisfying aT T hm
wT → ∞, aT /hwT → ∞ and aT /hyT →
∞. We will also use the identity
(50)
F̂ (y|w) − F 0 (y|w) =
1
F 0 (y|w)
0
[Ĝ(y, w) − H0T
[ĝ(w) − ḡT0 (w)].
(y, w)] −
ĝ(w)
ĝ(w)
STEP 1: We first show that supθ∈Θ |Ψ̂T (θ) − ΨT (θ)| = op (1). We have
Ψ̂T (θ) − ΨT (θ) =
T
1X
(Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F̂ (qt (θ)|Wt )]
T t=1
= ∆Ψ̂1T − ∆Ψ̂2T + ∆Ψ̂3T ,
EFFICIENT QUANTILE ESTIMATION
49
where Jt = dt (1 − dt ), Ht = (1 − dt )dt and
T
1X
=
(Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F 0 (Yt |Wt )],
T t=1
∆Ψ̂1T
T
1X
=
(Jt − Ht )[α − 1I(qt (θ) − Yt )][F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt )],
T t=1
∆Ψ̂2T
T
1X
(Jt − Ht )[α − 1I(qt (θ) − Yt )][F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt )].
=
T t=1
∆Ψ̂3T
As Ht 6 1I[|ĝ(Wt ) − ḡT0 (Wt )| −
T]
and the event {supw |ĝ(w) − ḡT0 (w)| >
probability 0 because Property (49) holds with aT =
T
T}
has asymptotic
by construction of
T,
we have
sup16t6T,T >1 Ht = 0 with probability approaching one. Hence, we need to consider the Jt
terms only. Namely, it suffices to show that supθ∈Θ ∆Ψ̂JjT = op (1) for j = 1, 2, 3. Using
Identity (50) and the definition of Jt , we obtain
|∆Ψ̂J1T |
6
Because (1/T )
b−1
T
PT
"
t=1
sup
(y,w)∈Rm+1
|Ĝ(y, w) −
0
H0T
(y, w)|
+ sup |ĝ(w) −
w∈Rm
#
gT0 (w)|
T
1X
Jt
T t=1
Jt 6 1, we get supθ∈Θ ∆Ψ̂J1T = op (1) in view of Properties (48) − (49)
with aT = bT under our assumptions on bT . Similarly, supθ∈Θ |∆Ψ̂J2T | = op (1). Regarding
P
P
P
∆Ψ̂J3T , we have |∆Ψ̂J3T | 6 (1/T ) Tt=1 Jt . But (1/T ) Tt=1 Jt 6 (1/T ) Tt=1 (1 − dt ) with
#
Z
T
T Z
1X
1X
ḡT0 (w)dw = o(1),
gt (w)dw =
(1 − dt ) =
E
0
T t=1
T t=1 {w:ḡT0 (w)<bT }
{w:ḡT (w)<bT }
"
where the last equality follows by taking cT = bT in (A10)(i). Hence, (1/T )
op (1) by Markov inequality. Thus,
(51)
and supθ∈Θ ∆Ψ̂J3T = op (1).
T
1X
Jt = op (1),
T t=1
PT
t=1 (1
− dt ) =
50
KOMUNJER AND VUONG
STEP 2: We next show that supθ∈Θ |ΨT (θ) − Ψ∗T (θ)| = op (1). We have
ΨT (θ) − Ψ∗T (θ) =
T
1X
d [α − 1I(qt (θ) − Yt )][F̂ (Yt |Wt ) − F 0 (Yt |Wt )]
T t=1 t
T
1X
−
d [α − 1I(qt (θ) − Yt )][F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt )]
T t=1 T
−
T
1X
(1 − dT )[α − 1I(qt (θ) − Yt )][F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt )]
T t=1
≡ ∆Ψ1T − ∆Ψ2T − ∆Ψ3T
Thus, it suffices to show that supθ∈Θ ∆ΨjT = op (1) for j = 1, 2, 3. Because T = o(bT ),
√
R
R
bT ≡ bT + T is a sequence satisfying bT T hm
wT → ∞, bT /hwT → ∞ and bT /hyT → ∞
so that Properties (48) − (49) hold with aT = bT . In particular, Property (49) implies
´
³
P inf {w:ḡT0 (w)>bT } ĝ(w) > bT (1 − η) → 1 as T → ∞ for any η ∈ (0, 1). Thus, using Identity
(50), we have
|∆Ψ1T | 6 (bT )−1 (1−η)−1
(
sup
(y,w)∈Rm+1
0
|Ĝ(y, w) − H0T
(y, w)| + sup |ĝ(w) − ḡT0 (w)|
with probability approaching 1, where (1/T )
w∈Rm
PT
t=1
)
T
1X
d,
T t=1 t
dt 6 1. Hence, supθ∈Θ ∆Ψ1T = op (1)
using Properties (48) − (49) with aT = bT . Similarly, supθ∈Θ ∆Ψ2T = op (1). Regarding
P
∆Ψ3T , we have supθ∈Θ |∆Ψ3T | 6 (1/T ) Tt=1 (1 − dT ) = op (1) from Step 1.
¤
Proof of Lemma 8. The proof adapts that of Lemma A-1 in Andrews (1995) to incorporate
the supremum over y-values, which leads to the additional term Op (hR
yT ). It is done in three
R
steps. Recall that L(·) was defined as L(y) ≡ 1I(y −u)K0 (u)du. Let It (y) be L[(y −Yt )/hyT ]
if λ = 0, K0 [(y−Yt )/hyT ] if λ = 1, and K00 [(y−Yt )/hyT ] if λ = 2. Thus, omitting the subscript
T , we have
sup
(y,w)∈Rm+1
|Dλ Ĝ(y, w) − Hλ0 (y, w)|
¯
¯
¶
µ
T
T
¯ 1 X
¯
X
w − Wt
1
¯
¯
λ 0
0
− D F (y|w)
gt (w)¯
It (y)K
=
sup ¯ λ m
¯
hw
T t=1
(y,w)∈Rm+1 ¯ T hy hw t=1
EFFICIENT QUANTILE ESTIMATION
51
for λ = 0, 1, 2. The desired result then follows from: (i)
¯
¶
µ
T
¯ 1 X
w − Wt
¯
sup ¯ λ m
It (y)K
hw
(y,w)∈Rm+1 ¯ T hy hw t=1
Ã
!
∙
µ
¶¸¯
T
w − Wt ¯¯
1 X
1
(52)
−
E It (y)K
¯ = Op √ λ m ,
¯
T hλy hm
hw
T hy hw
w t=1
which is proved in Step 1 by adapting Andrews’ (1995) proof of Lemma A-2, (ii)
¯
∙
µ
¶¸
T
¯ 1 X
w − Wt
¯
E It (y)K
sup ¯ λ m
hw
(y,w)∈Rm+1 ¯ T hy hw t=1
¶¸¯¯
∙
µ
T
X
¡ R¢
1
w − Wt ¯
λ 0
=
O
−
E
D
F
(y|W
)K
(53)
hy ,
¯
t
p
¯
T hm
hw
w t=1
which is proved in Step 2, and (iii)
¯
¶¸
∙
µ
T
¯ 1 X
w − Wt
¯
λ 0
E D F (y|Wt )K
sup ¯ m
hw
(y,w)∈Rm+1 ¯ T hw t=1
¯
T
¯
X
¡ ¢
1
¯
λ 0
0
− D F (y|w)
gt (w)¯ = Op hR
(54)
w ,
¯
T t=1
which is proved in Step 3.
STEP1: When λ = 0, note that |It (y)| 6
R
|K0 (u)|du < ∞ by (A11)(iii). When λ = 1,
|It (y)| 6 supy∈R |K0 (y)| < ∞ by (A11)(iii). When λ = 2, |It (y)| 6 supy∈R |K00 (y)| < ∞ by
(A11)(iv). Hence, It (y) is bounded by some C0 < ∞. Moreover, (A6) and Theorem 3.49
in White (2001) guarantee that for every y, the sequence {(It (y), Wt0 )0 } is strong mixing
with α of size −r/(r − 2), r > 2. Hence, for any (t, s), 1 6 t, s 6 T , T > 1, we have
α(|t − s|) = O(|t − s|−r/(r−2)− ) for some > 0 (see Definition 3.45 in White, 2001), and
P
C1 ≡ ∞
s=0 α(s) < ∞. Thus, by Billingsley (1995, Lemma 2, p.365), we have
¯
³
´¯
¯
¯
¯Cov It (y) cos(v0 Wt ), Iu (y) cos(v0 Wu ) ¯ 6 4C02 α(|t − u|),
for any v ∈ Rm and any y, t, u ∈ R. Hence, instead of (A.15) in Andrews (1995), we have
!
Ã
T
T −1
X
8C02 C1
1X
0
21
.
It (y) cos(v Wt ) 6 8C0
α(s) 6
Var
T t=1
T t=0
T
As this also holds for sin(·) replacing cos(·), Lyapunov inequality implies
¯
¯
r
T n
¯1 X
o¯
8C1
¯
¯
0
0
It (y) exp(iv Wt ) − E[It (y) exp(iv Wt )] ¯ 6 2C0
,
E¯
¯
¯ T t=1
T
52
KOMUNJER AND VUONG
for any v ∈ Rm and any y ∈ R. Let LT denote the left-hand side of (52). Using (A.11) in
Andrews (1995) with λ = 0 and the above inequality, we obtain
¯
¯
!
T n
¯ 1 X
o¯
¯
¯
It (y) exp(iv 0 Wt ) − E[It (y) exp(iv 0 Wt )] ¯ |φ(hw v)|dv
E(LT ) 6 E
sup ¯ λ
¯
¯
T
h
y∈R
y t=1
r
Z
8C1 1
C2
√
v)|dv
=
,
6 2C0
|φ(h
w
T hλy
T hλy hm
w
ÃZ
√
R
where C2 = 2C0 8C1 |φ(u)|du < ∞ by (A11)(ii) using the change of variable u = hw v.
By Markov inequality Equation (52) follows.
STEP 2: Consider first λ = 0. Using Fubini’s Theorem, we note that
µ
¶
y−Y
E[It (y)|Wt = w] = L
dF 0 (Y |w)
hy
¶
Z Z µ
y−Y
− u K0 (u)dudF 0 (Y |w)
=
1I
hy
Z
= F 0 (y − hy u|w)K0 (u)du,
Z
(55)
which does not depend on t because of (A9)(i). When λ = 1, using the change of variable
Y = y − hy u, we note that
(56)
∙
¸ Z
µ
¶
Z
1
It (y)
y−Y
0
f (Y |w)dY = f 0 (y − hy u|w)K0 (u)du.
E
|Wt = w =
K0
hy
hy
hy
When λ = 2, using the change of variable Y = y − hy u and integration by parts, we have
∙
¸ Z
µ
¶
Z
1 0 y−Y
1
It (y)
0
f 0 (y − hy u|w)K00 (u)du
f (Y |w)dY =
|Wt = w =
K
E
h2y
h2y 0
hy
hy
Z
= Df 0 (y − hy u|w)K0 (u)du.
(57)
EFFICIENT QUANTILE ESTIMATION
53
Now, let LT (y, w) denote the term inside the absolute value on the left-hand side of Equation
(53). Combining Equations (55)-(57) with (A11)(iii), we have
LT (y, w)
¸ µ
¶¾
½∙
T
w − Wt
It (y)
1 X
λ 0
− D F (y|Wt ) K
E
=
T hm
hλy
hw
w t=1
¶¾
¸ µ
½∙Z
T
1 X
w − Wt
λ 0
λ 0
=
E
[D F (y − hy u|Wt ) − D F (y|Wt )]K0 (u)du K
T hm
hw
w t=1
¶
¸ µ
Z ∙Z
T
w−W
1 X 0
λ 0
λ 0
=
[D F (y − hy u|W ) − D F (y|W )]K0 (u)du K
gt (W )dW.
hw
T hm
w t=1
Hence, taking an Rth-order Taylor expansion of Dλ F 0 (y−hyT u|W ) at y, and using (A11)(iii)
we obtain
sup
(y,w)∈Rm+1
|LT (y, w)| 6
hR
y
sup
|D
λ+R
(y,w)∈Rm+1
× sup sup ḡT0 (w),
0
F (y|w)|
Z
R
|u K0 (u)|du
Z
|K(W̃ )|dW̃
T >1 w∈Rm
which establishes Equation (53) because of (A8), (A9)(ii), and (A11)(i,iii).
STEP 3: The study of the bias (54) is standard as in the proof of Lemma A-3 in Andrews
(1995). Using (A9)(i) we have
µ
∙
µ
¶¸
¶
T
T Z
w−W
w − Wt
1 X
1 X
λ 0
λ 0
E D F (y|Wt )K
D F (y|W )K
=
gt0 (W )dW
m
m
T hw t=1
hw
T hw t=1
hw
Z
=
Hλ0 (y, w − hw W̃ )K(W̃ )dW̃ ,
where W̃ = (w − W )/hw . Hence, using a Taylor expansion of order R at w together with
(A11)(i) we obtain
∙
µ
¶¸
T
T
1 X
w − Wt
1X 0
λ 0
λ 0
E
D
F
(y|W
)K
g (w)
−
D
F
(y|w)
t
T hm
hw
T t=1 t
w t=1
Z h
i
0
0
=
Hλ (y, w − hw W̃ ) − Hλ (y, w) K(W̃ )dW̃
⎡
⎤
Z X
R
R 0
(−1) R ∂ Hλ (y|w − h̃w W̃ ) r1
= ⎣
W̃1 . . . W̃mrm ⎦ K(W̃ )dW̃ ,
h
R! w ∂W1r1 . . . ∂Wmrm
|r|=R
where 0 < h̃w < hw . This establishes Equation (54) using (A8), (A9)(ii) and (A11)(i).
¤
54
KOMUNJER AND VUONG
Proof of Theorem 6. From Equation (9), the first order conditions associated with θ̂T are
√
(58)
T ∇θ Ψ̂T (θ̂T ) = 0, a.s. − P,
h
i
P
where ∇θ Ψ̂T (θ) = (1/T ) Tt=1 {1I[qt (θ)−Yt ]−α}fˆt [qt (θ)]∇θ qt (θ) and fˆt (y) = dt ∂ Ĝ(y, Wt )/∂y
/ĝ(Wt ). Given (A11)(iv) fˆt (·) is continuously differentiable on R. Thus, a first-order Taylor
expansion of the condition (58) at θ0 gives
√
c √
T ∇θ Ψ̂T (θ0 ) + ∆θθ Ψ̂T (θ̄T ) T (θ̂T − θ0 ) = 0, a.s. − P,
(59)
c
where θ̄T ≡ cθ0 + (1 − c)θ̂T for some c ∈ (0, 1). To establish the theorem, we need two
lemmas:
Lemma 9. Suppose that (A0)-(A1), (A5)-(A7)(i), (A8)-(A10)(i) and (A11) hold. If bT → 0
√
c
∗
R
R
with bT T h2yT hm
wT → ∞, bT /hwT → ∞ and bT /hyT → ∞, then ∆θθ Ψ̂T (θ̄ T ) − ∆θθ ΨT (θ 0 ) =
op (1).
In particular, the conditions in Theorem 6 imply the conditions in Lemma 9. Thus
c
∆θθ Ψ̂T (θ̄T ) − ∆θθ Ψ∗T (θ0 ) = op (1).
Lemma 10. Suppose that all the conditions of Theorem 6 hold. Then,
∇θ Ψ∗T (θ0 )] = op (1).
√
T [∇θ Ψ̂T (θ0 ) −
The remainder of the proof is straightforward: Equation (59), Lemmas 9 and 10 together
´
³√
√
imply: T (θ̂T − θ0 ) = − [∆θθ Ψ∗T (θ0 ) + op (1)]−1
T ∇θ Ψ∗T (θ0 ) + op (1) , a.s. − P . Thus θ̂T
√
is T -asymptotically equivalent to θ∗T . The desired result follows.
¤
Proof of Lemma 9. Note that the assumptions of Theorem 5 are satisfied under those of
p
c
Lemma 9. Hence, θ̂T −→ θ0 . Moreover, because θ̄T = cθ0 + (1 − c)θ̂T for some c ∈ (0, 1),
c
p
we have θ̄T −→ θ0 . Thus, it suffices to prove that supθ∈Θ |∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ)| = op (1),
where
∆θθ Ψ̂T (θ)
T
n
o
1X
dt {1I[qt (θ) − Yt ] − α} D2 F̂ [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 + DF̂ [qt (θ)|Wt ]∆θθ qt (θ) ,
=
T t=1
∆θθ Ψ∗T (θ)
T
©
ª
1X
{1I[qt (θ) − Yt ] − α} D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 + DF 0 [qt (θ)|Wt ]∆θθ qt (θ) .
=
T t=1
EFFICIENT QUANTILE ESTIMATION
Let
T
be such that
T
= o(bT ),
T
√ 2 m
T hyT hwT → ∞,
R
T /hwT
→ ∞ and
55
R
T /hyT
→ ∞. As
sup |∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ)| 6 sup |∆θθ Ψ̂T (θ) − ∆θθ ΨT (θ)| + sup |∆θθ ΨT (θ) − ∆θθ Ψ∗T (θ)|,
θ∈Θ
θ∈Θ
θ∈Θ
where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms in the right-hand
side of the above inequality are op (1). Given Lemma 8 and Equation (46) we will use
a−1
T
sup
(y,w)∈Rm+1
0
|Dλ Ĝ(y, w) − HλT
(y, w)| = op (1),
0
a−1
T sup |ĝ(w) − ḡT (w)| = op (1),
w∈Rm
√
R
for λ = 1, 2, which hold for any sequence {aT } satisfying aT T h2yT hm
wT → ∞, aT /hwT → ∞
and aT /hR
yT → ∞. For λ = 1, 2 we will also use the identity
Dλ F̂ (y|w) − Dλ F 0 (y|w) =
1
Dλ F 0 (y|w)
0
[Dλ Ĝ(y, w) − HλT
(y, w)] −
[ĝ(w) − ḡT0 (w)]
ĝ(w)
ĝ(w)
which follows from Equation (50). The proof then draws from that of Theorem 5. Specifiθθ
θθ
θθ
cally, in Step 1 we deal with ∆θθ Ψ̂T (θ) − ∆θθ Ψ̂T (θ) = −∆Ψ̂θθ
1T − ∆Ψ̂2T − ∆Ψ̂3T , where ∆Ψ̂jT
are equal to ∆Ψ̂jT , for j = 1, 2, 3, where F̂ (Yt |Wt ) − F 0 (Yt |Wt ), F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt ),
F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt ) are replaced by {D2 F̂ [qt (θ)|Wt ] − D2 F 0 [qt (θ)|Wt ]}∇θ qt (θ)∇θ qt (θ)0 ,
{DF̂ [qt (θ)|Wt ]− DF 0 [qt (θ)|Wt ]}∆θθ qt (θ), and D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 +DF 0 [qt (θ)|Wt ]
∆θθ qt (θ), respectively. We then obtain
"
−1
|∆Ψ̂θθJ
1T | 6 bT
+
sup
(y,w)∈Rm+1
sup
(y,w)∈Rm+1
0
|D2 Ĝ(y, w) − H2T
(y, w)|
#"
|D2 F 0 (y|w)| sup |ĝ(w) − ḡT0 (w)|
w∈Rm
#
T
1X
Jt sup |∇θ qt (θ)∇θ qt (θ)0 | .
T t=1 θ∈Θ
Thus, supθ∈Θ ∆Ψ̂θθJ
1T = op (1) as Cauchy-Schwarz inequality gives
T
1X
Jt sup |∇θ qt (θ)∇θ qt (θ)0 | 6
T t=1 θ∈Θ
(60)
Ã
T
1X
Jt
T t=1
= op (1),
!1/2 Ã
¶2 !1/2
T µ
1X
sup |∇θ qt (θ)∇θ qt (θ)0 |
T t=1 θ∈Θ
P
by Equation (51) and (1/T ) Tt=1 (supθ∈Θ |∇θ qt (θ)∇θ qt (θ)0 |)2 = Op (1), which follows from
"
µ
¶2
¶2 #
T µ
1X
0
0
6 sup E sup |∇θ qt (θ)∇θ qt (θ) | < ∞,
sup |∇θ qt (θ)∇θ qt (θ) |
E
T t=1 θ∈Θ
16t6T,T >1
θ∈Θ
56
KOMUNJER AND VUONG
using (A7)(i) and Markov inequality. Similarly, supθ∈Θ ∆Ψ̂θθJ
2T = op (1) using
T
1X
Jt sup |∆θθ qt (θ)| = op (1).
T t=1 θ∈Θ
(61)
Regarding ∆Ψ̂θθJ
3T , we have
|∆Ψ̂θθJ
3T |
T
1X
6
sup |D F (y|w)|
Jt sup |∇θ qt (θ)∇θ qt (θ)0 |
T t=1 θ∈Θ
(y,w)∈Rm+1
2
+
sup
0
|DF 0 (y|w)|
(y,w)∈Rm+1
T
1X
Jt sup |∆θθ qt (θ)|,
T t=1 θ∈Θ
showing that supθ∈Θ ∆Ψ̂θθJ
3T = op (1) using Equations (60) − (61) and (A9)(ii).
In Step 2, we deal with ∆θθ Ψ̂T (θ) − ∆θθ Ψ∗T (θ) = −∆Ψ1Tθθ − ∆Ψ2Tθθ + ∆Ψ3Tθθ , where ∆ΨjTθθ
are equal to ∆ΨjT , for j = 1, 2, 3, where F̂ (Yt |Wt ) − F 0 (Yt |Wt ), F̂ (qt (θ)|Wt ) − F 0 (qt (θ)|Wt ),
F 0 (Yt |Wt ) − F 0 (qt (θ)|Wt ) are replaced by {D2 F̂ [qt (θ)|Wt ] − D2 F 0 [qt (θ)|Wt ]}∇θ qt (θ)∇θ qt (θ)0 ,
{DF̂ [qt (θ)|Wt ]−DF 0 [qt (θ)|Wt ]}∆θθ qt (θ), and D2 F 0 [qt (θ)|Wt ]∇θ qt (θ)∇θ qt (θ)0 +DF 0 [qt (θ)|Wt ]
∆θθ qt (θ), respectively. We then obtain
|∆Ψ1Tθθ | 6 (bT )−1 (1 − η)−1
+
sup
(y,w)∈Rm+1
2
"
sup
(y,w)∈Rm+1
0
|D2 Ĝ(y, w) − H2T
(y, w)|
0
|D F (y|w)| sup |ĝ(w) −
w∈Rm
with probability approaching 1, where (1/T )
0
#"
ḡT0 (w)|
PT
t=1
#
T
1X
0
d sup |∇θ qt (θ)∇θ qt (θ) | ,
T t=1 t θ∈Θ
dt supθ∈Θ |∇θ qt (θ)∇θ qt (θ)0 | 6 (1/T )
supθ∈Θ |∇θ qt (θ)∇θ qt (θ) | = Op (1) by Markov inequality and (A7)(i). Hence,
PT
t=1
θθ
supθ∈Θ ∆Ψ1T =
EFFICIENT QUANTILE ESTIMATION
57
op (1). Similarly, supθ∈Θ ∆Ψ2Tθθ = op (1). Regarding ∆Ψ3Tθθ , we have
|∆Ψ3Tθθ |
"
#"
#
T
X
1
6
sup |D2 F 0 (y|w)|
(1 − dT ) sup |∇θ qt (θ)∇θ qt (θ)0 |
T t=1
θ∈Θ
(y,w)∈Rm+1
#
#"
"
T
X
1
(1 − dT ) sup |∆θθ qt (θ)|
+
sup |DF 0 (y|w)|
T t=1
θ∈Θ
(y,w)∈Rm+1
#1/2 "
#"
"
¶2 #1/2
T
T µ
X
X
1
1
sup |∇θ qt (θ)∇θ qt (θ)0 |
(1 − dT )2
6
sup |D2 F 0 (y|w)|
T t=1
T t=1 θ∈Θ
(y,w)∈Rm+1
#1/2 "
#"
"
¶2 #1/2
T
T µ
X
X
1
1
(1 − dT )2
sup |∆θθ qt (θ)|
+
sup |DF 0 (y|w)|
T
T
m+1
θ∈Θ
(y,w)∈R
t=1
t=1
which is an op (1) as (1/T )
Theorem 5.
PT
t=1 (1
− dT )2 = (1/T )
PT
t=1 (1
− dT ) = op (1) from Step 1 of
¤
³
Proof of Lemma 10. Note that for any density ft (·), we have E ft [qt (θ0 )]∇θ qt (θ0 ){1I[qt (θ0 )−
´
Yt ] − α} = 0. Thus, Lemma 10 could be established from (i) the stochastic equicontinuity
√ P
at f 0 (·|·) of the vector process ν T (f) = (1/ T ) Tt=1 f [qt (θ0 )|Wt ]∇θ qt (θ0 ){1I[qt (θ0 ) −Yt ] −α}
ˆ
with respect to some pseudo-metric ρ(f1 , f2 ), and (ii) the consistency of f(·|·)
= 1I[ĝ(·) −
bT ]DĜ(·, ·)/ĝ(·) to f 0 (·|·) with respect to ρ(·, ·). See Andrews (1994b) for some general results
on stochastic equicontinuity. These require, however, a more elaborate trimming than the
one used here in view of Andrews (1995, p.571). We thus prove Lemma 10 directly.
Though more complex, our proof draws from the asymptotic normality proof of Theorem
1 in Lavergne and Vuong (1996). For similar asymptotic normality proofs in the iid case see
also Robinson (1988) and Hardle and Stoker (1989). As previously, we let qt0 = qα (Wt , θ0 ).
Moreover, let
T /(T
1/4 R
hyT )
T
be such that
→ ∞. As
∇θ Ψ̂T (θ0 ) −
∇θ Ψ∗T (θ0 )
T
= o(bT ),
TT
1/4
hyT hm
wT → ∞,
T /(T
1/4 R
hwT )
→ ∞ and
h
i
= ∇θ Ψ̂T (θ0 ) − ∇θ ΨT (θ0 ) + [∇θ ΨT (θ0 ) − ∇θ Ψ∗T (θ0 )] ,
where ΨT (θ) is defined in Equation (47), it suffices to prove that both terms on the righthand side of the above equality are op (T −1/2 ). Given Lemma 8 and Equation (46) we shall
58
KOMUNJER AND VUONG
use
a−2
T
(62)
(63)
a−2
T
sup
(y,w)∈Rm+1
sup
(y,w)∈Rm+1
0
|DĜ(y, w) − H1T
(y, w)|2 = op (T −1/2 ),
0
|DĜ(y, w) − H1T
(y, w)| sup |ĝ(w) − ḡT0 (w)| = op (T −1/2 ),
w∈Rm
0
2
= op (T −1/2 ),
a−2
T sup |ĝ(w) − ḡT (w)|
(64)
w∈Rm
1/4 R
which hold for any sequence {aT } satisfying aT T 1/4 hyT hm
hwT ) → ∞ and
wT → ∞, aT /(T
aT /(T 1/4 hR
yT ) → ∞. We shall also use the identities
(65)
fˆ(y|w) − f 0 (y|w) =
=
(66)
f 0 (y|w)
1
0
[DĜ(y, w) − H1T
(y, w)] −
[ĝ(w) − ḡT0 (w)],
ĝ(w)
ĝ(w)
DĜ(y, w) − f 0 (y|w)ĝ(w)
f 0 (y|w)
+
[ĝ(w) − ḡT0 (w)]2
ḡT0 (w)
ĝ(w)ḡT0 (w)
1
0
[DĜ(y, w) − H1T
(y, w)][ĝ(w) − ḡT0 (w)].
−
ĝ(w)ḡT0 (w)
STEP1: We first show that ∇θ Ψ̂T (θ0 ) − ∇θ ΨT (θ0 ) = op (T −1/2 ). We have
T
i
√ h
1 X
ˆ 0 |Wt )∇θ q 0
√
(θ
)
−
∇
Ψ
(θ
)
=
T ∇θ Ψ̂T 0
(Jt − Ht )[1I(qt0 − Yt ) − α]f(q
θ T 0
t
t
T t=1
√
√
T ∆Ψ̂θ1T (θ0 ) + T ∆Ψ̂θ2T (θ0 ),
=
where Jt = dt (1 − dt ), Ht = (1 − dt )dt and
T
√
1 X
θ
T ∆Ψ̂1T (θ0 ) = √
(Jt − Ht )[1I(qt0 − Yt ) − α][fˆ(qt0 |Wt ) − f 0 (qt0 |Wt )]∇θ qt0 ,
T t=1
T
√
1 X
T ∆Ψ̂θ2T (θ0 ) = √
(Jt − Ht )[1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 .
T t=1
As Ht 6 1I[|ĝ(Wt ) − ḡT0 (Wt )| −
T]
and the event {supw |ĝ(w) − ḡT0 (w)| >
probability 0 because Property (64) holds with aT =
T
T}
has asymptotic
by construction of
T,
we have
sup16t6T,T >1 Ht = 0 with probability approaching one. Hence, we need to consider the Jt
−1/2
terms only. Namely, it suffices to show that ∆Ψ̂θJ
) for j = 1, 2. Using
jT (θ 0 ) = op (T
EFFICIENT QUANTILE ESTIMATION
Equality (65) and the definition of Jt , we obtain
"
−1
|∆Ψ̂θJ
1T (θ 0 )| 6 bT
+
sup
(y,w)∈Rm+1
sup
(y,w)∈Rm+1
|∆Ψ̂θJ
2T (θ 0 )|
0
|DĜ(y, w) − H1T
(y, w)|
#"
f 0 (y|w) sup |ĝ(w) − ḡT0 (w)|
w∈Rm
T
1X
6
Jt f 0 (qt0 |Wt )|∇θ qt0 |.
T t=1
#
T
X
1
Jt |∇θ qt0 | ,
T t=1
P
P
Jt |∇θ qt0 | 6 (1/T ) Tt=1 (1 − dt )|∇θ qt0 | = Op (bγT ) and (1/T ) Tt=1 Jt f 0 (qt0 |Wt )
P
|∇θ qt0 | 6 (1/T ) Tt=1 (1 − dt )f 0 (qt0 |Wt )|∇θ qt0 | = Op (b2γ
T ) by Markov inequality combined with
But (1/T )
PT
59
t=1
(A10)(ii)-(iii) where cT = bT = O(bT ). Hence, using sup(y,w)∈Rm+1 f 0 (y|w) < ∞ by (A9)(ii)
−1/4 γ
combined with Properties (62) and (64) where aT = bT , we obtain ∆Ψ̂θJ
bT )
1T (θ 0 ) = Op (T
2γ
−1/(4γ)
−1/2
and ∆Ψ̂θJ
) we obtain ∆Ψ̂θJ
) for
2T (θ 0 ) = Op (bT ). Since bT = o(T
jT (θ 0 ) = op (T
j = 1, 2, as desired.
STEP 2: We next show that ∇θ ΨT (θ0 ) − ∇θ Ψ∗T (θ0 ) = op (T −1/2 ). We have ∇θ ΨT (θ0 ) =
P
µ0 + [∇θ ΨT (θ0 ) − µ0 ], where µ0 ≡ T −1 Tt=1 dt [1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 and
∇θ ΨT (θ0 ) − µ0
T
1X
d [1I(qt0 − Yt ) − α][fˆ(qt0 |Wt ) − f 0 (qt0 |Wt )]∇θ qt0
=
T t=1 t
T
1X
DĜ(qt0 , Wt ) − f 0 (qt0 |Wt )ĝ(Wt )
0
=
∇θ qt0
dt [1I(qt − Yt ) − α]
0
T t=1
ḡT (Wt )
T
f 0 (qt0 |Wt )
1X
dt [1I(qt0 − Yt ) − α]
[ĝ(Wt ) − ḡT0 (Wt )]2 ∇θ qt0
+
T t=1
ĝ(Wt )ḡT0 (Wt )
T
0
(qt0 , Wt )
1X
DĜ(qt0 , Wt ) − H1T
[ĝ(Wt ) − ḡT0 (Wt )]∇θ qt0
−
dt [1I(qt0 − Yt ) − α]
T t=1
ĝ(Wt )ḡT0 (Wt )
≡ µ1 + µ2 − µ3 ,
using Equality (66). Hence, ∇θ ΨT (θ0 ) = µ0 + µ1 + µ2 − µ3 . Thus, the proof is complete if
µ0 = ∇θ Ψ∗T (θ0 ) + op (T −1/2 ) and µj = op (T −1/2 ) for j = 1, 2, 3, as shown next.
STEP 2a: We show that µ0 = ∇θ Ψ∗T (θ0 ) + op (T −1/2 ). We have
µ0 =
∇θ Ψ∗T (θ0 )
T
1X
−
(1 − dt )[1I(qt0 − Yt ) − α]f 0 (qt0 |Wt )∇θ qt0 .
T t=1
60
KOMUNJER AND VUONG
Let µ02 denote the second term on the right-hand side of the above equality. Thus, it
P
suffices to show that µ02 = op (T −1/2 ). But, from Step 1 we know that |µ02 | 6 T −1 Tt=1 (1 −
−1/(4γ)
dt )f 0 (qt0 |Wt )|∇θ qt0 | = Op (b2γ
).
T ). The desired result follows from bT = o(T
STEP 2b: Next, we show that µ2 = µ3 = op (T −1/2 ). We have
√
| T µ2 | 6
√
T
0
sup f (y|w) sup [ĝ(w) −
bT inf {w:ḡT0 (w)>bT } |ĝ(w)| (y,w)∈Rm+1
w∈Rm
ḡT0 (w)]2
T
1X
|∇θ qt0 |.
T t=1
But Property (64) with aT = bT implies (bT )−1 supw |ĝ(w) − ḡT0 (w)| = op (T −1/4 ) = op (1).
Hence, for any η ∈ (0, 1) we have inf {w:ḡT0 (w)/bT >1} |ĝT0 (w)|/bT > 1 − η with probability
approaching one. Thus, with probability approaching one
√
| T µ2 | 6
√
T
X
T
0
0
21
|∇θ qt0 |,
sup f (y|w) sup [ĝ(w) − ḡT (w)]
2
(bT ) (1 − η) (y,w)∈Rm+1
T t=1
w∈Rm
which is an op (1) by Property (64) with aT = bT as sup(y,w)∈Rm+1 f 0 (y|w) < ∞ and (1/T )
|∇θ qt0 |
= Op (1) as noted earlier. That is, µ2 = op (T
√
| T µ3 | 6
×
−1/2
). Similarly,
PT
t=1
√
T
bT inf {w:ḡT0 (w)>bT } |ĝ(w)|
sup
(y,w)∈Rm+1
0
|DĜ(y, w) − H1T
(y, w)| sup |ĝ(w) − ḡT0 (w)|
w∈Rm
T
1X
|∇θ qt0 |,
T t=1
which shows that µ3 = op (T −1/2 ) using the same argument with Property (63).
STEP 2c: Lastly, we show that µ1 = op (T −1/2 ). Let K0T (·) ≡ (1/hyT )K0 (·/hyT ) and
KT (·) ≡ (1/hm
wT )K0 (·/hwT ). Thus, from the definitions of D Ĝ(y, w) and ĝ(w) we have
µ1
T
T
¤
1 X X 1I(qt0 − Yt ) − α £
K0T (qt0 − Ys ) − f 0 (qt0 |Wt ) KT (Wt − Ws )∇θ qt0
dt
=
0
2
T t=1 s=1
ḡT (Wt )
≡ L+
T −1
U,
T
EFFICIENT QUANTILE ESTIMATION
61
where L and U are the diagonal and U-statistic parts defined as
T
¤
1 X 1I(qt0 − Yt ) − α £
0
0 0
K
(q
−
Y
)
−
f
(q
|W
)
KT (0)∇θ qt0
d
L ≡
0T
t
t
t
t
t
0
2
T t=1
ḡT (Wt )
X
1
uT ts
U ≡
T (T − 1) 16t6=s6T
¢
1¡ 0
uT ts + u0T st ≡ hT (Yt , Wt , Ys , Ws )
2
£
¤
1I(qt0 − Yt ) − α
= K0T (qt0 − Ys ) − f 0 (qt0 |Wt ) KT (Wt − Ws )dt
∇θ qt0
ḡT0 (Wt )
£
¤
1I(qs0 − Ys ) − α
= K0T (qs0 − Yt ) − f 0 (qs0 |Ws ) KT (Ws − Wt )ds
∇θ qs0 ,
ḡT0 (Ws )
uT ts =
u0T ts
u0T st
for 1 6 t 6= s 6 T . Note that hT (Yt , Wt , Ys , Ws ) is symmetric in (Yt , Wt ) and (Ys , Ws ).
Hence, it suffices to show that L and U are both op (T −1/2 ).
For L we have
√
1
| T L| 6 √
T bT hm
wT
"
#
T
1
1X
0
sup |K0 (y)| + sup f (y|w) |K(0)|
|∇θ qt0 |
hyT y∈R
T t=1
(y,w)∈Rm+1
where sup(y,w)∈Rm+1 f 0 (y|w) < ∞, supy∈R |K0 (y)| < ∞ and |K(0)| < ∞ by (A11)(iii) and
P
(A9)(ii). As (1/T ) Tt=1 |∇θ qt0 | = Op (1) by (A7)(i), (A5) and Markov inequality, we obtain
√
√
1/4
1/4
→ ∞ using bT = bT (1 + o(1)).
hyT hm
T L = op (1) because T bT hyT hm
wT T
wT = bT T
It remains to be shown that U = op (T −1/2 ). Because of the stationarity assumption
(A6’)(i), we have ḡT0 (·) = gt0 (·) ≡ g 0 (·). Moreover, from the Hoeffding decomposition (see
e.g. Arcones (1995, eq. 1.7)), we have U = U0 + 2U1 + U2 where
Z Z Z Z
2
Y
U0 =
hT (y1 , w1 , y2 , w2 ) [f 0 (yt |wt )g 0 (wt )dyt dwt ]
(67)
t=1
(68)
U1 =
(69)
U2 =
(70)
(71)
T
1X
hT 1 (Yt , Wt )
T t=1
X
1
hT 2 (Yt , Wt , Ys , Ws )
T (T − 1) 16t6=s6T
Z Z
hT 1 (y1 , w1 ) =
hT (y1 , w1 , y2 , w2 )f 0 (y2 |w2 )g 0 (w2 )dy2 dw2 − U0
hT 2 (y1 , w1 , y2 , w2 ) = hT (y1 , w1 , y2 , w2 ) − hT 1 (y1 , w1 ) − hT 1 (y2 , w2 ) − U0 .
Note that U0 6= E[U] as
Q2
t=1 [f
0
(yt |wt )g0 (wt )] is not the joint density of (Y1 , W1 , Y2 , W2 ),
while hT 1 (·) and hT 2 (·) are canonical kernels, i.e. symmetric kernels satisfying E[hT 1 (Y1 , W1 )]
62
KOMUNJER AND VUONG
= 0 and E[hT 2 (y1 , w1 , Y2 , W2 )] = 0, respectively, as noted by Arcones (1995). Thus, it suffices
√
to show that T Uk = op (1) for k = 0, 1, 2.
√
STEP 2c(i): We start by showing that T U0 = op (1). In fact, we have U0 = 0 as Equation
(67) gives
U0
Z Z Z Z
2
Y
1 0
0
(u + uT 21 ) [f 0 (yt |wt )g 0 (wt )dyt dwt ]
=
2 T 12
t=1
¾
Z Z ½Z
£
¤ 0
1
0
0 0
K0T (q1 − y2 ) − f (q1 |w1 ) f (y2 |w2 )dy2
=
2
¾
½Z
¤ 0
£ 0
d ∇θ q10 0
g (w1 )g0 (w2 )dw1 dw2
×
1I(q1 − y1 ) − α f (y1 |w1 )dy1 KT (w1 − w2 ) 10
g (w1 )
¾
Z Z ½Z
£
¤ 0
1
0
0 0
K0T (q2 − y1 ) − f (q2 |w2 ) f (y1 |w1 )dy1
+
2
¾
½Z
¤ 0
£ 0
d2 ∇θ q20 0
g (w1 )g0 (w2 )dw1 dw2 ,
×
1I(q2 − y2 ) − α f (y2 |w2 )dy2 KT (w2 − w1 ) 0
g (w2 )
R
[1I(qt0 − yt ) − α] f 0 (yt |wt )dyt = 0 for any t by assumptions (A1) and (A9)(i).
√
STEP 2c(ii): We now show that T U1 = op (1). By Markov inequality it suffices to show
where
that E(T U12 ) = o(1). But assumption (A6’) and Lemma 3 in Arcones (1995) with p = r
imply
(72)
E(T U12 ) = T −1 E
"
³ X
16t6T
#
T −1
³
´
´2
X
(r−2)/r
−1
−1
tβ t
MT2 1 ,
hT 1 (Yt , Wt )
6c T +T
t=1
where β t are the mixing coefficients of {(Yt , Wt0 )0 }, c is a universal constant and MT 1 =
sup16t<∞ [E|hT 1 (Yt , Wt )|r ]1/r . Note that Lemma 3 in Arcones (1995) is written for canonical
kernels that are independent of T . It is, however, easy to see from his proofs that this lemma
and Lemma 8, which is used to prove it, both hold even when canonical kernels depend on
P
(r−2)/r
T as in hT 1 (·) and hT 2 (·). From (A6’)(ii) we know that ∞
< ∞ (see e.g. White
t=1 β t
PT −1 (r−2)/r
−1
−1
2001 for the definition of the size) hence T + T
= O(1). We now show
t=1 tβ t
that MT 1 → 0. As U0 = 0 and the integral of u0T 21 with respect to f 0 (y2 |w2 )g 0 (w2 )dy2 dw2 is
EFFICIENT QUANTILE ESTIMATION
zero because
so
R
63
[1I(q20 − y2 ) − α] f 0 (y2 |w2 )dy2 = 0, we have from Equation (70)
|hT 1 (y1 , w1 )|
¯
¯1 £
¤ d ∇θ q10
= ¯¯ 1I(q10 − y1 ) − α 10
2
g (w1 )
¯
¾
Z ½Z
¯
£
¤ 0
0
0 0
0
×
K0T (q1 − y2 ) − f (q1 |w1 ) f (y2 |w2 )dy2 KT (w1 − w2 )g (w2 )dw2 ¯¯
¯
¯Z ½Z
¾
¯
£ 0 0
¤
|∇θ q10 | ¯¯
0 0
0
¯
6
K
(u)
f
(q
−
uh
|w
)
−
f
(q
|w
)
du
K
(w
−
w
)g
(w
)dw
0
yT
2
1
T
1
2
2
2
1
1
¯
2bT ¯
¯
¯
½
¾
Z
Z
¯
£ 0 0
¤
|∇θ q10 | ¯¯
0 0
0
6
K0 (u) f (q1 − uhyT |w2 ) − f (q1 |w2 ) du KT (w1 − w2 )g (w2 )dw2 ¯¯
¯
2bT
¯Z
¯
¯
¤
|∇θ q10 | ¯¯ £ 0 0
0 0
0
+
f (q1 |w2 ) −f (q1 |w1 ) KT (w1 − w2 )g (w2 )dw2 ¯¯
¯
2bT
¯
Z ¯Z
¯
£ 0 0
¤
|∇θ q10 | ¯¯
0 0
¯ |KT (w1 − w2 )|g0 (w2 )dw2
6
K
(u)
f
(q
−
uh
|w
)
−
f
(q
|w
)
du
0
yT
2
2
1
1
¯
¯
2bT
¯
¯
Z
¯
¤
|∇θ q10 | ¯¯ £ 0 0
0
0 0
0
¯
f
(q
|w
)g
(w
)
−
f
(q
|w
)g
(w
)
K
(w
−
w
)dw
+
2
2
1
1
T
1
2
2
1
1
¯
2bT ¯
¯
¯
Z
¯
¯ £ 0
¤
|∇θ q10 | 0 0
+
g (w2 ) − g 0 (w1 ) KT (w1 − w2 )dw2 ¯¯
f (q1 |w1 ) ¯¯
2bT
Z
|∇θ q10 |
R
6
O(hyT ) |K(v)|g 0 (w1 − vhwT )dv
bT
¯
¯Z
¯
¤
|∇θ q10 | ¯¯ £ 0 0
0
0 0
0
¯
+
f
(q
|w
−
vh
)g
(w
−
vh
)
−
f
(q
|w
)g
(w
)
K(v)dv
1
wT
1
wT
1
1
1
1
¯
2bT ¯
¯
¯
Z
¯
¯ £ 0
¤
|∇θ q10 | 0 0
+
f (q1 |w1 ) ¯¯
g (w1 − vhwT ) − g 0 (w1 ) K(v)dv ¯¯
2bT
ª
|∇θ q10 | ©
0 0
R
6
O(hR
yT ) + [1 + f (q1 |w1 )]O(hwT ) ,
bT
(73)
|hT 1 (y1 , w1 )| 6
ª
|∇θ q10 | ©
R
O(hR
)
+
O(h
)
,
yT
wT
bT
where we have used the change of variables u = (q10 − y2 )/hyT and v = (w1 − w2 )/hwT
combined with (A8), (A9)(ii), (A11)(i,iii) and Taylor expansions of order R of the inte-
grands. As E|∇θ q10 |r < sup16t6T,T >1 E[supθ∈Θ |∇θ qα (Wt , θ)|r ] < ∞ by (A5) and (A7)(i),
ª
©
R
uniformly in t. Hence,
it follows that (E|hT 1 (Yt , Wt )|r )1/r 6 (1/bT ) O(hR
yT ) + O(hwT )
©
ª
R
R
R
MT 1 6 (1/bT ) O(hR
yT ) + O(hwT ) . Given bT = bT [1 + o(1)], hyT = o(bT ) and hwT = o(bT ),
64
KOMUNJER AND VUONG
1/4 R
which follow from bT /(T 1/4 hR
hwT ) → ∞ respectively, we have that
yT ) → ∞ and bT /(T
MT 1 = o(1). Combining Property (72) and (A6’)(i) then gives E(T U12 ) = o(1) as desired.
√
STEP 2c(iii): Finally we show that T U2 = op (1). Again, by Markov inequality it suffices
to show that E(T U22 ) = o(1). Similar to the previous case, Assumption (A6’) and Lemma 3
in Arcones (1995) with p = 2r imply
"
#
¶2
µ
³ X
´2
T
hT 2 (Yt , Wt , Ys , Ws )
T −3 E
E(T U22 ) =
T −1
16t6=s6T
(74)
6
µ
T
T −1
¶2 ³
T −1
´
X
(r−1)/r
MT2 2 ,
c T −1 + T −1
tβ t
t=1
1/(2r)
where c is a universal constant and MT 2 = sup16t6=s<∞ [E|hT 2 (Yt , Wt , Ys , Ws )|2r ]
. We
√
P
(r−1)/r
T −1
tβ t
= O(1/ T ) and that MT 2 = o(T 1/4 ). The first
now show that T −1 + T −1 t=1
P
P
(r−1)/r
(r−1)/r
property is implied by ∞
< ∞ for which it suffices to show that 2τ
→
t=1 tβ t
t=τ tβ t
P∞ (r−2)/r
(r−2)/r
0 as τ → ∞. As previously, from (A6’)(ii) we know that t=1 β t
< ∞ hence β t
t→
P
P
(r−1)/r
2τ
2τ
0 as t → ∞ and β t 6 tr/(2−r) for t large enough. Thus t=τ tβ t
6 t=τ t−1/(r−2) which
vanishes when 2 < r < 3 as assumed. For the second property, we bound MT 2 . From
Equations (71), (73) and U0 = 0 we obtain
ª
|∇θ q10 | + |∇θ q20 | ©
R
)
+
O(h
)
O(hR
yT
wT
bT
ª
|∇θ q10 | + |∇θ q20 | ©
−1
R
6
+ O(hR
O(hyT hm
wT )
yT ) + O(hwT ) ,
bT
|hT 2 (y1 , w1 , y2 , w2 )| 6 |hT (y1 , w1 , y2 , w2 )| +
where the second equality follows from the definitions of u0T 12 and u0T 21 , where supy∈R |K0 (y)| <
∞, supw∈Rm |K(w)| < ∞ and sup(y,w)∈Rm+1 f 0 (y|w) < ∞ by (A11)(i,iii) and (A9)(ii). Thus,
by Minkowski inequality we obtain
n£
o
£
¤
¤
0 2r 1/(2r)
0 2r 1/(2r)
E|∇θ qt |
+ E|∇θ qs |
MT 2 6
sup
×
16t6=s<∞
( µ
O
!
µ R ¶)
hR
hwT
yT
+O
+O
m
bT hyT hwT
bT
bT
!
Ã
µ R ¶
¶
µ
hR
hwT
1
yT
+O
,
+O
= O
m
bT hyT hwT
bT
bT
1
¶
Ã
R
1/4
by (A7)(i). Given bT = bT [1 + o(1)], hR
hyT hm
yT = o(bT ), hwT = o(bT ) and bT T
wT → ∞, we
have MT 2 = o(T 1/4 ) as desired. Thus, Equation (74) implies E(T U22 ) = o(1).
¤
EFFICIENT QUANTILE ESTIMATION
65
References
Ai, C., and X. Chen (2003): “Efficient Estimation of Models with Conditional Moment
Restrictions Containing Unknown Functions,” Econometrica, 71, 1795—1843.
Altonji, J., and L. Segal (1996): “Small Sample Bias in GMM Estimation of Covariance
Structures,” Journal of Economic and Business Statistics, 14, 353—366.
Andrews, D. W. K. (1994a): “Asymptotics for Semiparametric Econometric Models Via
Stochastic Equicontinuity,” Econometrica, 62, 43—72.
(1994b): “Empirical Process Methods in Econometrics,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, pp. 2247—2294. Elsevier Science.
(1995): “Nonparametric Kernel Estimation for Semiparametric Econometric Models,” Econometric Theory, 11, 560—596.
Angrist, J., V. Chernozhukov, and I. Fernandez-Val (2006): “Quantile Regression
under Misspecification, with an Application to the U.S. Wage Structure,” Econometrica,
74, 539—564.
Arcones, M. A. (1995): “On the Central Limit Theorem for U-Statistics under Absolute
Regularity,” Statistics and Probability Letters, 24, 245—249.
Begun, J., W. Hall, W. Huang, and J. Wellner (1983): “Information and Asymptotic
Efficiency in Parametric-Nonparametric Models,” Annals of Statistics, 11, 432—452.
Bickel, P. J. (1982): “On Adaptive Estimation,” Annals of Statistics, 10, 647—671.
Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1998): Efficient
and Adaptive Estimation for Semiparametric Models. Springer-Verlag, New York.
Bierens, H. J. (1983): “Uniform Consistency of Kernel Estimators of a Regression Function
Under Generalized Conditions,” Journal of the American Statistical Association, 78, 699—
707.
(1987): “Kernel Estimators of Regression Functions,” in Advances in Econometrics:
Fifth World Congress, ed. by T. Bewley, vol. 1 of Advances in Econometrics: Fifth World
Congress, pp. 99—144. Cambridge University Press, New York.
Bierens, H. J., and D. Ginther (2001): “Integrated Conditional Moment Testing of
Quantile Regression Models,” Empirical Economics, 26, 307—324.
Bilingsley, P. (1995): Probability and Measure. John Wiley and Sons, New York, 3rd edn.
Bracewell, R. N. (2000): The Fourier Transform and Its Applications. McGraw—Hill, 3rd
edn.
66
KOMUNJER AND VUONG
Bradley, R. C. (1986): “Basic Properties of Strong Mixing Conditions,” in Dependence in
Probability and Statistics, ed. by E. Eberlein, and M. S. Taqqu, pp. 165—192. Birkhauser,
Boston.
Brown, B., and W. K. Newey (1998): “Efficient Semiparametric Estimation of Expectations,” Econometrica, 66, 453—464.
Buchinsky, M. (1994): “Changes in the US Wage Structure 1963-1987: Application of
Quantile Regression,” Econometrica, 62, 405—458.
Buchinsky, M., and J. Hahn (1998): “An Alternative Estimator for the Censored Quantile Regression Model,” Econometrica, 66, 653—671.
Cai, Z. (2002): “Regression Quantiles for Time Series,” Econometric Theory, 18, 169—192.
Chamberlain, G. (1986): “Asymptotic Efficiency in Semi-Parametric Models with Censoring,” Journal of Econometrics, 32, 189—218.
(1987): “Asymptotic Efficiency in Estimation with Conditional Moment Restrictions,” Journal of Econometrics, 34, 305—334.
Chernozhukov, V., and H. Hong (2002): “Three-Step Censored Quantile Regression
and Extramarital Affairs,” Journal of the American Statistical Association, 97, 872—882.
Cosslett, S. R. (2004): “Efficient Semiparametric Estimation of Censored and Truncated
Regressions Via a Smoothed Self-Consistency Equation,” Econometrica, 72, 1277—1293.
Gourieroux, C., and A. Monfort (1995): Statistics and Econometric Models. Cambridge University Press.
Gourieroux, C., A. Monfort, and A. Trognon (1984): “Pseudo Maximum Likelihood
Methods: Theory,” Econometrica, 52, 681—700.
Hahn, J. (1997): “Efficient Estimation of Panel Data Models with Sequential Moment
Restrictions,” Journal of Econometrics, 79, 1—21.
Hansen, B. E. (2004a): “Nonparametric Estimation of Smooth Conditional Distributions,”
University of Winsconsin, Madison.
(2004b): “Uniform Convergence Rates for Kernel Estimation,” University of Winsconsin, Madison.
Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moment Estimators,” Econometrica, 50, 1029—1054.
Hansen, L. P., J. Heaton, and M. Ogaki (1988): “Efficiency Bounds Implied by Multiperiod Conditional Moment Restrictions,” Journal of the American Statistical Association,
83, 863—871.
EFFICIENT QUANTILE ESTIMATION
67
Hardle, W., and T. Stoker (1989): “Investigating Smooth Multiple Regression by the
Method of Average Derivatives,” Journal of the American Statistical Association, 89, 986—
995.
Hjort, N. L., and D. Pollard (1993): “Asymptotics for Minimizers of Convex Processes,”
Yale University.
Horowitz, J., and V. G. Spokoiny (2002): “An Adaptive Rate-Optimal Test of Linearity
for Median Regression Models,” Journal of the American Statistical Association, 97, 822—
835.
Huber, P. J. (1967): “The Behavior of Maximum Likelihood Estimates Under Nonstandard
Conditions,” in Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics
and Probability, Berkeley. University of California Press.
Khan, S. (2001): “Two-Stage Rank Estimation of Quantile Index Models,” Journal of
Econometrics, 100, 319—335.
Kim, T.-H., and H. White (2003): “Estimation, Inference, and Specification Analysis for
Possibly Misspecified Quantile Regression,” in Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later, ed. by T. Fromby, and R. C. Hill, pp. 107—132.
Elsevier, New York.
Kitamura, Y., G. Tripathi, and H. Ahn (2004): “Empirical Likelihood-Based Inference
in Conditional Moment Restriction Models,” Econometrica, 72, 1667—1714.
Knight, K. (1998): “Limiting Distributions for L1 -Regression Estimators under General
Conditions,” Annals of Statistics, 26, 755—770.
Koenker, R., and G. Bassett, Jr. (1978): “Regression Quantiles,” Econometrica, 46(1),
33—50.
(1982): “Robust Tests for Heteroscedasticity Based on Regression Quantiles,”
Econometrica, 50(1), 43—62.
Koenker, R., and K. F. Hallock (2001): “Quantile Regression,” Journal of Economic
Perspectives, 15(4), 143—156.
Koenker, R., and Z. Xiao (2002): “Inference on the Quantile Regression Process,” Econometrica, 70, 1583—1612.
Koenker, R., and Q. Zhao (1996): “Conditional Quantile Estimation and Inference for
ARCH Models,” Econometric Theory, 12, 793—813.
Komunjer, I. (2005a): “Asymmetric Power Distribution: Theory and Applications to Risk
Measurement,” University of California, San Diego.
68
KOMUNJER AND VUONG
(2005b): “Quasi-Maximum Likelihood Estimation for Conditional Quantiles,” Journal of Econometrics, 128, 137—164.
Lavergne, P., and Q. Vuong (1996): “Nonparametric Selection of Regressors: The
Nonnested Case,” Econometrica, 64, 207—219.
Newey, W., and R. J. Smith (2004): “Higher Order Properties of GMM and Generalized
Empirical Likelihood Estimators,” Econometrica, 72, 219—256.
Newey, W. K. (1990a): “Efficient Instrumental Variables Estimation of Nonlinear Models,”
Econometrica, 58, 809—837.
(1990b): “Semiparametric Effficiency Bounds,” Journal of Applied Econometrics,
5, 99—135.
(1993): “Efficient Estimation of Models with Conditional Moment Restrictions,” in
Handbook of Statistics, Volume 11: Econometrics, ed. by G. S. Maddala, C. R. Rao, and
H. D. Vinod, pp. 419—454. North Holland, Amsterdam.
(2004): “Efficient Semiparametric Estimation Via Moment Restrictions,” Econometrica, 72, 1877—1897.
Newey, W. K., and D. L. McFadden (1994): “Large Sample Estimation and Hypothesis
Testing,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, pp.
2113—2247. Elsevier Science.
Newey, W. K., and J. L. Powell (1990): “Efficient Estimation of Linear and Type
I Censored Regression Models Under Conditional Quantile Restrictions,” Econometric
Theory, 6, 295—317.
Otsu, T. (2003): “Empirical Likelihood for Quantile Regression,” University of Wisconsin
Madison.
Pollard, D. (1991): “Asymptotics for Least Absolute Deviation Regression Estimators,”
Econometric Theory, 7, 186—199.
Portnoy, S. (1991): “Behavior of Regression Quantiles in Non-Stationary, Dependent
Cases,” Journal of Multivariate Analysis, 38, 100—113.
Powell, J. L. (1984): “Least Absolute Deviations Estimation for the Censored Regression
Model,” Journal of Econometrics, 25, 303—325.
(1986): “Censored Regression Quantiles,” Journal of Econometrics, 32, 143—155.
Robinson, P. M. (1983): “Nonparametric Estimators for Time Series,” Journal of Time
Series Analysis, 4, 185—207.
EFFICIENT QUANTILE ESTIMATION
69
(1987): “Asymptotically Efficient Estimation in the Presence of Heteroskedasticity
of Unknown Form,” Econometrica, 55, 875—891.
(1988): “Root-N Consistent Semiparametric Regression,” Econometrica, 56, 931—
954.
Schwartz, L. (1997): Analyse. Hermann, Paris.
Stein, C. (1956): “Efficient Nonparametric Testing and Estimation,” in Proceedings of the
Third Berkeley Symposium in Mathematical Statistics and Probability, vol. 1, pp. 187—196,
Berkeley. University of California Press.
Stone, C. J. (1980): “Optimal Rates of Convergence for Nonparametric Estimators,” Annals of Statistics, 8, 1348—1360.
(1982): “Optimal Global Rates of Convergence for Nonparametric Regression,”
Annals of Statistics, 10, 1040—1053.
Truong, Y. K., and C. J. Stone (1992): “Nonparametric Function Estimation Involving
Time Series,” Annals of Statistics, 20, 77—97.
White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models,” Econometrica, 50, 1—25.
(2001): Asymptotic Theory for Econometricians. Academic Press, San Diego.
Zhao, Q. (2001): “Asymptotically Efficient Median Regression in the Presence of Heteroskedasticity of Unknown Form,” Econometric Theory, 17, 765—784.
Zheng, J. X. (1998): “A Consistent Nonparametric Test of Parametric Regression Models
under Conditional Quantile Restrictions,” Econometric Theory, 14, 123—138.