Optimal Choice Between Parametric and Non-Parametr

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/232005453
Optimal choice between parametric and non-parametric bootstrap estimates
Article in Mathematical Proceedings of the Cambridge Philosophical Society · March 1994

DOI: 10.1017/S0305004100072121
CITATIONS READS
18 810
1 author:
Stephen Lee
The University of Hong Kong
52 PUBLICATIONS 480 CITATIONS
SEE PROFILE
All content following this page was uploaded by Stephen Lee on 10 February 2015.
The user has requested enhancement of the downloaded file.

Math. Proc. Camb. Phil. Soc. (1994), 115, 335 335
Printed in Great Britain
Optimal choice between parametric and non-parametric

bootstrap estimates
BY STEPHEN MAN SING LEE
Statistical Laboratory, University of Cambridge, 16 Mill Lane, Cambridge CB2 ISB
(Received 23 July 1993; revised 12 October 1993)

Abstract
A parametric bootstrap estimate (PB) may be more accurate than its non-
parametric version (NB) if the parametric model upon which it is based is, at least
approximately, correct. Construction of an optimal estimator based on both PB and
NB is pursued with the aim of minimizing the mean squared error. Our approach is
to pick an empirical estimate of the optimal tuning parameter ee[0,1] which
minimizes the mean square error of eNB + (1 — e) PB. The resulting hybrid estimator
is shown to be more reliable than either PB or NB uniformly over a rich class of
distributions. Theoretical asymptotic results show that the asymptotic error of this
hybrid estimator is quite close in distribution to the smaller of the errors of PB and
NB. All these errors typically have the same convergence rate of order O(n~~%). A
particular example is also presented to illustrate the fact that this hybrid estimate
can indeed be strictly better than either of the pure bootstrap estimates in terms of
minimizing mean squared error. Two simulation studies were conducted to verify the
theoretical results and demonstrate the good practical performance of the hybrid
method.
1. Introduction
The idea of bootstrapping in statistical estimation was put forward by Efron[2].
The bootstrap method can be applied in both parametric and non-parametric
fashions. In the context of estimation of a statistical functional a(F) based on a
random sample of size n from an unknown distribution F, the parametric bootstrap
estimate (PB) and the non-parametric bootstrap estimate (NB) amount to replacing
F by F§ and Fn respectively, where 8 n is some estimate of the index 8 parametrizing
a family {F6} thought to contain F, and Fn is the usual empirical distribution with
equal probability placed on each data point.
In the literature PB and NB have been discussed mainly as two quite separate
procedures. Their relative performance depends largely on correctness of our
assumption of the parametric model. The estimate PB is generally more accurate if
our proposed model is sufficiently close to F, whereas NB is the better choice if the
parametric model is far from correct. The question of which one to use in
circumstances where knowledge of the underlying distribution is vague remains
unanswered. A straightforward criterion to guide our choice between the two
bootstrap estimates is necessary.
Hjort[5] looks briefly at an intermediate bootstrap estimate (IB) obtained by
replacing F with a mixture distribution eF$ +(l — e)Fn, where ee[0,1] is a pseudo-
336 STEPHEN MAN SING L E E
parameter to be determined in some optimal sense. Inspired by Hjort's suggestion we
propose a slightly different e-hybrid estimator, denned as
where ee[0,1] is, in the terminology of [7], a 'tuning parameter'. The asymptotic
properties of the e-hybrid estimator are, unlike Hjort's approach, easy to study.
Moreover, the optimal e for Tn e is known explicitly so that it can be computed
directly, whereas Hjort's estimate (IB) generally requires resampling from a class of
distributions indexed by e in order to determine the approximate value of the
optimal e.
Section 2 gives a full theoretical account of the asymptotic properties of our e-
hybrid estimator. It will be shown that Tne combines the merits of PB and NB in a
robust way. The tuning parameter e, if picked to minimize the mean squared error
of Tn e, results in an optimal hybrid estimator. In practice, e is determined by
minimizing the empirical mean squared error of Tn e. An exact expression of the
optimal e is readily available and its sample version can either be computed directly
or approximated via bootstrap resampling. The optimal properties of Tn e are
demonstrated through two simple examples in Section 3. The second example shows
that Tn e can even be strictly better than either PB or NB, uniformly over a rich class
of distributions. Simulation results are obtained and closely support our theory. A
few open questions are raised in Section 4 about possible generalization of our
procedure.
2. Theory of e-hybrid estimator

2-1. Problem specification. A random sample of n observations
is drawn from an unknown distribution F. We suspect F belongs to a parametric

family of distributions,
^ = {Fe:Qe&},
where 0 £ R p is a non-empty parameter subspace. We wish to estimate a real-
valued statistical functional <x(F) based on 9C.
Define .F^ to be the empirical distribution of ?£, that is, the distribution that places
an equal mass of n~x on each data point in 9C. Define 8 n to be the maximum likelihood
estimator (MLE) of 8 based on 3C assuming Fe^. Then the parametric bootstrap
estimator (PB) and non-parametric bootstrap estimator (NB) are respectively OL(FQ )
and oc(Fn). Define an e-hybrid estimator Tn e to be a convex combination of PB and
NB, namely,
for ee [0,1].
Suppose now en(F) minimizes the mean squared error (MSE) of Tn e, that is,
i (r n i £ n ( F ) ) = inf F(;,£)
= inf BF{[Tn_e-oc(F)f}.
OsSesSl
Note that the optimal en(F) depends on the unknown distribution F and is therefore
Parametric and non-parametric bootstrap estimates 337
not generally available for evaluation. I t may, however, be estimated by its non-
parametric bootstrap estimate en(Fn). Write
K = en(Fn)
for simplicity. We shall study the asymptotic properties of both the theoretical
optimal e-hybrid estimator Tnc (F) and the empirical optimal e-hybrid estimator
Tn. .
2-2. Notation and assumptions. Before laying down the basic assumptions
underlying the theoretical results to be derived later, we first introduce some
notation:
1. || • || is the usual Euclidean norm denned on the space of real matrices, so that
IMMI = .
< - l J"-1
where A = (atj) is any pxq matrix. Thus, in particular,

(v \i
II -r II—
||X|| I Vxi)v2 I VX6K.
—\ 2J Vv cMV
.
\«-l /
2. For any differentiable function g: ft->R with ft £ R p , write
3. F o r any twice differentiable function g: ft-^ R with ft £ R p , let J f g denote its

Hessian matrix, t h a t is,
d2g d2g 82g

dx\ dx1 dx2 dxl
9 u 9
Jifg{x1,...,xp) = dx2dx1 dx\
d2g
dxp dx1 dxp dx2 dxl
4. Let Fn be a random variable such that along almost all sample sequences
XltX2,..., the conditional distribution of Yn given (X1, ...,Xn) converges weakly to a
limit distribution G corresponding to a random variable Y say. Then we write
Yn\Xlt...,Xn-2+Y(oTG) a.s.
where 'a.s.' stands for 'almost surely' as usual.
5. Similarly, if Yn converges in conditional probability to Ygiven (Xlt...,Xn) along
almost all random sequences X1,X2,..., which means given any e > 0,
P[\Yn-Y\>e\X1,...,Xn]^0 a.s.,
we then write Yn\Xv...,Xn-°+Y a.s.

338 S T E P H E N M A N SING L E E
6. If Yn is uniformly integrable with respect to the conditional distribution given
(Xt, ...,Xn) along almost all random sequences X1;X2,..., that is,
limsupE[|7J 1{|7J > c}\Xv ...,Xn] = 0 a.s.
where 1{E} is the indicator function of the event E, then we write

Yn\X1,X2,...,Xn isu.i. a.s.
7. A superscript' *' is used to indicate the bootstrap version of any statistic based
on a bootstrap resample 3C* drawn randomly from*3Cwith replacement. For example,
the empirical distribution of 9[* is denoted by F*.
We now make the following assumptions:
(A 1) For all distribution functions G,H,
a(H) = a(G) + \fG(x) dH(x) + Rem (H, G),
with \i/rG(x)dG(x) = 0.
(A 2) Under the true distribution F, \frF as determined in (A 1) satisfies

EF[f%(X)] < oo.
(A 3) The equation
E F [Vlog/(X;6)] = 0
has a unique solution 80 for 0 e 0 where f(x;8) is the density function corresponding
to the distribution2^, so that.F e is the member of 3F 'closest' to the true underlying
distribution F.
(A 4) There exists a real-valued function H, defined on R with HF[H(X)] < oo,
such that for all rreR and i,j,k = 1,2, ...,p,
dHogf(x;Q)
(A 5) Var F [Vlog/(Z;e)| 9o ] and E F [Jf log/(X;0)| e J exist with the latter being
negative definite.
(A 6) Define a function fi: 0 - ^ R by /?(8) = a(Fe). Then ^ is twice differentiable
and || J?7?|| is bounded in a neighbourhood of 80.
(A 7) VnB,em(Fn,F)^^0 8ind Vnnem(F*,Fn)\X1,...,Xn-^Q a.s.
(A 8) Almost sure convergence holds as follows:
a.s.,
1% ||Vlog/(Zi;8)|en-Vlog/(^;e)|eo||2^0 a.s.
1 " -log/(^;8) •0 a.s.

and -'i-l2 ddjd0k
for sA\j,k=i.,...,p.
2 2
(A 9) The quantities n[a(Fn)—a(F)] and n[a(F§ ) — oc(F6 )] are u.i. with respect to
F. Also, for any given sample sequence X1,X2,...,
n[a(FZ)-a(Fn)?\X1,X2,...,Xn
and n[a(Fd.)-oc(FK)f\X1,X2,...,Xn
are u.i. a.s.
Roughly speaking, the above assumptions hold in general if:
1. F has finite moments of sufficiently high orders;
2. all the distributions FêSF are regular and !F is parametrized by 8 in some
smooth manner. Serfling[8] provides a thorough account of the regularity conditions
required for the theory of parametric maximum likelihood estimation. And:
3. the functional to be estimated, a(-), is Hadamard differentiate with a well-
behaved influence function \jr. See [4] for a definition of Hadamard differentiability.
2-3. Preliminary results. Two lemmas are presented here to establish convergence
of bootstrap statistics conditional on a given sample of observations Xx, ...,Xn. They
are analogous to the Central Limit Theorem and Weak Law of Large Numbers for
ordinary random variables.
LEMMA A. Suppose Xj,X 2 ,..., are a sequence of independent multivariate random
variables drawn from a distribution G on R p . Let g : R p x ^ - > R * be a statistical
functional such that
where ^ is some convex class of distributions containing G and all point masses. Let Gn
be the empirical distribution of Xj,X 2 , . . . , X n . Given Xx, . . . , X n , let (X*, ...,X*) be a
random sample drawn from Gn. Then along almost all sample sequences XltX2,..., as
oo,
(i) -!-\
Furthermore, if
-SIMX^J-gtXÔHÔ a.s.,
then
(ii) -^-\iq(X*,Gn)-iq(Xj,Gn))\X1,...,Xn^Nk(O,VavG[q(X, 0)]),
along almost all sample sequences X^X^ ....

Note that the additional assumption that
n
i k i ^ J q i ^ W a.s.
i-\
required for result (ii) is usually satisfied by functionals q{x, G) sufficiently smooth
with respect to G under moment conditions on X.
Sketch proof of Lemma A. The first result (i) can be derived easily from the proof
340 STEPHEN MAN SING LEE
of Theorem 2-1 in [1]. That paper also suggests the second result (ii) for a scalar
functional q. For a general ^-dimensional setting, we can easily show by Chebyshev's
inequality that
X1,...,Xn-^0 a.s.
By the previous result (i), the second term in the above difference converges in
conditional distribution to iVft(0, VarG[gr(X, 6?)]) and hence result (ii) follows. I
LEMMA B. Suppose XY, X2,..., are a sequence of independent random variables drawn
from G on R. Suppose
EG[|X|] < oo.
Let X*, ...,X* be a random sample drawn from Gn, the empirical distribution based on
(Xp ...,Xn). Let T : R X ^ - > R be a statistical functional where <& is a convex class of
distributions containing G and all point masses. Suppose T satisfies
Then we have EG[\T(X,G)\]<CO.
1 "
(i) T(X*
^2
-ilT(X*,G)\X1,...,Xn-ÊG[r(X,G)) a.s.
Further, if n
-S a.s.,
then
n
1
(ii) |X 1 ,...,X n -Ê G [T(X,G')] a .s.
As in Lemma A, the additional condition stated for Lemma B (ii) is usually satisfied
by sufficiently smooth functionals T(X, G) under moment conditions on X.
Sketch proof of Lemma B. Let r\ be the set of distributions H satisfying the
condition / \x\ dH(x) < oo. Define a metric d^ on r x such that d1(H1,H2) is the infimum
of E|X— Y\ where X, Y have marginal distributions H1,H2 respectively. Without
ambiguity we can write d^Y, Z) = d1(H1,H2), where Y ~ H1 and Z ~ H2. Let ^ < n > be
the distribution of n~x 2?=i Yt with Y1)...,Yn independently distributed from H. Then
for all #!,#,, in r\,
Also, we have d^H^H)^-^ as

if and only if Hn -»• H weakly
and \\x\dHn{x)^\\x\dH{x).
The details of d1 can be found in [1]. Now take H as the distribution of T(X1, 6) with
X1,X2,..., independently distributed from G. Let Hn be the conditional distribution
of r(Xf,G) with Xfs independently distributed from Gn. Clearly H,HneTl a.s. We
can deduce from the Strong Law of Large Numbers (SLLN) that
d,{H<?>, #<«>) < d^H^H) - 0 a.s.
On the other hand, SLLN and a slightly modified version of the Dominated
Convergence Theorem imply the result
^(tfWEcTpr.G1))-^.
Thus
^(flWElIG)) d(^>ff<">) rf(^<n>EGT(X,G))Ô a.s.
and H<nn>^BGT(X,G)
weakly a.s. So result (i) follows.
To prove (ii), we let Jn be the conditional distribution of r(X*,Gn) with X* ~ Gn.
It is clear that Jn e F1 a.s. Note that SLLN and the assumption
together imply |a;|d«/n(x)->EG|T(X, G)\ a.s. (t)
Also, Chebychev's inequality shows that
T{X*,6n)-T(X*,0)\X1,...,Xn-^0 a.s.
But we know that T(X*,G)\X1, ...,Xn-^+r(X,G) a.s.
using results from the proof of (i). Therefore, by Slutsky's Theorem,
T(X*,Gn)-2+T(X,G)
conditional on Xlt ...,Xn a.s. Hence

Jn->H weakly a.s. (J)
From (f) and (J), we have
dx(Jn,H)^0 a.s.
Therefore dx{«/<">,EGT(X,G)) < dx(Jn,H) + d^H™,EGT(X,G))->0 a.s.
So result (ii) follows.
2-4. Asymptotic properties of e-hybrid estimators. In this section we shall develop
the asymptotic properties of en and Tn • as well as the theoretical optimal estimator
T
n,en(Fv F o r simplicity, we write en = en(F), en = en(Fn), vn = a(Fn)-a(F), u>n =
oc(FK)-a(F), /i* = a(Ft)-oc(Fn), V* = a(F^)-a(\), and y = a(F%)-a(F).
2'4-l. Asymptotic joint distribution of pure bootstrap estimates. First we study the
joint distribution of the non-parametric bootstrap estimate a(Fn) and the parametric
bootstrap estimate a(F^ ). By (A 1) we have
which represents the error of the pure bootstrap estimates for oc(F). Here ft and 00 are
denned as in (A 3) and (A 6). We shall first prove the following theorem.
THEOREM 1. The joint distribution of
is asymptotically bivariate normal with zero mean and variance covariance matrix
where pF = BF[1/rF(X)Vlogf(X;Q)\%],
and KF(%) = Var^ [V log/(Z; 9)1^].

Proof of Theorem 1. By assumption (A 4), there exists a neighbourhood N(Q0) of 80
such that for all X., BeN(Q0), we have
where An(3C, 9) = - S V log/(Xe; 8),
Bn(3f,B)=-j:jflogf{Xt;B),
n
i-\
Cn(%)=-'£H(Xi) as defined in (A 4),
and Wn(^;X,B) i s a p x p matrix satisfying
{Wn(X;-k,B)},i = S (A t -0 f c )£, w (aT;M)
for some |^ ( (^r;X.,8)| < 1, and j,k,l= 1,2,...,p. Take 9 = 90. Then SLLN and
assumptions (A 3), (A 4) and (A 5) imply
AB(£\90) = i i Vlogf(Xt;B)\9tÊPVlogf(X;B)\%
n
= 0,
i-\
Bn(X,B0) = i S ^ l o g / ( X i ; e ) | e o ^ E F Jf log/(Z;9)|eo = - JF(90),

and Cn(3C) = - £ H(Xt)^*EFH{X).

n
i-i
Thus, with probability 1 and for large enough n, there exists a sequence 6n
converging to 80 such that A n (#",9j = 0. See [8, 4-2-2, pp. 144-9]. Then we have
iien-eoii[i £i
U-l k-l l-\
->-0 a.s. as?i->-oo.

Therefore, for sufficiently large n, we have from (1) that
k-*» = Qn{%;kA)KWA) a.s. (2)
1 1
where Qn(3C; 9 n , 90) = - (£„(#, 00)+JCn(flT) Wn(X; 6 n , e,,))" -> Jê,,)" a.s.
since JF(Q0) is positive definite by (A 5). Consider the Taylor's expansion for /?(6),
which is valid by assumption (A 6),
for some Q1 with \\Q1 — 90|| < ||9 n — 90||. Without ambiguity we hereby drop the
arguments of Qn(&;Qn,QQ) and An(#",90) for simplicity. So
/(n)/(0) n a.s.
where
since ||^yff|8i|| is bounded iniV(90) by (A 6).

We know by the Central Limit Theorem (CLT) that
and VÂ n (^,9 0 ) = - J - £ {Vlog/(Z4;9)|9o-EF Vlog/(X;9)|9o}
where KF(Q0) = Var F [Vlog/(Z;9)| e J exists by (A 5). Therefore
= JV2(0,SF), (3)
where S^ is defined in the statement of the theorem. I
Applying Theorem 1 to the bivariate error of the pure bootstrap estimates, we
conclude
(Rem(Fn,F)
\a(F%)-a(F)
with Vn\nt-i —+ N2(O,XF).

(4)
The exact asymptotic behaviour of the bivariate error depends on correctness of our
proposed parametric model as well as the remainder function Hem(Fn,F) (which
depends on the functional a).
2-4-2. Asymptotic joint bootstrap distribution of pure bootstrap estimates. Next we
consider bootstrap resampled version of the bivariate error of the pure bootstrap
estimates, namely
fa(F*) — a.(F )\ / u* \
'\/n\ " I = \/n\ I.
\CC(FQ*) — oc(Fn)J \V* — vn~\~h)n)
Note that
!/
^ F (X*) \ /RemfJP* F )
J+Vn' "'
A similar argument to that used in Section 2-4-1 yields the bootstrap version of
Theorem 1, given as follows.
THEOREM 2. The joint bootstrap distribution of
^ n ( ) and
conditional on 3C converges weakly to a bivariate normal distribution with zero mean

and variance covariance matrix S F given in Theorem 1, along almost all sample sequences
Proof of Theorem 2. The proof follows closely that of Theorem 1. Take 8 = 8 n in (1)
which is valid because 6neiV(60) a.s. for sufficiently large n. Also, replace 3C by 3C*.
Thus
for all X.eiV(80). Since 8 n can be seen as a functional of Fn, applying Lemma B(ii)
together with (A 5) and (A 8), we get
)|9o = 0 a.s.
and £„(£•*,§„) | Z 1 , . . . , Z n - ^ E F ^ l o g / ( X ; 8 ) | 9 o = -J F (e o ) a.s.

Moreover, using Lemma B(i) with (A 4), we have
Cn(ar*)\Xlt...,Xn-UEFH(X) a.s.
All these convergences together with the fact that Wn(^*;X,Bn) = 0(||X. — 8n||) a.s.
imply that there exists a sequence 0* such that
and V[An(Sr*M) = 0\X1,...,Xn]+l a.s.

or in abbreviation,
An{9£*, 8*) = 0 with conditional probability (w.c.p.) ->• 1 a.s.
Hence Wn(Sr*;B*\)\X1,...,Xn-2+0 a.s.

Thus e*-8^ = Q n (^8te n )A n (<r*,e n ) w.c.p.^1 a.s.
where Qn(3C*;B*X)\Xlt ...,Xn^^JF(Q0)~1 a.s.

Therefore y9(8*)-^(8n) = S*TAn(^r*;8n) w.c.p.-^1 a.s.
where
S*T = V/SIJ, Qn(3C*; 8*, 8n) + JAâr*. 8n) Ql{%*; §*, 8n) jf^ Qn(%*; S*. 8n)
with l|e*-8J|<||8*-8n|| w.c.p.-^l a.s.
Note that 8 B -^8 0 a.s. and V/?(8) is continuous in 6. Thus,
u
n ' o
and S* T |^ 1 ,... ! X n -^V/?| e T o J F (8 o )- 1 a.s.

According to Lemma B(ii) and assumptions (A 2) and (A 8), we have also
/rn()\1,...,XnÊFirF(X) a.s.
Note that EPW*F(X)+ || V log/(JT; G)|eJ|2] < oo

by (A 2) and (A 5), and
2
Vlog/(Xi;8)U • ' - ~~ " " ^

by (A 8). Hence Lemma A(ii) implies
Therefore
- <L,VFn\~i) 1 _[1 0T\ 1 "
-^+N2(0,ZF) a.s.
This completes the proof. I
Thus, by Theorem 2, we have a result analogous to (4):
(
n
1 \
_Vit. (X*)\
»i-i " * \\X1,...,Xn-^+N2(0,-ZF) a.s. (5)
/?(§*)-/?(§„)
2-4-3. Asymptotic distribution of optimal e-hybrid estimators. For convenience we
hereby let
be a bivariate random variable having the same distribution as the limit obtained in
•N(0,i:P)
(5). Define a function £: RH>- [0,1] as
0 ifa;<0,
i
i ( y\ — • i * IT O •^^ T* •^^ 1
LIX I tMj 1J. \ / " ^ Jb " ^ ^ Xj
1 if a; > 1.
Note that £ is a bounded continuous function on R. Define also
_B[V(V-U)]
l
F =
The next theorem summarizes the main results of this paper.

THEOREM 3. Under assumptions (A 1) to (A 9), the theoretical and empirical optimal
tuning parameters and e-hybrid estimators satisfy the following:
(i) Suppose 7 = 0, that is, a.(Fe ) = a(F), which covers the case where the postulated
parametric model is correct. Then we have, as n-> oo,
en-»ZVF) and 4 -
where Z is a random variable with a %\ distribution. Also, the corresponding e-hybrid
estimators have the following asymptotic distributions,
as n -> oo.
(ii) Suppose y 4= 0, that is, oc(FQ<>) ==
| oc(F). Then we have, as n^- oo,
en = \ + o(n~^) and en = l+
The corresponding e-hybrid estimators satisfy
U and
as n -*• oo.
For y = 0, Theorem 3 (i) shows that our asymptotically optimal e is £(YF). Both the
true and empirical optimal e-hybrid estimates have a rate of convergence of order
O(n~*). While the true optimal Tn ^ is asymptotically normal, its empirical version
Tnin has an asymptotic distribution very close to normal with a slightly greater
dispersion. On the other hand, if y =f= 0, the theoretical optimal tuning parameter
en-> 1 as n^- oo and our asymptotically optimal choice is therefore NB. This sounds
reasonable as we have postulated a wrong model for the underlying distribution and
a non-parametric estimate is therefore intuitively superior. Our empirical optimal en
is consistent with the theoretical optimal en as n tends to infinity. The theoretical and
empirical estimates, Tn ^ and Tnj , are consistent with each other asymptotically
and have the same asymptotic distribution N(0, cr2F) as does NB.
Proof of Theorem 3. Suppose first that y = 0. From (4) and (A 7), we have by
Slutsky's Theorem,
•\/n (6)
0),
yy
Consider MSE^T^) =
This is a simple quadratic polynomial in e and is minimized over ee[0,1] at
e = £„ =
provided the denominator is non-zero, or equivalently, PB and NB coincide with

probability less than 1 under F. We can assume so because the case where PB and
NB are almost everywhere identical is not of interest. The result en->-£(YF) follows
by using (6) and (A 9). Similarly, we obtain an expression for the bootstrap estimate
of en,
(7)
Substituting en and en for e respectively in Tne, we have

Vn[Tnitn-a(F)] = V ^ K K - « „ ) + «,„] (8)
Vn[Tnin-oc(F)] = Vn[en(vn-vn)+a>n]. (9)
Note that all the quantities (7), (8) and (9) are continuous functions of \/n(i>n,a)n),
«E*>J»]. nEfJLy?], nE F >*7/*], VnEpJfr*], and VnEPn[V*] with
(10)
and a.s.
348 STEPHEN M A N SING L E E
The almost sure convergence (10) follows from (A 9) and noting that (A 7) and (5)
together imply
" •
by Slutsky's Theorem.
Hence, using Slutsky's Theorem again on (7), (8) and (9) and putting
JF-t/)*
E[(F-£/) 2 ] Xl
'
the required convergence results for en and the optimal e-hybrid estimators follow
immediately.
Next we look at the case where y ==
t 0. Application of Slutsky's Theorem to (4) and
(A 7) now gives us
By (A 9) we have nv\ and n(wn — y)2 to be uniformly integrable. Therefore « 2 is also

uniformly integrable. Thus
and VnEF[vn(wn-vn)] = EF[Vnvna)n]-EF[Vnv2n] = o(l)
since wn -5- y and \/n vn -> U. Therefore we have
iftK-O2]
The almost sure convergence (10) still holds as the value of y does not affect the
asymptotic behaviour of the random quantities in question. Thus we can deduce
EF-IXK1?*"/"*)] =n~1E[U(V—U)] + o(n~1) a.s.,
^)2] + °(w~1) a.s.,
a.s.,
a.s.
and hence en = 1 +op(n~*),
using expression (7) again. Lastly we consider the actual hybrid estimates Tne and
Tn -B. Clearly,
T
and n,en~c
This completes the whole proof. I
Now we look at a particular case where 0 ^ R with F = Fe , so that our postulated
parametric model is correct. We want to find out what kind of conditions guarantee
the convergence of the optimal en to 0, or in other words, PB being our asymptotically
optimal choice. First we need a lemma as stated below, the proof of which follows
that of Theorem 8-1-3 in [4].
Let ^"(R) be the space of continuous real-valued functions on R with finite limits
at + oo. Equip ^(R) with the sup-norm || • H^.
LEMMA C. Suppose the mapping dt-+Fefrom 0 to ^(R) is continuous. Suppose also
that a is a statistical functional defined on ^(R). Assume a is Hadamard differentiable
at Fg for some 60eQ, such that
(i) Fg is strictly increasing, and
(ii) as 8-+0,
fec,+s~fe—
0 >8
- in <£\Fe),
Sfs, dd
where fe = F'g exists and ^C2(Fg ) is the usual integrated squared norm with respect to the
measure Fg on a suitable space of real functions. Then we have
= o(S).
Proof of Lemma C. Since a is Hadamard differentiable at FR, its derivative

a'Fs : ^(R.) ->• R exists, where
a'Fe{G-Fe) =
Define a function
9(x)= a3 dFgo(t).
By the regularity properties of {Fg}, g is well-defined and ge(£(R.). Now define
g if 8 = 0.
Consider the function space
= {H3:Se[-1,1]}
and the mapping <f>: [— 1, IJ such that <f>(8) = Ht. We claim that <fi is
continuous on [—1,1]. For 8 =t= 0, Hs is obviously continuous in 8 since 6\->Fg is
assumed to be continuous. So <j> is continuous at all 8 =# 0. Moreover, using
the Cauchy-Schwarz inequality and assumption (ii), we have
xeR
S J0n
•o
xeR We.
as 8^-0. Therefore <j> is continuous at 8 = 0 as well and our claim is valid. Thus
K = (f)[— 1,1] is compact. By Hadamard differentiability of a at Fg ,
o+ 8H)-ac(Fgo)-aFe(8H)
•0 as
uniformly for all HeK. Therefore
-0 as£->0.
Hence the result follows. I

The following theorem characterizes a common situation in which PB is the
asymptotically optimal choice.
THEOREM 4. Suppose © £ R and F = Fe . Assume conditions prescribed in Lemma C.
Then we have TF = 0. Hence
7
£„ -»• 0 and £„
Z+l
with Z ~ x\.
Proof of Theorem 4. Using notation given in Theorem 1, we have clearly
JFe(d)=KF$(0),
C r\ r\ C
and pF^ = irFg(x)-^\ogf(x; 6) f(x; 60) dx = ^ xjrFg{x) dFe(x) .
Suppose all the conditions prescribed in Lemma C hold. Then we have
B[V(V-U)] =
dd
y.
according to Lemma C. Therefore Tp = 0. The convergence of the optimal tuning
parameters follows from Theorem 3 (i)°. I
Theorem 4 is a fairly general result since the conditions in Lemma C are usually
fulfilled by reasonable classes of distributions. Clearly, the theoretical optimal Tn e
is asymptotic to PB under the conditions of Theorem 4. However, the empirical
estimate Tn - still has a non-trivial limit distribution.
The case in which a(F) = a(Fg ) and F 4= Fg may give rise to quite different results.
We shall come back to this in Section 3-2 where an example will be given to illustrate
the non-trivial asymptotic behaviour in this case.
2-4-4. Asymptotic MSE of optimal e-hybrid estimators.
Case 1. Suppose that y = 0. According to assumption (A 9) we deduce that both
n\Tn e —a(F)]z and n\Tn • — oc(F)]2 are uniformly integrable. Therefore,
and similarly,
Now we consider three separate cases.
(i) YF ^ 1: Here £(YF) = 1. Both the asymptotic errors satisfy
a
Our asymptotically optimal choice is NB and the empirical version Tn • is consistent

with the true optimal estimate Tn e asymptotically.
(ii) O < T F < 1 : Then £(YF) = YF. Here Tn £f> is asymptotically normal with
variance strictly less than those of U and V since YF is the unique and non-trivial
ee (0,1) which minimizes the asymptotic variance of convex combinations of PB and
NB. The asymptotic distribution and MSE of Tn - are no less complicated than our
previous general results. Note that MSEF(Tn - ) is asymptotically slightly bigger
than MSEi?(71n<eJ owing to the more dispersed asymptotic distribution of Tn-n,
although they both have the same rate of convergence.
(iii) TF ^ 0: Then £(YF) = 0. So now
and
PB is the asymptotically optimal choice. Again there is no simpler corresponding
expression for Tn - .
Case 2. Suppose now y =t= 0. Note that both the theoretical and empirical optimal
e-hybrid estimators have the same asymptotic distribution N(0, crp) from Theorem
3 (ii). We should expect their MSEs to be asymptotic to (rF/n as well. This is certainly
the case for Tn e since with probability 1, Tne satisfies
where -\/n(l — en)-^-0. Therefore it is uniformly integrable and hence
Under the further condition that n(l — en)2 be uniformly integrable, we have also
n[TnJn-a(F)f
to be uniformly integrable, so that
Note that the condition that n(l—en)2 be uniformly integrable is usually satisfied
under stricter moment conditions on the underlying distribution F.
2.5. Summary and comments. We have shown that PB is the asymptotically
optimal estimate under some mild conditions if our proposed parametric model is
correct, as stated in Theorem 4. However the empirical optimal en has a non-
degenerate limit distribution corresponding to the random variable Z/(Z+1), where
Z ~ x\- The density function of Z/(Z+1) is given by the formula
PSP us
o-
0-2 04 0-6 0-8 10
Fig. 1. Density function of (Z/Z+l) where Z -
Figure 1 shows that this function is highly concentrated around zero, and is more or
less evenly distributed over the rest of the interval [0,1] with moderate density. This
means we are quite likely to end up with a non-trivial convex combination of PB and
NB as our hybrid estimate, where PB plays a much more substantial role in the
combination. Nevertheless, MSEF(71n-e) proves to be only slightly bigger than
MSEF(71n e ) and both have the same convergence rate of order O(n~x).
On the other hand, if the proposed parametric model is wrong and a(Fe ) 4= OL{F),
then we have en-§-1 and our empirical estimate Tn -e is exactly consistent with the
true optimal Tne which is asymptotic to NB. Similarly their MSEs are
asymptotically identical.
One nice property of Tn - is that whether or not our suggested parametric model
is correct, it can adjust itself automatically to do the right thing. It should be borne
in mind that the pure and hybrid bootstrap estimates considered so far have the
same convergence rate of order O(ri~^). (In the case of PB, we need the assumption
that our postulated model is correct.) This is due to the fact that these estimates are
all based on the empirical distribution Fn, which is known to be a -\/?i-consistent
estimate off under certain regularity conditions. What the hybrid estimate does is
to reduce the first order MSE size without changing the convergence rate.
There exist particular cases when a(F) = a(Fg ) where Tn e is asymptotic to neither
PB nor NB but a non-trivial convex combination of them. This means Tn . can
sometimes be not just a trade-off but a strictly better estimate than the pure
bootstraps. Section 3-2 gives a particular example to illustrate this point.
Finally, a remark should be made on the computational requirements of our
empirical optimal e-hybrid estimator Tn c- . Recall that the empirical optimal tuning
parameter is given by
In general the functional a cannot be expressed as a directly computable formula.
Bootstrap resampling is usually required to approximate the bootstrap estimates
involved in the above formula for en. The presence of quantities like a(F*) and a(i^»)
generally calls for a double bootstrap procedure which is very time-consuming. There
exist other non-parametric methods of estimating MSEs without the need for
bootstrap resampling. For example, jackknife estimates of standard error and bias
can be used to derive an estimate of MSE F (-). Alternatively, the delta method,
similar to the jackknife, is also applicable here. See [3] for details of these estimation
methods. It is shown that these methods can be regarded in some sense as
approximations to the bootstrap method. We may thus expect some loss of accuracy
from saving the requirement of a second level of bootstrapping.
3. Examples of e-hybrid estimator

Two separate examples are considered here to verify the asymptotic theory of Tn e,
as developed in Section 2-4. Both examples focus on the estimation of variance with
the negative exponential family indexed by the scale parameter 6 as our proposed
underlying parametric model. The second example illustrates the possibility of Tni
being superior to either PB or NB.
Note that in the case where a(F) is the variance of F, there exist closed form
expressions for a(F) so that no double bootstrap procedure is required. Moreover, the
MSE of the corresponding parametric and non-parametric bootstrap estimates of
<x(F) can be expressed explicitly in terms of population moments, which allows us to
compute the MSE exactly. The same is also true for the empirical optimal tuning
parameter en which admits an explicit formula in terms of sample moments.
3-1. Example 1: True distribution is Gamma. LetX lt X 2 ,...,X n be a random sample
drawn from F, taken to be a T(a, d) distribution, whose probability density function
is given by
where a and 6 are the shape and scale parameters respectively. We are interested in
estimating the functional a(F) = Var F (X) with X ~ F. The parametric model
contemplated here is the family of exponential distributions indexed by the scale
parameter 6, that is,
P = {Fe:Fe(x) = l{x>o)(l-e-Ox), 6 > 0}.
The MLE of 6 assuming J* to be correct is easily found to be 0n = l/X. The two
bootstrap estimates are then:
a(FSn) = X\ (PB)
a(.r n ) = — 2 J U ^ I — A ) . (NJD)
Figure 2 shows the ratio of the respective mean squared errors (MSEs) under a
Gamma distribution with varying a and 0 = 1 for different sample sizes n. It is found
that PB outperforms NB if the true distribution is close enough to our proposed
model, which is the case a = 1.
T2-2
1 '/ /
3 •
/ II /
/) = 100 /
1
jl /
1
/ ' ~ X
\
\ I
l // /
00
I
1 \\
\ l /
//
i //
1
1 \
\\ ' 'I/
1 •
I «= 50
"•"•-..
\ if
/ / - 40
n -= 10
0-
00 0-5 10 1-5 20
a
Fig. 2. Example 1. - Ratio of MSE(PB) to MSE(NB) under T(a, 1) for different sample sizes n.
3-1-1. Theoretical asymptotic results. First we apply the asymptotic theory

developed in Section 2-4 to this particular example. Note that for any distributions
F and 0,
a(G) = a(F) +
where xjrF{x) = x2-EF[X2]-2EF[X] (x-EF[X])
Suppose the true underlying distribution F is T(al,dl). The postulated parametric

family is negative exponential so that/(x;0) = 6e~ex for x > 0. Therefore, we have
oc(F) = ajd\ and /?(<?) = a(Fe) = d~2. The MLE of 6 is obtained as 6n = 1/X. The
nearest 6 describing F is taken to be the solution 60 for 6 to the equation
The solution is given by 60 = 01/a1. It is easy to check that assumptions (A 1)-(A 9)

are satisfied here. After a few simple calculations we obtain
= 2a, /a, + 3 2a,

such that ^ ^l2ai 24}
The two cases y = 0 and y =t= 0 correspond to ax = 1 and OL1 4= 1 respectively, since
•= oHF$t)-a(F)=-^
Case 1. Suppose first that ax = 1. The optimal en is given, for large n, by
Wehave Y - 2
We have ^ ~ B[(V-U) ] 4aJ-6af + 6a 1
Therefore we deduce en—>0 and

en—>——-,
Z+ 1
where Z ~ xt- Moreover, the asymptotic behaviour of Tn e and Tn • is given by
9 F
and Vn\Tn - --<™ - " . (^ )
where ^~^0,4<^g J
Case 2. Suppose now that a1 =t= 1. Here en satisfies
for sufficiently large rz,. Therefore we have
A similar argument and the fact that
E^Rem {F*n,Fn) = ^ S ^ " ( ^ £ ^ ) 1 = O^ 1 ) a.s.

imply en =
and Vn [TnJn-a(F)]-^N(0,2a1(a1 + 3) <V).

3-l-2. Simulation results. Here we take #j = 1 so that our parameter of interest has
the value a1. Two hundred data samples of size n were drawn from T(<x1,1), for
n = 10,102,103,104 and ax = 05,1,15,4. For each n and ctx the frequency
distributions of y/n\Tn t — a j and y/n[Tn e- — aj] based on the 200 samples were
formed into a series of histograms. They are compared with the densities of the
theoretical limit distributions. Theory shows that
(i) for 04 = 1,
0-25
0-20
015 •
0 10
005
00-
-10 -5 10
Fig. 3. Example 1. - Comparison of asymptotic distributions of \/n \Tn s — 1]
and y/n[T —1] under V( 1,1).
where
V) Til 4
(U-V)3
Note that Var F + 5-868,
(C/-F)2 +
which is bigger than the asymptotic variance 4 of the normalized optimal Tn e .
Figure 3 compares the densities of the above two limit distributions,
(ii) for txx =f= 1,
Frequency histograms of -\/n\Tn - — ax] corresponding to n = 104 are shown in

Figure 4, together with their respective asymptotic density curves. Results for
\/n [Tn e — a j are fairly similar and so they are not presented here. Observe that the
empirical histograms shown in Figure 4 follow closely their limiting normal or almost
normal density curves. Table 1 lists the sample variances of \/n[Tn e — a j and
•y/n [Tn i — 0Cj] together with their asymptotic variances. Note that the figures for the
sample variances listed in Table 1 were obtained by averaging over two hundred
random samples drawn from F(a 1 ,1). It is found that the experimental results agree
remarkably with the theoretical asymptotic results.
In order to verify our theoretical results further, frequency distributions of en for
<Xj = 1 and n(i —en) for <x1 = 4 were also constructed. Figure 5 shows the situations
for the largest attempted sample size n = 104. Note the close resemblance of the case
a1 = 1 and the density of Z/(Z+ 1) with Z ~ x\ a s shown in Figure 1. Also, in the case
where ax = 4 we find that n(l — en) seems to be bounded in a very small interval.
These observations are all consistent with our asymptotic theory.
a, = 0-5 a, = 1
0-20-1
0 10-
0 10 -
0 0J 0 0J
-10 -5 10 -10 -5 10
a, = 1-5 =4
006 -i
006-
003-
J
00 00 J i i i i ~i i i
-20 -10 0 10 20 -30 -20 -10 0 10 20 30
Asymptotic density
Fig. 4. Example 1.-Frequency histograms of \fn\Tn- — ax] under F(a1,l) for

n = 10000 and at = 0-5,1,1-5,4. The theoretical asymptotic densities are also plotted
for comparison. In the case of ax = 1, the asymptotic density is obtained by simulation
of 105 data points.
Table 1. Sampling results from, Example 1

Sample of variance Sample variance of
10 6-244 5-227
100 5-554 3-657
1000 5-076 3-823
10000 6054 4-392
Asymptotic variance: 5868 4000
0-5 10 1-931 0-804

100 2-410 2-329
1000 3114 3156
10000 3172 3181
Asymptotic variance: 3500 3-500
1-5 10 12-79 16-61

100 13-66 13-57
1000 14-69 13-61
10000 13-11 1304
Asymptotic variance: 1350 13-50
10 37-63 35-80
100 6019 5916
1000 5801 57-86
10000 56-45 56-43
Asymptotic variance: 5600 5600
' 120
100
30
80
20
60
40
10
20
00 02 0-4 0-6 08 10 00 02
I 0-4 0-6
Fig. 5. Example 2. - Frequency histograms of in under F(l, 1) and w(l —en)
under r(4,1) for n = 10000.
Lastly, the performances of Tn e and Tn s are compared with the pure PB and NB
by means of their MSEs, for different sample sizes n over a range of av The MSEs
of Tn e and Tn s were approximated by averaging over two hundred random
samples, whereas the MSEs of PB and NB were computed exactly from their closed
form expressions. Figure 6 demonstrates two typical cases where n = 10 and n = 200.
It is found that the theoretically optimal Tn e performs pretty well uniformly over
a x e[0,2]. The empirical Tn -e is excellent for values of <x1 where NB is favoured. It
is a bit less than optimal when PB is favoured, where its asymptotic variance is
slightly bigger than the optimal one. Generally speaking, Tn . performs reasonably
well over a wide class of underlying distributions uniformly.
3-2. Example 2: True distribution is Normal. This example illustrates a possible
case where F #= Fg and yet a(F) = a(Fg).
Suppose the true underlying distribution F is normal with mean /i and variance fi2.
The contemplated parametric model is still negative exponential indexed by the scale
parameter 8. Our purpose is to estimate oc(F) = Var F [X] = /t2. So we have the same
PB and NB as in Example 1.
3"2-l. Theoretical asymptotic results. As before, we perform the steps necessary for
deriving the asymptotic properties for this example. The influence function i/rG( •)
corresponding to a( •) is the same as in the previous example. Here the nearest 8
describing F is easily found to be 60 = /A,'1, giving a(Fg ) = a(F) = fi2. A little algebra
shows that
n = 10 = 200
i
2-5 t
i 012
PB,1
1
1
1
1
20 1
t
1
1
1
1
1
i /
1
1 A/
IV
1-5
fv
1
j /J
10
0-5 /fy
00 00
00 0-5 10 1-5 20 20
Fig. 6. Example 1. - Comparison of MSEs of Tn in and Tn e with those of PB and NB

under T(a1,1) for n ='l() and 2 F
Suppose now
E[V(V-U)]\ 2
Then we have en^Z 3'
and
Thus we have a non-trivial asymptotically optimal combination of PB and NB. Note

also that
2/44 and
whereas
This shows that the optimal hybrid estimator Tn e actually outperforms both PB
and NB asymptotically in terms of MSEs. As for the asymptotics of our empirical
estimate Tn - , we have
. a 2 + 2/3
Z+l
with Z ~ x\, and
0-25 •
0-20 •
015 •
0 10 •
005
0 0 •
-10 -5 10
Fig. 7. Example 2. - Comparison of asymptotic distributions of y/n\Tnjn — V2],
], V«[NB-v/2] and ^/n[VB-^2] under 2V(2s, 2^)"
0020 \
1
1
',
I
1
n = 100
;
1
I
1
\
\
\
'
PB; 1
\ 1 1
\ ', t
0015
! II
\ \
1
1
0010 •
V• V *
*
11 1
' / • •
I1
\ \\ \ NB^ 1/
"*• \ \ '» 1
1 ' /
/ / •
0 005 -
•• \ \ * 1
jU * T •
",£„
A/- •T
oo-
-10 -0-5 00 0-5 10
Fig. 8. Example 2. - Comparison of MSEs of Tn in and TnH with those of PB and NB

under N(fi,fi2) for n"= 100.
The variance of the above limit distribution is difficult to compute explicitly.

However, we should expect it to lie strictly between 4/t4/3 and 2/i4 = min (2/1*, 4/*4).
Figure 7 compares the densities of the asymptotic distributions of Tn e , Tn e , PB and
NB, for the case where ft* = 2. Clearly, both optimal e-hybrid estimators are better
than either PB or NB. This example convinces us that using a convex combination
of PB and NB can be strictly beneficial in some circumstances.
3-2-2. Simulation results. Five hundred samples of size n were drawn from N(/i, fi2),
for different values of n and fi. The MSEs of Tn e and Tn e were compared with those
of NB and PB. Here MSE(PB) and MSE(NB) were exact, whereas MSE(Tn£n) and
MSE(Tn - ) were approximated by averaging over the 500 samples. The case for
n = 100 is shown in Figure 8. We see the MSE of the hybrid estimator is strictly less
than that of either PB or NB throughout the entire range of /i except at the point
fi = 0. Of course the theoretically optimal Tne performs even better than the
empirical Tn -n, as can be seen from Figure 8.
4. Further discussion and comments

A few questions concerning our e-hybrid approach will be raised and discussed
briefly in this section.
4-1. Restrictions on tuning parameter. The pure and e-hybrid bootstrap estimators
discussed so far can be expressed in the form entxn(Fn) + (l — en)a,n(F^J, where en is
some random quantity bounded within the interval [0,1] based on the given sample
data. In fact, the pure bootstrap estimates correspond to non-random values of en.
There is no particular reason why we must restrict our choice of en to the interval
[0,1]. If e is allowed to assume values in the whole real line, the corresponding
asymptotic results become as follows:
(i) If a(F) = a(Fe), then
and z+rF
' z+i
with Z ~ x\-
(ii) If a(F) ct{Fg), then
Vn(en~: and -0.
The asymptotic distributions of Tnc and Tn - follow in an obvious way. The only
possible trouble occurs when deducing the large sample behaviour of their MSEs
where the property of uniform integrability is made use of. However we note that
both en and en are still bounded asymptotically with probability one. Only when
Y p $[0,1] and a(F) = a{Fe ) should en and en converge to some limit beyond the
range [0,1]. The danger of an unbounded optimal e is thus minimal.
4-2. Dependence of functionals on sample size. The asymptotic theory developed in
Section 2-4 is valid under the assumption that a(F) is independent of n. What
happens if the parameter of interest is of the form an{F) ? This is in fact a rather
common situation. Take an example of the standard deviation of the variance-
stabilized sample correlation coefficient of a bivariate sample, which is defined as
X(Xt-X)(Yt-Y)
i-l
tanh (12)
where {(X1,Y1),(X2,Y2), ...,(Xn,Yn)} is a random sample drawn from a bivariate

distribution F and (X, Y) is the usual sample mean. This quantity depends on the
sample size n in a complicated and implicit manner. Whether the asymptotic theory
still applies here remains an open question.
However we can often rewrite <xn(F) as an asymptotic expansion such as
k
<*n(F) = 2 bi{n)am{F)-\-o{blc(n)) asraôo,
where {aw(F),a(2)(F),...} is a sequence of sample-size free functionals of F and bt(n)

is some distribution-free function of n such that
asn->oo.
It is therefore possible to derive the asymptotic properties corresponding to the

separate a(4)s. The overall asymptotic behaviour of any estimate of <xn(F) may be
obtained by combining these individual results. In fact, sensible conclusions relevant
to estimation of <xn{F) may often be expected to be obtained by study of estimation
of the first order term a{1)(F) alone.
Silverman and Young [9] show that it is indeed the case in the example where <xn(F)
is given by (12). They simplify their study of estimation of (12) via an approximate
formula given by [6, p. 251], namely
an(F) = n-*all)(F) + O(n-*),
where the first order term a ' 1 ' ^ ) admits an explicit expression in terms of moments
of F.
4-3. Other combinations of NB and PB. Obviously Tne = eNB + (l —e)PB is not
the only way of mixing NB with PB. It has been chosen because of its relatively
simple structure which allows for easy derivation of an optimal estimator as well as
its asymptotic properties. Hjort[5] suggests a slightly different intermediate
bootstrap estimate (IB), a(eFn + (1 — e)F§ ). Note that IB reduces to Tn eifa is a linear
functional. How IB differs from Tn e theoretically remains to be studied. It seems
inevitable that the optimal choice of e in IB is much harder to determine because
MSE F (IB) is no longer a simple quadratic polynomial in e, unless a is 'linear' enough.
A direct approach to finding the empirical optimal e for IB is to minimize the
empirical version of MSE, given by
over e in some predetermined interval. This process would typically call for a double
bootstrap procedure, in which double bootstrap resamples are drawn from
eF* + (1 — e)Fg* at some mesh of e values and the empirical optimal e chosen from this
finite set of values, possibly by means of interpolation. The need for double bootstrap
resampling from a variety of distributions of the form eF* + (l— e)Fg* makes the
estimate IB practically less attractive than Tn j which requires double bootstrap
resampling from F* and Fg* only, as has been pointed out in Section 2-5.
4 4 . Criterion of optimality. Apart from the mean squared error, other criteria may
sometimes be necessary, especially when the second moment of the bootstrap
estimate fails to exist. Any measure of estimation error is suitable as a criterion
provided it is well defined, non-parametric, easy to calculate and sensitive to the
difference between NB and PB. It is hoped that the resulting hybrid estimate is
rather stable despite the choice of criterion employed in its construction.
I should like to thank my research supervisor Dr Alastair Young for many helpful
suggestions and discussions on the ideas and results presented in this paper, and Dr
Pat Altham for her advice on many practical areas. This work was carried out whilst
in receipt of a research scholarship from the Croucher Foundation.
REFERENCES
[1] P. J. BICKEL and D. A. FREEDMAN.Some asymptotic theory for the bootstrap. Ann. Statist.
9 (1981), 1196-1217.
[2] B. EFRON. Bootstrap methods: another look at the jackknife. Ann. Statist. 7 (1979), 1-26.
[3] B. EFRON. Jackknife-after-bootstrap standard errors and influence functions. (With dis-
cussion.) J. Roy. Statist. Soc. Ser. B 54 (1992), 83-127.
[4] L. T. FERNHOLZ. vonMises Calculus for Statistical Functionate (Springer-Verlag, 1983).
[5] N. L. HJORT. Contribution to the discussion of David Hinkley's lectures on bootstrapping
techniques. Written version presented at Nordic Conference in Mathematical Statistics.
Scand. J. Statist, to appear.
[6] M. G. KENDALL and A. STUART. The Advanced Theory of Statistics, vol. 1, 4th edn. (Griffin,
1977).
[7] C. LEGER and J. P. ROMANO. Bootstrap choice of tuning parameters. Ann. Inst. Stat.Math. 42
(1990), 709-735.
[8] R. J. SERFLING. Approximation Theorems of Mathematical Statistics (Wiley, 1980).
[9] B. W. SILVERMAN and G. A. YOUNG. The bootstrap: to smooth or not to smooth? Biometrika
74 (1987), 469-479.
View publication stats

Optimal Choice Between Parametric and Non-Parametr

Uploaded by

Copyright:

Available Formats

Optimal Choice Between Parametric and Non-Parametr

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Choice Between Parametric and Non-Parametr

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Optimal choice between parametric and non-parametric bootstrap estimates

Article in Mathematical Proceedings of the Cambridge Philosophical Society · March 1994

The user has requested enhancement of the downloaded file.

Optimal choice between parametric and non-parametric

(Received 23 July 1993; revised 12 October 1993)

2. Theory of e-hybrid estimator

is drawn from an unknown distribution F. We suspect F belongs to a parametric

where A = (atj) is any pxq matrix. Thus, in particular,

2. For any differentiable function g: ft->R with ft £ R p , write

3. F o r any twice differentiable function g: ft-^ R with ft £ R p , let J f g denote its

d2g d2g 82g

we then write Yn\Xv...,Xn-°+Y a.s.

where 1{E} is the indicator function of the event E, then we write

a(H) = a(G) + \fG(x) dH(x) + Rem (H, G),

(A 2) Under the true distribution F, \frF as determined in (A 1) satisfies

1 " -log/(^;8) •0 a.s.

(ii) -^-\iq(X*,Gn)-iq(Xj,Gn))\X1,...,Xn^Nk(O,VavG[q(X, 0)]),

along almost all sample sequences X^X^ ....

Then we have EG[\T(X,G)\]<CO.

Also, we have d^H^H)^-^ as

together imply |a;|d«/n(x)->EG|T(X, G)\ a.s. (t)

Also, Chebychev's inequality shows that

But we know that T(X*,G)\X1, ...,Xn-^+r(X,G) a.s.

using results from the proof of (i). Therefore, by Slutsky's Theorem,

conditional on Xlt ...,Xn a.s. Hence

and KF(%) = Var^ [V log/(Z; 9)1^].

where An(3C, 9) = - S V log/(Xe; 8),

Cn(%)=-'£H(Xi) as defined in (A 4),

and Wn(^;X,B) i s a p x p matrix satisfying

{Wn(X;-k,B)},i = S (A t -0 f c )£, w (aT;M)

Bn(X,B0) = i S ^ l o g / ( X i ; e ) | e o ^ E F Jf log/(Z;9)|eo = - JF(90),

and Cn(3C) = - £ H(Xt)^*EFH{X).

->-0 a.s. as?i->-oo.

since ||^yff|8i|| is bounded iniV(90) by (A 6).

and V^A n (^,9 0 ) = - J - £ {Vlog/(Z4;9)|9o-EF Vlog/(X;9)|9o}

where KF(Q0) = Var F [Vlog/(Z;9)| e J exists by (A 5). Therefore

with Vn\nt-i —+ N2(O,XF).

conditional on 3C converges weakly to a bivariate normal distribution with zero mean

and £„(£•*,§„) | Z 1 , . . . , Z n - ^ E F ^ l o g / ( X ; 8 ) | 9 o = -J F (e o ) a.s.

and V[An(Sr*M) = 0\X1,...,Xn]+l a.s.

Hence Wn(Sr*;B*\)\X1,...,Xn-2+0 a.s.

where Qn(3C*;B*X)\Xlt ...,Xn^^JF(Q0)~1 a.s.

and S* T |^ 1 ,... ! X n -^V/?| e T o J F (8 o )- 1 a.s.

Note that EPW*F(X)+ || V log/(JT; G)|eJ|2] < oo

Vlog/(Xi;8)U • ' - ~~ " " ^

- <L,VFn\~i) 1 _[1 0T\ 1 "

The next theorem summarizes the main results of this paper.

provided the denominator is non-zero, or equivalently, PB and NB coincide with

Substituting en and en for e respectively in Tne, we have

By (A 9) we have nv\ and n(wn — y)2 to be uniformly integrable. Therefore « 2 is also

and VnEF[vn(wn-vn)] = EF[Vnvna)n]-EF[Vnv2n] = o(l)

since wn -5- y and \/n vn -> U. Therefore we have

Proof of Lemma C. Since a is Hadamard differentiable at FR, its derivative

By the regularity properties of {Fg}, g is well-defined and ge(£(R.). Now define

Hence the result follows. I

Suppose all the conditions prescribed in Lemma C hold. Then we have

Our asymptotically optimal choice is NB and the empirical version Tn • is consistent

where -\/n(l — en)-^-0. Therefore it is uniformly integrable and hence

Hence Wn(Sr;B\)\X1,...,Xn-2+0 a.s.

where Qn(3C;BX)\Xlt ...,Xn^^JF(Q0)~1 a.s.