Optimal Choice Between Parametric and Non-Parametr
Optimal Choice Between Parametric and Non-Parametr
Optimal Choice Between Parametric and Non-Parametr
net/publication/232005453
CITATIONS READS
18 810
1 author:
Stephen Lee
The University of Hong Kong
52 PUBLICATIONS 480 CITATIONS
SEE PROFILE
All content following this page was uploaded by Stephen Lee on 10 February 2015.
1. Introduction
The idea of bootstrapping in statistical estimation was put forward by Efron[2].
The bootstrap method can be applied in both parametric and non-parametric
fashions. In the context of estimation of a statistical functional a(F) based on a
random sample of size n from an unknown distribution F, the parametric bootstrap
estimate (PB) and the non-parametric bootstrap estimate (NB) amount to replacing
F by F§ and Fn respectively, where 8 n is some estimate of the index 8 parametrizing
a family {F6} thought to contain F, and Fn is the usual empirical distribution with
equal probability placed on each data point.
In the literature PB and NB have been discussed mainly as two quite separate
procedures. Their relative performance depends largely on correctness of our
assumption of the parametric model. The estimate PB is generally more accurate if
our proposed model is sufficiently close to F, whereas NB is the better choice if the
parametric model is far from correct. The question of which one to use in
circumstances where knowledge of the underlying distribution is vague remains
unanswered. A straightforward criterion to guide our choice between the two
bootstrap estimates is necessary.
Hjort[5] looks briefly at an intermediate bootstrap estimate (IB) obtained by
replacing F with a mixture distribution eF$ +(l — e)Fn, where ee[0,1] is a pseudo-
336 STEPHEN MAN SING L E E
parameter to be determined in some optimal sense. Inspired by Hjort's suggestion we
propose a slightly different e-hybrid estimator, denned as
where ee[0,1] is, in the terminology of [7], a 'tuning parameter'. The asymptotic
properties of the e-hybrid estimator are, unlike Hjort's approach, easy to study.
Moreover, the optimal e for Tn e is known explicitly so that it can be computed
directly, whereas Hjort's estimate (IB) generally requires resampling from a class of
distributions indexed by e in order to determine the approximate value of the
optimal e.
Section 2 gives a full theoretical account of the asymptotic properties of our e-
hybrid estimator. It will be shown that Tne combines the merits of PB and NB in a
robust way. The tuning parameter e, if picked to minimize the mean squared error
of Tn e, results in an optimal hybrid estimator. In practice, e is determined by
minimizing the empirical mean squared error of Tn e. An exact expression of the
optimal e is readily available and its sample version can either be computed directly
or approximated via bootstrap resampling. The optimal properties of Tn e are
demonstrated through two simple examples in Section 3. The second example shows
that Tn e can even be strictly better than either PB or NB, uniformly over a rich class
of distributions. Simulation results are obtained and closely support our theory. A
few open questions are raised in Section 4 about possible generalization of our
procedure.
for ee [0,1].
Suppose now en(F) minimizes the mean squared error (MSE) of Tn e, that is,
i (r n i £ n ( F ) ) = inf F(;,£)
= inf BF{[Tn_e-oc(F)f}.
OsSesSl
Note that the optimal en(F) depends on the unknown distribution F and is therefore
Parametric and non-parametric bootstrap estimates 337
not generally available for evaluation. I t may, however, be estimated by its non-
parametric bootstrap estimate en(Fn). Write
K = en(Fn)
for simplicity. We shall study the asymptotic properties of both the theoretical
optimal e-hybrid estimator Tnc (F) and the empirical optimal e-hybrid estimator
Tn. .
2-2. Notation and assumptions. Before laying down the basic assumptions
underlying the theoretical results to be derived later, we first introduce some
notation:
1. || • || is the usual Euclidean norm denned on the space of real matrices, so that
IMMI = .
< - l J"-1
d2g
dxp dx1 dxp dx2 dxl
4. Let Fn be a random variable such that along almost all sample sequences
XltX2,..., the conditional distribution of Yn given (X1, ...,Xn) converges weakly to a
limit distribution G corresponding to a random variable Y say. Then we write
Yn\Xlt...,Xn-2+Y(oTG) a.s.
where 'a.s.' stands for 'almost surely' as usual.
5. Similarly, if Yn converges in conditional probability to Ygiven (Xlt...,Xn) along
almost all random sequences X1,X2,..., which means given any e > 0,
P[\Yn-Y\>e\X1,...,Xn]^0 a.s.,
with \i/rG(x)dG(x) = 0.
(A 5) Var F [Vlog/(Z;e)| 9o ] and E F [Jf log/(X;0)| e J exist with the latter being
negative definite.
(A 6) Define a function fi: 0 - ^ R by /?(8) = a(Fe). Then ^ is twice differentiable
and || J?7?|| is bounded in a neighbourhood of 80.
(A 7) VnB,em(Fn,F)^^0 8ind Vnnem(F*,Fn)\X1,...,Xn-^Q a.s.
(A 8) Almost sure convergence holds as follows:
a.s.,
1% ||Vlog/(Zi;8)|en-Vlog/(^;e)|eo||2^0 a.s.
where ^ is some convex class of distributions containing G and all point masses. Let Gn
be the empirical distribution of Xj,X 2 , . . . , X n . Given Xx, . . . , X n , let (X*, ...,X*) be a
random sample drawn from Gn. Then along almost all sample sequences XltX2,..., as
oo,
(i) -!-\
Furthermore, if
-SIMX^J-gtX^OH^O a.s.,
then
n
i k i ^ J q i ^ W a.s.
i-\
required for result (ii) is usually satisfied by functionals q{x, G) sufficiently smooth
with respect to G under moment conditions on X.
Sketch proof of Lemma A. The first result (i) can be derived easily from the proof
340 STEPHEN MAN SING LEE
of Theorem 2-1 in [1]. That paper also suggests the second result (ii) for a scalar
functional q. For a general ^-dimensional setting, we can easily show by Chebyshev's
inequality that
X1,...,Xn-^0 a.s.
By the previous result (i), the second term in the above difference converges in
conditional distribution to iVft(0, VarG[gr(X, 6?)]) and hence result (ii) follows. I
LEMMA B. Suppose XY, X2,..., are a sequence of independent random variables drawn
from G on R. Suppose
EG[|X|] < oo.
Let X*, ...,X* be a random sample drawn from Gn, the empirical distribution based on
(Xp ...,Xn). Let T : R X ^ - > R be a statistical functional where <& is a convex class of
distributions containing G and all point masses. Suppose T satisfies
1 "
(i) T(X*
^2
-ilT(X*,G)\X1,...,Xn-^EG[r(X,G)) a.s.
Further, if n
-S a.s.,
then
n
1
(ii) |X 1 ,...,X n -^E G [T(X,G')] a .s.
As in Lemma A, the additional condition stated for Lemma B (ii) is usually satisfied
by sufficiently smooth functionals T(X, G) under moment conditions on X.
Sketch proof of Lemma B. Let r\ be the set of distributions H satisfying the
condition / \x\ dH(x) < oo. Define a metric d^ on r x such that d1(H1,H2) is the infimum
of E|X— Y\ where X, Y have marginal distributions H1,H2 respectively. Without
ambiguity we can write d^Y, Z) = d1(H1,H2), where Y ~ H1 and Z ~ H2. Let ^ < n > be
the distribution of n~x 2?=i Yt with Y1)...,Yn independently distributed from H. Then
for all #!,#,, in r\,
and \\x\dHn{x)^\\x\dH{x).
Parametric and non-parametric bootstrap estimates 341
The details of d1 can be found in [1]. Now take H as the distribution of T(X1, 6) with
X1,X2,..., independently distributed from G. Let Hn be the conditional distribution
of r(Xf,G) with Xfs independently distributed from Gn. Clearly H,HneTl a.s. We
can deduce from the Strong Law of Large Numbers (SLLN) that
d,{H<?>, #<«>) < d^H^H) - 0 a.s.
On the other hand, SLLN and a slightly modified version of the Dominated
Convergence Theorem imply the result
^(tfWEcTpr.G1))-^.
Thus
^(flWElIG)) d(^>ff<">) rf(^<n>EGT(X,G))^O a.s.
and H<nn>^BGT(X,G)
weakly a.s. So result (i) follows.
To prove (ii), we let Jn be the conditional distribution of r(X*,Gn) with X* ~ Gn.
It is clear that Jn e F1 a.s. Note that SLLN and the assumption
T{X*,6n)-T(X*,0)\X1,...,Xn-^0 a.s.
T(X*,Gn)-2+T(X,G)
which represents the error of the pure bootstrap estimates for oc(F). Here ft and 00 are
denned as in (A 3) and (A 6). We shall first prove the following theorem.
THEOREM 1. The joint distribution of
is asymptotically bivariate normal with zero mean and variance covariance matrix
where pF = BF[1/rF(X)Vlogf(X;Q)\%],
Bn(3f,B)=-j:jflogf{Xt;B),
n
i-\
for some |^ ( (^r;X.,8)| < 1, and j,k,l= 1,2,...,p. Take 9 = 90. Then SLLN and
assumptions (A 3), (A 4) and (A 5) imply
AB(£\90) = i i Vlogf(Xt;B)\9t^EPVlogf(X;B)\%
n
= 0,
i-\
Thus, with probability 1 and for large enough n, there exists a sequence 6n
converging to 80 such that A n (#",9j = 0. See [8, 4-2-2, pp. 144-9]. Then we have
iien-eoii[i £i
U-l k-l l-\
for some Q1 with \\Q1 — 90|| < ||9 n — 90||. Without ambiguity we hereby drop the
arguments of Qn(&;Qn,QQ) and An(#",90) for simplicity. So
/(n)/(0) n a.s.
where
= JV2(0,SF), (3)
where S^ is defined in the statement of the theorem. I
344 STEPHEN MAN SING L E E
Applying Theorem 1 to the bivariate error of the pure bootstrap estimates, we
conclude
(Rem(Fn,F)
\a(F%)-a(F)
The exact asymptotic behaviour of the bivariate error depends on correctness of our
proposed parametric model as well as the remainder function Hem(Fn,F) (which
depends on the functional a).
2-4-2. Asymptotic joint bootstrap distribution of pure bootstrap estimates. Next we
consider bootstrap resampled version of the bivariate error of the pure bootstrap
estimates, namely
fa(F*) — a.(F )\ / u* \
'\/n\ " I = \/n\ I.
\CC(FQ*) — oc(Fn)J \V* — vn~\~h)n)
Note that
!/
^ F (X*) \ /RemfJP* F )
J+Vn' "'
A similar argument to that used in Section 2-4-1 yields the bootstrap version of
Theorem 1, given as follows.
THEOREM 2. The joint bootstrap distribution of
^ n ( ) and
Proof of Theorem 2. The proof follows closely that of Theorem 1. Take 8 = 8 n in (1)
which is valid because 6neiV(60) a.s. for sufficiently large n. Also, replace 3C by 3C*.
Thus
for all X.eiV(80). Since 8 n can be seen as a functional of Fn, applying Lemma B(ii)
together with (A 5) and (A 8), we get
)|9o = 0 a.s.
Cn(ar*)\Xlt...,Xn-UEFH(X) a.s.
Parametric and non-parametric bootstrap estimates 345
All these convergences together with the fact that Wn(^*;X,Bn) = 0(||X. — 8n||) a.s.
imply that there exists a sequence 0* such that
/rn()\1,...,Xn^EFirF(X) a.s.
Therefore
-^+N2(0,ZF) a.s.
This completes the proof. I
346 STEPHEN MAN SING L E E
Thus, by Theorem 2, we have a result analogous to (4):
(
n
1 \
_Vit. (X*)\
»i-i " * \\X1,...,Xn-^+N2(0,-ZF) a.s. (5)
/?(§*)-/?(§„)
2-4-3. Asymptotic distribution of optimal e-hybrid estimators. For convenience we
hereby let
be a bivariate random variable having the same distribution as the limit obtained in
•N(0,i:P)
(5). Define a function £: RH>- [0,1] as
0 ifa;<0,
i
i ( y\ — • i * IT O •^^ T* •^^ 1
LIX I tMj 1J. \ / " ^ Jb " ^ ^ Xj
1 if a; > 1.
Note that £ is a bounded continuous function on R. Define also
_B[V(V-U)]
l
F =
en-»ZVF) and 4 -
where Z is a random variable with a %\ distribution. Also, the corresponding e-hybrid
estimators have the following asymptotic distributions,
as n -> oo.
(ii) Suppose y 4= 0, that is, oc(FQ<>) ==
| oc(F). Then we have, as n^- oo,
en = \ + o(n~^) and en = l+
The corresponding e-hybrid estimators satisfy
U and
as n -*• oo.
Parametric and non-parametric bootstrap estimates 347
For y = 0, Theorem 3 (i) shows that our asymptotically optimal e is £(YF). Both the
true and empirical optimal e-hybrid estimates have a rate of convergence of order
O(n~*). While the true optimal Tn ^ is asymptotically normal, its empirical version
Tnin has an asymptotic distribution very close to normal with a slightly greater
dispersion. On the other hand, if y =f= 0, the theoretical optimal tuning parameter
en-> 1 as n^- oo and our asymptotically optimal choice is therefore NB. This sounds
reasonable as we have postulated a wrong model for the underlying distribution and
a non-parametric estimate is therefore intuitively superior. Our empirical optimal en
is consistent with the theoretical optimal en as n tends to infinity. The theoretical and
empirical estimates, Tn ^ and Tnj , are consistent with each other asymptotically
and have the same asymptotic distribution N(0, cr2F) as does NB.
Proof of Theorem 3. Suppose first that y = 0. From (4) and (A 7), we have by
Slutsky's Theorem,
•\/n (6)
0),
yy
Consider MSE^T^) =
This is a simple quadratic polynomial in e and is minimized over ee[0,1] at
e = £„ =
(7)
(10)
and a.s.
348 STEPHEN M A N SING L E E
The almost sure convergence (10) follows from (A 9) and noting that (A 7) and (5)
together imply
" •
by Slutsky's Theorem.
Hence, using Slutsky's Theorem again on (7), (8) and (9) and putting
JF-t/)*
E[(F-£/) 2 ] Xl
'
the required convergence results for en and the optimal e-hybrid estimators follow
immediately.
Next we look at the case where y ==
t 0. Application of Slutsky's Theorem to (4) and
(A 7) now gives us
iftK-O2]
The almost sure convergence (10) still holds as the value of y does not affect the
asymptotic behaviour of the random quantities in question. Thus we can deduce
EF-IXK1?*"/"*)] =n~1E[U(V—U)] + o(n~1) a.s.,
^)2] + °(w~1) a.s.,
a.s.,
a.s.
and hence en = 1 +op(n~*),
using expression (7) again. Lastly we consider the actual hybrid estimates Tne and
Tn -B. Clearly,
T
and n,en~c
This completes the whole proof. I
Now we look at a particular case where 0 ^ R with F = Fe , so that our postulated
parametric model is correct. We want to find out what kind of conditions guarantee
the convergence of the optimal en to 0, or in other words, PB being our asymptotically
optimal choice. First we need a lemma as stated below, the proof of which follows
that of Theorem 8-1-3 in [4].
Parametric and non-parametric bootstrap estimates 349
Let ^"(R) be the space of continuous real-valued functions on R with finite limits
at + oo. Equip ^(R) with the sup-norm || • H^.
LEMMA C. Suppose the mapping dt-+Fefrom 0 to ^(R) is continuous. Suppose also
that a is a statistical functional defined on ^(R). Assume a is Hadamard differentiable
at Fg for some 60eQ, such that
(i) Fg is strictly increasing, and
(ii) as 8-+0,
fec,+s~fe—
0 >8
- in <£\Fe),
Sfs, dd
where fe = F'g exists and ^C2(Fg ) is the usual integrated squared norm with respect to the
measure Fg on a suitable space of real functions. Then we have
= o(S).
a'Fe{G-Fe) =
Define a function
9(x)= a3 dFgo(t).
g if 8 = 0.
Consider the function space
= {H3:Se[-1,1]}
and the mapping <f>: [— 1, IJ such that <f>(8) = Ht. We claim that <fi is
continuous on [—1,1]. For 8 =t= 0, Hs is obviously continuous in 8 since 6\->Fg is
assumed to be continuous. So <j> is continuous at all 8 =# 0. Moreover, using
the Cauchy-Schwarz inequality and assumption (ii), we have
xeR
S J0n
•o
xeR We.
as 8^-0. Therefore <j> is continuous at 8 = 0 as well and our claim is valid. Thus
K = (f)[— 1,1] is compact. By Hadamard differentiability of a at Fg ,
o+ 8H)-ac(Fgo)-aFe(8H)
•0 as
350 STEPHEN MAN SING L E E
uniformly for all HeK. Therefore
-0 as£->0.
B[V(V-U)] =
dd
y.
according to Lemma C. Therefore Tp = 0. The convergence of the optimal tuning
parameters follows from Theorem 3 (i)°. I
Theorem 4 is a fairly general result since the conditions in Lemma C are usually
fulfilled by reasonable classes of distributions. Clearly, the theoretical optimal Tn e
is asymptotic to PB under the conditions of Theorem 4. However, the empirical
estimate Tn - still has a non-trivial limit distribution.
The case in which a(F) = a(Fg ) and F 4= Fg may give rise to quite different results.
We shall come back to this in Section 3-2 where an example will be given to illustrate
the non-trivial asymptotic behaviour in this case.
2-4-4. Asymptotic MSE of optimal e-hybrid estimators.
Case 1. Suppose that y = 0. According to assumption (A 9) we deduce that both
n\Tn e —a(F)]z and n\Tn • — oc(F)]2 are uniformly integrable. Therefore,
and similarly,
Parametric and non-parametric bootstrap estimates 351
Now we consider three separate cases.
(i) YF ^ 1: Here £(YF) = 1. Both the asymptotic errors satisfy
a
and
PB is the asymptotically optimal choice. Again there is no simpler corresponding
expression for Tn - .
Case 2. Suppose now y =t= 0. Note that both the theoretical and empirical optimal
e-hybrid estimators have the same asymptotic distribution N(0, crp) from Theorem
3 (ii). We should expect their MSEs to be asymptotic to (rF/n as well. This is certainly
the case for Tn e since with probability 1, Tne satisfies
Under the further condition that n(l — en)2 be uniformly integrable, we have also
n[TnJn-a(F)f
to be uniformly integrable, so that
Note that the condition that n(l—en)2 be uniformly integrable is usually satisfied
under stricter moment conditions on the underlying distribution F.
2.5. Summary and comments. We have shown that PB is the asymptotically
optimal estimate under some mild conditions if our proposed parametric model is
correct, as stated in Theorem 4. However the empirical optimal en has a non-
degenerate limit distribution corresponding to the random variable Z/(Z+1), where
Z ~ x\- The density function of Z/(Z+1) is given by the formula
PSP us
352 STEPHEN MAN SING L E E
o-
0-2 04 0-6 0-8 10
Fig. 1. Density function of (Z/Z+l) where Z -
Figure 1 shows that this function is highly concentrated around zero, and is more or
less evenly distributed over the rest of the interval [0,1] with moderate density. This
means we are quite likely to end up with a non-trivial convex combination of PB and
NB as our hybrid estimate, where PB plays a much more substantial role in the
combination. Nevertheless, MSEF(71n-e) proves to be only slightly bigger than
MSEF(71n e ) and both have the same convergence rate of order O(n~x).
On the other hand, if the proposed parametric model is wrong and a(Fe ) 4= OL{F),
then we have en-§-1 and our empirical estimate Tn -e is exactly consistent with the
true optimal Tne which is asymptotic to NB. Similarly their MSEs are
asymptotically identical.
One nice property of Tn - is that whether or not our suggested parametric model
is correct, it can adjust itself automatically to do the right thing. It should be borne
in mind that the pure and hybrid bootstrap estimates considered so far have the
same convergence rate of order O(ri~^). (In the case of PB, we need the assumption
that our postulated model is correct.) This is due to the fact that these estimates are
all based on the empirical distribution Fn, which is known to be a -\/?i-consistent
estimate off under certain regularity conditions. What the hybrid estimate does is
to reduce the first order MSE size without changing the convergence rate.
There exist particular cases when a(F) = a(Fg ) where Tn e is asymptotic to neither
PB nor NB but a non-trivial convex combination of them. This means Tn . can
sometimes be not just a trade-off but a strictly better estimate than the pure
bootstraps. Section 3-2 gives a particular example to illustrate this point.
Finally, a remark should be made on the computational requirements of our
empirical optimal e-hybrid estimator Tn c- . Recall that the empirical optimal tuning
parameter is given by
Parametric and non-parametric bootstrap estimates 353
In general the functional a cannot be expressed as a directly computable formula.
Bootstrap resampling is usually required to approximate the bootstrap estimates
involved in the above formula for en. The presence of quantities like a(F*) and a(i^»)
generally calls for a double bootstrap procedure which is very time-consuming. There
exist other non-parametric methods of estimating MSEs without the need for
bootstrap resampling. For example, jackknife estimates of standard error and bias
can be used to derive an estimate of MSE F (-). Alternatively, the delta method,
similar to the jackknife, is also applicable here. See [3] for details of these estimation
methods. It is shown that these methods can be regarded in some sense as
approximations to the bootstrap method. We may thus expect some loss of accuracy
from saving the requirement of a second level of bootstrapping.
where a and 6 are the shape and scale parameters respectively. We are interested in
estimating the functional a(F) = Var F (X) with X ~ F. The parametric model
contemplated here is the family of exponential distributions indexed by the scale
parameter 6, that is,
P = {Fe:Fe(x) = l{x>o)(l-e-Ox), 6 > 0}.
The MLE of 6 assuming J* to be correct is easily found to be 0n = l/X. The two
bootstrap estimates are then:
a(FSn) = X\ (PB)
a(.r n ) = — 2 J U ^ I — A ) . (NJD)
Figure 2 shows the ratio of the respective mean squared errors (MSEs) under a
Gamma distribution with varying a and 0 = 1 for different sample sizes n. It is found
that PB outperforms NB if the true distribution is close enough to our proposed
model, which is the case a = 1.
T2-2
354 STEPHEN MAN SING L E E
1 '/ /
3 •
/ II /
/) = 100 /
1
jl /
1
/ ' ~ X
\
\ I
l // /
00
I
1 \\
\ l /
//
i //
1
1 \
\\ ' 'I/
1 •
I «= 50
"•"•-..
\ if
/ / - 40
n -= 10
0-
00 0-5 10 1-5 20
a
Fig. 2. Example 1. - Ratio of MSE(PB) to MSE(NB) under T(a, 1) for different sample sizes n.
•= oHF$t)-a(F)=-^
Parametric and non-parametric bootstrap estimates 355
Case 1. Suppose first that ax = 1. The optimal en is given, for large n, by
Wehave Y - 2
We have ^ ~ B[(V-U) ] 4aJ-6af + 6a 1
9 F
and Vn\Tn - --<™ - " . (^ )
where ^~^0,4<^g J
Case 2. Suppose now that a1 =t= 1. Here en satisfies
0-20
015 •
0 10
005
00-
-10 -5 10
Fig. 3. Example 1. - Comparison of asymptotic distributions of \/n \Tn s — 1]
and y/n[T —1] under V( 1,1).
where
V) Til 4
(U-V)3
Note that Var F + 5-868,
(C/-F)2 +
which is bigger than the asymptotic variance 4 of the normalized optimal Tn e .
Figure 3 compares the densities of the above two limit distributions,
(ii) for txx =f= 1,
0 10-
0 10 -
0 0J 0 0J
-10 -5 10 -10 -5 10
a, = 1-5 =4
006 -i
006-
003-
J
00 00 J i i i i ~i i i
-20 -10 0 10 20 -30 -20 -10 0 10 20 30
Asymptotic density
10 6-244 5-227
100 5-554 3-657
1000 5-076 3-823
10000 6054 4-392
Asymptotic variance: 5868 4000
10 37-63 35-80
100 6019 5916
1000 5801 57-86
10000 56-45 56-43
Asymptotic variance: 5600 5600
358 STEPHEN MAN SING LEE
' 120
100
30
80
20
60
40
10
20
00 02 0-4 0-6 08 10 00 02
I 0-4 0-6
Fig. 5. Example 2. - Frequency histograms of in under F(l, 1) and w(l —en)
under r(4,1) for n = 10000.
Lastly, the performances of Tn e and Tn s are compared with the pure PB and NB
by means of their MSEs, for different sample sizes n over a range of av The MSEs
of Tn e and Tn s were approximated by averaging over two hundred random
samples, whereas the MSEs of PB and NB were computed exactly from their closed
form expressions. Figure 6 demonstrates two typical cases where n = 10 and n = 200.
It is found that the theoretically optimal Tn e performs pretty well uniformly over
a x e[0,2]. The empirical Tn -e is excellent for values of <x1 where NB is favoured. It
is a bit less than optimal when PB is favoured, where its asymptotic variance is
slightly bigger than the optimal one. Generally speaking, Tn . performs reasonably
well over a wide class of underlying distributions uniformly.
3-2. Example 2: True distribution is Normal. This example illustrates a possible
case where F #= Fg and yet a(F) = a(Fg).
Suppose the true underlying distribution F is normal with mean /i and variance fi2.
The contemplated parametric model is still negative exponential indexed by the scale
parameter 8. Our purpose is to estimate oc(F) = Var F [X] = /t2. So we have the same
PB and NB as in Example 1.
3"2-l. Theoretical asymptotic results. As before, we perform the steps necessary for
deriving the asymptotic properties for this example. The influence function i/rG( •)
corresponding to a( •) is the same as in the previous example. Here the nearest 8
describing F is easily found to be 60 = /A,'1, giving a(Fg ) = a(F) = fi2. A little algebra
shows that
Parametric and non-parametric bootstrap estimates 359
n = 10 = 200
i
2-5 t
i 012
PB,1
1
1
1
1
20 1
t
1
1
1
1
1
i /
1
1 A/
IV
1-5
fv
1
j /J
10
0-5 /fy
00 00
00 0-5 10 1-5 20 20
Suppose now
E[V(V-U)]\ 2
Then we have en^Z 3'
and
whereas
This shows that the optimal hybrid estimator Tn e actually outperforms both PB
and NB asymptotically in terms of MSEs. As for the asymptotics of our empirical
estimate Tn - , we have
. a 2 + 2/3
Z+l
with Z ~ x\, and
360 STEPHEN MAN SING LEE
0-25 •
0-20 •
015 •
0 10 •
005
0 0 •
-10 -5 10
Fig. 7. Example 2. - Comparison of asymptotic distributions of y/n\Tnjn — V2],
], V«[NB-v/2] and ^/n[VB-^2] under 2V(2s, 2^)"
0020 \
1
1
',
I
1
n = 100
;
1
I
1
\
\
\
'
PB; 1
\ 1 1
\ ', t
0015
! II
\ \
1
1
0010 •
V• V *
*
11 1
' / • •
I1
\ \\ \ NB^ 1/
"*• \ \ '» 1
1 ' /
/ / •
0 005 -
•• \ \ * 1
jU * T •
",£„
A/- •T
oo-
-10 -0-5 00 0-5 10
and z+rF
' z+i
with Z ~ x\-
(ii) If a(F) ct{Fg), then
The asymptotic distributions of Tnc and Tn - follow in an obvious way. The only
possible trouble occurs when deducing the large sample behaviour of their MSEs
where the property of uniform integrability is made use of. However we note that
both en and en are still bounded asymptotically with probability one. Only when
Y p $[0,1] and a(F) = a{Fe ) should en and en converge to some limit beyond the
range [0,1]. The danger of an unbounded optimal e is thus minimal.
4-2. Dependence of functionals on sample size. The asymptotic theory developed in
Section 2-4 is valid under the assumption that a(F) is independent of n. What
happens if the parameter of interest is of the form an{F) ? This is in fact a rather
common situation. Take an example of the standard deviation of the variance-
stabilized sample correlation coefficient of a bivariate sample, which is defined as
X(Xt-X)(Yt-Y)
i-l
tanh (12)
asn->oo.
over e in some predetermined interval. This process would typically call for a double
bootstrap procedure, in which double bootstrap resamples are drawn from
eF* + (1 — e)Fg* at some mesh of e values and the empirical optimal e chosen from this
finite set of values, possibly by means of interpolation. The need for double bootstrap
resampling from a variety of distributions of the form eF* + (l— e)Fg* makes the
estimate IB practically less attractive than Tn j which requires double bootstrap
resampling from F* and Fg* only, as has been pointed out in Section 2-5.
4 4 . Criterion of optimality. Apart from the mean squared error, other criteria may
sometimes be necessary, especially when the second moment of the bootstrap
estimate fails to exist. Any measure of estimation error is suitable as a criterion
Parametric and non-parametric bootstrap estimates 363
provided it is well defined, non-parametric, easy to calculate and sensitive to the
difference between NB and PB. It is hoped that the resulting hybrid estimate is
rather stable despite the choice of criterion employed in its construction.
I should like to thank my research supervisor Dr Alastair Young for many helpful
suggestions and discussions on the ideas and results presented in this paper, and Dr
Pat Altham for her advice on many practical areas. This work was carried out whilst
in receipt of a research scholarship from the Croucher Foundation.
REFERENCES
[1] P. J. BICKEL and D. A. FREEDMAN.Some asymptotic theory for the bootstrap. Ann. Statist.
9 (1981), 1196-1217.
[2] B. EFRON. Bootstrap methods: another look at the jackknife. Ann. Statist. 7 (1979), 1-26.
[3] B. EFRON. Jackknife-after-bootstrap standard errors and influence functions. (With dis-
cussion.) J. Roy. Statist. Soc. Ser. B 54 (1992), 83-127.
[4] L. T. FERNHOLZ. vonMises Calculus for Statistical Functionate (Springer-Verlag, 1983).
[5] N. L. HJORT. Contribution to the discussion of David Hinkley's lectures on bootstrapping
techniques. Written version presented at Nordic Conference in Mathematical Statistics.
Scand. J. Statist, to appear.
[6] M. G. KENDALL and A. STUART. The Advanced Theory of Statistics, vol. 1, 4th edn. (Griffin,
1977).
[7] C. LEGER and J. P. ROMANO. Bootstrap choice of tuning parameters. Ann. Inst. Stat.Math. 42
(1990), 709-735.
[8] R. J. SERFLING. Approximation Theorems of Mathematical Statistics (Wiley, 1980).
[9] B. W. SILVERMAN and G. A. YOUNG. The bootstrap: to smooth or not to smooth? Biometrika
74 (1987), 469-479.