287
Statistica Neerlandica (1999) Vol. 53, nr. 3, pp. 287±308
On piecewise linear density estimators
J. Beirlant*
Department of Mathematics, Katholieke Universiteit Leuven,
Celestijnenlaan 200B, B-300 Leuven, Belgium
A. Berlinet
Universite Montpellier II, Place Bataillon, 34095 Montpellier Cedex,
France
L. GyoÈr®
Department of Mathematics and Computer Science, Technical
University of Budapest, 1521 Stoczek u. 2, Budapest, Hungary
We study piecewise linear density estimators from the L1 point of view:
the frequency polygons investigated by SCOTT (1985) and JONES et al.
(1997), and a new piecewise linear histogram. In contrast to the earlier
proposals, a unique multivariate generalization of the new piecewise
linear histogram is available. All these estimators are shown to be
universally L1 strongly consistent. We derive large deviation inequalities.
For twice dierentiable densities with compact support their expected
L1 error is shown to have the same rate of convergence as have kernel
density estimators. Some simulated examples are presented.
Key Words and Phrases: nonparametric density estimation, histogram,
asymptotics.
1
Introduction
We consider the problem of estimating consistently an unknown probability density
function in L1 from a sample of independent and identically distributed (i.i.d.)
random variables by means of histogram-based estimators. The histogram is attractive from a computational point of view, but for twice dierentiable densities it has a
worse rate of convergence than the kernel density estimator. The frequency polygon
(SCOTT, 1985a), the edge frequency polygon (JONES et al., 1998) and the piecewise
linear histogram introduced next combine the two advantages: they are computationally ecient (quickly evaluated and updated) and therefore well adapted to online high data speed signal processing. They also have a good rate of convergence for
smooth densities.
In order to estimate a univariate density f we are given X1 , . . . , Xn , a sequence of
i.i.d. random variables with common density f. Let P fAnj g and R fAnj ; A
nj g be
* e-mail:
[email protected]
# VVS, 1999. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA.
288
Beirlant, Berlinet and GyoÈr®
partitions of the real line by ®nite intervals such that Anj ; Anj ; A
nj have respective
.
Let
m
denote
the
empirical
measure
lengths hnj ; hnj =2; hnj =2, and Anj Anj [ A
nj
n
associated with the sample, so that
#fi : Xi 2 Ag
n
mn A
for any Borel subset A on the real line. Let g^ n and gn stand for the histograms de®ned
by the partitions P and R respectively, and let x 2 R. If Anj and Bnj denote the sets of
the partitions P and R to which x belongs we have
g^ n x
mn Anj
and
hnj
gn x
2mn Bnj
hnj
By the de®nition of the histograms
gn
g^ n
Anj
Anj
gn
A
nj
so that if anj ; anj ; a
nj are the centers of Anj ; Anj ; Anj respectively, then the three points
Onj anj ; gn anj , Onj anj ; g^n anj , Onj anj ; gn a
nj are contained in one line,
and this line de®nes the piecewise linear histogram f^PL on Anj (see Figure 1):
2
x
f^PL x g^n anj
hnj
anj gn ajn
gn anj
for x 2 Anj . Remark that
g^ n anj
gn a
jn gn anj
2
Fig. 1. The histogram and the piecewise linear histogram on the cell Anj .
# VVS, 1999
1
289
Piecewise linear density estimators
Fig. 2. The frequency polygon between anj and an;j1 .
The frequency polygon f^P is based on the partition P and is de®ned by the broken
line joining the neighboring points Onj (see Figure 2). In the case hn hnj for all j, we
have
1
^f x 1 g^ a g^ a
x
P
n n; j1
2 n nj
hn
hn
2
anj
g^ n an; j1
g^ n anj
2
^
for x 2 A
nj [ An; j1 . The L2 properties of fP have been studied by SCOTT (1985a).
^
Finally the edge frequency polygon fEP , introduced by JONES et al. (1998), connects
values at the bin edges by straight lines, those values being averages of contiguous
histogram ordinates:
2
x
f^EP x
hnj
anj
1
g^ n an; j
4
g^ n an; j1
1
g^ n an; j
4
1
3
2g^ n anj g^ n an; j1
for x 2 Anj (see Figure 3). The L2 properties of f^EP have been studied by JONES et al.
(1998).
Frequency polygons have been widely used and their practical behavior is well
known, although theoretical questions about them are still unanswered. As will be
shown in this paper the piecewise linear histograms share with frequency and edge
frequency polygons nice rates of convergence. So these kinds of piecewise linear
estimates require further theoretical investigations to get ®nite distance comparisons
and automatic methods to select the number of cells. Thorough simulation studies
should be carried out. Here we show a few examples to illustrate the practical behavior
of the estimators. This will give some idea of the behavior of the piecewise linear
histogram compared with the standard histogram, the frequency polygon and the
edge frequency polygon.
# VVS, 1999
290
Beirlant, Berlinet and GyoÈr®
Fig. 3. The edge frequency polygon on Anj .
Two samples of 200 pseudo i.i.d. random numbers were generated. The ®rst one
with the standard exponential distribution, the second with the normal N 0; 1
distribution. In the ®rst case we built the histogram with four equi-spaced cells on
[0, 5] together with the associated estimators. The results are shown in Figures 4, 5
and 6.
In the second case the histogram was built with four equi-spaced cells on [ 4, 4].
Results are shown in Figures 7, 8 and 9. In these ®gures the underlying density and the
histogram are in thin lines and the density estimate in thick lines. These two examples
were selected because they exhibit some characteristic features of piecewise linear
histograms: some capability of correcting some problems of functional estimates such
Fig. 4. Exponential density, histogram and frequency polygon.
# VVS, 1999
Piecewise linear density estimators
291
Fig. 5. Exponential density, histogram and edge frequency polygon.
Fig. 6. Exponential density, histogram and piecewise linear histogram.
as edge eects, underestimation of peaks and overestimation of local minima. They
are discontinuous like histograms and can be negative. f^PL is a density if and only if in
^
any cell onehas gn a
nj 3gn anj and gn anj 3gn anj . However, fPL has integral 1
^
^
because of fPL gn 1, whereas the frequency polygons fP and f^EP are nonnegative and continuous but do not integrate to one in general.
In order to diminish the sensitivity with respect to the choice of origin or left
endpoint of the ®rst cell an average shifted piecewise linear histogram can be constructed averaging m linear histograms with slightly shifted origins, following the idea
of average shifted histograms as proposed in SCOTT (1985b). In Figure 10 we propose
# VVS, 1999
292
Beirlant, Berlinet and GyoÈr®
Fig. 7. Normal density, histogram and frequency polygon.
Fig. 8. Normal density, histogram and edge frequency polygon.
the result of an average shifted piecewise histogram for the normal density of
Figures 7±9 with the same h 1, m 10 with origins set at 4 i=10, 0 i 9.
Unlike frequency polygons and edge frequency polygons for which dierent
multivariate extensions exist (see e.g. section 4.2 in SCOTT, 1992), the extension of the
piecewise linear histogram is unique and rather straightforward.
Let P fAnj g be the partition of Rd by rectangles, i.e.
1
2
d
Anj Inj Inj Inj
# VVS, 1999
293
Piecewise linear density estimators
Fig. 9. Normal density, histogram and piecewise linear histogram.
Fig. 10. Normal density and average shifted piecewise linear histogram.
where Injk k 1; . . . ; d are intervals. Decompose the interval Injk into the union of
two disjoint intervals Injk and Injk of equal size and put
Anjk
Inj1 Injk Injd
and
k
Anj
# VVS, 1999
1
k
Inj Inj
d
Inj
294
Beirlant, Berlinet and GyoÈr®
Then
k
k
Rk fAnj ; Anj g
are partitions of Rd . Let g^n and gnk be histograms for the partitions P and Rk
respectively based on a pure random sample of size n.
Let anj , anjk and anjk be the centers of the rectangles Anj , Anjk and Anjk
respectively, then with x x1 ; x2 ; . . . ; xd and anj cnj1 ; cnj2 ; . . . ; cnjd the multivariate piecewise linear histogram is de®ned by
f^PL x g^ n anj 2
d x
X
k
k1
cnjk
jInjk j
k
k
fgn anj
k
k
gn anj g;
for x 2 Anj
where j I j denotes the length of an interval I. In Figure 11 the piecewise linear
histogram estimator of a bivariate density is shown. It is built from 200 simulated
values of couples of independent variables with density N 0; 1. The rectangle
xmin ; xmax ymin ; ymax was divided into 6 6 cells with the same size.
If f and g are probability densities then their L1 distance is de®ned by
kf
gk
j f x
g xj dx
jf
gj
In the remainder of the paper we investigate the L1 properties of these dierent
piecewise linear density estimators in more detail. We prove their universal strong L1
consistency and give large deviation inequalities in case of non-uniform partitions.
Furthermore in case of uniform partitions we show that, for twice dierentiable
densities, they have the same rate of convergence as the kernel estimate; this in spite of
the fact that the piecewise linear histogram does not use any information from the
neighboring bins.
Fig. 11. Piecewise linear histogram estimate of a bivariate normal density.
# VVS, 1999
295
Piecewise linear density estimators
In section 2 we present consistency, large deviation and rate of convergence results
for the estimators. Section 3 contains the proofs of these results.
2
Theoretical results
We ®rst state the consistency of the piecewise linear histogram and the polygon.
THEOREM 1 If for any ®nite interval S centered at the origin
lim max h
n!1 f j:Anj Sg nj
0
4
and
lim
n!1
#f j : Anj Sg
n
0
5
then
lim k f
n!1
f^PL k 0
a:s:
THEOREM 2 For any positive integer n, let
hn inf hnj
j
and
Hn sup hnj
j
Suppose that
8n 4 0;
Hn =hn C 5 1
6
and that
lim Hn 0 and
n!1
lim nhn 1
n!1
Then
lim k f
n!1
f^P k 0 a:s:
We note that a consistency theorem for the edge frequency polygon is readily
obtained when hn hnj for all j, as in this case for all Anj
kg^ n
f^EP k 2
Anj
kg^ n
f^P k
Anj
Similarly to the regular histogram and to the kernel density estimator it is possible to
prove non-asymptotic large deviation inequalities for the centered L1 error:
# VVS, 1999
k f^
fk
Ek f^
fk
296
Beirlant, Berlinet and GyoÈr®
THEOREM 3 For any partition P, any density f and any e 4 0,
P jjj f^PL
Ek f^PL
fk
f jjj 4 e 2 exp
2
8ne =25
and
Var k f^PL
25 1
16 n
fk
Note that Theorem 3 has no conditions on the partitions, so it holds even for nonconsistent locally linear histogram. For regular histogram and kernel density estimate
similar results were proved in DEVROYE (1991) such that
Var kg^n
fk
1
n
The increase of the upper bound on the variance here is due to the fact that the
piecewise linear histogram may take negative values, too.
THEOREM 4 For any partition P satisfying condition (6), any density f and any e 4 0,
P jjj f^P
fk
Ek f^P
f jjj 4 e 2 exp
2
2
2ne = C 1
and
Var k f^P
fk
C 12
4n
For a uniform partition we have C 1. Therefore Theorem 4 gives the same bounds
as for the histogram and the kernel estimates in case of a uniform partition.
A large deviation theorem for the edge frequency polygon can be derived when
hn hnj for all j. In this case the fundamental Lemma 2 (see section 3) holds with
ci 4=n.
Given any sequence of density estimators ff^n g the rate of convergence of the
expected L1 error Ek f f^n k can be arbitrary slow (DEVROYE, 1983). Therefore in
order to get a rate of convergence for the expected L1 error we need smoothness and
tail conditions on f.
In the sequel we assume that hn hnj for all j. Then the consistency conditions (1)
and (2) are as follows:
lim h
n!1 n
# VVS, 1999
0;
lim nhn 1
n!1
7
297
Piecewise linear density estimators
The expectation of the L1-error of the kernel density estimator f^K;n with kernel K was
already given in DEVROYE and GYOÈRFI (1985, Section 5.1) in case of twice
dierentiable densities f:
E k f^K;n
s
!
!
p
nh5n bj f00 j
a f
1
2
fk p c
o hn p
f 2a
nhn
nhn
!
r p
f bh2n _00
2a
1
2
p
jf j o hn p
2
p nhn
nhn
8
p
K2 ,
where c a EjZ aj, Z is a normal N 0; 1 random variable and a
2
b x K x dx.
The following theorems specify the rate of convergence of the expected L1-error for
the density estimators de®ned above.
THEOREM 5 If the density f has a compact support S in R and is twice continuously
dierentiable, then as n ! 1, hn ! 0 and nhn ! 1
E k f^PL
rp
p
p
f
5 1
2
log 2 5 p
fk
p 2
4
nhn
Z
h2 2
E n jU
8
1 00
kf j
3
!
p !
p
f
2
jZj 1 4U p
nhn
1
2
o hn p
nhn
rp
p
p
f
h2n
5 1
2
j f00 j
log 2 5 p p
p 2
4
nhn 18 3
!
1
2
o hn p
nhn
where Z and U are independent random variables, Z normal N 0; 1 distributed and U
uniformly distributed on [ 1, 1].
If hn cn 1=5 for some c 4 0 then
lim sup n
n!1
# VVS, 1999
2=5
E k f^PL
rp
p
p
f
5 1
2
c2
00
log 2 5 p p j f j
fk
p 2
4
c
18 3
298
Beirlant, Berlinet and GyoÈr®
THEOREM 6 If the density f has a compact support S in R and is twice continuously
dierentiable, then as n ! 1, hn ! 0 and nhn ! 1
p
p
f
1 p
^
p
E k fP fk
2 log 1 2 p
2 p
nhn
p !
p 2f
1
U2
2 1
00
2
E hn
j f j jZj 1 U p
2
3
4
nhn
!
1
2
o hn p
nhn
p
p
f
h2
1 p
00
p 2 log 1 2 p n j f j
8
2 p
nhn
!
1
2
o hn p
nhn
where Z and U are independent random variables, Z normal N 0; 1 distributed and U
uniformly distributed on [ 1, 1].
If hn cn 1=5 for some c 4 0 then
p
p
f c2
1 p
2=5
^
lim sup n E k fP fk p 2 log 1 2 p
j f00 j
8
2 p
c
n!1
THEOREM 7 If the density f has a compact support S in R and is twice continuously
dierentiable, then as n ! 1, hn ! 0 and nhn ! 1
p
f
3
4
log 3 p
E k f^EP fk p
8 p 3
nhn
r p !
f
h2n 22
3 1 2
2
00
U j f j jZj
U p
E
8 3
8 8
nhn
!
1
2
o hn p
nhn
p
f
7h2
1
00
p 4 3 log 3 p n j f j
8
8 p
nhn
!
1
2
o hn p
nhn
where Z and U are independent random variables, Z normal N 0; 1 distributed and U
uniformly distributed in [ 1, 1].
# VVS, 1999
Piecewise linear density estimators
If hn cn
1=5
299
for some c 4 0 then
lim sup n
n!1
2=5
E k f^EP
p
f 7c2
1
00
jf j
fk p 4 3 log 3 p
8
8 p
c
The given upper bounds for the approximation of the expected value of the L1-error
results from the sum of the expected variation terms Ek f^n Ef^n k ( ®rst term), and the
bias term kEf^n fk. These upper bounds are minimized by the choice hn copt n 1=5
with
r
p !2=5
p
f
9 6 p 1
5 log 2 5 00
cPL;opt
4 p
2
jf j
in case of the piecewise linear histogram f^PL ,
cP;opt
p !2=5
p
f
1 p
p 2 log 1 2 00
p
jf j
in case of the frequency polygon, and
cEP;opt
p !2=5
f
1
p 4 3 log 3 00
28 p
jf j
in case of the edge frequency polygon.
The ratio of the upper bounds uPL;opt , uP;opt and uEP;opt for the L1 error of the
dierent estimators evaluated at the optimal c-values is given by uPL;opt =uP;opt 1.95
and uP;opt =uEP;opt 1.98. These factors can be considered as the price one has to pay
for using information from two histogram cells in case of the frequency polygon,
respectively from one cell in case of the piecewise linear histogram, in comparison
with the edge frequency polygon which uses three cells when estimating the density at
a given position.
The above results can also be used to choose the binwidth hn appropriately in
practice. For example, from (8) and Theorem 5 it follows that
92=5
8
a
>
>
>
>
>
>
p
=
cPL;opt <
b 2p
r
>
cK;opt
p >
p 1
>
>
>
>9 6
:
5 log 2 5 ;
4 p
2
where cK;opt denotes the optimal constant when using a normal kernel density
estimator. For practical methods for choosing optimal bandwidth for such kernel
estimators we refer to DEVROYE (1997).
Concerning the multivariate piecewise linear histogram the following result shows
the consistency of multivariate piecewise linear density estimator.
# VVS, 1999
300
Beirlant, Berlinet and GyoÈr®
THEOREM 8 If for any sphere S centered at the origin
lim
max
n!1 f j:Anj Sg
diam Anj 0
9
and
lim
n!1
#f j : Anj Sg
n
0
10
then
lim k f
n!1
f^PL k 0
a:s:
Next we specify a sharp upper bound for the rate of convergence of the expected L1error for the piecewise linear histogram estimator. Also we restrict ourselves to the
case where the bins are equally large: hn jInjk j for all j and k. We denote the second
order partial derivatives of f with respect to dimensions k1 and k2 by f00k ;k .
1
2
THEOREM 9 If the density f has a compact support S in Rd and is twice continuously
dierentiable, then as n ! 1, hn ! 0 and nhdn ! 1
! p
r p
p f
2
1
1 4d2
2
^
log 2d 1 4d p
E k fPL fk
p
4d
2
nhdn
d
h2n X
h2 X X
00
00
p
j fk ;k j
j fk;k j n
1 2
8
18 3 k1
k2 4 k1
!
1
2
o hn p
nhdn
3
Proofs
PROOF OF THEOREM 1. We start with a technical lemma, the proof of which is clear
from Figure 1.
LEMMA 1
k f^PL
gn k kg^ n
gn k=2 kg^ n
f^PL k=2
Now by ABOU-JAOUDE (1976) under the conditions (4) and (5)
lim k f
gn k 0
a:s:
lim k f
g^ n k 0
a:s:
n!1
and
n!1
# VVS, 1999
301
Piecewise linear density estimators
(see also DEVROYE and GYOÈRFI, 1984, and BARRON, GYOÈRFI and
1992). Thus by Lemma 1 and the triangle inequality
k f^PL
fk k f^PL
gn k kgn
kg^ n
fk=2 3kgn
fk kg^ n
VAN DER
gn k=2 kgn
MEULEN,
fk
fk=2 ! 0 a:s:
PROOF OF THEOREM 2. Consider, as in Figure 1, two contiguous cells Anj and An; j1
with centers anj and an; j1 . Let
m Anj [ An; j1
mj
hnj hn; j1
Then simple calculations show that
an; j 1
an; j 1
jpn
jg^ n
mj j C
mj j
anj
anj
an; j 1
C
an; j 1
jg^ n
fj C
anj
jf
mj j
anj
One can build two partitions and therefore two histograms g~ n and g_ n by grouping 2 by
2 contiguous cells of the partition P. We have
an; j 1
X
j
jf
anj
0
an; j 1
XB
jf
pn j
@
j
an; j 1
anj
anj
0
XB
@ 1 C
j
1 C k f
jpn
mj j
1
C
mj jA
an; j 1
jf
Anj [An; j 1
mj j C
jg^ n
anj
g~ n k k f
g_ n k Ckg^ n
fk
1
C
fjA
Thus the statement follows from Abou-Jaoude's result on histograms.
We now turn to the proofs of the large deviation results. To this end we recall a
lemma from MCDIARMID (1989) and DEVROYE (1991).
LEMMA 2 Let X1 ; . . . ; Xn be independent random variables and assume that the
measurable function F : Rn ! R satis®es
sup
x1 ;...;xn ;x0i
# VVS, 1999
jF x1 ; . . . ; xi ; . . . ; xn
0
F x1 ; . . . ; xi ; . . . ; xn j ci ; 1 i n
302
Beirlant, Berlinet and GyoÈr®
Then for all e 4 0
P jF X1 ; . . . ; Xn
EF X1 ; . . . ; Xn j 4 e 2 exp
2
2e =
n
X
i1
2
ci
!!
and
Var F X1 ; . . . ; Xn
PROOF OF
error k f
Let x0
n
1X
2
c
4 i1 i
THEOREM 3. For x x1 ; . . . ; xi ; . . . ; xn 2 Rn let F(x) be equal to the L1
f^PL x; :k where f^PL x; : is the local linear histogram estimate built from x.
x1 ; . . . ; x0i ; . . . ; xn . We have
jF x
0
F x j k f^PL x; :
0
f^PL x ; :k
If xi and x0i belong to the same cell of R the above quantities are zero. If xi belongs to
0
Anj and x0i belongs to A
nk or xi belongs to Anj and xi belongs to Ank , it is clear from
Figure 1 that
2=n
if k j
0
^
^
k fPL x; : fPL x ; :k
2.5=n if k 6 j
Therefore Lemma 2 can be applied with ci 2.5=n.
PROOF OF THEOREM 4. Similar calculations show that Lemma 2 can be applied with
ci C 1=n.
In the proofs of Theorems 5 and 6 we will use a ``Poissonized'' version g~ n of the
histogram gn de®ned by
g~ n x
1 X
Pn Anj IA x
nj
nhn j
where Pn denotes a Poisson process on R which can be constructed from the empirical
measure nmn A Sni1 IA Xi by replacing the sample size n by a Poisson random
variable Nn with mean n, de®ned on the same probability space as the sequence Xi
(i 1, 2, . . .) and independent of this sequence; that is
Pn A d Nn mN A
n
Nn
X
i1
IA Xi jointly in A 2 B R
We give only the proof of Theorem 5, those of Theorems 6 and 7 follow the same
methods.
PROOF OF THEOREM 5. The proof follows the method of proof set out in BEIRLANT and
GYOÈRFI (1998) analyzing the rate of convergence of the multidimensional histogram
estimator.
# VVS, 1999
303
Piecewise linear density estimators
Since f is of compact support we can assume without loss of generality that an
integer ln exists such that for j 4 ln , m Anj 0 where m A A f. From this a
Poissonized version f~PL of the piecewise linear histogram can be de®ned replacing gn
by g~ n . From Lemma 2.3 in BEIRLANT and MASON (1995) it follows that it suces to
show that the corresponding result holds for kf~PL fk. Now
Ekf~PL
fk
s
ln
X
m Anj
1
h j1
E
n
Anj
Pn Anj nm Anj
q
nm Anj
0
1
x anj B BPn Anj nm Anj C
q
2
A
@mj @
h=2
nm A
nj
0
s
n
B x dx
m Anj j;n
mj
!
1
Pn Anj nm Anj C
p
A
nm Anj
11
with
mj
s
m A
nj
;
m Anj
s
m Anj
mj
;
m Anj
and
Bj;n x m Anj
hf x 2
x
anj
h=2
m Anj
m Anj
Lemma 2.2 in BEIRLANT and GYOÈRFI (1996) allows one to replace
Pn A
nm A
nj
nj
q
nm Anj
and
Pn Anj nm Anj
p
nm Anj
in the above expression by independent
standard normal random variables Z
j ; Zj
p
2
within the required accuracy o h 1= nh. Hence
Ekf~PL
mj
E
Anj
s
ln
m Anj
1X
fk
h j1
n
x
12
h=2
p
o h2 1= nh
# VVS, 1999
anj
Zj mj 1
2
x
anj
h=2
s
n
B x dx
Zj
m Anj j;n
304
Beirlant, Berlinet and GyoÈr®
Remark now that for any j,
x anj
mj 1 2
Zj mj 1
h=2
2
x
anj
h=2
Zj
is a normal random variable with mean zero and variance equal to
2
Dj;n x 1
16
x
h2
2
anj
8
x
h
2
anj mj
2
mj
Now jaj a 2 a , a b a b jaj b for any real a and any
positive b, and E(sgn(Zj 0, so that with the use of the independence of the sign
and the absolute value of zero centered normal random variable we obtain with Zj a
standard normal variable
s
x anj
x anj
n
mj 1 2
E
B x dx
Zj mj 1 2
Zj
m Anj j;n
h=2
h=2
Anj
E
s
n
B x dx
x
m Anj j;n
Zj Dj;n
Anj
E
jZj jDj;n
Anj
0
E
Anj
@jZ jD
j
1
s
n
x
sgn Zj Bj;n xA dx
m Anj
0
2E
Anj
r
2
p
j;n
s
n
sgn Zj Bj;n x dx
x
m Anj
@ jZ jD x
j
j;n
1
s
n
sgn Zj Bj;n xA dx
m Anj
Dj;n x dx
Anj
0
E
Anj
@ jZ jD
j
j;n
1
s
n
jB xjA dx
x
m Anj j;n
In the following lemma's we evaluate Bj;n and Dj;n in more detail.
LEMMA 3 Under (7)
h h2
Bj;n x
2 12
# VVS, 1999
x
00
3
anj f anj o h
2
12
305
Piecewise linear density estimators
PROOF. A Taylor expansion of f around anj leads to
anj h=2
A
nj
f
0
f anj u
anj f anj
1
u
2
2 00
2
anj f anj o h du
anj
h
h2 0
h3 00
3
f anj f anj f anj o h
2
8
48
Similarly
h2 0
h3 00
3
f anj f anj o h
8
48
h
f f anj
2
Anj
and
hf x h f anj
f
f x
h3 00
3
f anj o h
24
Anj
0
anj f anj
hx
h
x
2
2 00
anj f anj
h3 00
3
f anj o h
24
The result now follows by combination of the above approximations.
LEMMA 4 Under (7)
1
h
Dj;n
Anj
p
p
5 1
x dx
log 2 5 o h
4
2
PROOF. As in the proof of the preceding lemma one derives that
2
mj
2
mj
0
h f anj
o h
4 f anj
Using the substitution x
anj = h=2 u we ®nd then that
1
1
h
Dj;n
Anj
1
x dx
2
1
from which the result follows.
# VVS, 1999
q
1 4u2 h 1 o 1u f0 =f anj du
306
Beirlant, Berlinet and GyoÈr®
Now
s
rp
ln
p X
m Anj
5 1
2
log 2 5
fk
p 2
4
n
j1
Ekf~PL
0
l
n
1X
E
h j1
Anj
@jB
j;n
jZj jDj;n
xj
s1
p
m Anj
A dx o h2 1= nh
x
n
From Lemmas 3 and 4, Lemmas 5.16 and 5.17 in DEVROYE and GYOÈRFI (1985),
applying the substitution x anj = h=2 u to the integrals over Anj we now obtain
the result. The inequality in the statement of the theorem follows from the inequality
a b a .
PROOF OF THEOREM 8. By the triangle inequality
jf^PL x
Ef^PL xj jmn Anj
m Anj j
Anj
2
d
X
k1
jxk
Anj
jmn Anj
k
cnjk j
k
jInj j
k
jg^ n x
Anj
k
d
X
jInjk j
m Anj j 2
Egn anj
jmn Anj
k
4
k1
k
k
Egn anj
m Anj j
Eg^n xj
d
1X
2 k1
lim k f^PL
# VVS, 1999
k
Ef^PL k 0
k
k
Egn anj
k
k
k
Egn anj j
k
gn anj
Pdl1 jInjl j
jInjk j
k
a:s:
k
gn anj
jgn x
Anj
k
gn anj
d
1X
k
jm A
2 k1 n nj
Hence the variation term satis®es
n!1
k
gn anj
k
k
m A:nj j jmn Anj
k
Egn xj
k
m Anj j
307
Piecewise linear density estimators
Concerning the bias term we ®nd that
j f x
Ef^PL xj
Anj
j f x
Eg^ n anj j
Anj
2
d
X
k1
jxk
cnjk j
k
jInj j
Anj
j f x
Eg^ n xj
Anj
k
k
k
Egn anj
k
Egn anj
d
1X
k
jm Anj
2 k1
k
m Anj j
Hence
j f x
Ef^PL xj j f x
Eg^ n xj
d X
1X
jm Anjk
2 k1 j
m Anjk j
The ®rst term is the bias of g^ n which tends to 0. The second term also goes to 0. This
can be easily veri®ed for uniformly continuous f with compact support, and then
extended to arbitrary f.
PROOF OF THEOREM 9. The proof follows the same lines as the proof of Theorem 5,
except for a few details which we discuss here. At the step corresponding to (11) the
integrand is replaced by
2 0
1
k
k
d x
X
Pn Anj nm Anj
a
P
A
nm
A
nj
nj
k
nj 6 B n
C
q
q
E
2
A
4mjk @
h=2
k
nm Anj
k1
nm Anj
0
13
s
k
k
P
A
nm
A
n
nj
B n nj
C7
q
mjk @
B x
A5
m Anj j;n
k
nm Anj
with
v
u
um A k
nj
mjk t
;
m Anj
v
u
um A k
nj
mjk t
m Anj
Bj;n x m Anj
hf x 2
and
# VVS, 1999
d x
X
k
k1
cnjk
h=2
k
m Anj
k
m Anj
13
308
Beirlant, Berlinet and GyoÈr®
The ®rst two terms in (9) can now be approximated by a Gaussian random variable
1=dSdk1 Z*jk where
xk cnjk
xk cnjk
*
Z jk mjk Zjk 1 2d
mjk Zjk 1 2d
h=2
h=2
and for each j; k; Zjk ; Z
jk is a couple of independent standard normal random
variables.
In the step corresponding to (12) the equality is then replaced by an inequality
based on
E
d
d
1X
1X
Z*jk
EjZ*jk j
d k1
d k1
The remaining steps are analogous to the computations followed in the proof of
Theorem 5.
References
ABOU-JAOUDE, S. (1976), Conditions neÂcessaires et susantes de convergence L1 en probabiliteÂ
de l'histogramme pour une densiteÂ, Annales de l'Institut Henri Poincare 12, 213±231.
BARRON, A. R., L. GYOÈRFI and E. C. VAN DER MEULEN (1992), Distribution estimates consistent
in total variation and in two types of information divergence, IEEE Trans. on Information
Theory 38, 1437±1454.
BEIRLANT, J. and L. GYOÈRFI (1998), On the L1-error in histogram density estimation: the
multivariate case, Nonparametric Statistics 9, 197±216.
BEIRLANT, J. and D. M. MASON (1995), On the asymptotic normality of Lp norms of empirical
functionals, Mathematical Methods of Statistics 4, 1±15.
DEVROYE, L. (1983), On arbitrary slow rates of global convergence in density estimation,
Zeitschrift fuÈr Wahscheinlichkeitstheorie und verwandte Gebiete 62, 475±483.
DEVROYE, L. (1991), Exponential inequalities in nonparametric estimation, in: G. ROUSSAS
(ed.), Nonparametric functional estimation and related topics, p. 31±44, NATO ASI Series,
Kluwer Academic Publishers, Dordrecht.
DEVROYE, L. (1997), Universal smoothing factor selection in density estimation: theory and
practice, (with discussion), Test 6, 223±320.
DEVROYE, L. and L. GYOÈRFI (1985), Nonparametric density estimation: the L1 View, Wiley, New
York.
JONES, M. C., M. SAMIUDDIN, A. H. AL-HARBEY and T. A. H. MAATOUK (1998), The edge
frequency polygon, Biometrika 85, 235±239.
MCDIARMID, C. (1989), On the method of bounded dierences, in: Surveys in combinatorics
1989, p. 148±188. Cambridge University Press, Cambridge.
SCOTT, D. W. (1985a), Frequency polygons: theory and application, Journal of the American
Statistical Association 80, 348±354.
SCOTT, D. W. (1985b), Average shifted histograms: eective nonparametric density estimators in
several dimensions, Annals of Statistics 13, 1024±1040.
SCOTT, D. W. (1992), Multivariate density estimation: theory, practice and visualization, Wiley,
New York.
Received: October 1996. Revised: January 1998.
# VVS, 1999