Academia.eduAcademia.edu

On piecewise linear density estimators

1999, Statistica Neerlandica

We study piecewise linear density estimators from the L 1 point of view: the frequency polygons investigated by SCOTT (1985) and JONES et al. (1997), and a new piecewise linear histogram. In contrast to the earlier proposals, a unique multivariate generalization of the new piecewise linear histogram is available. All these estimators are shown to be universally L 1 strongly consistent. We derive large deviation inequalities. For twice dierentiable densities with compact support their expected L 1 error is shown to have the same rate of convergence as have kernel density estimators. Some simulated examples are presented.

287 Statistica Neerlandica (1999) Vol. 53, nr. 3, pp. 287±308 On piecewise linear density estimators J. Beirlant* Department of Mathematics, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-300 Leuven, Belgium A. Berlinet Universite Montpellier II, Place Bataillon, 34095 Montpellier Cedex, France L. GyoÈr® Department of Mathematics and Computer Science, Technical University of Budapest, 1521 Stoczek u. 2, Budapest, Hungary We study piecewise linear density estimators from the L1 point of view: the frequency polygons investigated by SCOTT (1985) and JONES et al. (1997), and a new piecewise linear histogram. In contrast to the earlier proposals, a unique multivariate generalization of the new piecewise linear histogram is available. All these estimators are shown to be universally L1 strongly consistent. We derive large deviation inequalities. For twice di€erentiable densities with compact support their expected L1 error is shown to have the same rate of convergence as have kernel density estimators. Some simulated examples are presented. Key Words and Phrases: nonparametric density estimation, histogram, asymptotics. 1 Introduction We consider the problem of estimating consistently an unknown probability density function in L1 from a sample of independent and identically distributed (i.i.d.) random variables by means of histogram-based estimators. The histogram is attractive from a computational point of view, but for twice di€erentiable densities it has a worse rate of convergence than the kernel density estimator. The frequency polygon (SCOTT, 1985a), the edge frequency polygon (JONES et al., 1998) and the piecewise linear histogram introduced next combine the two advantages: they are computationally ecient (quickly evaluated and updated) and therefore well adapted to online high data speed signal processing. They also have a good rate of convergence for smooth densities. In order to estimate a univariate density f we are given X1 , . . . , Xn , a sequence of i.i.d. random variables with common density f. Let P ˆ fAnj g and R ˆ fAnj ; A‡ nj g be * e-mail: [email protected] # VVS, 1999. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA. 288 Beirlant, Berlinet and GyoÈr® partitions of the real line by ®nite intervals such that Anj ; Anj ; A‡ nj have respective . Let m denote the empirical measure lengths hnj ; hnj =2; hnj =2, and Anj ˆ Anj [ A‡ nj n associated with the sample, so that #fi : Xi 2 Ag n mn A† ˆ for any Borel subset A on the real line. Let g^ n and gn stand for the histograms de®ned by the partitions P and R respectively, and let x 2 R. If Anj and Bnj denote the sets of the partitions P and R to which x belongs we have g^ n x† ˆ mn Anj † and hnj gn x† ˆ 2mn Bnj † hnj By the de®nition of the histograms gn ‡ g^ n ˆ Anj Anj gn A‡ nj ‡ so that if anj ; anj ; a‡ nj are the centers of Anj ; Anj ; Anj respectively, then the three points ‡ ‡ Onj ˆ anj ; gn anj ††, Onj ˆ anj ; g^n anj ††, Onj ˆ anj ; gn a‡ nj †† are contained in one line, and this line de®nes the piecewise linear histogram f^PL on Anj (see Figure 1): 2 x f^PL x† ˆ g^n anj † ‡ hnj ‡ anj † gn ajn † gn anj †† for x 2 Anj . Remark that g^ n anj † ˆ gn a‡ jn † ‡ gn anj † 2 Fig. 1. The histogram and the piecewise linear histogram on the cell Anj . # VVS, 1999 1† 289 Piecewise linear density estimators Fig. 2. The frequency polygon between anj and an;j‡1 . The frequency polygon f^P is based on the partition P and is de®ned by the broken line joining the neighboring points Onj † (see Figure 2). In the case hn ˆ hnj for all j, we have  1 ^f x† ˆ 1 g^ a † ‡ g^ a x P n n; j‡1 †† ‡ 2 n nj hn hn 2 anj  g^ n an; j‡1 † g^ n anj †† 2† ^ for x 2 A‡ nj [ An; j‡1 . The L2 properties of fP have been studied by SCOTT (1985a). ^ Finally the edge frequency polygon fEP , introduced by JONES et al. (1998), connects values at the bin edges by straight lines, those values being averages of contiguous histogram ordinates: 2 x f^EP x† ˆ hnj anj † 1 ‡ g^ n an; j 4  g^ n an; j‡1 † 1† g^ n an; j 4 1†  3† ‡ 2g^ n anj † ‡ g^ n an; j‡1 †† for x 2 Anj (see Figure 3). The L2 properties of f^EP have been studied by JONES et al. (1998). Frequency polygons have been widely used and their practical behavior is well known, although theoretical questions about them are still unanswered. As will be shown in this paper the piecewise linear histograms share with frequency and edge frequency polygons nice rates of convergence. So these kinds of piecewise linear estimates require further theoretical investigations to get ®nite distance comparisons and automatic methods to select the number of cells. Thorough simulation studies should be carried out. Here we show a few examples to illustrate the practical behavior of the estimators. This will give some idea of the behavior of the piecewise linear histogram compared with the standard histogram, the frequency polygon and the edge frequency polygon. # VVS, 1999 290 Beirlant, Berlinet and GyoÈr® Fig. 3. The edge frequency polygon on Anj . Two samples of 200 pseudo i.i.d. random numbers were generated. The ®rst one with the standard exponential distribution, the second with the normal N 0; 1† distribution. In the ®rst case we built the histogram with four equi-spaced cells on [0, 5] together with the associated estimators. The results are shown in Figures 4, 5 and 6. In the second case the histogram was built with four equi-spaced cells on [ 4, 4]. Results are shown in Figures 7, 8 and 9. In these ®gures the underlying density and the histogram are in thin lines and the density estimate in thick lines. These two examples were selected because they exhibit some characteristic features of piecewise linear histograms: some capability of correcting some problems of functional estimates such Fig. 4. Exponential density, histogram and frequency polygon. # VVS, 1999 Piecewise linear density estimators 291 Fig. 5. Exponential density, histogram and edge frequency polygon. Fig. 6. Exponential density, histogram and piecewise linear histogram. as edge e€ects, underestimation of peaks and overestimation of local minima. They are discontinuous like histograms and can be negative. f^PL is a density if and only if in ‡ ^ any cell one„has gn a„ ‡ nj †  3gn anj † and gn anj †  3gn anj †. However, fPL has integral 1 ^ ^ because of fPL ˆ gn ˆ 1, whereas the frequency polygons fP and f^EP are nonnegative and continuous but do not integrate to one in general. In order to diminish the sensitivity with respect to the choice of origin or left endpoint of the ®rst cell an average shifted piecewise linear histogram can be constructed averaging m linear histograms with slightly shifted origins, following the idea of average shifted histograms as proposed in SCOTT (1985b). In Figure 10 we propose # VVS, 1999 292 Beirlant, Berlinet and GyoÈr® Fig. 7. Normal density, histogram and frequency polygon. Fig. 8. Normal density, histogram and edge frequency polygon. the result of an average shifted piecewise histogram for the normal density of Figures 7±9 with the same h ˆ 1, m ˆ 10 with origins set at 4 i=10, 0  i  9. Unlike frequency polygons and edge frequency polygons for which di€erent multivariate extensions exist (see e.g. section 4.2 in SCOTT, 1992), the extension of the piecewise linear histogram is unique and rather straightforward. Let P ˆ fAnj g be the partition of Rd by rectangles, i.e. 1† 2† d† Anj ˆ Inj  Inj      Inj # VVS, 1999 293 Piecewise linear density estimators Fig. 9. Normal density, histogram and piecewise linear histogram. Fig. 10. Normal density and average shifted piecewise linear histogram. where Injk† k ˆ 1; . . . ; d† are intervals. Decompose the interval Injk† into the union of two disjoint intervals Injk† and Injk†‡ of equal size and put Anjk† ˆ Inj1†      Injk†      Injd† and k†‡ Anj # VVS, 1999 1† k†‡ ˆ Inj      Inj d†      Inj 294 Beirlant, Berlinet and GyoÈr® Then k† k†‡ Rk ˆ fAnj ; Anj g are partitions of Rd . Let g^n and gnk† be histograms for the partitions P and Rk respectively based on a pure random sample of size n. Let anj , anjk† and anjk†‡ be the centers of the rectangles Anj , Anjk† and Anjk†‡ respectively, then with x ˆ x1 ; x2 ; . . . ; xd † and anj ˆ cnj1 ; cnj2 ; . . . ; cnjd † the multivariate piecewise linear histogram is de®ned by f^PL x† ˆ g^ n anj † ‡ 2 d x X k kˆ1 cnjk † jInjk† j k† k†‡ fgn anj † k† k† gn anj †g; for x 2 Anj where j I j denotes the length of an interval I. In Figure 11 the piecewise linear histogram estimator of a bivariate density is shown. It is built from 200 simulated values of couples of independent variables with density N 0; 1†. The rectangle xmin ; xmax †  ymin ; ymax † was divided into 6  6 cells with the same size. If f and g are probability densities then their L1 distance is de®ned by kf gk ˆ j f x† g x†j dx ˆ jf gj In the remainder of the paper we investigate the L1 properties of these di€erent piecewise linear density estimators in more detail. We prove their universal strong L1 consistency and give large deviation inequalities in case of non-uniform partitions. Furthermore in case of uniform partitions we show that, for twice di€erentiable densities, they have the same rate of convergence as the kernel estimate; this in spite of the fact that the piecewise linear histogram does not use any information from the neighboring bins. Fig. 11. Piecewise linear histogram estimate of a bivariate normal density. # VVS, 1999 295 Piecewise linear density estimators In section 2 we present consistency, large deviation and rate of convergence results for the estimators. Section 3 contains the proofs of these results. 2 Theoretical results We ®rst state the consistency of the piecewise linear histogram and the polygon. THEOREM 1 If for any ®nite interval S centered at the origin lim max h n!1 f j:Anj Sg nj ˆ0 4† and lim n!1 #f j : Anj  Sg n ˆ0 5† then lim k f n!1 f^PL k ˆ 0 a:s: THEOREM 2 For any positive integer n, let hn ˆ inf hnj j and Hn ˆ sup hnj j Suppose that 8n 4 0; Hn =hn  C 5 1 6† and that lim Hn ˆ 0 and n!1 lim nhn ˆ 1 n!1 Then lim k f n!1 f^P k ˆ 0 a:s: We note that a consistency theorem for the edge frequency polygon is readily obtained when hn ˆ hnj for all j, as in this case for all Anj kg^ n f^EP k  2 Anj kg^ n f^P k Anj Similarly to the regular histogram and to the kernel density estimator it is possible to prove non-asymptotic large deviation inequalities for the centered L1 error: # VVS, 1999 k f^ fk Ek f^ fk 296 Beirlant, Berlinet and GyoÈr® THEOREM 3 For any partition P, any density f and any e 4 0, P jjj f^PL Ek f^PL fk f jjj 4 e†  2 exp 2 8ne =25† and Var k f^PL 25 1 16 n fk†  Note that Theorem 3 has no conditions on the partitions, so it holds even for nonconsistent locally linear histogram. For regular histogram and kernel density estimate similar results were proved in DEVROYE (1991) such that Var kg^n fk†  1 n The increase of the upper bound on the variance here is due to the fact that the piecewise linear histogram may take negative values, too. THEOREM 4 For any partition P satisfying condition (6), any density f and any e 4 0, P jjj f^P fk Ek f^P f jjj 4 e†  2 exp 2 2 2ne = C ‡ 1† † and Var k f^P fk†  C ‡ 1†2 4n For a uniform partition we have C ˆ 1. Therefore Theorem 4 gives the same bounds as for the histogram and the kernel estimates in case of a uniform partition. A large deviation theorem for the edge frequency polygon can be derived when hn ˆ hnj for all j. In this case the fundamental Lemma 2 (see section 3) holds with ci ˆ 4=n. Given any sequence of density estimators ff^n g the rate of convergence of the expected L1 error Ek f f^n k can be arbitrary slow (DEVROYE, 1983). Therefore in order to get a rate of convergence for the expected L1 error we need smoothness and tail conditions on f. In the sequel we assume that hn ˆ hnj for all j. Then the consistency conditions (1) and (2) are as follows: lim h n!1 n # VVS, 1999 ˆ 0; lim nhn ˆ 1 n!1 7† 297 Piecewise linear density estimators The expectation of the L1-error of the kernel density estimator f^K;n with kernel K was already given in DEVROYE and GYOÈRFI (1985, Section 5.1) in case of twice di€erentiable densities f: E k f^K;n s ! ! p nh5n bj f00 j a f 1 2 fk† ˆ p c ‡ o hn ‡ p f 2a nhn nhn ! r „ p f bh2n _00 2a 1 2 p ‡  jf j ‡ o hn ‡ p 2 p nhn nhn 8† p „ K2 , where c a† ˆ EjZ aj, Z is a normal N 0; 1† random variable and a ˆ „ 2 b ˆ x K x† dx. The following theorems specify the rate of convergence of the expected L1-error for the density estimators de®ned above. THEOREM 5 If the density f has a compact support S in R and is twice continuously di€erentiable, then as n ! 1, hn ! 0 and nhn ! 1 E k f^PL rp  „ p p f 5 1 2 ‡ log 2 ‡ 5† p fk† ˆ p 2 4 nhn ‡ Z h2 2 E n jU 8 1 00 kf j 3 ! p ! ‡ p f 2 jZj 1 ‡ 4U p nhn 1 2 ‡ o hn ‡ p nhn rp  „ p p f h2n 5 1 2  j f00 j  ‡ log 2 ‡ 5† p ‡ p p 2 4 nhn 18 3 ! 1 2 ‡ o hn ‡ p nhn where Z and U are independent random variables, Z normal N 0; 1† distributed and U uniformly distributed on [ 1, 1]. If hn ˆ cn 1=5 for some c 4 0 then lim sup n n!1 # VVS, 1999 2=5 E k f^PL rp  „ p p f 5 1 2 c2 00 ‡ log 2 ‡ 5† p ‡ p j f j fk†  p 2 4 c 18 3 298 Beirlant, Berlinet and GyoÈr® THEOREM 6 If the density f has a compact support S in R and is twice continuously di€erentiable, then as n ! 1, hn ! 0 and nhn ! 1 „ p p f 1 p ^ p  E k fP fk† ˆ 2 ‡ log 1 ‡ 2†† p 2 p nhn p !   p 2f ‡ 1 U2 2 1 00 2 E hn ‡ j f j jZj 1 ‡ U p 2 3 4 nhn ! 1 2 ‡ o hn ‡ p nhn „ p p f h2 1 p 00  p 2 ‡ log 1 ‡ 2†† p ‡ n j f j 8 2 p nhn ! 1 2 ‡ o hn ‡ p nhn where Z and U are independent random variables, Z normal N 0; 1† distributed and U uniformly distributed on [ 1, 1]. If hn ˆ cn 1=5 for some c 4 0 then „ p p f c2 1 p 2=5 ^ lim sup n E k fP fk†  p 2 ‡ log 1 ‡ 2†† p ‡ j f00 j 8 2 p c n!1 THEOREM 7 If the density f has a compact support S in R and is twice continuously di€erentiable, then as n ! 1, hn ! 0 and nhn ! 1   „ p f 3 4 ‡ log 3 p E k f^EP fk† ˆ p 8 p 3 nhn r p !‡   f h2n 22 3 1 2 2 00 U j f j jZj ‡ U p ‡ E 8 3 8 8 nhn ! 1 2 ‡ o hn ‡ p nhn „ p f 7h2 1 00  p 4 ‡ 3 log 3† p ‡ n j f j 8 8 p nhn ! 1 2 ‡ o hn ‡ p nhn where Z and U are independent random variables, Z normal N 0; 1† distributed and U uniformly distributed in [ 1, 1]. # VVS, 1999 Piecewise linear density estimators If hn ˆ cn 1=5 299 for some c 4 0 then lim sup n n!1 2=5 E k f^EP „ p f 7c2 1 00 jf j fk†  p 4 ‡ 3 log 3† p ‡ 8 8 p c The given upper bounds for the approximation of the expected value of the L1-error results from the sum of the expected variation terms Ek f^n Ef^n k ( ®rst term), and the bias term kEf^n fk. These upper bounds are minimized by the choice hn ˆ copt n 1=5 with r  „ p !2=5 p f 9 6 p 1 5 ‡ log 2 ‡ 5† „ 00 cPL;opt ˆ 4 p 2 jf j in case of the piecewise linear histogram f^PL , cP;opt ˆ „ p !2=5 p f 1 p p 2 ‡ log 1 ‡ 2†† „ 00 p jf j in case of the frequency polygon, and cEP;opt ˆ „ p !2=5 f 1 p 4 ‡ 3 log 3† „ 00 28 p jf j in case of the edge frequency polygon. The ratio of the upper bounds uPL;opt , uP;opt and uEP;opt for the L1 error of the di€erent estimators evaluated at the optimal c-values is given by uPL;opt =uP;opt  1.95 and uP;opt =uEP;opt  1.98. These factors can be considered as the price one has to pay for using information from two histogram cells in case of the frequency polygon, respectively from one cell in case of the piecewise linear histogram, in comparison with the edge frequency polygon which uses three cells when estimating the density at a given position. The above results can also be used to choose the binwidth hn appropriately in practice. For example, from (8) and Theorem 5 it follows that 92=5 8 a > > > > > > p = cPL;opt < b 2p r ˆ  > cK;opt p > p 1 > > > >9 6 : 5 ‡ log 2 ‡ 5† ; 4 p 2 where cK;opt denotes the optimal constant when using a normal kernel density estimator. For practical methods for choosing optimal bandwidth for such kernel estimators we refer to DEVROYE (1997). Concerning the multivariate piecewise linear histogram the following result shows the consistency of multivariate piecewise linear density estimator. # VVS, 1999 300 Beirlant, Berlinet and GyoÈr® THEOREM 8 If for any sphere S centered at the origin lim max n!1 f j:Anj Sg diam Anj † ˆ 0 9† and lim n!1 #f j : Anj  Sg n ˆ0 10† then lim k f n!1 f^PL k ˆ 0 a:s: Next we specify a sharp upper bound for the rate of convergence of the expected L1error for the piecewise linear histogram estimator. Also we restrict ourselves to the case where the bins are equally large: hn ˆ jInjk† j for all j and k. We denote the second order partial derivatives of f with respect to dimensions k1 and k2 by f00k ;k . 1 2 THEOREM 9 If the density f has a compact support S in Rd and is twice continuously di€erentiable, then as n ! 1, hn ! 0 and nhdn ! 1 ! p r p p „ f 2 1 1 ‡ 4d2 2 ^ ‡ log 2d ‡ 1 ‡ 4d † p E k fPL fk†  p 4d 2 nhdn d h2n X h2 X X 00 00 p j fk ;k j j fk;k j ‡ n 1 2 8 18 3 kˆ1 k2 4 k1 ! 1 2 ‡ o hn ‡ p nhdn ‡ 3 Proofs PROOF OF THEOREM 1. We start with a technical lemma, the proof of which is clear from Figure 1. LEMMA 1 k f^PL gn k ˆ kg^ n gn k=2 ˆ kg^ n f^PL k=2 Now by ABOU-JAOUDE (1976) under the conditions (4) and (5) lim k f gn k ˆ 0 a:s: lim k f g^ n k ˆ 0 a:s: n!1 and n!1 # VVS, 1999 301 Piecewise linear density estimators (see also DEVROYE and GYOÈRFI, 1984, and BARRON, GYOÈRFI and 1992). Thus by Lemma 1 and the triangle inequality k f^PL fk  k f^PL gn k ‡ kgn  kg^ n fk=2 ‡ 3kgn fk ˆ kg^ n VAN DER gn k=2 ‡ kgn MEULEN, fk fk=2 ! 0 a:s: PROOF OF THEOREM 2. Consider, as in Figure 1, two contiguous cells Anj and An; j‡1 with centers anj and an; j‡1 . Let m Anj [ An; j‡1 † mj ˆ hnj ‡ hn; j‡1 Then simple calculations show that an; j ‡1 an; j‡ 1 jpn jg^ n mj j  C mj j anj anj an; j ‡1 C an; j ‡ 1 jg^ n fj ‡ C anj jf mj j anj One can build two partitions and therefore two histograms g~ n and g_ n by grouping 2 by 2 contiguous cells of the partition P. We have an; j ‡1 X j jf anj 0 an; j ‡ 1 XB jf pn j  @ j an; j‡ 1 anj anj 0 XB  @ 1 ‡ C† j  1 ‡ C† k f jpn mj j ‡ 1 C mj jA an; j ‡1 jf Anj [An; j ‡1 mj j ‡ C jg^ n anj g~ n k ‡ k f g_ n k† ‡ Ckg^ n fk 1 C fjA Thus the statement follows from Abou-Jaoude's result on histograms. We now turn to the proofs of the large deviation results. To this end we recall a lemma from MCDIARMID (1989) and DEVROYE (1991). LEMMA 2 Let X1 ; . . . ; Xn be independent random variables and assume that the measurable function F : Rn ! R satis®es sup x1 ;...;xn ;x0i # VVS, 1999 jF x1 ; . . . ; xi ; . . . ; xn † 0 F x1 ; . . . ; xi ; . . . ; xn †j  ci ; 1  i  n 302 Beirlant, Berlinet and GyoÈr® Then for all e 4 0 P jF X1 ; . . . ; Xn † EF X1 ; . . . ; Xn †j 4 e†  2 exp 2 2e = n X iˆ1 2 ci !! and Var F X1 ; . . . ; Xn ††  PROOF OF error k f Let x0 ˆ n 1X 2 c 4 iˆ1 i THEOREM 3. For x ˆ x1 ; . . . ; xi ; . . . ; xn † 2 Rn let F(x) be equal to the L1 f^PL x; :†k where f^PL x; :† is the local linear histogram estimate built from x. x1 ; . . . ; x0i ; . . . ; xn †. We have jF x† 0 F x †j  k f^PL x; :† 0 f^PL x ; :†k If xi and x0i belong to the same cell of R the above quantities are zero. If xi belongs to ‡ 0 Anj and x0i belongs to A‡ nk or xi belongs to Anj and xi belongs to Ank , it is clear from Figure 1 that  2=n if k ˆ j 0 ^ ^ k fPL x; :† fPL x ; :†k ˆ 2.5=n if k 6ˆ j Therefore Lemma 2 can be applied with ci ˆ 2.5=n. PROOF OF THEOREM 4. Similar calculations show that Lemma 2 can be applied with ci ˆ C ‡ 1†=n. In the proofs of Theorems 5 and 6 we will use a ``Poissonized'' version g~ n of the histogram gn de®ned by g~ n x† ˆ 1 X Pn Anj †IA x† nj nhn j where Pn denotes a Poisson process on R which can be constructed from the empirical measure nmn A† ˆ Sniˆ1 IA Xi † by replacing the sample size n by a Poisson random variable Nn with mean n, de®ned on the same probability space as the sequence Xi (i ˆ 1, 2, . . .) and independent of this sequence; that is Pn A† ˆ d Nn mN A† ˆ n Nn X iˆ1 IA Xi † jointly in A 2 B R† We give only the proof of Theorem 5, those of Theorems 6 and 7 follow the same methods. PROOF OF THEOREM 5. The proof follows the method of proof set out in BEIRLANT and GYOÈRFI (1998) analyzing the rate of convergence of the multidimensional histogram estimator. # VVS, 1999 303 Piecewise linear density estimators Since f is of compact support we can assume without loss of „generality that an integer ln exists such that for j 4 ln , m Anj † ˆ 0 where m A† ˆ A f. From this a Poissonized version f~PL of the piecewise linear histogram can be de®ned replacing gn by g~ n . From Lemma 2.3 in BEIRLANT and MASON (1995) it follows that it suces to show that the corresponding result holds for kf~PL fk. Now Ekf~PL fk ˆ s ln X m Anj † 1 h jˆ1 E n Anj Pn Anj † nm Anj † q nm Anj † 0 1 ‡ ‡ x anj † B ‡ BPn Anj † nm Anj †C q ‡2 A @mj @ h=2 nm A‡ nj † 0 s n B x† dx ‡ m Anj † j;n mj ! 1 Pn Anj † nm Anj † C p A nm Anj † 11† with ‡ mj s m A‡ nj † ; ˆ m Anj † s m Anj † mj ˆ ; m Anj † and Bj;n x† ˆ m Anj † hf x†† ‡ 2 x anj † h=2 ‡ m Anj † m Anj †† Lemma 2.2 in BEIRLANT and GYOÈRFI (1996) allows one to replace Pn A ‡ nm A‡ nj † nj † q ‡ nm Anj † and Pn Anj † nm Anj † p nm Anj † in the above expression by independent standard normal random variables Z‡ j ; Zj p 2 within the required accuracy o h ‡ 1= nh†. Hence Ekf~PL ‡ mj E Anj s ln m Anj † 1X fk ˆ h jˆ1 n  x 1‡2 h=2 p ‡ o h2 ‡ 1= nh† # VVS, 1999 anj †   ‡ Zj ‡ mj 1 2 x anj † h=2 s  n B x† dx Zj ‡ m Anj † j;n 304 Beirlant, Berlinet and GyoÈr® Remark now that for any j,    x anj † ‡ ‡ mj 1 ‡ 2 Zj ‡ mj 1 h=2 2 x anj † h=2  Zj is a normal random variable with mean zero and variance equal to 2 Dj;n x† ˆ 1 ‡ 16 x h2 2 anj † ‡ 8 x h ‡ 2 anj † mj † 2 mj † † Now jaj ˆ a ‡ 2 a†‡ , a b†‡ ‡ a b†‡ ˆ jaj b†‡ for any real a and any positive b, and E(sgn(Zj †† ˆ 0, so that with the use of the independence of the sign and the absolute value of zero centered normal random variable we obtain with Zj a standard normal variable s     x anj † ‡ x anj † n ‡ mj 1 ‡ 2 E B x† dx Zj ‡ mj 1 2 Zj ‡ m Anj † j;n h=2 h=2 Anj ˆE s n B x† dx x† ‡ m Anj † j;n Zj Dj;n Anj ˆE jZj jDj;n Anj 0 ˆE Anj @jZ jD j 1 s n x† ‡ sgn Zj †Bj;n x†A dx m Anj † 0 ‡ 2E Anj r 2 ˆ p j;n s n sgn Zj †Bj;n x† dx x† ‡ m Anj † @ jZ jD x† j j;n 1‡ s n sgn Zj †Bj;n x†A dx m Anj † Dj;n x† dx Anj 0 ‡E Anj @ jZ jD j j;n 1‡ s n jB x†jA dx x† ‡ m Anj † j;n In the following lemma's we evaluate Bj;n and Dj;n in more detail. LEMMA 3 Under (7)  h h2 Bj;n x† ˆ 2 12 # VVS, 1999 x  00 3 anj † f anj † ‡ o h † 2 12† 305 Piecewise linear density estimators PROOF. A Taylor expansion of f around anj leads to anj ‡h=2 A‡ nj fˆ 0 ‰f anj † ‡ u anj †f anj † ‡ 1 u 2 2 00 2 anj † f anj † ‡ o h †Š du anj ˆ h h2 0 h3 00 3 f anj † ‡ f anj † ‡ f anj † ‡ o h † 2 8 48 Similarly h2 0 h3 00 3 f anj † ‡ f anj † ‡ o h † 8 48 h f ˆ f anj † 2 Anj and hf x† ˆ h f anj † f f x†† ‡ h3 00 3 f anj † ‡ o h † 24 Anj ˆ 0 anj †f anj † hx h x 2 2 00 anj † f anj † ‡ h3 00 3 f anj † ‡ o h † 24 The result now follows by combination of the above approximations. LEMMA 4 Under (7) 1 h Dj;n Anj p p 5 1 x† dx ˆ ‡ log 2 ‡ 5† ‡ o h† 4 2 PROOF. As in the proof of the preceding lemma one derives that ‡ 2 mj † 2 mj † ˆ 0 h f anj † ‡ o h† 4 f anj † Using the substitution x anj †= h=2† ˆ u we ®nd then that 1 1 h Dj;n Anj 1 x† dx ˆ 2 1 from which the result follows. # VVS, 1999 q 1 ‡ 4u2 ‡ h 1 ‡ o 1††u f0 =f† anj † du 306 Beirlant, Berlinet and GyoÈr® Now s rp  ln p X m Anj † 5 1 2 ‡ log 2 ‡ 5† fk ˆ p 2 4 n jˆ1 Ekf~PL 0 l n 1X E ‡ h jˆ1 Anj @jB j;n jZj jDj;n x†j s1‡ p m Anj † A dx ‡ o h2 ‡ 1= nh† x† n From Lemmas 3 and 4, Lemmas 5.16 and 5.17 in DEVROYE and GYOÈRFI (1985), applying the substitution x anj †= h=2† ˆ u to the integrals over Anj we now obtain the result. The inequality in the statement of the theorem follows from the inequality a b†‡  a‡ . PROOF OF THEOREM 8. By the triangle inequality jf^PL x† Ef^PL x†j  jmn Anj † m Anj †j Anj ‡2 d X kˆ1 jxk Anj ˆ jmn Anj † k† cnjk j k† jInj j k†‡ ˆ jg^ n x† Anj k†‡ d X jInjk† j m Anj †j ‡ 2 Egn anj †  jmn Anj † k† 4 kˆ1 k† k† Egn anj †† m Anj †j ‡ Eg^n x†j ‡ d 1X 2 kˆ1 lim k f^PL # VVS, 1999 k† Ef^PL k ˆ 0 k† k†‡ Egn anj † k† k† k† Egn anj ††j k† gn anj † Pdlˆ1 jInjl† j jInjk† j k† a:s: k†‡ gn anj † jgn x† Anj k† gn anj † d 1X k†‡ jm A † 2 kˆ1 n nj Hence the variation term satis®es n!1 k† gn anj † k†‡ k† m A:nj †j ‡ jmn Anj † k† Egn x†j k† m Anj †j† 307 Piecewise linear density estimators Concerning the bias term we ®nd that j f x† Ef^PL x†j  Anj j f x† Eg^ n anj †j Anj ‡2  d X kˆ1 jxk cnjk j k† jInj j Anj j f x† Eg^ n x†j ‡ Anj k† k†‡ k† Egn anj † k† Egn anj † d 1X k†‡ jm Anj † 2 kˆ1 k† m Anj †j Hence j f x† Ef^PL x†j  j f x† Eg^ n x†j ‡ d X 1X jm Anjk†‡ † 2 kˆ1 j m Anjk† †j The ®rst term is the bias of g^ n which tends to 0. The second term also goes to 0. This can be easily veri®ed for uniformly continuous f with compact support, and then extended to arbitrary f. PROOF OF THEOREM 9. The proof follows the same lines as the proof of Theorem 5, except for a few details which we discuss here. At the step corresponding to (11) the integrand is replaced by 2 0 1 k†‡ k†‡ d x X Pn Anj † nm Anj † a † P A † nm A † nj nj k nj 6 ‡ B n C q q E ‡2 A 4mjk @ h=2 k†‡ nm Anj † kˆ1 nm Anj † 0 13 s k† k† P A † nm A † n nj B n nj C7 q mjk @ B x† ‡ A5 m Anj † j;n k† nm Anj † with v u um A k†‡ † nj ‡ mjk ˆ t ; m Anj † v u um A k† † nj mjk ˆ t m Anj † Bj;n x† ˆ m Anj † hf x†† ‡ 2 and # VVS, 1999 d x X k kˆ1 cnjk † h=2 k†‡ m Anj † k† m Anj †† 13† 308 Beirlant, Berlinet and GyoÈr® The ®rst two terms in (9) can now be approximated by a Gaussian random variable 1=d†Sdkˆ1 Z*jk where     xk cnjk † xk cnjk † ‡ ‡ * Z jk ˆ mjk Zjk 1 ‡ 2d ‡ mjk Zjk 1 2d h=2 h=2 and for each j; k; Zjk ; Z‡ jk † is a couple of independent standard normal random variables. In the step corresponding to (12) the equality is then replaced by an inequality based on E d d 1X 1X Z*jk  EjZ*jk j d kˆ1 d kˆ1 The remaining steps are analogous to the computations followed in the proof of Theorem 5. References ABOU-JAOUDE, S. (1976), Conditions neÂcessaires et susantes de convergence L1 en probabilite de l'histogramme pour une densiteÂ, Annales de l'Institut Henri Poincare 12, 213±231. BARRON, A. R., L. GYOÈRFI and E. C. VAN DER MEULEN (1992), Distribution estimates consistent in total variation and in two types of information divergence, IEEE Trans. on Information Theory 38, 1437±1454. BEIRLANT, J. and L. GYOÈRFI (1998), On the L1-error in histogram density estimation: the multivariate case, Nonparametric Statistics 9, 197±216. BEIRLANT, J. and D. M. MASON (1995), On the asymptotic normality of Lp norms of empirical functionals, Mathematical Methods of Statistics 4, 1±15. DEVROYE, L. (1983), On arbitrary slow rates of global convergence in density estimation, Zeitschrift fuÈr Wahscheinlichkeitstheorie und verwandte Gebiete 62, 475±483. DEVROYE, L. (1991), Exponential inequalities in nonparametric estimation, in: G. ROUSSAS (ed.), Nonparametric functional estimation and related topics, p. 31±44, NATO ASI Series, Kluwer Academic Publishers, Dordrecht. DEVROYE, L. (1997), Universal smoothing factor selection in density estimation: theory and practice, (with discussion), Test 6, 223±320. DEVROYE, L. and L. GYOÈRFI (1985), Nonparametric density estimation: the L1 View, Wiley, New York. JONES, M. C., M. SAMIUDDIN, A. H. AL-HARBEY and T. A. H. MAATOUK (1998), The edge frequency polygon, Biometrika 85, 235±239. MCDIARMID, C. (1989), On the method of bounded di€erences, in: Surveys in combinatorics 1989, p. 148±188. Cambridge University Press, Cambridge. SCOTT, D. W. (1985a), Frequency polygons: theory and application, Journal of the American Statistical Association 80, 348±354. SCOTT, D. W. (1985b), Average shifted histograms: e€ective nonparametric density estimators in several dimensions, Annals of Statistics 13, 1024±1040. SCOTT, D. W. (1992), Multivariate density estimation: theory, practice and visualization, Wiley, New York. Received: October 1996. Revised: January 1998. # VVS, 1999