Estimation in inverse problems and second-generation wavelets

Dominique Picard; Gérard Kerkyacharian

Estimation in inverse problems and second-generation wavelets

Gerard Kerkyacharian

Proceedings of the International Congress of Mathematicians Madrid, August 22–30, 2006

visibility

…

description

22 pages

link

1 file

We consider the problem of recovering a function f , when we receive a blurred (by a linear operator) and noisy version : Yε = Kf + εẆ. We will have as guides 2 famous examples of such inverse problems : the deconvolution and the Wicksell problem. The direct problem (K is the identity) isolates the denoising operation. It can't be solved unless accepting to estimate a smoothed version of f : for instance, if f has an expansion on a basis, this smoothing might correspond to stopping the expansion at some stage m. Then a crucial problem lies in finding an equilibrium for m considering the fact that for m large, the difference between f and its smoothed version is small, whereas the random effect introduces an error which is increasing with m. In the true inverse problem, in addition to denoising we have to 'inverse the operator' K, which operation not only creates the usual difficulties, but also introduces the necessity to control the additional instability due to the inversion of the random noise. Our purpose here is to emphasize the fact that in such a problem, there generally exists a basis which is fully adapted to the problem, where for instance the inversion remains very stable : this is the Singular Value Decomposition basis. On the other hand, the SVD basis might be difficult to determine and manipulate numerically, it also might not be appropriate for the accurate description of the solution with a small number of parameters. Also in many practical situations, the signal provides inhomogeneous regularity, and its local features are especially interesting to recover. In such cases, other bases (in particular localised bases such as wavelet bases) may be much more appropriate to give a good representation of the object at hand. Our approach here will be to produce estimation procedures trying to keep the advantages of localisation without loosing the stability and computability of SVD decompositions. We will especially consider two cases. In the first one (which is the case of the deconvolution example) we show that a fairly simple algorithm (WAVE-VD) using an appropriate thresholding technique performed on a standard wavelet system, enables us to estimate the object with rates which are almost optimal up to logarithm factors for any Lp loss function, and on the whole range of Besov spaces. In the second case (which is the case of the Wicksell example where the SVD bases lies in the range of Jacobi polynomials), we prove that quite a similar algorithm (NEED-VD) can be performed provided replacing the standard wavelet system by a second generation wavelet-type basis : the Needlets. We use here the construction (essentially following the work of Petrushev and co-authors) of a localised frame linked with a prescribed basis (here Jacobi polynomials) using a Littlewood Paley decomposition combined with a cubature formula. Section 5 describes the direct case (K = I). It has its own interest and will act as a guide for understanding the 'true' inverse models for a reader which is unfamiliar with nonparametric statistical estimation. It can be read first. Section 1 introduces the general inverse problem and describes the examples of deconvolution and Wicksell problem. A review of standard methods is given with a special focus on SVD methods. Section 2 describes the WAVE-VD procedure. Section 3 and 4 give a description of the needlets constructions and the performances of the NEED-VD procedure.

Estimation in Inverse Problems and secondgeneration wavelets Dominique Picard and Gérard Kerkyacharian Abstract. We consider the problem of recovering a function f , when we receive a blurred (by a linear operator) and noisy version : Yε = Kf + εẆ . We will have as guides 2 famous examples of such inverse problems : the deconvolution and the Wicksell problem. The direct problem (K is the identity) isolates the denoising operation. It can’t be solved unless accepting to estimate a smoothed version of f : for instance, if f has an expansion on a basis, this smoothing might correspond to stopping the expansion at some stage m. Then a crucial problem lies in finding an equilibrium for m considering the fact that for m large, the difference between f and its smoothed version is small, whereas the random effect introduces an error which is increasing with m. In the true inverse problem, in addition to denoising we have to ’inverse the operator’ K, which operation not only creates the usual difficulties, but also introduces the necessity to control the additional instability due to the inversion of the random noise. Our purpose here is to emphasize the fact that in such a problem, there generally exists a basis which is fully adapted to the problem, where for instance the inversion remains very stable : this is the Singular Value Decomposition basis. On the other hand, the SVD basis might be difficult to determine and manipulate numerically, it also might not be appropriate for the accurate description of the solution with a small number of parameters. Also in many practical situations, the signal provides inhomogeneous regularity, and its local features are especially interesting to recover. In such cases, other bases (in particular localised bases such as wavelet bases) may be much more appropriate to give a good representation of the object at hand. Our approach here will be to produce estimation procedures trying to keep the advantages of localisation without loosing the stability and computability of SVD decompositions. We will especially consider two cases. In the first one (which is the case of the deconvolution example) we show that a fairly simple algorithm (WAVE-VD) using an appropriate thresholding technique performed on a standard wavelet system, enables us to estimate the object with rates which are almost optimal up to logarithm factors for any Lp loss function, and on the whole range of Besov spaces. In the second case (which is the case of the Wicksell example where the SVD bases lies in the range of Jacobi polynomials), we prove that quite a similar algorithm (NEED-VD) can be performed provided replacing the standard wavelet system by a second generation wavelet-type basis : the Needlets. We use here the construction (essentially following the work of Petrushev and co-authors) of a localised frame linked with a prescribed basis (here Jacobi polynomials) using a Littlewood Paley decomposition combined with a cubature formula. Section 5 describes the direct case (K = I). It has its own interest and will act as a guide for understanding the ’true’ inverse models for a reader which is unfamiliar with nonparametric statistical estimation. It can be read first. Section 1 introduces the general inverse problem and describes the examples of deconvolution and Wicksell problem. A review of standard methods is given with a special focus on SVD methods. Section 2 describes the WAVE-VD procedure. Section 3 and 4 give a description of the needlets constructions and the performances of the NEED-VD procedure. Mathematics Subject Classification (2000). Primary 62G07, 62G20, 62C20. Keywords. Nonparametric estimation, denoising, inverse models, Thresholding, Meyer Wavelet, Singular value decomposition, Littlewood-Paley decomposition. 1. Inverse Models Let H and K be two Hilbert spaces. K is a linear operator : f ∈ H 7→ Kf ∈ K. The standard linear ill-posed inverse problem consists of recovering a good approximation fε of 2 Dominique Picard and Gérard Kerkyacharian f solution of g = Kf (1) when only a perturbation gε of g is observed. In this paper, we will consider the case where this perturbation is an additive stochastic white noise. Namely we observe Yε defined by the following equation : Yε = Kf + εẆ , H, K, (2) ε is the amplitude of the noise. It is supposed to be a small parameter which will tend to 0. Our error will be measured in terms of this small parameter. Ẇ is a K-white noise : i.e. for any g, h in K, ξ(g) := (Ẇ , g)K , ξ(h) := (Ẇ , h)K are forming a random gaussian vector, centered, with marginal variance kgkK , khkK , and covariance (g, h)K (with the obvious extension when one considers k functions instead of 2). Equation (2) means that for any g in K, we observe : Yε (g) := (Yε , g)K = (Kf, g)K + εξ(g) where ξ(g) ∼ N (0, kgk2 ), and Y (g), Y (h) are independent random variables for orthogonal functions g and h. The case where K is the identity, is called the ’direct model’ and summarized as a memento in section 5. The reader which is unfamiliar with nonparametric statistical estimation is invited to refer to this section, which will act as a guide, for understanding the more general inverse models. In particular, in section 5, it is recalled that the model (2) appears as an approximation of models which appear in real practical situations, for instance the case where (2) is replaced by a discretisation. 1.1. Two examples: Deconvolution and Wicksell’s problems. 1.1.1. Deconvolution. The following problem is probably one of the famest among inverse problems in signal processing. In the deconvolution problem, we consider the following operator: let in this case H = K be the set of square integrable periodic functions, with the standard L2 ([0, 1]) norm, and consider: f ∈ H 7→ Kf = Z 0 1 γ(u − t)f (t)dt ∈ H (3) where γ is a known function of H, which is generally assumed to be a regular function (often in the sense that γ̂k , its Fourier coefficients behave like k −ν ). A very usual example is also the box-car function. The following figures show first four original signals to recover, which are well known test-signals of the statistical litterature. They provide typical features which are difficult to restore : bumps, blocks and Doppler effects. The second and third series of pictures show their deformation after blurring (i.e. convolution with a regular function) and addition of a noise. These figures show how the convolution regularizes the signal, making it very difficult to recover especially the high frequency features. Their statistical investigation can be found in [22]. A variant of this problem consists in observing Y1 , . . . , Yn , n independent and identically distributed random variables where each Yi may be written as the following sum Yi = Xi + Ui , where Xi and Ui again are independent, the distribution of Ui is know and of density γ and we want to recover the common density of the Xi ’s. The direct problem is the case where Ui = 0, for all i, and is corresponding to a standard density estimation problem (see section 5.1) . Hence the variables Ui ’s are acting as perturbations of the Xi ’s, whose density is to be recovered. 3 Estimation in Inverse Problems and second-generation wavelets (a) Lidar (b) Bumps 2.5 6 2 5 1.5 4 1 3 0.5 2 0 −0.5 1 0 0.2 0.4 0.6 0.8 0 1 0 0.2 0.4 (c) Blocks 0.6 0.8 1 0.8 1 (d) Doppler 1.5 0.5 1 0.5 0 0 −0.5 0 0.2 0.4 (a) Blurred LIDAR 0.6 0.8 −0.5 1 (b) Blurred Bumps 2.5 0 0.2 0.4 0.6 (a) Noisy−blurred LIDAR 1.5 (b) Noisy−blurred Bumps 2.5 1.5 2 2 1 1 1.5 1.5 0.5 1 0.5 1 0.5 0 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 −0.5 0 0.2 (c) Blurred Blocks 0.4 0.6 0.8 1 −0.5 0 (d) Blurred Doppler 1.5 0.2 0.4 0.6 0.8 1 −0.5 0 (c) Noisy−blurred Blocks 0.5 0.2 0.4 0.6 0.8 1 (d) Noisy−blurred Doppler 1.5 0.6 0.4 1 1 0.2 0 0.5 0 0.5 −0.2 0 −0.4 0 −0.6 −0.5 0 0.2 0.4 0.6 0.8 1 −0.5 0 0.2 0.4 0.6 0.8 1 −0.5 0 0.2 0.4 0.6 0.8 1 −0.8 0 0.2 0.4 0.6 0.8 1 1.1.2. Wicksell’s problem. Another typical example is the following classical Wicksell’s problem [42]. Suppose a population of spheres is embedded in a medium. The spheres have radii that may be assumed to be drawn independently from a density f . A random plane slice is taken through the medium and those spheres that are intersected by the plane furnish circles which radii are the points of observation Y1 , . . . , Yn . The unfolding problem is then to infer the density of the sphere radii from the observed circle radii. This unfolding problem also arises in medecine, where the spheres might be tumors in an animal’s liver [36], as well as in numerous contexts (biological, engineering,...) see for instance [9]. Following [42] and [23], the Wicksell’s problem corresponds to the following operator: K = L2 ([0, 1], dλ) dλ(x) = 4π −1 (1 − y 2 )1/2 dy Z 1 π 2 −1/2 Kf (y) = y(1 − y ) (x2 − y 2 )−1/2 f (x)dµ. 4 y Notice however, that, in this presentation, again in order to avoid additional technicalities, we detail this problem in the white noise framework, which is simpler than the original problem expressed above in density terms. H = L2 ([0, 1], dµ) dµ(x) = (4x)−1 dx, 1.2. Singular Value Decomposition and projection methods. Let us begin with a quick description of well known methods in inverse problems with random noise. Under the assumption that K is compact, there exist 2 orthonormal bases (SVD bases) (ek ) of H and (gk ) of K, respectively and a sequence (bk ), tending to 0, when k goes to infinity, such that Kek = bk gk , K ∗ gk = bk ek if K ∗ is the adjoint operator. For a sake of simplicity we suppose in the sequel that K and K ∗ are into. Otherwise we have to take care of the kernels of these operators. The bases (ek ) and (gk ) are called singular value bases, whereas the bk ’s are simply called singular values. 4 Dominique Picard and Gérard Kerkyacharian Deconvolution In this case simple calculations prove that the SVD bases ek and gk both coincide with the Fourier basis. The singular values are corresponding to the Fourier coefficients of the function γ: bk = γ̂k . (4) Wicksell In this case, following [23], we have the following SVD : ek (x) = 4(k + 1)1/2 x2 Pk0,1 (2x2 − 1) gk (y) = U2k+1 (y) Pk0,1 is the Jacobi polynomial of type (0, 1) with degree k Uk is the second type Chebishev polynomial with degree k. The singular values are : π (1 + k)−1/2 . (5) bk = 16 1.2.1. SVD Method. The Singular Value Decomposition (SVD) of K : X Kf = bk hf, ek igk k gives rise to approximations of the type : fε = N X k=0 b−1 k hgε , gk iek where N = N (ε) has to be chosen properly. This SVD method is very attractive theoretically and can be shown to be asymptotically optimal in many situations (see Mathé and Pereversev [31], Cavalier and Tsybakov [7], Mair and Ruymgaardt [29]). It also has the big advantage of performing a quick and stable inversion of the operator. However, it suffers from different types of limitations: The SVD bases might be difficult to determine and manipulate numerically. Secondly while these bases are fully adapted to describe the operator K, they might not be appropriate for the accurate description of the solution with a small number of parameters. Also in many practical situations, the signal provides inhomogeneous regularity, and its local features are especially interesting to recover. In such cases, other bases (in particular localised bases such as wavelet bases) may be much more appropriate to give a good representation of the object at hand. 1.2.2. Projection methods. Projection methods, which are defined as solutions of (1) restricted on finite dimensional subspaces HN and KN (of dimension N ) also give rise to attractive approximations of f , by properly choosing the subspaces and the tuning parameter N (Dicken and Maass [10], Mathe and Pereversseh [31] together with their non linear counterparts Cavalier and Tsybakov [7], Cavalier et al [6], Tsybakov [40], Goldenschluger and Pereversev [19], Efromovich and Kolchinskii [16]. In the case where H = K, and K is a self adjoint operator, the system is particularly simple to solve since the restricted operator KN is symmetric positive definite. This is the so-called Galerkin method. Obviously, restricting to finite subspaces has similar effects and can also be seen as a Tykonov regularisation, i.e. minimizing the least square functional penalised by a regularisation term. The advantage of the Galerkin method is to allow the choice of the basis. However the Galerkin method suffers from the drawback of being unstable in many cases. Comparing the SVD and Galerkin methods exactly states one main difficulty of the problem. The possible antagonism between the SVD basis where the inversion of the system is easy, and a ’localised’ basis where the signal is sparsely represented, will be the issue we are trying to address here. Estimation in Inverse Problems and second-generation wavelets 5 1.3. Cut-off, linear methods, thresholding. The reader may profitably refer to subsections 5.3 and 5.4, where the linear methods and thresholding techniques are presented in details in the direct case. SVD as well as Galerkin methods are very sensitive to the choice of the tuning parameter N (ε). This problems can be solved theoretically. However the solution depends heavily on the prior assumptions of regulatity on the solution, which have to be known in advance. In the last ten years, many nonlinear methods have been developped especially in the direct case with the objective of automatically adapting to the unknown smoothness and local singular behavior of the solution. In the direct case, one of the most attractive methods is probably wavelet thresholding, since it allies numerical simplicity to asymptotic optimality on a large variety of functional classes such as Besov or Sobolev classes. To adapt this approach in inverse problems, Donoho [14] introduced a wavelet-like decomposition, specifically adapted to the operator K (Wavelet-Vaguelette-Decomposition) and provided a thresholding algorithm on this decomposition. In Abramovitch and Silverman [1], this method was compared with the similar vaguelette-wavelet decomposition. Other wavelet approaches, might be mentioned such as Antoniadis and Bigot [2], Antoniadis & al [3] and especially for the deconvolution problem, Penski & Vidakovic [37], Fan & Koo [17], Kalifa & Mallat [24], Neelamani & al [34]. Later, Cohen et al [8] introduced an algorithm combining a Galerkin inversion to a thresholding algorithm. The approach developped in the sequel is greatly influenced by these previous works. The accent we put here is over constructing (when necessary) new generation wavelet-type bases well adapted to the operator K, instead of sticking to the standard wavelet bases and reducing the range of potential operators covered by the method. 2. Wave-VD-type estimation : We explain here the basic idea of the method, which is very simple. Let us expand f using a well suited basis (to be defined later : ’ the wavelet-type’ basis’) : X f= (f, ψλ )H ψλ P Using Parseval identity, we have βλ = (f, ψλ )H = fi ψλi , for fi = (f, ei )H , ψλi = (ψλ , ei )H , Let us put Yi = (Yε , gi )K , we have, X fj ej , K ∗ gi )H + εξi = bi fi + εξi Yi = (Kf, gi )K + εξi = (f, K ∗ gi )K + εξi = ( j where the ξi ’s are forming a sequence of iid centered gaussian variables with variance 1. β̂λ = X Yi i bi ψλi is such that E(β̂λ ) = βλ (i.e. its average value is βλ ). It is a plausible estimate of βλ . Let us now put ourselves in a multiresolution setting, taking λ = (j, k), for j ≥ 0, k belonging to a set χj and consider : J X X fˆ = t(βˆjk )ψjk j=−1 k∈χj where t is a thresholding operator. (The reader which is unfamiliar with thresholding techniques may profitably refer to section 5.4.) p (6) t(βˆjk ) = βˆjk I{|βˆjk | ≥ κtε σj } tε = ε log 1/ε. 6 Dominique Picard and Gérard Kerkyacharian κ here is a tuning parameter of the method which will be properly chosen later. A main difference, here, with the direct case is the fact that the thresholding is depending on the resolution level through the constant σj which also will be precised later. However, our main discussion will be on how to choose the basis (ψjk ). We will see that in order to perform in an optimal way, the method needs the basis (ψjk ) to be ’coherent’ with the SVD basis. We will particularly focus on two situations (corresponding to the two examples discussed in the introduction). In the first type of cases, the operator has as SVD bases the Fourier basis. In this case, this ’coherence’ is easily obtained with ’standard’ wavelets (still, not any kind of standard wavelets as will be seen). However, more difficult problems (and typically the Wicksell’s problem), require, when we need to mix these coherence conditions with the desired property of localisation of the basis, the construction of new objects: second generation-type wavelets. 2.1. WAVE-VD in a wavelet scenario. It is well known (see for instance Meyer [32]) that wavelet bases provide characterisations of smoothness spaces such as Hölder s spaces Lip(s), Sobolev spaces Wps as well as Besov spaces Bpq for a range of indices s depending on the wavelet ψ. For the scale of Besov spaces which includes as particular s s cases Lip(s) = B∞∞ (if s 6∈ N ) and Wps = Bpp (if p = 2), the characterisation has the following form : XX if f = βjk ψjk j≥0 k∈Z 1 1 j[s+ 2 − p ] s ∼ k(2 kβj· klp )j≥0 klq . kf kBpq (7) In this section, we take {ψjk , j ≥ −1, k ∈ ξj } to be a standard wavelet basis. More presisely, we suppose as usual that ψ−1 stands for the scaling function, for any j ≥ −1, ξj is a set included in N which size is of order 2j . Moreover, we assume that the following properties are true : There exist constants cp , Cp , dp such that: p p cp 2j( 2 −1) ≤ kψjk kpp ≤ Cp 2j( 2 −1) X p X |uk kψjk kpp , for any sequence uk uk ψjk kpp ≤ Dp k (8) (9) k∈ξj k∈ξj As in section 5, we consider the loss of a decision fˆ if the truth is f , as the Lp norm : ˆ kf − f kp and its associated risk : Ekfˆ − f kpp . The following theorem is going to evaluate this risk, when the strategy is the one introduced in the previous section, and when the true function belongs to a Besov ball s s ≤ M ). One nice property of this estimation procedure is that (f ∈ Bπ,r (M ) ⇐⇒ kf kBpq it does not need the a priori knowledge of this regularity to get a good rate of convergence. Theorem 2.1. If we assume that, 1 < p < ∞, 2ν + 1 > 0, and σj2 := i X ψjk [ ]2 ≤ C22jν , ∀ j ≥ 0, b i i (10) −2 If we put κ2 ≥ 16p, 2J = [tε ] 2ν+1 , then s if f belongs to Bπ,r (M ) with π ≥ 1, s ≥ 1/π, r ≥ 1 (with the restriction r ≤ π if p s = (2ν + 1)( 2π − 12 ), we have Ekfˆ − f kpp ≤ C log(1/ε)p−1 [ε2 log(1/ε)]αp , (11) Estimation in Inverse Problems and second-generation wavelets 7 with: p 1 s , if s ≥ (2ν + 1)( − ) 1 + 2(ν + s) 2π 2 1 p 1 s − 1/π + 1/p , if ≤ s < (2ν + 1)( − ). α= 1 + 2(ν + s − 1/π) π 2π 2 α= Remarks : 1. Condition (10) is essential here. As will be shown later, this condition is linking the wavelet system with the singular value decomposition of the kernel K. If we set ourself in the deconvolution case, the SVD basis is the Fourier basis in such a way i that ψjk is simply the Fourier coefficients of ψjk . If we choose as wavelet basis the periodized Meyer wavelet basis (see Meyer [32] and Mallat [30]), conditions (8) and (9) are satisfied. In addition, as the Meyer wavelet has the remarkable property of being compactly supported in the Fourier domain, simple calculations prove that, for i any j ≥ 0, k, the number of i’s such that ψjk 6= 0 is finite and of the order 2j . Then if we assume to be in the so-called ’regular’ case (bk ∼ k −ν , for all k), it is easy to establish that (10) is true. This condition is also true for more general cases in the deconvolution setting such as the box-car deconvolution, see [22], [28]. 2. These results are minimax (see [43]) up to logarithmic factors. This means that if we consider the best estimator in its worse performance over a given Besov ball, this estimator attains a rate of convergence which is the one given in (11) up to logarithmic factors. 3. If we compare these results to the rates of convergence obtained in the direct model (see subsections 5.3 and 5.4), we see that the difference (up to logarithmic terms) essentially lies in the parameter ν which acts as a reducing factor of the rate of convergence. This parameter quantifies the extra difficulty offered by the inverse problem. It is often called coefficient of illposedness. If we recall that in the deconvolution case, the coefficients bk ’s are the Fourier coefficients of the function γ, the illposedness coefficient then clearly appears to be closely related to the regularity of the blurring function. ✸ This result has been proved in the deconvolution case in [22]. The proof of this theorem is given in Appendix I. 2.2. WAVE-VD in Jacobi scenario: NEED-VD. We have seen that the results given above are true under the condition (10) on the wavelet basis. Let us first appreciate how the condition (10) links the ’wavelet-type’ basis to the SVD basis (ek ). To see this let us put ourselves in the regular case : bi ∼ i−ν . (by this, we mean more precisely that there exist two positive constants c and c′ such that c′ i−ν ≤ bi ≤ ci−ν .) If (10) is true, we have : C22jν ≥ Hence, ∀ m ≥ j, : X X X m 2m ≤i<2m+1 2m ≤i<2m+1 [ i ψjk ]2 bi i 2 [ψjk ] ≤ c22ν(j−m) 8 Dominique Picard and Gérard Kerkyacharian This suggests the necessity to contruct a ’wavelet-type’ basis which support, at the level j, with respect to the SVD basis (sum in i) is concentrated on the integers between 2j and 2j+1 and exponentially decreasing after this band. This is exactly the case of Meyer’s wavelet, when the SVD basis is the Fourier basis. In the general case of an arbitrary linear operator giving rise to an arbitrary SVD basis (ek ), and if in addition to (10), we add a localisation condition on the basis, we do not know if such a construction can be performed. However, in some cases, even quite as far from the deconvolution as the Wicksell’s problem, one can build a ’second generation wavelet-type’ basis, with exactly these properties. The following construction due to Petrushev and collaborators ([33], [39], [38]) exactly realizes the paradigm mentioned above, producing a frame (the needlet basis) in the case where the linear operator has Jacobi polynomials as SVD basis, (as well as in different other cases such as spherical harmonics, Hermite functions, Laguerre polynomials) which has the property of being localised. 3. Petrushev construction of Needlets Frames were introduced in the 1950’s by Duffin and Schaeffer [15] to represent functions via over-complete sets. Frames including tight frames arise naturally in wavelet analysis on Rd . Tight frames which are very close to orthonormal bases are especially useful in signal and image processing. We will see that the following construction has the advantage of being easily computable and producing well localised tight frames constructed on a specified orthonormal basis. We recall the following definition. Definition 3.1. Let H be a Hilbert space, and (en ) a sequence in H, (en ) is a tight frame (with constant 1) if : X ∀f ∈ H, kf k2 = |hf, en i|2 n Let now Y be a metric space, µ a finite measure. Let us suppose that we have the following decomposition ∞ M L2 (Y, µ) = Hk k=0 where the Hk ’s are finite dimensional spaces. For sake of simplicity, we suppose that H0 is reduced to the constants. Let Lk be the orthogonal projection on Hk : Z f (y)Lk (x, y)dµ(y) ∀f ∈ L2 (Y, µ), Lk (f )(x) = Y where Lk (x, y) = lk X eki (x)ēki (y) i=1 (eki )i=1,...lk lk is the dimension of Hk and an orthonormal basis of Hk . Let us observe that we have the following property of the projection operators: Z Lk (x, y)Lm (y, z)dµ(z) = δk,m Lk (x, z) (12) The construction, also inspired by the paper of Frazier, Jawerth and Weiss [18], is based on two fundamental steps : Littlewood-Paley decomposition and discretization, which are summarized in the two following subsections. 9 Estimation in Inverse Problems and second-generation wavelets 3.1. Littlewood -Paley decomposition. Let ϕ be a C ∞ function supported in |ξ| ≤ 1, such that 1 ≥ ϕ(ξ) ≥ 0 and ϕ(ξ) = 1 if |ξ| ≤ 21 . Let us define: a2 (ξ) = ϕ(ξ/2) − ϕ(ξ) ≥ 0 so that X ∀|ξ| ≥ 1, a2 (ξ/2j ) = 1 (13) j Let us now define the operator Λj = X a2 (k/2j )Lk k≥0 and the associated kernel X Λj (x, y) = a2 (k/2j )Lk (x, y) = k≥0 X a2 (k/2j )Lk (x, y.) 2j−1 <k<2j+1 The following proposition is true : Proposition 3.2. ∀f ∈ H, f = lim L0 (f ) + J→∞ For Mj (x, y) = X j a(k/2 )Lk (x, y), J X Λj (x, y) = k Λj (f ) (14) Mj (x, z)Mj (z, y)dµ(z) (15) j=0 Z Proof. L0 (f ) + J X J X X X ( a2 (k/2j )Lk ) = ϕ(k/2J+1 )Lk Λj (f ) = L0 + j=0 j=0 k So: X X k ϕ(k/2J+1 )Lk (f ) − f k2 = kLl (f )k2 + k l≥2J+1 ≤ X l≥2J X 2J ≤l<2J+1 kLl (f )(1 − ϕ(l/2J+1 )k2 2 kLl (f )k −→ 0, when (15) is a simple consequence of (12). 3.2. Discretization. Let us define Kk = (16) k k M Hm , m=0 and let us assume that some additional assumptions are true: 1. f ∈ Kk =⇒ f¯ ∈ Kk 2. f ∈ Kk , g ∈ Kl =⇒ f g ∈ Kk+l J →∞ 10 Dominique Picard and Gérard Kerkyacharian 3. Quadrature formula: for all k ∈ N, there exists χk a finite subset of Y and positive real numbers λξ > 0 indiced by the elements ξ of χk , such that Z X λξ f (ξ) ∀f ∈ Kk , f dµ = ξ∈χk Then the operator Mj defined in the subsection above is such that: Mj (x, z) = Mj (z, x) and z 7→ Mj (x, z) ∈ K2j+1 −1 . So that z 7→ Mj (x, z)Mj (z, y) ∈ K2j+2 −2 , and we can write: Λj (x, y) = Z X Mj (x, z)Mj (z, y)dµ(z) = λξ Mj (x, ξ)Mj (ξ, y) ξ∈χ2j+2 −2 This implies: Λj f (x) = = Z Λj (x, y)f (y)dµ(y) = X ξ∈χ2j+2 −2 p λξ Mj (x, ξ) Z Z X λξ Mj (x, ξ)Mj (ξ, y)f (y)dµ(y) ξ∈χ2j+2 −2 p λξ Mj (y, ξ)f (y)dµ(y) This can be summarized in the following way, if we put χ2j+2 −2 = Zj , X hf, ψj,ξ iψj,ξ (x). Λj f (x) = p λξ Mj (x, ξ) = ψj,ξ , ξ∈Zj Proposition 3.3. The family (ψj,ξ )j∈N,ξ∈Zj is a tight frame Proof. As f = lim (L0 (f ) + J−→∞ X kf k2 = lim (hL0 (f ), f i + J−→∞ but hΛj (f ), f i = X ξ∈Zj Λj (f )) j≤J X j≤J hΛj (f ), f i) hf, ψj,ξ ihψj,ξ , f i = X ξ∈Zj |hf, ψj,ξ i|2 and if ψ0 is a normalized constant hL0 (f ), f i = |hf, ψ0 i|2 so that X kf k2 = |hf, ψ0 i|2 + |hf, ψj,ξ i|2 j∈N, ξ∈Zj but this is exactly the characterization of a tight frame we gave. 3.3. Localisation properties. This construction has been performed in different frameworks by Petrushev and coauthors giving in each situation very nice localisation properties. The following figure (thanks to Paolo Baldi) is an illustration of this phenomenum: it shows a needlet constructed as explained above using Legendre polynomials of degree 28 . The highly oscillationg function is a Legendre polynomials of degree 28 , whereas the localised one is a needlet centered approximately in the middel of the interval . Its localisation properties are remarquable considering the fact that both functions are polynomials of the same order. 11 Estimation in Inverse Problems and second-generation wavelets 200 150 100 50 0 −50 −100 −150 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 In the case of the sphere of Rd+1 , where the spaces Hk are spanned by spherical harmonics, it is proved in Narcowich, Petrushev and Ward, [33] the following localisation property: For any k there exists a constant Ck such that : |ψjη (ξ)| ≤ Ck 2dj/2 . [1 + 2j arccos < η, ξ >]k A similar result exists in the case of Laguerre polynomials on R+ [25]. In the case of Jacobi polynomials on the interval with Jacobi weight, it is proved in Petrushev and Xu [38] the following localisation property: For any k there exist constant C, c such that : |ψjη (cos θ)| ≤ C2j/2 p . (1 + (2j |θ − arccos η|)k wαβ (2j , cos θ) where wαβ (n, x) = (1−x+n−2 )α+1/2 (1+x+n−2 )β+1/2 , −1/2. −1 ≤ x ≤ 1 if α > −1/2, β > 4. NEED-VD in the Jacobi case Let us now come back to the estimation algorithm. We consider now an operator K such that the space : H = L2 (I, dγ(x)) I = [−1, 1], dγ(x) = ωα,β (x)dx, ωα,β (x) = (1 − x)α (1 + x)β ; α > −1/2, β > −1/2 For a sake of simplicity, let us suppose α ≥ β. (Otherwise we can exchange the parameters.) Let Pk be the normalized Jacobi polynomials for this weight. We suppose that these polynomials appears as SVD basis of the operator K, as it is the case for the Wicksell problem, with β = 0, α = 1, bk ∼ k −1/2 . 12 Dominique Picard and Gérard Kerkyacharian 4.1. Needlets and condition (10). Let us define the ’needlets’ as constructed above: ψj,ηk = X l p â(l/2j−1 )Pl (x)Pl (ηk ) bj,ηk (17) The following proposition proves that such a construction always implies the condition (10) in the regular case. Proposition 4.1. As soon as ψj,ηk is a frame, if bi ∼ i−ν , then σj2 := i X ψjk [ ]2 ≤ C22jν b i i Proof. : As soon as the family ψj,ηk is a frame (not necessarily tight), as the elements of a i frame are bounded, and the set {i, ψjk 6= 0} is included into the set {C1 2j , . . . , C2 2j }, we have, i X ψjk [ ]2 ≤ C2jν kψj,ηk k2 ≤ C ′ 2jν b i i 4.2. Convergence results in the Jacobi case. The following theorem is the analogous of theorem 2.1, in this case. As can be seen, the results there are at the same time more difficult to obtain (the following theorem do not cover the same range as the previous one), and richer since they furnish new rates of convergence. Theorem 4.2. Let us suppose that we are in the Jacobi case as stated above (α ≥ β > − 12 ): We put p tε = ε log 1/ε 2 , − 1+2ν 2 J = tε choose κ ≥ 16p[1 + ( α2 − α+1 p )+ ], and suppose that we are in the regular case i.e. 1 ν>− . 2 bi ∼ i−ν , Then if f = P P j k βj,ηk ψj,ηk is such that, X ( |βj,ηk |p kψj,ηk kpp )1/p ≤ ρj 2−js , ρ. ∈ lr then p Ekfˆ − f kpp ≤ C[log(1/ε)]p−1 [ε log(1/ε)]µp , with : 1. If p < 2 + 1 α+1/2 µ= 2. If p > 2 + 1 α+1/2 µ= s s+ν + 1 2 s s+ν +α+1− 2(1+α) p 13 Estimation in Inverse Problems and second-generation wavelets This theorem is proved in Kerkyacharian et al. [27]. Simulation results on these methods are given there, showing that their performances are far above the usual SVD methods in several cases. It is interesting to notice that the rate of convergence which are obtained here agree with the minimax rates evaluated in Johnstone and Silverman [23] where the 1 ) show a rate of convergence case p = 2 is considered. But the second case (p > 2 + α+1/2 which is new in the litterature. In [27], where the whole range of Besov bodies is considered, more atypical rates are given. 5. Direct Models (K = I): a memento 5.1. The density model. The famest nonparametric model consists in observing n i.i.d. random variables having a common density f on the interval [0, 1], and trying to give an estimation of f . A standard route to perform this estimation consists in expanding the density f in an orthonormal basis {ek , k ∈ N} of an Hilbert space H -assuming implicitely that f belongs to H. X f= θl e l l∈N If H happens to be the space L2 = {g : [0, 1] 7→ R, kgk22 := that Z 1 θl = el (x)f (x)dx = Eel (Xi ). R1 0 g 2 < ∞}, we observe 0 Replacing the expectation by the empirical one leads to a standard estimate for θl : n θ̂l = 1X el (Xi ). n i=1 At this step, the simplest choice of estimate for f is obviously : fˆm = m X θ̂l el (18) i=1 5.2. From the density to the white noise model. Before analysing the properties of the estimator defined above, let us observe that the previous approach (representing f by its coefficients {θk , k ≥ 0}, leads to summarize the information in the following sequence model : {θ̂k , k ≥ 0}. (19) We can write θ̂k =: θk + uk , with n 1X uk = [ek (Xi ) − θk ], n i=1 Central limit theorem is a relatively convincing argument that the model (19), may be approximated by the following one ηk {θ̂k = θk + √ , k ≥ 0}. n (20) where the ηk′ s are forming a sequence of i.i.d. gaussian, centered variables with fixed variance σ 2 say. Such an approximation requires more delicate calculations than these quick arguments and is rigourously proved in Nussbaum [35], see also Brown and Low [5]. 14 Dominique Picard and Gérard Kerkyacharian This model is the sequence space model associated to the following global observation, so called white noise model (with ε = n−1/2 ): dYt = f (t)dt + εdWt , t ∈ [0, 1] R R R where for any ϕ ∈ L2 ([0, 1], dt), [0,1] ϕdYt = [0,1] f ϕdt + ε [0,1] ϕdWt is observable. (20) formally consists in considering, all the observables obtained for ϕ = ek , for all k in N. Among nonparametric situations, the white noise model considered above is one of the simplest, at least technically. Mostly for this reason, this model has been given a central place in statistics, particularly by the Russian school, following Ibraguimov and Has’minskii (see for instance their book [20]). However it arises as an appropriate large sample limit to more general nonparametric models, such as regression with random design, or non independent spectrum estimation, diffusion models -see for instance [21], [4]...- 5.3. The linear estimation: how to choose the tuning parameter m ? . In (18), the choice of m is crucial. To better understand the situation, let us have a look to the risk of the strategy fˆm . If we consider that, when deciding fˆm , when f is the truth, we have a loss of order kfˆm − f k22 , then, our risk will be the following mathematical expectation : Ekfˆm − f k22 . Of course this way of measuring our risk is arguable since there is no particular reason for the L2 norm to reflect well the features we want to recover in the signal. For instance, an L∞ -norm could be prefered because it is easier to visualize. In general, several Lp norms are considered (as it is the case in sections 2.1 and 4.2). Here we restrict to the L2 case for a sake of simplicity. To avoid technical difficulties, we set ourselves in the case of a white noise model, considering that we observe the sequence defined in (20). Hence, E(θ̂l − θl )2 = 1 n Z 1 el (x)2 dx = 0 1 := ε2 . n We are now able to obtain : Ekfˆm − f k22 = X l≤m (θ̂l − θl )2 + ≤ mε2 + If we now assume that : ∀k ∈ N∗ , X l>k X X θl2 l>m θl2 l>m θl2 ≤ M k −2s (21) for some s > 0 which is here an index of regularity directly connected to the size of the compact in l2 in which the function f is supposed to lie, we get Ekfˆm − f k22 ≤ mε2 + M m−2s We observe that the RHS is the sum of two factors: one (called the stochatic term) is increasing in m and reflects the fact that because of the noise, the more coefficients we have to estimate, the larger the global error will be. The second one (called the bias term or approximation term) does not depend on the noise and is decreasing in m. The RHS is −2 optimised by choosing m = m∗ (s) =: c(s, M )ε 1+2s . Then Ekfˆm∗ (s) − f k2 ≤ c′ (s, M )ε 1+2s . −4s 15 Estimation in Inverse Problems and second-generation wavelets Let us observe that the more f is supposed to be regular (in the sense the larger s is), the less coefficients we need to estimate : a very irregular function (s close to 0) requires almost as much as ε−2 = n coefficients, which corresponds to estimate as many coefficients as the number of available observations -in the density model for instance-. The rate obtained in (5.3) can be proved to be optimal in the following sense (minimax) : if we consider the best estimator in its worse performance over the class of functions verifying (21), this estimator attains a rate of convergence which is (up to a constant) the one given in (5.3). See Tsybakov [41] for a detailed review of the minimax point of view. 5.4. The thresholding estimation. Let us now suppose that the constant s, which plays an essential role in the construction of the previous estimator is not known. This is realistic, since it is extremely rare to know in advance that the function we are seeking has a specified regularity. Also, the previous approach takes very seriously into account the order in which the basis is taken. Let us now present a very elegant way of addressing at the same time both of these issues. The thresholding techniques which have been known for long by engineers in electronic and telecommunications, was introduced in statistics in Donoho and Johnstone [11] and later in a series of papers on wavelet thresholoding [12], [13]. It allies numerical simplicity to asymptotic optimality. It starts from a different kind of observation : let us introduce the following estimate : fe = B X k=0 θ̂k I{|θ̂k | ≥ κtε }ek (22) Here the point of view is the following. We choose B very large (i.e. almost corresponding to s = 0) B = ε−2 log 1/ε. But instead of keeping all the coefficients which numbers are between 0 and B, we decide to kill those which are not above the threshold tε . The intuitive justification of this choice is the following. Assuming that f has some kind of regularity conditions like (21) (unknown, but real...), essentially means that the coefficients θk of f are small except a small number of them. Hence, in the reconstruction of f , only these ones will be significant. tε is chosen, in such a way that the noise θ̂k − θk due to the randomness of the observation might be neglected: tε = ε[log 1/ε]−1/2 . Now, let us assume another type of condition on f , namely : there exists a positive constant 0 < q < 2, such that, ∀ k ∈ N∗ , sup λq #{k, |θk | ≥ λ} ≤ M (23) λ>0 Ekfe − f k22 = ≤ + X l≤B X l X l≤B (θ̂l I{|θ̂l | ≥ κtε } − θl )2 + X θl2 l>B (θ̂l − θl )2 I{|θl | ≥ κtε /2} + X l θl2 I{|θl | ≤ 2κtε } [(θ̂l − θl )2 + θl2 ]I{|θ̂l − θl | ≥ κtε /2} + X θl2 l>B Now, using the following probabilist bounds, E(θ̂l − θl )2 = ε2 , P(|θ̂l − θl | ≥ λ) ≤ 2 exp −λ2 ε2 , ∀λ > 0 and the fact that condition (23) implies : X θl2 I{|θl | ≤ 2κtε } ≤ Ct2−q ε l 16 Dominique Picard and Gérard Kerkyacharian we get, ′ 2−q Ekfe − f k22 ≤ M ε2 t−q + εκ ε + C tε 2 /8 B+ X θl2 l>B It remains now to choose κ2 ≥ 32, to get, Ekfe − f k22 ≤ C ′ t2−q + ε X θl2 l>B and if we assume in addition to (23), ∀k ∈ N∗ , X l>k θl2 ≤ M k − 2−q 2 (24) we get, Ekfe − f k22 ≤ C”t2−q ε Note that the interesting point in this construction is that the regularity conditions which are assumed on the function f are not known from the statistician, since they do not enter into the construction of the procedure. This property is called adaptation. 2 Now, to compare with the previous section, let us take q = 1+2s , it is not difficult to prove that as soon as f verifies (21), it automatically verifies (23) and (24). Hence fe and fˆm∗(s) have the same rate of convergence up to a logarithmic term. If we neglect this logarithmic loss, we substantially gain here the fact that we need not know the apriori regularity conditions on the aim function. It can also be proved that in fact conditions (23) and (24) are defining a set which is substantially larger than the set defined by condition (21): for instance its entropy is strictly larger (see [26]). 6. Appendix I : Proof of Theorem 2.1 In this proof, C will denote an absolute constant which may change from one line to the other. s We can always suppose p ≥ π. Indeed, if π ≥ p it is very simple to see that Bπ,r (M ) is 1 1 1 1 s included into Bp,r (M ): as 2j[s+ 2 − p ] kβj· klp ≤ 2j[s+ 2 − π ] kβj· klπ (since the cardinal of ξj is j of order 2 ). First we have the following decomposition : Ekfˆ − f kpp ≤ 2p−1 {Ek J X X j=−1 k∈χj (t(βˆjk ) − βjk )ψjk kpp + k X X j>J k∈χj βjk ψjk kpp } =: I + II s The term II is easy to analyse, as follows: Since f belongs to Bπ,r (M ), using standard embedding results (which in this case simply follows from direct comparisons between lq 1 1 −p )+ s−( π norms) we have that f also belong to Bp,r k X X j>J k∈χj Then we only need to verify that (M ′ ), for some constant M ′ . Hence 1 1 βjk ψjk kp ≤ C2−J[s−( π − p )+ ] . 1 1 s−( π −p )+ 1+2ν is always larger that α, which not difficult. 17 Estimation in Inverse Problems and second-generation wavelets Bounding the term I is more involved. Using the triangular inequality together with Hölder inequality, and property (9) for the second line, we get I ≤ 2p−1 J p−1 J X Ek j=−1 ≤ 2p−1 J p−1 Dp X (t(βˆjk ) − βjk )ψjk kpp X E|t(βˆjk ) − βjk |p kψjk kpp k∈χj J X j=−1 k∈χj Now, we separate four cases : PJ j=−1 P k∈χj E|t(βˆjk ) − βjk |p kψjk kpp = ≤ ≤ ˆjk ) − βjk |p kψjk kp I{|βˆjk | ≥ κtε σj } E|t( β p j=−1 k∈χj +I{|βˆjk | < κtε σj } PJ P p p ˆ ˆ j=−1 k∈χj E|βjk − βjk | kψjk kp I{|βjk | ≥ κtε σj } I{|βjk | ≥ κ2 tε σj } + I{|βjk | < κ2 tε σj } +|βjk |p kψjk kpp I{|βˆjk | ≤ κtε σj } I{|βjk | ≥ 2κtε σj } +I{|βjk | < 2κtε σj } PJ P : Bb + Bs + Sb + Ss P ψi P i i fi ψjk = ε i ξi bjk is a gaussian random variable If we notice that βˆjk − βjk = i Yi −b bi i i P ψjk 2 2 centered, and with variance ε i [ bi ] , we have using standard properties of the gaussian P ψi 2 ] ≤ C22jν , and denote by distribution, for any q ≥ 1, if we recall that we set σj2 =: i [ bjk i sq the qth absolute moment of the gaussian distribution when centered and with variance 1: E|βˆjk − βjk |q ≤ sq σjq εq 2 κ P{|βˆjk − βjk | ≥ tε σj } ≤ 2εκ /8 2 Hence, Bb ≤ Ss ≤ J X X sp σjp εp kψjk kpp I{|βjk | ≥ J X X |βjk |p kψjk kpp I{|βjk | < 2κtε σj } j=−1 k∈χj j=−1 k∈χj κ tε σj } 2 And, Bs ≤ ≤ J X X κ κ [E|βˆjk − βjk |2p ]1/2 [P{|βˆjk − βjk | ≥ tε σj }]1/2 kψjk kpp I{|βjk | < tε σj } 2 2 J X X s2p σjp εp 21/2 εκ j=−1 k∈χj 1/2 j=−1 k∈χj ≤C J X j=−1 1 2jp(ν+ 2 ) εp εκ 2 /16 2 /16 kψjk kpp I{|βjk | < ≤ Cεκ 2 /16 κ tε σj } 2 18 Dominique Picard and Gérard Kerkyacharian Now, if we remark that the βjk are necessarily all bounded by some constant (depending s on M ) since f belongs to Bπ,r (M ), and using (8), Sb ≤ J X X |βjk |p kψjk kpp P{|βˆjk − βjk | ≥ 2κtε σj }I{|βjk | ≥ 2κtε σj } J X X |βjk |p kψjk kpp 2εκ j=−1 k∈χj ≤ j=−1 k∈χj ≤C J X p 2 j 2 εκ 2 /8 ≤ Cε j=−1 κ2 8 2 /8 I{|βjk | ≥ 2κtε σj } p − 2(2ν+1) It is easy to check that in any cases if κ2 ≥ 16p the terms Bs and Sb are smaller than the rates announced in the theorem. We have using (8) and condition (10) for any z ≥ 0: Bb ≤ Cεp ≤ Cεp J X 2j(νp+ 2 −1) J X 2j(νp+ 2 −1) p j=−1 I{|βjk | ≥ X |βjk |z [tε σj ]−z k∈χj p j=−1 ≤ Ctε p−z X k∈χj J X p 2j[ν(p−z)+ 2 −1] j=−1 X k∈χj κ tε σj } 2 |βjk |z Also, for any p ≥ z ≥ 0 Ss ≤ C J X p 2j( 2 −1) j=−1 X k∈χj ≤ C[tε ]p−z J X |βjk |z σjp−z [tε ]p−z p 2j(ν(p−z)+ 2 −1) j=−1 X k∈χj |βjk |z So in both cases we have the same bound to investigate. We will write this bound on the following form (forgetting the constant) : I + II = tε p−z1 [ j0 X p 2j[ν(p−z1 )+ 2 −1] j=−1 X k∈χj |βjk |z1 ]+ tε p−z2 [ J X p 2j[ν(p−z2 )+ 2 −1] j=j0 +1 X k∈χj |βjk |z2 ] The constants zi and j0 will be chosen depending on the cases. Let us first consider the case where s ≥ (ν + 21 )( πp − 1), put q= p(2ν + 1) 2(s + ν) + 1 and observe that on the considered domain, q ≤ π and p > q. In the sequel it will be used that we have automatically s = (ν + 12 )( pq − 1). Now, taking z2 = π, we get : II ≤ tε p−π [ J X j=j0 +1 p 2j[ν(p−π)+ 2 −1] X k∈χj |βjk |π ] 19 Estimation in Inverse Problems and second-generation wavelets Now, as 1 p 1 1 p − + ν( − 1) = s + − 2q π q 2 π and X k∈χj 1 1 |βjk |π = 2−j(s+ 2 − π ) τj s with (τj )j ∈ lr (this is a consequence of the fact that f ∈ Bπ,r (M ) and (6)). Hence, we can write : II ≤ tε p−π ≤ Ctε X 1 π 2jp(1− q )(ν+ 2 ) τjπ j=j0 +1 1 p−π j0 p(1− π q )(ν+ 2 ) 2 The last inequality is true for any r ≥ 1 if π > q and for r ≤ π if π = q. Notice that π = q p 1 is equivalent to s = (ν + 21 )( πp − 1). Now if we choose j0 such that 2j0 q (ν+ 2 ) ∼ tε −1 we get the bound tε p−q which exactly gives the rate announced in the theorem for this case. As for the first part of the sum (before j0 ), we have, taking now z1 = qe, with qe ≤ π, so P P 1 1 that [ 21j k∈χj |βjk |qe] qe ≤ [ 21j k∈χj |βjk |π ] π , and using again (6), j0 X X p |βjk |qe] 2j[ν(p−eq )+ 2 −1] I ≤ tε p−eq [ k∈χj −1 j0 X 2j[ν(p−eq )+ 2 − π ] [ j0 X 2j[(ν+ 2 )p(1− q )] τjqe ≤ tε p−eq [ ≤ tε p−eq ≤ Ctε ≤ Ctε q e p X k∈χj −1 q e |βjk |π ] π ] q e 1 −1 p−e q j0 [(ν+ 21 )p(1− qqe )] 2 p−q The last two lines are valid if qe is chosen strictly smaller than q (this is possible since π ≥ q). Let us now consider the case where s < (ν + 21 )( pq − 1), and choose now q= p , 2(s + ν − π1 ) + 1 s−1/π+1/p , q−π = in such a way that we easily verify that p − q = 2 1+2(ν+s−1/π) 1 π. (p−π)(1+2ν) > 1 2(s+ν− π )+1 p 1 1 1 2 − π = 2q − q because s is supposed to be larger that Furthermore we also have s + p ν( q − 1). s Hence taking z1 = π and using again the fact that f belongs to Bπ,r (M ), I ≤ tε p−π k∈χj −1 ≤ tε p−π ≤ Ctε j0 X X p 2j[ν(p−π)+ 2 −1] [ |βjk |π ] j0 X 1 1 p 2j[(ν+ 2 − p ) q (q−π)] τjπ −1 1 p ) q (q−π)] p−π j0 [(ν+ 21 − p 2 0, + 20 Dominique Picard and Gérard Kerkyacharian This is true since ν + take 2 1 1 j0 p q (ν+ 2 − p ) ∼ tε 1 1 2 − p −1 is also strictly positive because of our constraints. If we now we get the bound tε p−q which is the rate announced in the theorem for this case. Again, for II, we have, taking now z2 = qe > q ( > π) J X II ≤ tε p−eq [ p 2j[ν(p−eq )+ 2 −1] j=j0 +1 ≤ Ctε p−eq ≤ Ctε ≤ Ctε X X k∈χj 2 1 p j[(ν+ 12 − p ) q (q−e q )] |βjk |qe] q e zjπ j=j0 +1 1 p ) q (q−e q )] p−e q j0 [(ν+ 21 − p 2 p−q References [1] F. Abramovich and B. W. Silverman. Wavelet decomposition approaches to statistical inverse problems. Biometrika, 85(1):115–129, 1998. [2] A. Antoniadis and J. Bigot. Poisson inverse models. 2004. Preprint Grenoble. [3] A. Antoniadis, J. Fan, and I. Gijbels. A wavelet method for unfolding sphere size distributions. 29:265–290, 2001. [4] Lawrence D. Brown, T. Tony Cai, Mark G. Low, and Cun-Hui Zhang. Asymptotic equivalence theory for nonparametric regression with random design. Ann. Statist., 30(3):688–707, 2002. [5] Lawrence D. Brown and Mark G. Low. Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist., 24(6):2384–2398, 1996. [6] L. Cavalier, G. K. Golubev, D. Picard, and A. B. Tsybakov. Oracle inequalities for inverse problems. Ann. Statist., 30(3):843–874, 2002. [7] Laurent Cavalier and Alexandre Tsybakov. Sharp adaptation for inverse problems with random noise. Probab. Theory Related Fields, 123(3):323–354, 2002. [8] Albert Cohen, Marc Hoffmann, and Markus Reiß. Adaptive wavelet Galerkin methods for linear inverse problems. SIAM J. Numer. Anal., 42(4):1479–1501 (electronic), 2004. [9] L.M. Cruz-Orive. Distribution-free estimation of sphere size distributions from slabs showing overprojections and truncations, with a review of previous methods. J. Microscopy, 131:265– 290, 1983. [10] V. Dicken and P. Maass. Wavelet-Galerkin methods for ill-posed problems. J. Inverse IllPosed Probl., 4(3):203–221, 1996. [11] D. L. Donoho and I. M. Johnstone. Minimax risk over ℓp -balls for ℓq -error. Probability Theory and Related Fields, 99:277–303, 1994. [12] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society, Series B, 57:301–369, 1995. With Discussion. [13] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Density estimation by wavelet thresholding. Annals of Statistics, 24:508–539, 1996. [14] David L. Donoho. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal., 2(2):101–126, 1995. [15] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series. Trans. Amer. Math. Soc., 72:341–366, 1952. [16] Sam Efromovich and Vladimir Koltchinskii. On inverse problems with unknown operators. IEEE Trans. Inform. Theory, 47(7):2876–2894, 2001. Estimation in Inverse Problems and second-generation wavelets 21 [17] J. Fan and J.K. Koo. Wavelet deconvolution. IEEE Transactions on Information Theory, 48(3):734–747, 2002. [18] M. Frazier, B. Jawerth, and G. Weiss. Littlewood paley theory and the study of functions spaces. CMBS, 79, 1991. AMS. [19] Alexander Goldenshluger and Sergei V. Pereverzev. On adaptive inverse estimation of linear functionals in Hilbert scales. Bernoulli, 9(5):783–807, 2003. [20] I. A. Ibragimov and R. Z. Hasminskii. Statistical estimation, volume 16 of Applications of Mathematics. Springer-Verlag, New York, 1981. [21] Michael Jähnisch and Michael Nussbaum. Asymptotic equivalence for a model of independent non identically distributed observations. Statist. Decisions, 21(3):197–218, 2003. [22] Iain M. Johnstone, Gérard Kerkyacharian, Dominique Picard, and Marc Raimondo. Wavelet deconvolution in a periodic setting. J. R. Stat. Soc. Ser. B Stat. Methodol., 66(3):547–573, 2004. [23] Iain M. Johnstone and Bernard W. Silverman. Discretization effects in statistical inverse problems. J. Complexity, 7(1):1–34, 1991. [24] Jérôme Kalifa and Stéphane Mallat. Thresholding estimators for linear inverse problems and deconvolutions. Ann. Statist., 31(1):58–109, 2003. [25] G. Kerkyacharian, P. Petrushev, D. Picard, and Y. Xu. Localized polynomials and frames induced by laguerre functions. 2005. [26] G. Kerkyacharian and D. Picard. Thresholding algorithms and well-concentrated bases. Test, 9(2), 2000. [27] G. Kerkyacharian, D. Picard, P. Petrushev, and T. Willer. Needvd: second generation wavelets for estimation in inverse problems. 2006. LPMA 2006. [28] G. Kerkyacharian, D. Picard, and M. Raimondo. Adaptive boxcar deconvolution on full lebesgue measure sets. 2005. Preprint LPMA. [29] Bernard A. Mair and Frits H. Ruymgaart. Statistical inverse estimation in Hilbert scales. SIAM J. Appl. Math., 56(5):1424–1444, 1996. [30] Stéphane Mallat. A wavelet tour of signal processing. Academic Press Inc., San Diego, CA, 1998. [31] Peter Mathé and Sergei V. Pereverzev. Geometry of linear ill-posed problems in variable Hilbert scales. Inverse Problems, 19(3):789–803, 2003. [32] Yves Meyer. Ondelettes et opérateurs. I. Actualités Mathématiques. [Current Mathematical Topics]. Hermann, Paris, 1990. [33] F. J. Narcowich, P. Petrushev, and J.M. Ward. conditioned systems. 2005. IMI 2005:04. Wavelet-based deconvolution for ill- [34] R. Neelamani, H. Choi, and R. Baranuik. Wavelet-based deconvolution for ill-conditioned systems. 2000. http://www-dsp.rice.edu/publications/pub/neelsh98icassp.pdf. [35] Michael Nussbaum. Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist., 24(6):2399–2430, 1996. [36] D. Nyshka, G. Wahba, S. Goldfarb, and T. Pugh. Cross validated spline methods for the estimation of three-dimensional tumor size distributions from observations on two-dimensional cross sections. J. Amer. Statist. Assoc., 79:832–846, 1984. [37] M. Pensky and B. Vidakovic. Adaptive wavelet estimator for nonparametric density deconvolution. Annals of Statistics, 27:2033–2053, 1999. [38] P. Petrushev and Y. Xu. Localized polynomials frames on the interval with jacobi weights. 2005. IMI 2005. [39] P. Petrushev and Y. Xu. Localized polynomials kernels and frames (needlets) on the ball. 2005. IMI 2005. [40] Alexandre Tsybakov. On the best rate of adaptive estimation in some inverse problems. C. R. Acad. Sci. Paris Sér. I Math., 330(9):835–840, 2000. 22 Dominique Picard and Gérard Kerkyacharian [41] Alexandre B. Tsybakov. Introduction à l’estimation non-paramétrique, volume 41 of Mathématiques & Applications (Berlin) [Mathematics & Applications]. Springer-Verlag, Berlin, 2004. [42] S. D. Wicksell. The corpuscle problem: a mathematical study of a biometric problem. 17:84– 99, 1925. [43] T. Willer. Deconvolution in white noise with a random blurring effect. 2005. LPMA 2005. LPMA, Universités Paris VII et Paris X, CNRS, 175 rue du Chevaleret, 75013 Paris, France

Log In

Estimation in inverse problems and second-generation wavelets

Related papers

Related papers

Related topics