Estimation in Inverse Problems and secondgeneration wavelets
Dominique Picard and Gérard Kerkyacharian
Abstract. We consider the problem of recovering a function f , when we receive a blurred (by a
linear operator) and noisy version : Yε = Kf + εẆ . We will have as guides 2 famous examples
of such inverse problems : the deconvolution and the Wicksell problem. The direct problem (K
is the identity) isolates the denoising operation. It can’t be solved unless accepting to estimate
a smoothed version of f : for instance, if f has an expansion on a basis, this smoothing might
correspond to stopping the expansion at some stage m. Then a crucial problem lies in finding an
equilibrium for m considering the fact that for m large, the difference between f and its smoothed
version is small, whereas the random effect introduces an error which is increasing with m. In the
true inverse problem, in addition to denoising we have to ’inverse the operator’ K, which operation
not only creates the usual difficulties, but also introduces the necessity to control the additional
instability due to the inversion of the random noise. Our purpose here is to emphasize the fact that
in such a problem, there generally exists a basis which is fully adapted to the problem, where for
instance the inversion remains very stable : this is the Singular Value Decomposition basis. On the
other hand, the SVD basis might be difficult to determine and manipulate numerically, it also might
not be appropriate for the accurate description of the solution with a small number of parameters.
Also in many practical situations, the signal provides inhomogeneous regularity, and its local
features are especially interesting to recover. In such cases, other bases (in particular localised
bases such as wavelet bases) may be much more appropriate to give a good representation of the
object at hand. Our approach here will be to produce estimation procedures trying to keep the
advantages of localisation without loosing the stability and computability of SVD decompositions.
We will especially consider two cases. In the first one (which is the case of the deconvolution
example) we show that a fairly simple algorithm (WAVE-VD) using an appropriate thresholding
technique performed on a standard wavelet system, enables us to estimate the object with rates
which are almost optimal up to logarithm factors for any Lp loss function, and on the whole range
of Besov spaces. In the second case (which is the case of the Wicksell example where the SVD bases
lies in the range of Jacobi polynomials), we prove that quite a similar algorithm (NEED-VD) can
be performed provided replacing the standard wavelet system by a second generation wavelet-type
basis : the Needlets. We use here the construction (essentially following the work of Petrushev
and co-authors) of a localised frame linked with a prescribed basis (here Jacobi polynomials)
using a Littlewood Paley decomposition combined with a cubature formula. Section 5 describes
the direct case (K = I). It has its own interest and will act as a guide for understanding the
’true’ inverse models for a reader which is unfamiliar with nonparametric statistical estimation.
It can be read first. Section 1 introduces the general inverse problem and describes the examples
of deconvolution and Wicksell problem. A review of standard methods is given with a special
focus on SVD methods. Section 2 describes the WAVE-VD procedure. Section 3 and 4 give a
description of the needlets constructions and the performances of the NEED-VD procedure.
Mathematics Subject Classification (2000). Primary 62G07, 62G20, 62C20.
Keywords. Nonparametric estimation, denoising, inverse models, Thresholding, Meyer Wavelet,
Singular value decomposition, Littlewood-Paley decomposition.
1. Inverse Models
Let H and K be two Hilbert spaces. K is a linear operator : f ∈ H 7→ Kf ∈ K. The
standard linear ill-posed inverse problem consists of recovering a good approximation fε of
2
Dominique Picard and Gérard Kerkyacharian
f solution of
g = Kf
(1)
when only a perturbation gε of g is observed. In this paper, we will consider the case where
this perturbation is an additive stochastic white noise. Namely we observe Yε defined by
the following equation :
Yε = Kf + εẆ , H, K,
(2)
ε is the amplitude of the noise. It is supposed to be a small parameter which will tend to
0. Our error will be measured in terms of this small parameter.
Ẇ is a K-white noise : i.e. for any g, h in K, ξ(g) := (Ẇ , g)K , ξ(h) := (Ẇ , h)K
are forming a random gaussian vector, centered, with marginal variance kgkK , khkK , and
covariance (g, h)K (with the obvious extension when one considers k functions instead of
2).
Equation (2) means that for any g in K, we observe : Yε (g) := (Yε , g)K = (Kf, g)K +
εξ(g) where ξ(g) ∼ N (0, kgk2 ), and Y (g), Y (h) are independent random variables for
orthogonal functions g and h.
The case where K is the identity, is called the ’direct model’ and summarized as a
memento in section 5. The reader which is unfamiliar with nonparametric statistical estimation is invited to refer to this section, which will act as a guide, for understanding
the more general inverse models. In particular, in section 5, it is recalled that the model
(2) appears as an approximation of models which appear in real practical situations, for
instance the case where (2) is replaced by a discretisation.
1.1. Two examples: Deconvolution and Wicksell’s problems.
1.1.1. Deconvolution. The following problem is probably one of the famest among inverse problems in signal processing. In the deconvolution problem, we consider the following
operator: let in this case H = K be the set of square integrable periodic functions, with
the standard L2 ([0, 1]) norm, and consider:
f ∈ H 7→ Kf =
Z
0
1
γ(u − t)f (t)dt ∈ H
(3)
where γ is a known function of H, which is generally assumed to be a regular function
(often in the sense that γ̂k , its Fourier coefficients behave like k −ν ). A very usual example
is also the box-car function.
The following figures show first four original signals to recover, which are well known
test-signals of the statistical litterature. They provide typical features which are difficult to
restore : bumps, blocks and Doppler effects. The second and third series of pictures show
their deformation after blurring (i.e. convolution with a regular function) and addition
of a noise. These figures show how the convolution regularizes the signal, making it very
difficult to recover especially the high frequency features. Their statistical investigation
can be found in [22].
A variant of this problem consists in observing Y1 , . . . , Yn , n independent and identically
distributed random variables where each Yi may be written as the following sum Yi =
Xi + Ui , where Xi and Ui again are independent, the distribution of Ui is know and of
density γ and we want to recover the common density of the Xi ’s. The direct problem is
the case where Ui = 0, for all i, and is corresponding to a standard density estimation
problem (see section 5.1) . Hence the variables Ui ’s are acting as perturbations of the Xi ’s,
whose density is to be recovered.
3
Estimation in Inverse Problems and second-generation wavelets
(a) Lidar
(b) Bumps
2.5
6
2
5
1.5
4
1
3
0.5
2
0
−0.5
1
0
0.2
0.4
0.6
0.8
0
1
0
0.2
0.4
(c) Blocks
0.6
0.8
1
0.8
1
(d) Doppler
1.5
0.5
1
0.5
0
0
−0.5
0
0.2
0.4
(a) Blurred LIDAR
0.6
0.8
−0.5
1
(b) Blurred Bumps
2.5
0
0.2
0.4
0.6
(a) Noisy−blurred LIDAR
1.5
(b) Noisy−blurred Bumps
2.5
1.5
2
2
1
1
1.5
1.5
0.5
1
0.5
1
0.5
0
0.5
0
0
0
0
0.2
0.4
0.6
0.8
1
−0.5
0
0.2
(c) Blurred Blocks
0.4
0.6
0.8
1
−0.5
0
(d) Blurred Doppler
1.5
0.2
0.4
0.6
0.8
1
−0.5
0
(c) Noisy−blurred Blocks
0.5
0.2
0.4
0.6
0.8
1
(d) Noisy−blurred Doppler
1.5
0.6
0.4
1
1
0.2
0
0.5
0
0.5
−0.2
0
−0.4
0
−0.6
−0.5
0
0.2
0.4
0.6
0.8
1
−0.5
0
0.2
0.4
0.6
0.8
1
−0.5
0
0.2
0.4
0.6
0.8
1
−0.8
0
0.2
0.4
0.6
0.8
1
1.1.2. Wicksell’s problem. Another typical example is the following classical Wicksell’s
problem [42]. Suppose a population of spheres is embedded in a medium. The spheres have
radii that may be assumed to be drawn independently from a density f . A random plane
slice is taken through the medium and those spheres that are intersected by the plane
furnish circles which radii are the points of observation Y1 , . . . , Yn . The unfolding problem
is then to infer the density of the sphere radii from the observed circle radii. This unfolding
problem also arises in medecine, where the spheres might be tumors in an animal’s liver
[36], as well as in numerous contexts (biological, engineering,...) see for instance [9].
Following [42] and [23], the Wicksell’s problem corresponds to the following operator:
K = L2 ([0, 1], dλ) dλ(x) = 4π −1 (1 − y 2 )1/2 dy
Z 1
π
2 −1/2
Kf (y) = y(1 − y )
(x2 − y 2 )−1/2 f (x)dµ.
4
y
Notice however, that, in this presentation, again in order to avoid additional technicalities, we detail this problem in the white noise framework, which is simpler than the
original problem expressed above in density terms.
H = L2 ([0, 1], dµ) dµ(x) = (4x)−1 dx,
1.2. Singular Value Decomposition and projection methods. Let us
begin with a quick description of well known methods in inverse problems with random
noise.
Under the assumption that K is compact, there exist 2 orthonormal bases (SVD bases)
(ek ) of H and (gk ) of K, respectively and a sequence (bk ), tending to 0, when k goes to
infinity, such that
Kek = bk gk , K ∗ gk = bk ek
if K ∗ is the adjoint operator.
For a sake of simplicity we suppose in the sequel that K and K ∗ are into. Otherwise
we have to take care of the kernels of these operators. The bases (ek ) and (gk ) are called
singular value bases, whereas the bk ’s are simply called singular values.
4
Dominique Picard and Gérard Kerkyacharian
Deconvolution In this case simple calculations prove that the SVD bases ek and gk both
coincide with the Fourier basis. The singular values are corresponding to the Fourier
coefficients of the function γ:
bk = γ̂k .
(4)
Wicksell In this case, following [23], we have the following SVD :
ek (x) = 4(k + 1)1/2 x2 Pk0,1 (2x2 − 1)
gk (y) = U2k+1 (y)
Pk0,1 is the Jacobi polynomial of type (0, 1) with degree k Uk is the second type Chebishev
polynomial with degree k. The singular values are :
π
(1 + k)−1/2 .
(5)
bk =
16
1.2.1.
SVD Method. The Singular Value Decomposition (SVD) of K :
X
Kf =
bk hf, ek igk
k
gives rise to approximations of the type :
fε =
N
X
k=0
b−1
k hgε , gk iek
where N = N (ε) has to be chosen properly. This SVD method is very attractive theoretically and can be shown to be asymptotically optimal in many situations (see Mathé
and Pereversev [31], Cavalier and Tsybakov [7], Mair and Ruymgaardt [29]). It also has
the big advantage of performing a quick and stable inversion of the operator. However, it
suffers from different types of limitations: The SVD bases might be difficult to determine
and manipulate numerically. Secondly while these bases are fully adapted to describe the
operator K, they might not be appropriate for the accurate description of the solution with
a small number of parameters.
Also in many practical situations, the signal provides inhomogeneous regularity, and its
local features are especially interesting to recover. In such cases, other bases (in particular
localised bases such as wavelet bases) may be much more appropriate to give a good
representation of the object at hand.
1.2.2. Projection methods. Projection methods, which are defined as solutions of (1)
restricted on finite dimensional subspaces HN and KN (of dimension N ) also give rise to
attractive approximations of f , by properly choosing the subspaces and the tuning parameter N (Dicken and Maass [10], Mathe and Pereversseh [31] together with their non linear
counterparts Cavalier and Tsybakov [7], Cavalier et al [6], Tsybakov [40], Goldenschluger
and Pereversev [19], Efromovich and Kolchinskii [16]. In the case where H = K, and K
is a self adjoint operator, the system is particularly simple to solve since the restricted
operator KN is symmetric positive definite. This is the so-called Galerkin method. Obviously, restricting to finite subspaces has similar effects and can also be seen as a Tykonov
regularisation, i.e. minimizing the least square functional penalised by a regularisation
term.
The advantage of the Galerkin method is to allow the choice of the basis. However the
Galerkin method suffers from the drawback of being unstable in many cases.
Comparing the SVD and Galerkin methods exactly states one main difficulty of the
problem. The possible antagonism between the SVD basis where the inversion of the
system is easy, and a ’localised’ basis where the signal is sparsely represented, will be the
issue we are trying to address here.
Estimation in Inverse Problems and second-generation wavelets
5
1.3. Cut-off, linear methods, thresholding. The reader may profitably refer
to subsections 5.3 and 5.4, where the linear methods and thresholding techniques are
presented in details in the direct case.
SVD as well as Galerkin methods are very sensitive to the choice of the tuning parameter
N (ε). This problems can be solved theoretically. However the solution depends heavily on
the prior assumptions of regulatity on the solution, which have to be known in advance.
In the last ten years, many nonlinear methods have been developped especially in
the direct case with the objective of automatically adapting to the unknown smoothness
and local singular behavior of the solution. In the direct case, one of the most attractive
methods is probably wavelet thresholding, since it allies numerical simplicity to asymptotic
optimality on a large variety of functional classes such as Besov or Sobolev classes.
To adapt this approach in inverse problems, Donoho [14] introduced a wavelet-like decomposition, specifically adapted to the operator K (Wavelet-Vaguelette-Decomposition)
and provided a thresholding algorithm on this decomposition. In Abramovitch and Silverman [1], this method was compared with the similar vaguelette-wavelet decomposition.
Other wavelet approaches, might be mentioned such as Antoniadis and Bigot [2], Antoniadis & al [3] and especially for the deconvolution problem, Penski & Vidakovic [37], Fan
& Koo [17], Kalifa & Mallat [24], Neelamani & al [34].
Later, Cohen et al [8] introduced an algorithm combining a Galerkin inversion to a
thresholding algorithm.
The approach developped in the sequel is greatly influenced by these previous works.
The accent we put here is over constructing (when necessary) new generation wavelet-type
bases well adapted to the operator K, instead of sticking to the standard wavelet bases
and reducing the range of potential operators covered by the method.
2. Wave-VD-type estimation :
We explain here the basic idea of the method, which is very simple. Let us expand f using
a well suited basis (to be defined later : ’ the wavelet-type’ basis’) :
X
f=
(f, ψλ )H ψλ
P
Using Parseval identity, we have βλ = (f, ψλ )H =
fi ψλi , for fi = (f, ei )H , ψλi =
(ψλ , ei )H , Let us put Yi = (Yε , gi )K , we have,
X
fj ej , K ∗ gi )H + εξi = bi fi + εξi
Yi = (Kf, gi )K + εξi = (f, K ∗ gi )K + εξi = (
j
where the ξi ’s are forming a sequence of iid centered gaussian variables with variance 1.
β̂λ =
X Yi
i
bi
ψλi
is such that E(β̂λ ) = βλ (i.e. its average value is βλ ). It is a plausible estimate of βλ . Let
us now put ourselves in a multiresolution setting, taking λ = (j, k), for j ≥ 0, k belonging
to a set χj and consider :
J
X
X
fˆ =
t(βˆjk )ψjk
j=−1 k∈χj
where t is a thresholding operator. (The reader which is unfamiliar with thresholding
techniques may profitably refer to section 5.4.)
p
(6)
t(βˆjk ) = βˆjk I{|βˆjk | ≥ κtε σj } tε = ε log 1/ε.
6
Dominique Picard and Gérard Kerkyacharian
κ here is a tuning parameter of the method which will be properly chosen later. A main
difference, here, with the direct case is the fact that the thresholding is depending on the
resolution level through the constant σj which also will be precised later. However, our
main discussion will be on how to choose the basis (ψjk ). We will see that in order to
perform in an optimal way, the method needs the basis (ψjk ) to be ’coherent’ with the
SVD basis.
We will particularly focus on two situations (corresponding to the two examples discussed in the introduction). In the first type of cases, the operator has as SVD bases the
Fourier basis. In this case, this ’coherence’ is easily obtained with ’standard’ wavelets (still,
not any kind of standard wavelets as will be seen). However, more difficult problems (and
typically the Wicksell’s problem), require, when we need to mix these coherence conditions with the desired property of localisation of the basis, the construction of new objects:
second generation-type wavelets.
2.1. WAVE-VD in a wavelet scenario. It is well known (see for instance Meyer
[32]) that wavelet bases provide characterisations of smoothness spaces such as Hölder
s
spaces Lip(s), Sobolev spaces Wps as well as Besov spaces Bpq
for a range of indices s
depending on the wavelet ψ. For the scale of Besov spaces which includes as particular
s
s
cases Lip(s) = B∞∞
(if s 6∈ N ) and Wps = Bpp
(if p = 2), the characterisation has the
following form :
XX
if f =
βjk ψjk
j≥0 k∈Z
1
1
j[s+ 2 − p ]
s ∼ k(2
kβj· klp )j≥0 klq .
kf kBpq
(7)
In this section, we take {ψjk , j ≥ −1, k ∈ ξj } to be a standard wavelet basis. More
presisely, we suppose as usual that ψ−1 stands for the scaling function, for any j ≥ −1, ξj
is a set included in N which size is of order 2j . Moreover, we assume that the following
properties are true : There exist constants cp , Cp , dp such that:
p
p
cp 2j( 2 −1) ≤ kψjk kpp ≤ Cp 2j( 2 −1)
X p
X
|uk kψjk kpp , for any sequence uk
uk ψjk kpp ≤ Dp
k
(8)
(9)
k∈ξj
k∈ξj
As in section 5, we consider the loss of a decision fˆ if the truth is f , as the Lp norm :
ˆ
kf − f kp and its associated risk :
Ekfˆ − f kpp .
The following theorem is going to evaluate this risk, when the strategy is the one introduced in the previous section, and when the true function belongs to a Besov ball
s
s ≤ M ). One nice property of this estimation procedure is that
(f ∈ Bπ,r
(M ) ⇐⇒ kf kBpq
it does not need the a priori knowledge of this regularity to get a good rate of convergence.
Theorem 2.1. If we assume that, 1 < p < ∞, 2ν + 1 > 0, and
σj2 :=
i
X ψjk
[
]2 ≤ C22jν , ∀ j ≥ 0,
b
i
i
(10)
−2
If we put κ2 ≥ 16p, 2J = [tε ] 2ν+1 , then
s
if f belongs to Bπ,r
(M ) with π ≥ 1, s ≥ 1/π, r ≥ 1 (with the restriction r ≤ π if
p
s = (2ν + 1)( 2π − 12 ), we have
Ekfˆ − f kpp ≤ C log(1/ε)p−1 [ε2 log(1/ε)]αp ,
(11)
Estimation in Inverse Problems and second-generation wavelets
7
with:
p
1
s
, if s ≥ (2ν + 1)(
− )
1 + 2(ν + s)
2π 2
1
p
1
s − 1/π + 1/p
, if
≤ s < (2ν + 1)(
− ).
α=
1 + 2(ν + s − 1/π)
π
2π
2
α=
Remarks :
1. Condition (10) is essential here. As will be shown later, this condition is linking
the wavelet system with the singular value decomposition of the kernel K. If we set
ourself in the deconvolution case, the SVD basis is the Fourier basis in such a way
i
that ψjk
is simply the Fourier coefficients of ψjk . If we choose as wavelet basis the
periodized Meyer wavelet basis (see Meyer [32] and Mallat [30]), conditions (8) and
(9) are satisfied. In addition, as the Meyer wavelet has the remarkable property of
being compactly supported in the Fourier domain, simple calculations prove that, for
i
any j ≥ 0, k, the number of i’s such that ψjk
6= 0 is finite and of the order 2j . Then
if we assume to be in the so-called ’regular’ case (bk ∼ k −ν , for all k), it is easy to
establish that (10) is true. This condition is also true for more general cases in the
deconvolution setting such as the box-car deconvolution, see [22], [28].
2. These results are minimax (see [43]) up to logarithmic factors. This means that if
we consider the best estimator in its worse performance over a given Besov ball,
this estimator attains a rate of convergence which is the one given in (11) up to
logarithmic factors.
3. If we compare these results to the rates of convergence obtained in the direct model
(see subsections 5.3 and 5.4), we see that the difference (up to logarithmic terms)
essentially lies in the parameter ν which acts as a reducing factor of the rate of
convergence. This parameter quantifies the extra difficulty offered by the inverse
problem. It is often called coefficient of illposedness. If we recall that in the deconvolution case, the coefficients bk ’s are the Fourier coefficients of the function γ, the
illposedness coefficient then clearly appears to be closely related to the regularity of
the blurring function.
✸
This result has been proved in the deconvolution case in [22]. The proof of this theorem is
given in Appendix I.
2.2. WAVE-VD in Jacobi scenario: NEED-VD. We have seen that the
results given above are true under the condition (10) on the wavelet basis.
Let us first appreciate how the condition (10) links the ’wavelet-type’ basis to the SVD
basis (ek ). To see this let us put ourselves in the regular case :
bi ∼ i−ν .
(by this, we mean more precisely that there exist two positive constants c and c′ such that
c′ i−ν ≤ bi ≤ ci−ν .)
If (10) is true, we have :
C22jν ≥
Hence, ∀ m ≥ j, :
X
X
X
m 2m ≤i<2m+1
2m ≤i<2m+1
[
i
ψjk
]2
bi
i 2
[ψjk
] ≤ c22ν(j−m)
8
Dominique Picard and Gérard Kerkyacharian
This suggests the necessity to contruct a ’wavelet-type’ basis which support, at the
level j, with respect to the SVD basis (sum in i) is concentrated on the integers between 2j
and 2j+1 and exponentially decreasing after this band. This is exactly the case of Meyer’s
wavelet, when the SVD basis is the Fourier basis.
In the general case of an arbitrary linear operator giving rise to an arbitrary SVD basis
(ek ), and if in addition to (10), we add a localisation condition on the basis, we do not know
if such a construction can be performed. However, in some cases, even quite as far from the
deconvolution as the Wicksell’s problem, one can build a ’second generation wavelet-type’
basis, with exactly these properties.
The following construction due to Petrushev and collaborators ([33], [39], [38]) exactly
realizes the paradigm mentioned above, producing a frame (the needlet basis) in the case
where the linear operator has Jacobi polynomials as SVD basis, (as well as in different
other cases such as spherical harmonics, Hermite functions, Laguerre polynomials) which
has the property of being localised.
3. Petrushev construction of Needlets
Frames were introduced in the 1950’s by Duffin and Schaeffer [15] to represent functions
via over-complete sets. Frames including tight frames arise naturally in wavelet analysis on
Rd . Tight frames which are very close to orthonormal bases are especially useful in signal
and image processing.
We will see that the following construction has the advantage of being easily computable
and producing well localised tight frames constructed on a specified orthonormal basis.
We recall the following definition.
Definition 3.1. Let H be a Hilbert space, and (en ) a sequence in H, (en ) is a tight frame
(with constant 1) if :
X
∀f ∈ H, kf k2 =
|hf, en i|2
n
Let now Y be a metric space, µ a finite measure. Let us suppose that we have the
following decomposition
∞
M
L2 (Y, µ) =
Hk
k=0
where the Hk ’s are finite dimensional spaces. For sake of simplicity, we suppose that H0
is reduced to the constants.
Let Lk be the orthogonal projection on Hk :
Z
f (y)Lk (x, y)dµ(y)
∀f ∈ L2 (Y, µ), Lk (f )(x) =
Y
where
Lk (x, y) =
lk
X
eki (x)ēki (y)
i=1
(eki )i=1,...lk
lk is the dimension of Hk and
an orthonormal basis of Hk . Let us observe that
we have the following property of the projection operators:
Z
Lk (x, y)Lm (y, z)dµ(z) = δk,m Lk (x, z)
(12)
The construction, also inspired by the paper of Frazier, Jawerth and Weiss [18], is based
on two fundamental steps : Littlewood-Paley decomposition and discretization, which are
summarized in the two following subsections.
9
Estimation in Inverse Problems and second-generation wavelets
3.1. Littlewood -Paley decomposition. Let ϕ be a C ∞ function supported in
|ξ| ≤ 1, such that 1 ≥ ϕ(ξ) ≥ 0 and ϕ(ξ) = 1 if |ξ| ≤ 21 . Let us define:
a2 (ξ) = ϕ(ξ/2) − ϕ(ξ) ≥ 0
so that
X
∀|ξ| ≥ 1,
a2 (ξ/2j ) = 1
(13)
j
Let us now define the operator
Λj =
X
a2 (k/2j )Lk
k≥0
and the associated kernel
X
Λj (x, y) =
a2 (k/2j )Lk (x, y) =
k≥0
X
a2 (k/2j )Lk (x, y.)
2j−1 <k<2j+1
The following proposition is true :
Proposition 3.2.
∀f ∈ H, f = lim L0 (f ) +
J→∞
For Mj (x, y) =
X
j
a(k/2 )Lk (x, y),
J
X
Λj (x, y) =
k
Λj (f )
(14)
Mj (x, z)Mj (z, y)dµ(z)
(15)
j=0
Z
Proof.
L0 (f ) +
J
X
J X
X
X
(
a2 (k/2j )Lk ) =
ϕ(k/2J+1 )Lk
Λj (f ) = L0 +
j=0
j=0
k
So:
X
X
k
ϕ(k/2J+1 )Lk (f ) − f k2 =
kLl (f )k2 +
k
l≥2J+1
≤
X
l≥2J
X
2J ≤l<2J+1
kLl (f )(1 − ϕ(l/2J+1 )k2
2
kLl (f )k −→ 0, when
(15) is a simple consequence of (12).
3.2. Discretization. Let us define
Kk =
(16)
k
k
M
Hm ,
m=0
and let us assume that some additional assumptions are true:
1.
f ∈ Kk =⇒ f¯ ∈ Kk
2.
f ∈ Kk , g ∈ Kl =⇒ f g ∈ Kk+l
J →∞
10
Dominique Picard and Gérard Kerkyacharian
3. Quadrature formula: for all k ∈ N, there exists χk a finite subset of Y and positive
real numbers λξ > 0 indiced by the elements ξ of χk , such that
Z
X
λξ f (ξ)
∀f ∈ Kk ,
f dµ =
ξ∈χk
Then the operator Mj defined in the subsection above is such that: Mj (x, z) = Mj (z, x)
and
z 7→ Mj (x, z) ∈ K2j+1 −1 .
So that
z 7→ Mj (x, z)Mj (z, y) ∈ K2j+2 −2 ,
and we can write:
Λj (x, y) =
Z
X
Mj (x, z)Mj (z, y)dµ(z) =
λξ Mj (x, ξ)Mj (ξ, y)
ξ∈χ2j+2 −2
This implies:
Λj f (x) =
=
Z
Λj (x, y)f (y)dµ(y) =
X
ξ∈χ2j+2 −2
p
λξ Mj (x, ξ)
Z
Z
X
λξ Mj (x, ξ)Mj (ξ, y)f (y)dµ(y)
ξ∈χ2j+2 −2
p
λξ Mj (y, ξ)f (y)dµ(y)
This can be summarized in the following way, if we put χ2j+2 −2 = Zj ,
X
hf, ψj,ξ iψj,ξ (x).
Λj f (x) =
p
λξ Mj (x, ξ) = ψj,ξ ,
ξ∈Zj
Proposition 3.3. The family (ψj,ξ )j∈N,ξ∈Zj is a tight frame
Proof. As
f = lim (L0 (f ) +
J−→∞
X
kf k2 = lim (hL0 (f ), f i +
J−→∞
but
hΛj (f ), f i =
X
ξ∈Zj
Λj (f ))
j≤J
X
j≤J
hΛj (f ), f i)
hf, ψj,ξ ihψj,ξ , f i =
X
ξ∈Zj
|hf, ψj,ξ i|2
and if ψ0 is a normalized constant hL0 (f ), f i = |hf, ψ0 i|2 so that
X
kf k2 = |hf, ψ0 i|2 +
|hf, ψj,ξ i|2
j∈N, ξ∈Zj
but this is exactly the characterization of a tight frame we gave.
3.3. Localisation properties. This construction has been performed in different
frameworks by Petrushev and coauthors giving in each situation very nice localisation
properties.
The following figure (thanks to Paolo Baldi) is an illustration of this phenomenum:
it shows a needlet constructed as explained above using Legendre polynomials of degree
28 . The highly oscillationg function is a Legendre polynomials of degree 28 , whereas the
localised one is a needlet centered approximately in the middel of the interval . Its localisation properties are remarquable considering the fact that both functions are polynomials
of the same order.
11
Estimation in Inverse Problems and second-generation wavelets
200
150
100
50
0
−50
−100
−150
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
In the case of the sphere of Rd+1 , where the spaces Hk are spanned by spherical harmonics, it is proved in Narcowich, Petrushev and Ward, [33] the following localisation
property: For any k there exists a constant Ck such that :
|ψjη (ξ)| ≤
Ck 2dj/2
.
[1 + 2j arccos < η, ξ >]k
A similar result exists in the case of Laguerre polynomials on R+ [25].
In the case of Jacobi polynomials on the interval with Jacobi weight, it is proved in
Petrushev and Xu [38] the following localisation property: For any k there exist constant
C, c such that :
|ψjη (cos θ)| ≤
C2j/2
p
.
(1 + (2j |θ − arccos η|)k wαβ (2j , cos θ)
where wαβ (n, x) = (1−x+n−2 )α+1/2 (1+x+n−2 )β+1/2 ,
−1/2.
−1 ≤ x ≤ 1 if α > −1/2, β >
4. NEED-VD in the Jacobi case
Let us now come back to the estimation algorithm.
We consider now an operator K such that the space :
H = L2 (I, dγ(x))
I = [−1, 1], dγ(x) = ωα,β (x)dx, ωα,β (x) = (1 − x)α (1 + x)β ; α > −1/2, β > −1/2
For a sake of simplicity, let us suppose α ≥ β. (Otherwise we can exchange the parameters.)
Let Pk be the normalized Jacobi polynomials for this weight. We suppose that these
polynomials appears as SVD basis of the operator K, as it is the case for the Wicksell
problem, with β = 0, α = 1, bk ∼ k −1/2 .
12
Dominique Picard and Gérard Kerkyacharian
4.1. Needlets and condition (10). Let us define the ’needlets’ as constructed
above:
ψj,ηk =
X
l
p
â(l/2j−1 )Pl (x)Pl (ηk ) bj,ηk
(17)
The following proposition proves that such a construction always implies the condition
(10) in the regular case.
Proposition 4.1. As soon as ψj,ηk is a frame, if bi ∼ i−ν , then
σj2 :=
i
X ψjk
[
]2 ≤ C22jν
b
i
i
Proof. : As soon as the family ψj,ηk is a frame (not necessarily tight), as the elements of a
i
frame are bounded, and the set {i, ψjk
6= 0} is included into the set {C1 2j , . . . , C2 2j }, we
have,
i
X ψjk
[
]2 ≤ C2jν kψj,ηk k2 ≤ C ′ 2jν
b
i
i
4.2. Convergence results in the Jacobi case. The following theorem is the
analogous of theorem 2.1, in this case. As can be seen, the results there are at the same
time more difficult to obtain (the following theorem do not cover the same range as the
previous one), and richer since they furnish new rates of convergence.
Theorem 4.2. Let us suppose that we are in the Jacobi case as stated above (α ≥ β > − 12 ):
We put
p
tε = ε log 1/ε
2
,
− 1+2ν
2 J = tε
choose κ ≥ 16p[1 + ( α2 −
α+1
p )+ ],
and suppose that we are in the regular case i.e.
1
ν>− .
2
bi ∼ i−ν ,
Then if f =
P P
j
k
βj,ηk ψj,ηk is such that,
X
(
|βj,ηk |p kψj,ηk kpp )1/p ≤ ρj 2−js , ρ. ∈ lr
then
p
Ekfˆ − f kpp ≤ C[log(1/ε)]p−1 [ε log(1/ε)]µp ,
with :
1. If p < 2 +
1
α+1/2
µ=
2. If p > 2 +
1
α+1/2
µ=
s
s+ν +
1
2
s
s+ν +α+1−
2(1+α)
p
13
Estimation in Inverse Problems and second-generation wavelets
This theorem is proved in Kerkyacharian et al. [27]. Simulation results on these methods are given there, showing that their performances are far above the usual SVD methods
in several cases. It is interesting to notice that the rate of convergence which are obtained
here agree with the minimax rates evaluated in Johnstone and Silverman [23] where the
1
) show a rate of convergence
case p = 2 is considered. But the second case (p > 2 + α+1/2
which is new in the litterature. In [27], where the whole range of Besov bodies is considered,
more atypical rates are given.
5. Direct Models (K = I): a memento
5.1. The density model. The famest nonparametric model consists in observing
n i.i.d. random variables having a common density f on the interval [0, 1], and trying to
give an estimation of f .
A standard route to perform this estimation consists in expanding the density f in an
orthonormal basis {ek , k ∈ N} of an Hilbert space H -assuming implicitely that f belongs
to H.
X
f=
θl e l
l∈N
If H happens to be the space L2 = {g : [0, 1] 7→ R, kgk22 :=
that
Z 1
θl =
el (x)f (x)dx = Eel (Xi ).
R1
0
g 2 < ∞}, we observe
0
Replacing the expectation by the empirical one leads to a standard estimate for θl :
n
θ̂l =
1X
el (Xi ).
n i=1
At this step, the simplest choice of estimate for f is obviously :
fˆm =
m
X
θ̂l el
(18)
i=1
5.2. From the density to the white noise model. Before analysing the
properties of the estimator defined above, let us observe that the previous approach (representing f by its coefficients {θk , k ≥ 0}, leads to summarize the information in the
following sequence model :
{θ̂k , k ≥ 0}.
(19)
We can write θ̂k =: θk + uk , with
n
1X
uk =
[ek (Xi ) − θk ],
n i=1
Central limit theorem is a relatively convincing argument that the model (19), may be
approximated by the following one
ηk
{θ̂k = θk + √ , k ≥ 0}.
n
(20)
where the ηk′ s are forming a sequence of i.i.d. gaussian, centered variables with fixed
variance σ 2 say. Such an approximation requires more delicate calculations than these
quick arguments and is rigourously proved in Nussbaum [35], see also Brown and Low [5].
14
Dominique Picard and Gérard Kerkyacharian
This model is the sequence space model associated to the following global observation, so
called white noise model (with ε = n−1/2 ):
dYt = f (t)dt + εdWt , t ∈ [0, 1]
R
R
R
where for any ϕ ∈ L2 ([0, 1], dt), [0,1] ϕdYt = [0,1] f ϕdt + ε [0,1] ϕdWt is observable.
(20) formally consists in considering, all the observables obtained for ϕ = ek , for all
k in N. Among nonparametric situations, the white noise model considered above is one
of the simplest, at least technically. Mostly for this reason, this model has been given a
central place in statistics, particularly by the Russian school, following Ibraguimov and
Has’minskii (see for instance their book [20]). However it arises as an appropriate large
sample limit to more general nonparametric models, such as regression with random design,
or non independent spectrum estimation, diffusion models -see for instance [21], [4]...-
5.3. The linear estimation: how to choose the tuning parameter m
? . In (18), the choice of m is crucial.
To better understand the situation, let us have a look to the risk of the strategy fˆm . If
we consider that, when deciding fˆm , when f is the truth, we have a loss of order kfˆm − f k22 ,
then, our risk will be the following mathematical expectation :
Ekfˆm − f k22 .
Of course this way of measuring our risk is arguable since there is no particular reason for
the L2 norm to reflect well the features we want to recover in the signal. For instance, an
L∞ -norm could be prefered because it is easier to visualize. In general, several Lp norms
are considered (as it is the case in sections 2.1 and 4.2). Here we restrict to the L2 case
for a sake of simplicity.
To avoid technical difficulties, we set ourselves in the case of a white noise model,
considering that we observe the sequence defined in (20). Hence,
E(θ̂l − θl )2 =
1
n
Z
1
el (x)2 dx =
0
1
:= ε2 .
n
We are now able to obtain :
Ekfˆm − f k22 =
X
l≤m
(θ̂l − θl )2 +
≤ mε2 +
If we now assume that :
∀k ∈ N∗ ,
X
l>k
X
X
θl2
l>m
θl2
l>m
θl2 ≤ M k −2s
(21)
for some s > 0 which is here an index of regularity directly connected to the size of the
compact in l2 in which the function f is supposed to lie, we get
Ekfˆm − f k22 ≤ mε2 + M m−2s
We observe that the RHS is the sum of two factors: one (called the stochatic term) is
increasing in m and reflects the fact that because of the noise, the more coefficients we
have to estimate, the larger the global error will be. The second one (called the bias term
or approximation term) does not depend on the noise and is decreasing in m. The RHS is
−2
optimised by choosing m = m∗ (s) =: c(s, M )ε 1+2s . Then
Ekfˆm∗ (s) − f k2 ≤ c′ (s, M )ε 1+2s .
−4s
15
Estimation in Inverse Problems and second-generation wavelets
Let us observe that the more f is supposed to be regular (in the sense the larger s is), the
less coefficients we need to estimate : a very irregular function (s close to 0) requires almost
as much as ε−2 = n coefficients, which corresponds to estimate as many coefficients as the
number of available observations -in the density model for instance-. The rate obtained
in (5.3) can be proved to be optimal in the following sense (minimax) : if we consider
the best estimator in its worse performance over the class of functions verifying (21), this
estimator attains a rate of convergence which is (up to a constant) the one given in (5.3).
See Tsybakov [41] for a detailed review of the minimax point of view.
5.4. The thresholding estimation. Let us now suppose that the constant s,
which plays an essential role in the construction of the previous estimator is not known.
This is realistic, since it is extremely rare to know in advance that the function we are seeking has a specified regularity. Also, the previous approach takes very seriously into account
the order in which the basis is taken. Let us now present a very elegant way of addressing
at the same time both of these issues. The thresholding techniques which have been known
for long by engineers in electronic and telecommunications, was introduced in statistics in
Donoho and Johnstone [11] and later in a series of papers on wavelet thresholoding [12],
[13]. It allies numerical simplicity to asymptotic optimality.
It starts from a different kind of observation : let us introduce the following estimate :
fe =
B
X
k=0
θ̂k I{|θ̂k | ≥ κtε }ek
(22)
Here the point of view is the following. We choose B very large (i.e. almost corresponding
to s = 0)
B = ε−2 log 1/ε.
But instead of keeping all the coefficients which numbers are between 0 and B, we decide to
kill those which are not above the threshold tε . The intuitive justification of this choice is
the following. Assuming that f has some kind of regularity conditions like (21) (unknown,
but real...), essentially means that the coefficients θk of f are small except a small number
of them. Hence, in the reconstruction of f , only these ones will be significant. tε is chosen,
in such a way that the noise θ̂k − θk due to the randomness of the observation might be
neglected:
tε = ε[log 1/ε]−1/2 .
Now, let us assume another type of condition on f , namely : there exists a positive
constant 0 < q < 2, such that,
∀ k ∈ N∗ , sup λq #{k, |θk | ≥ λ} ≤ M
(23)
λ>0
Ekfe − f k22 =
≤
+
X
l≤B
X
l
X
l≤B
(θ̂l I{|θ̂l | ≥ κtε } − θl )2 +
X
θl2
l>B
(θ̂l − θl )2 I{|θl | ≥ κtε /2} +
X
l
θl2 I{|θl | ≤ 2κtε }
[(θ̂l − θl )2 + θl2 ]I{|θ̂l − θl | ≥ κtε /2} +
X
θl2
l>B
Now, using the following probabilist bounds,
E(θ̂l − θl )2 = ε2 , P(|θ̂l − θl | ≥ λ) ≤ 2 exp −λ2 ε2 , ∀λ > 0
and the fact that condition (23) implies :
X
θl2 I{|θl | ≤ 2κtε } ≤ Ct2−q
ε
l
16
Dominique Picard and Gérard Kerkyacharian
we get,
′ 2−q
Ekfe − f k22 ≤ M ε2 t−q
+ εκ
ε + C tε
2
/8
B+
X
θl2
l>B
It remains now to choose κ2 ≥ 32, to get,
Ekfe − f k22 ≤ C ′ t2−q
+
ε
X
θl2
l>B
and if we assume in addition to (23),
∀k ∈ N∗ ,
X
l>k
θl2 ≤ M k −
2−q
2
(24)
we get,
Ekfe − f k22 ≤ C”t2−q
ε
Note that the interesting point in this construction is that the regularity conditions which
are assumed on the function f are not known from the statistician, since they do not enter
into the construction of the procedure. This property is called adaptation.
2
Now, to compare with the previous section, let us take q = 1+2s
, it is not difficult
to prove that as soon as f verifies (21), it automatically verifies (23) and (24). Hence
fe and fˆm∗(s) have the same rate of convergence up to a logarithmic term. If we neglect
this logarithmic loss, we substantially gain here the fact that we need not know the apriori
regularity conditions on the aim function. It can also be proved that in fact conditions (23)
and (24) are defining a set which is substantially larger than the set defined by condition
(21): for instance its entropy is strictly larger (see [26]).
6. Appendix I : Proof of Theorem 2.1
In this proof, C will denote an absolute constant which may change from one line to the
other.
s
We can always suppose p ≥ π. Indeed, if π ≥ p it is very simple to see that Bπ,r
(M ) is
1
1
1
1
s
included into Bp,r
(M ): as 2j[s+ 2 − p ] kβj· klp ≤ 2j[s+ 2 − π ] kβj· klπ (since the cardinal of ξj is
j
of order 2 ).
First we have the following decomposition :
Ekfˆ − f kpp ≤ 2p−1 {Ek
J
X
X
j=−1 k∈χj
(t(βˆjk ) − βjk )ψjk kpp + k
X X
j>J k∈χj
βjk ψjk kpp }
=: I + II
s
The term II is easy to analyse, as follows: Since f belongs to Bπ,r
(M ), using standard
embedding results (which in this case simply follows from direct comparisons between lq
1
1
−p
)+
s−( π
norms) we have that f also belong to Bp,r
k
X X
j>J k∈χj
Then we only need to verify that
(M ′ ), for some constant M ′ . Hence
1
1
βjk ψjk kp ≤ C2−J[s−( π − p )+ ] .
1
1
s−( π
−p
)+
1+2ν
is always larger that α, which not difficult.
17
Estimation in Inverse Problems and second-generation wavelets
Bounding the term I is more involved. Using the triangular inequality together with Hölder
inequality, and property (9) for the second line, we get
I ≤ 2p−1 J p−1
J
X
Ek
j=−1
≤ 2p−1 J p−1 Dp
X
(t(βˆjk ) − βjk )ψjk kpp
X
E|t(βˆjk ) − βjk |p kψjk kpp
k∈χj
J
X
j=−1 k∈χj
Now, we separate four cases :
PJ
j=−1
P
k∈χj
E|t(βˆjk ) − βjk |p kψjk kpp
=
≤
≤
ˆjk ) − βjk |p kψjk kp I{|βˆjk | ≥ κtε σj }
E|t(
β
p
j=−1
k∈χj
+I{|βˆjk | < κtε σj }
PJ
P
p
p
ˆ
ˆ
j=−1
k∈χj E|βjk − βjk | kψjk kp I{|βjk | ≥ κtε σj }
I{|βjk | ≥ κ2 tε σj } + I{|βjk | < κ2 tε σj }
+|βjk |p kψjk kpp I{|βˆjk | ≤ κtε σj } I{|βjk | ≥ 2κtε σj }
+I{|βjk | < 2κtε σj }
PJ
P
: Bb + Bs + Sb + Ss
P ψi
P
i
i fi
ψjk
= ε i ξi bjk
is a gaussian random variable
If we notice that βˆjk − βjk = i Yi −b
bi
i
i
P ψjk
2
2
centered, and with variance ε
i [ bi ] , we have using standard properties of the gaussian
P ψi 2
] ≤ C22jν , and denote by
distribution, for any q ≥ 1, if we recall that we set σj2 =: i [ bjk
i
sq the qth absolute moment of the gaussian distribution when centered and with variance
1:
E|βˆjk − βjk |q ≤ sq σjq εq
2
κ
P{|βˆjk − βjk | ≥ tε σj } ≤ 2εκ /8
2
Hence,
Bb ≤
Ss ≤
J
X
X
sp σjp εp kψjk kpp I{|βjk | ≥
J
X
X
|βjk |p kψjk kpp I{|βjk | < 2κtε σj }
j=−1 k∈χj
j=−1 k∈χj
κ
tε σj }
2
And,
Bs ≤
≤
J
X
X
κ
κ
[E|βˆjk − βjk |2p ]1/2 [P{|βˆjk − βjk | ≥ tε σj }]1/2 kψjk kpp I{|βjk | < tε σj }
2
2
J
X
X
s2p σjp εp 21/2 εκ
j=−1 k∈χj
1/2
j=−1 k∈χj
≤C
J
X
j=−1
1
2jp(ν+ 2 ) εp εκ
2
/16
2
/16
kψjk kpp I{|βjk | <
≤ Cεκ
2
/16
κ
tε σj }
2
18
Dominique Picard and Gérard Kerkyacharian
Now, if we remark that the βjk are necessarily all bounded by some constant (depending
s
on M ) since f belongs to Bπ,r
(M ), and using (8),
Sb ≤
J
X
X
|βjk |p kψjk kpp P{|βˆjk − βjk | ≥ 2κtε σj }I{|βjk | ≥ 2κtε σj }
J
X
X
|βjk |p kψjk kpp 2εκ
j=−1 k∈χj
≤
j=−1 k∈χj
≤C
J
X
p
2 j 2 εκ
2
/8
≤ Cε
j=−1
κ2
8
2
/8
I{|βjk | ≥ 2κtε σj }
p
− 2(2ν+1)
It is easy to check that in any cases if κ2 ≥ 16p the terms Bs and Sb are smaller than
the rates announced in the theorem.
We have using (8) and condition (10) for any z ≥ 0:
Bb ≤ Cεp
≤ Cεp
J
X
2j(νp+ 2 −1)
J
X
2j(νp+ 2 −1)
p
j=−1
I{|βjk | ≥
X
|βjk |z [tε σj ]−z
k∈χj
p
j=−1
≤ Ctε p−z
X
k∈χj
J
X
p
2j[ν(p−z)+ 2 −1]
j=−1
X
k∈χj
κ
tε σj }
2
|βjk |z
Also, for any p ≥ z ≥ 0
Ss ≤ C
J
X
p
2j( 2 −1)
j=−1
X
k∈χj
≤ C[tε ]p−z
J
X
|βjk |z σjp−z [tε ]p−z
p
2j(ν(p−z)+ 2 −1)
j=−1
X
k∈χj
|βjk |z
So in both cases we have the same bound to investigate. We will write this bound on
the following form (forgetting the constant) :
I + II = tε p−z1 [
j0
X
p
2j[ν(p−z1 )+ 2 −1]
j=−1
X
k∈χj
|βjk |z1 ]+ tε p−z2 [
J
X
p
2j[ν(p−z2 )+ 2 −1]
j=j0 +1
X
k∈χj
|βjk |z2 ]
The constants zi and j0 will be chosen depending on the cases.
Let us first consider the case where s ≥ (ν + 21 )( πp − 1), put
q=
p(2ν + 1)
2(s + ν) + 1
and observe that on the considered domain, q ≤ π and p > q. In the sequel it will be used
that we have automatically s = (ν + 12 )( pq − 1). Now, taking z2 = π, we get :
II ≤ tε p−π [
J
X
j=j0 +1
p
2j[ν(p−π)+ 2 −1]
X
k∈χj
|βjk |π ]
19
Estimation in Inverse Problems and second-generation wavelets
Now, as
1
p
1
1
p
− + ν( − 1) = s + −
2q π
q
2 π
and
X
k∈χj
1
1
|βjk |π = 2−j(s+ 2 − π ) τj
s
with (τj )j ∈ lr (this is a consequence of the fact that f ∈ Bπ,r
(M ) and (6)). Hence, we
can write :
II ≤ tε p−π
≤ Ctε
X
1
π
2jp(1− q )(ν+ 2 ) τjπ
j=j0 +1
1
p−π j0 p(1− π
q )(ν+ 2 )
2
The last inequality is true for any r ≥ 1 if π > q and for r ≤ π if π = q. Notice that π = q
p
1
is equivalent to s = (ν + 21 )( πp − 1). Now if we choose j0 such that 2j0 q (ν+ 2 ) ∼ tε −1 we get
the bound
tε p−q
which exactly gives the rate announced in the theorem for this case.
As for the first part of the sum (before j0 ), we have, taking now z1 = qe, with qe ≤ π, so
P
P
1
1
that [ 21j k∈χj |βjk |qe] qe ≤ [ 21j k∈χj |βjk |π ] π , and using again (6),
j0
X
X
p
|βjk |qe]
2j[ν(p−eq )+ 2 −1]
I ≤ tε p−eq [
k∈χj
−1
j0
X
2j[ν(p−eq )+ 2 − π ] [
j0
X
2j[(ν+ 2 )p(1− q )] τjqe
≤ tε p−eq [
≤ tε p−eq
≤ Ctε
≤ Ctε
q
e
p
X
k∈χj
−1
q
e
|βjk |π ] π ]
q
e
1
−1
p−e
q j0 [(ν+ 21 )p(1− qqe )]
2
p−q
The last two lines are valid if qe is chosen strictly smaller than q (this is possible since
π ≥ q).
Let us now consider the case where s < (ν + 21 )( pq − 1), and choose now
q=
p
,
2(s + ν − π1 ) + 1
s−1/π+1/p
, q−π =
in such a way that we easily verify that p − q = 2 1+2(ν+s−1/π)
1
π.
(p−π)(1+2ν)
>
1
2(s+ν− π
)+1
p
1
1
1
2 − π = 2q − q
because s is supposed to be larger that
Furthermore we also have s +
p
ν( q − 1).
s
Hence taking z1 = π and using again the fact that f belongs to Bπ,r
(M ),
I ≤ tε
p−π
k∈χj
−1
≤ tε p−π
≤ Ctε
j0
X
X
p
2j[ν(p−π)+ 2 −1]
[
|βjk |π ]
j0
X
1
1
p
2j[(ν+ 2 − p ) q (q−π)] τjπ
−1
1 p
) q (q−π)]
p−π j0 [(ν+ 21 − p
2
0,
+
20
Dominique Picard and Gérard Kerkyacharian
This is true since ν +
take 2
1
1
j0 p
q (ν+ 2 − p )
∼ tε
1
1
2 − p
−1
is also strictly positive because of our constraints. If we now
we get the bound
tε p−q
which is the rate announced in the theorem for this case.
Again, for II, we have, taking now z2 = qe > q ( > π)
J
X
II ≤ tε p−eq [
p
2j[ν(p−eq )+ 2 −1]
j=j0 +1
≤ Ctε p−eq
≤ Ctε
≤ Ctε
X
X
k∈χj
2
1 p
j[(ν+ 12 − p
) q (q−e
q )]
|βjk |qe]
q
e
zjπ
j=j0 +1
1 p
) q (q−e
q )]
p−e
q j0 [(ν+ 21 − p
2
p−q
References
[1] F. Abramovich and B. W. Silverman. Wavelet decomposition approaches to statistical inverse
problems. Biometrika, 85(1):115–129, 1998.
[2] A. Antoniadis and J. Bigot. Poisson inverse models. 2004. Preprint Grenoble.
[3] A. Antoniadis, J. Fan, and I. Gijbels. A wavelet method for unfolding sphere size distributions.
29:265–290, 2001.
[4] Lawrence D. Brown, T. Tony Cai, Mark G. Low, and Cun-Hui Zhang. Asymptotic equivalence
theory for nonparametric regression with random design. Ann. Statist., 30(3):688–707, 2002.
[5] Lawrence D. Brown and Mark G. Low. Asymptotic equivalence of nonparametric regression
and white noise. Ann. Statist., 24(6):2384–2398, 1996.
[6] L. Cavalier, G. K. Golubev, D. Picard, and A. B. Tsybakov. Oracle inequalities for inverse
problems. Ann. Statist., 30(3):843–874, 2002.
[7] Laurent Cavalier and Alexandre Tsybakov. Sharp adaptation for inverse problems with random noise. Probab. Theory Related Fields, 123(3):323–354, 2002.
[8] Albert Cohen, Marc Hoffmann, and Markus Reiß. Adaptive wavelet Galerkin methods for
linear inverse problems. SIAM J. Numer. Anal., 42(4):1479–1501 (electronic), 2004.
[9] L.M. Cruz-Orive. Distribution-free estimation of sphere size distributions from slabs showing
overprojections and truncations, with a review of previous methods. J. Microscopy, 131:265–
290, 1983.
[10] V. Dicken and P. Maass. Wavelet-Galerkin methods for ill-posed problems. J. Inverse IllPosed Probl., 4(3):203–221, 1996.
[11] D. L. Donoho and I. M. Johnstone. Minimax risk over ℓp -balls for ℓq -error. Probability Theory
and Related Fields, 99:277–303, 1994.
[12] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society, Series B, 57:301–369, 1995. With Discussion.
[13] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Density estimation by
wavelet thresholding. Annals of Statistics, 24:508–539, 1996.
[14] David L. Donoho. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal., 2(2):101–126, 1995.
[15] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series. Trans. Amer. Math.
Soc., 72:341–366, 1952.
[16] Sam Efromovich and Vladimir Koltchinskii. On inverse problems with unknown operators.
IEEE Trans. Inform. Theory, 47(7):2876–2894, 2001.
Estimation in Inverse Problems and second-generation wavelets
21
[17] J. Fan and J.K. Koo. Wavelet deconvolution. IEEE Transactions on Information Theory,
48(3):734–747, 2002.
[18] M. Frazier, B. Jawerth, and G. Weiss. Littlewood paley theory and the study of functions
spaces. CMBS, 79, 1991. AMS.
[19] Alexander Goldenshluger and Sergei V. Pereverzev. On adaptive inverse estimation of linear
functionals in Hilbert scales. Bernoulli, 9(5):783–807, 2003.
[20] I. A. Ibragimov and R. Z. Hasminskii. Statistical estimation, volume 16 of Applications of
Mathematics. Springer-Verlag, New York, 1981.
[21] Michael Jähnisch and Michael Nussbaum. Asymptotic equivalence for a model of independent
non identically distributed observations. Statist. Decisions, 21(3):197–218, 2003.
[22] Iain M. Johnstone, Gérard Kerkyacharian, Dominique Picard, and Marc Raimondo. Wavelet
deconvolution in a periodic setting. J. R. Stat. Soc. Ser. B Stat. Methodol., 66(3):547–573,
2004.
[23] Iain M. Johnstone and Bernard W. Silverman. Discretization effects in statistical inverse
problems. J. Complexity, 7(1):1–34, 1991.
[24] Jérôme Kalifa and Stéphane Mallat. Thresholding estimators for linear inverse problems and
deconvolutions. Ann. Statist., 31(1):58–109, 2003.
[25] G. Kerkyacharian, P. Petrushev, D. Picard, and Y. Xu. Localized polynomials and frames
induced by laguerre functions. 2005.
[26] G. Kerkyacharian and D. Picard. Thresholding algorithms and well-concentrated bases. Test,
9(2), 2000.
[27] G. Kerkyacharian, D. Picard, P. Petrushev, and T. Willer. Needvd: second generation
wavelets for estimation in inverse problems. 2006. LPMA 2006.
[28] G. Kerkyacharian, D. Picard, and M. Raimondo. Adaptive boxcar deconvolution on full
lebesgue measure sets. 2005. Preprint LPMA.
[29] Bernard A. Mair and Frits H. Ruymgaart. Statistical inverse estimation in Hilbert scales.
SIAM J. Appl. Math., 56(5):1424–1444, 1996.
[30] Stéphane Mallat. A wavelet tour of signal processing. Academic Press Inc., San Diego, CA,
1998.
[31] Peter Mathé and Sergei V. Pereverzev. Geometry of linear ill-posed problems in variable
Hilbert scales. Inverse Problems, 19(3):789–803, 2003.
[32] Yves Meyer. Ondelettes et opérateurs. I. Actualités Mathématiques. [Current Mathematical
Topics]. Hermann, Paris, 1990.
[33] F. J. Narcowich, P. Petrushev, and J.M. Ward.
conditioned systems. 2005. IMI 2005:04.
Wavelet-based deconvolution for ill-
[34] R. Neelamani, H. Choi, and R. Baranuik. Wavelet-based deconvolution for ill-conditioned
systems. 2000. http://www-dsp.rice.edu/publications/pub/neelsh98icassp.pdf.
[35] Michael Nussbaum. Asymptotic equivalence of density estimation and Gaussian white noise.
Ann. Statist., 24(6):2399–2430, 1996.
[36] D. Nyshka, G. Wahba, S. Goldfarb, and T. Pugh. Cross validated spline methods for the estimation of three-dimensional tumor size distributions from observations on two-dimensional
cross sections. J. Amer. Statist. Assoc., 79:832–846, 1984.
[37] M. Pensky and B. Vidakovic. Adaptive wavelet estimator for nonparametric density deconvolution. Annals of Statistics, 27:2033–2053, 1999.
[38] P. Petrushev and Y. Xu. Localized polynomials frames on the interval with jacobi weights.
2005. IMI 2005.
[39] P. Petrushev and Y. Xu. Localized polynomials kernels and frames (needlets) on the ball.
2005. IMI 2005.
[40] Alexandre Tsybakov. On the best rate of adaptive estimation in some inverse problems. C.
R. Acad. Sci. Paris Sér. I Math., 330(9):835–840, 2000.
22
Dominique Picard and Gérard Kerkyacharian
[41] Alexandre B. Tsybakov. Introduction à l’estimation non-paramétrique, volume 41 of
Mathématiques & Applications (Berlin) [Mathematics & Applications]. Springer-Verlag,
Berlin, 2004.
[42] S. D. Wicksell. The corpuscle problem: a mathematical study of a biometric problem. 17:84–
99, 1925.
[43] T. Willer. Deconvolution in white noise with a random blurring effect. 2005. LPMA 2005.
LPMA, Universités Paris VII et Paris X, CNRS, 175 rue du Chevaleret, 75013 Paris, France