The Curvelet Transform For Image Denoising: Jean-Luc Starck, Emmanuel J. Candès, and David L. Donoho
The Curvelet Transform For Image Denoising: Jean-Luc Starck, Emmanuel J. Candès, and David L. Donoho
The Curvelet Transform For Image Denoising: Jean-Luc Starck, Emmanuel J. Candès, and David L. Donoho
range of wavelet-based tools and ideas have been proposed and studied. Initial efforts included very simple ideas like thresholding of the orthogonal wavelet coefficients of the noisy data, followed by reconstruction. Later efforts found that substantial improvements in perceptual quality could be obtained by translation invariant methods based on thresholding of an undecimated wavelet transform. More recently, tree-based wavelet denoising methods were developed in the context of image denoising, which exploit the tree structure of wavelet coefficients and the so-called parentchild correlations which are present in wavelet coefficients of images with edges. Also, many investigators have experimented with variations on the basic schemesmodifications of thresholding functions, level-dependent thresholding, block thresholding, adaptive choice of threshold, Bayesian conditional expectation nonlinearities, and so on. Extensive efforts by a large number of researchers have produced a body of literature which exhibits substantial progress overall, achieved by combining a sequence of incremental improvements. B. Promising New Approach In this paper, we report initial efforts at image denoising based on a recently introduced family of transformsthe ridgelet and curvelet transformswhich have been proposed as alternatives to wavelet representation of image data. These transforms, to be described further below, are new enough that the underlying theory is still under development. Software for computing these new transforms is still in a formative stage, as various trade-offs and choices are still being puzzled through. Although we have completed an initial software development only, and although the time and effort we have expended in implementation, and in fine-tuning, is miniscule in comparison to the efforts which have been made in image denoising by wavelets, we have been surprised at the degree of success already achieved. We present in this paper evidence that the new approach, in this early state of development, already performs as well as, or better than, mature wavelet image denoising methods. Specifically, we exhibit higher PSNR on standard images such as Barbara and Lenna, across a range of underlying noise levels. (At our website, additional examples are provided.) Our comparisons consider standard wavelet methods such as thresholding of standard undecimated wavelet transforms, thresholding of decimated wavelet transforms, and Bayesian tree-based methods. While of course the evidence provided by a few examples is by itself quite limited, the evidence we present is consistent with the developing theory of curvelet denoising, which predicts that, in recovering images which are smooth away from edges, curvelets will obtain dramatically smaller asymptotic mean square error of reconstruction than wavelet methods. The
I. INTRODUCTION A. Wavelet Image Denoising VER THE last decade, there has been abundant interest in wavelet methods for noise removal in signals and images. In many hundreds of papers published in journals throughout the scientific and engineering disciplines, a wide
Manuscript received January 19, 2001; revised November 21, 2001. This work was supported by the National Science Foundation under Grants DMS 98-72890 (KDI) and DMS 95-05151 and also by AFOSR MURI-95-P49620-96-1-0028. J.-L. Starck is with the CEA-Saclay, DAPNIA/SEI-SAP, F-91191 Gif sur Yvette, France and also with the Department of Statistics, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). E. J. Cands is with the Department of Applied Mathematics, California Institute of Technology, Pasadena, CA 91125 and also with the Department of Statistics, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). D. L. Donoho is with the Department of Statistics, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(02)01734-7.
671
images we study are small in size, so that the asymptotic theory cannot be expected to fully kick in; however, we do observe already, at these limited image sizes, noticeable improvements of the new methods over wavelet denoising. By combining the experiments reported here with the theory being developed elsewhere, we conclude that the new approaches offer a high degree of promise which may repay further development in appropriate image reconstruction problems. C. New Transforms The new ridgelet and curvelet transforms were developed over several years in an attempt to break an inherent limit plaguing wavelet denoising of images. This limit arises from the well-known and frequently depicted fact that the two-dimensional (2-D) wavelet transform of images exhibits large wavelet coefficients even at fine scales, all along the important edges in the image, so that in a map of the large wavelet coefficients one sees the edges of the images repeated at scale after scale. While this effect is visually interesting, it means that many wavelet coefficients are required in order to reconstruct the edges in an image properly. With so many coefficients to estimate, denoising faces certain difficulties. There is, owing to well-known statistical principles, an imposing tradeoff between parsimony and accuracy which even in the best balancing leads to a relatively high mean squared error (MSE). While this tradeoff is intrinsic to wavelet methods (and also to Fourier and many other standard methods), there exist, on theoretical grounds, better denoising schemes for recovering images which are smooth away from edges. For example, asymptotic arguments show that, in a certain continuum model of treating noisy images with formal noise parameter , for smooth away from edges, recovering an image which is whereas the MSE achievable the ideal MSE scales like by wavelet methods scales only like . (For discussions of this white noise model, see [8], [16].) To approach this ideal MSE, one should develop new expansions which accurately represent smooth functions using only a few nonzero coefficients, and which also accurately represent edges using only a few nonzero coefficients. Then, because so few coefficients are required either for the smooth parts or the edge parts, the balance between parsimony and accuracy will be much more favorable and a lower MSE results. The ridgelet transform and curvelet transform were developed explicitly to show that this combined sparsity in representation of both smooth functions and edges is possible. The continuous ridgelet transform provides a sparse representation of both smooth functions and of perfectly straight edges. As introduced in [2], the ridgelet transform in two dimensions allows the representation of arbitrary bivariate by superpositions of elements of the form functions . Here is a wavelet, is a scale parameter, is an orientation parameter, and is a location scalar parameter. These so-called ridgelets are , and along the constant along ridge lines orthogonal direction they are wavelets. Because ridgelets at are localized near lines , fine scale
it is possible to efficiently superpose several terms with common ridge lines (i.e., common and different scales to efficiently approximate singularities along a line. But ridgelets also work well for representing smooth functions, of in fact they represent functions in the Sobolev space functions with two derivatives in mean-square just as efficiently as wavelets (i.e., comparable numbers of terms for same degree of approximation). There are also various discrete ridgelet transformsi.e., expansions into a countable discrete collection of generating elementsbased on ideas of frames and orthobases. For all of these notions, one has frame/basis elements localized near lines at all locations and orientations and ranging though a variety of scales (localization widths). It has been shown that for these schemes, simple thresholding of the discrete ridgelet transform provides near-optimal -term approximations to smooth functions with discontinuities along lines [7], [4], [17]. In short, discrete ridgelet representations solve the problem of sparse approximation to smooth objects with straight edges. In image processing, edges are typically curved rather than straight and ridgelets alone cannot yield efficient representations. However at sufficiently fine scales, a curved edge is almost straight, and so to capture curved edges, one ought to be able to deploy ridgelets in a localized manner, at sufficiently fine scales. Two approaches to localization of ridgelets are possible. 1) Monoscale ridgelets: Here, one thinks of the plane as partitioned into congruent squares of a given fixed sidelength and constructs a system of renormalized ridgelets smoothly localized near each such square [3]. 2) Multiscale ridgelets: Here, one thinks of the plane as subjected to an infinite series of partitions, based on dyadic scales, where each partition, like in the monoscale case, consists of squares of the given dyadic sidelength. The corresponding dictionary of generating elements is a pyramid of windowed ridgelets, renormalized and transported to a wide range of scales and locations, see Sections II-B and IV. Both localization approaches will play important roles in this paper. Curvelets are based on multiscale ridgelets combined with a spatial bandpass filtering operation to isolate different scales [6], [5]. Like ridgelets, curvelets occur at all scales, locations, and orientations. However, while ridgelets all have global length and variable widths, curvelets in addition to a variable width have a variable length and so a variable anisotropy. The length and width at fine scales are related by a scaling length and so the anisotropy increases with law width decreasing scale like a power law. Recent work shows that thresholding of discrete curvelet coefficients provide near-optimal -term representations of otherwise smooth objects with curves. discontinuities along Quantitatively, the -term squared approximation error by . This approxicurvelet thresholding scales like mation-theoretic result implies the following statistical result. By choosing a threshold so that one is reconstructing from the noisy curvelet coefficients in a noisy image largest at noise level , one obtains decay of the MSE almost of order
672
. In contrast, in analyzing objects with edges, wavelets , give an -term squared approximation error only of size and wavelet thresholding gives a corresponding MSE only of and no better. size D. This Paper So according to theory for a certain continuous-space model, discrete ridgelet transforms and discrete curvelet transforms provide near-ideal sparsity of representation of both smooth objects and of objects with edges. In a certain continuous-space statistical theory, this implies that simple thresholding of noisy coefficients in these expansions is a near-optimal method of denoising in the presence of white Gaussian noise. In this paper we provide an initial test of these ideas in a digital image processing setting, where images are available on an -by- grid. We first review some basic ideas about ridgelet and curvelet representations in the continuum. We next use these to develop a series of digital ridgelet and digital curvelet transforms taking digital input data on a Cartesian grid. Next we consider a model denoising problem where we embed some standard images in white noise and apply thresholding in the digital curvelet transform domain. Finally we discuss interpretations and possibilities for future work. Not surprisingly, other researchers have undertaken efforts to implement ridgelet and curvelet transforms, and develop applications. In addition to work mentioned in the body of the article, we would like to point out the work of Do and Vetterli [13], Donoho and Duncan [18]. We would also like to mention the articles of Sahiner and Yagle [21][23], Olson and DeStefano [20], Zhao et al. [30], and Zuidwijk [31], [32] although these references are less directly related. II. CONTINUOUS RIDGELET TRANSFORM The 2-D continuous ridgelet transform in can be defined as follows [2]. We pick a smooth univariate function with sufficient decay and satisfying the admissibility condition (1) which holds if, say, has a vanishing mean will suppose that is normalized so that , each and each For each by the bivariate ridgelet . We . , we define
valid a.e. for functions which are both integrable and square integrable. Furthermore, this formula is stable as one has a Parseval relation (4) Hence, much like the wavelet or Fourier transforms, the identity (3) expresses the fact that one can represent any arbitrary function as a continuous superposition of ridgelets. Discrete analogs of (3) and (4) exist; see [2] or [17] for a slightly different approach. A. Radon Transform A basic tool for calculating ridgelet coefficients is to view ridgelet analysis as a form of wavelet analysis in the Radon domain. We recall that the Radon transform of an object is the collection of line integrals indexed by given by (5) where is the Dirac distribution. The ridgelet coefficients of an object are given by analysis of the Radon transform via
Hence, the ridgelet transform is precisely the application of a one-dimensional (1-D) wavelet transform to the slices of the Radon transform where the angular variable is constant and is varying. B. Ridgelet Pyramids denote a dyadic square and let be the collection of all such for the collection of all dyadic dyadic squares. We write we consquares of scale . Associated to the squares struct a partition of energy as follows. With a nice smooth , we dilate window obeying and transport to all squares at scale , producing a collecsuch that the s, , make up a tion of windows denote the transport operator partition of unity. We also let acting on functions via Let
(2)
const . this function is constant along lines Transverse to these ridges it is a wavelet. Given an integrable , we define its ridgelet coefficients by bivariate function
and, therefore, summing the above equality across squares at a given scale gives We have the exact reconstruction formula (3) (6)
673
The identity (6) expresses the fact that one can represent any function as a superposition of elements of the form ; that is, of ridgelet elements localized near the is the ridgelet squares . For the function (2) with parameters obeying
and, thus, is a windowed ridgelet, supported near the square , hence the name local ridgelet transform. The previous paragraph discussed the construction of local ( fixed). Letting the scale ridgelets of fixed length, roughly vary defines the multiscale ridgelet dictionary by
that is, a whole pyramid of local ridgelets at various lengths and locations. This is, of course, a massively overcomplete representation system and no formula like (6) is available for this multiscale ridgelet pyramid, because it is highly overcomplete. III. APPROXIMATE DIGITAL RIDGELET TRANSFORM So a basic strategy for calculating the continuous ridgelet and transform is first to compute the Radon transform . second, to apply a 1-D wavelet transform to the slices In this section we develop a digital procedure which is inspired by this viewpoint, and is realizable on by numerical arrays. A. Fourier Strategy for Digital Radon Transform A fundamental fact about the Radon transform is the projection-slice formula [12]
Fig. 1. Illustration of the digital polar grid in the frequency domain for an n by n image (n = 8). The figure displays the set of radial lines joining pairs of symmetric points from the boundary of the square. The rectopolar grid is the set of pointsmarked with circlesat the intersection between those radial lines and those which are parallel to the axes.
B. A Polar Sampling Scheme for Digital Data For our implementation of the Cartesian-to-polar conversion, we have used a pseudo-polar grid, in which the pseudo-radial variable has level sets which are squares rather than circles. Starting with Oppenheim and Mersereau [19] this grid has often been called the concentric squares grid in the signal processing literature; in the medical tomography literature it is associated with the linogram, while in [1] it is called the rectopolar grid; see this last reference for a complete bibliographic treatment. The geometry of the rectopolar grid is illustrated on Fig. 1. We select radial lines in the frequency plane obtained by connecting the lying on the boundary of the array origin to the vertices , i.e., such that or . The polar grid ( serves to index a given radial line while the position of the point on that line is indexed by ) that we shall use is the intersection between the set of radial lines and that of Cartesian lines parallel to the axes. To be more specific, the sample points along a radial line whose angle with the vertical axis is less or are obtained by intersecting with the set of horiequal to . Simzontal lines ilarly, the intersection with the vertical lines defines our sample points whenever the angle between and the horizontal axis is less or equal . The cardinality of the rectopolar grid is equal to as to radial lines and sampled values on each of these there are lines. As a result, data structures associated with this grid will have a rectangular format. We observe that this choice corresponds to irregularly spaced values of the angular variable . C. Interpolation to Rectopolar Grid To obtain samples on the rectopolar grid, we should, in general, interpolate from nearby samples at the Cartesian grid. In principle, compare [1], [15], the interpolation of Fourier transforms is a very delicate matter because of the well-known fact that the Fourier transform of an image is highly oscillatory, and the phase contains crucial information about the image. In our
This says that the Radon transform can be obtained by applying the 1-D inverse Fourier transform to the 2-D Fourier transform restricted to radial lines going through the origin. This of course suggests that approximate Radon transforms for digital data can be based on discrete fast Fourier transforms. This is a widely used approach, in the literature of medical imaging and synthetic aperture radar imaging, for which the key approximation errors and artifacts have been widely discussed. In outline, one simply does the following, for gridded , . data 1) 2-D FFT. Compute the 2-D FFT of giving the array , . 2) Cartesian to Polar Conversion. Using an interpolation scheme, substitute the sampled values of the Fourier transform obtained on the square lattice with sampled values of on a polar lattice: that is, on a lattice where the points fall on lines going through the origin. 3) 1-D IFFT. Compute the 1-D IFFT on each line, i.e., for each value of the angular parameter. The use of this strategy in connection with ridgelet transforms has been discussed in the articles [14], [15].
674
approach, however, we use a crude interpolation method: we the value of the Fourier transform simply impute for . taken at the point on the Cartesian grid nearest to There are, of course, more sophisticated ways to realize the Cartesian-to-polar conversion; even simple bilinear interpolation would offer better theoretical accuracy. A very high accuracy approach used in [14] consists in viewing the data as samples of the trigonometric polynomial defined by
(7) on with with . There turns out [14], [1] to be an exact on the polar algorithm for rapidly finding the values of grid. The high-accuracy approach can be used in reverse, allowing for exact reconstruction of the original trigonometric polynomial from its rectopolar samples. Our nearest-neighbor interpolation, although admittedly simple-minded, happens to give good results in our application. In fact numerical experiments show that in overall system performance, it rivals the exact interpolation scheme. This is explainable as follows. Roughly speaking, the high-frequency are associated with terms in the trigonometric polynomial pixels at the edge of the underlying by grid. Our crude interpolation evidently will fail at reconstructing high-frequency terms. However, in the curvelet applicationsee belowwe use a window function to downweight the contributions of our reconstruction near the edge of the image array. So, inaccuracies in reconstruction caused by our crude interpolation can be expected to be located mostly in regions which make little visual impact on the reconstruction. A final point about our implementation. Since we are interested in noise removal artifact removal is very important. At the signal-to-noise ratios (SNRs) we consider, high-order-accuracy interpolation formulas which generate substantial artifacts (as many high-order formulas do) can be less useful than low-order-accuracy schemes which are relatively artifact-free. A known artifact of exact interpolation of trigonometric polynomials: substantial long-range disturbances can be generated by local perturbations such as discontinuities. In this sense, our crude interpolation may actually turn out to be preferable for some purposes. D. Exact Reconstruction and Stability The Cartesian-to-rectopolar conversion we have suggested here is reversible. That is to say, given the rectopolar values output from this method, one can recover the original Cartesian values exactly. To see this, take as given the following: Claim: the assignment of Cartesian points as nearest neighbors of rectopolar points happens in such a way that each Cartesian point is assigned as the nearest neighbor of at least one rectopolar point. It follows from this claim that each value in the original Cartesian input array is copied into at least one place in the output rectopolar array. Hence, perfect reconstruction is oba square lattice; that is,
viously possible in principlejust keeping track of where the entries are have been copied to and undoing the process. Our reconstruction rule obtains, for each point on the Cartesian grid, the arithmetic mean of all the values in the rectopolar grid which have that Cartesian point as their nearest point. This provides a numerically stable left inverse. Indeed, if applied to a perturbed set of rectopolar values, this rule gives an approximate reconstruction of the original unperturbed Cartesian values in which the approximation errors are smaller than the size of the perturbations suffered by the rectopolar values. (This final comment is reassuring in the present denoising context, where our reconstructions will always be made by perturbing the empirical rectopolar FT of the noisy data.) Phrased in mathematical terms this gives
where is a given point on the Cartesian grid and is the set of rectopolar points that are closest to . Stability in , for instance, follows from the observation
since is a partition of the set of rectopolar points, summing this last inequality across the Cartesian grid gives . It remains to explain the italicized claim, because, as we have seen, from it flows the exact reconstruction property and stability of the inverse. Consider the rectopolar points in the hourglass region made of basically vertical lines, i.e., lines which with vertical, and more specifically make an angle less than those points on a single horizontal scan line. Assuming the scan line is not at the extreme top or bottom of the array, these points are spaced strictly less than one unit apart, where our unit is the spacing of the Cartesian grid. Therefore, when we consider a Cartesian grid point belonging to this scan line and ask about and which are closest to it on the the rectopolar points left and right, respectively, these two points cannot be as much . Therefore, at least one of as one unit apart: the two points must be strictly less than 1/2 unit away from the or Cartesian Point, i.e., either . Without loss of generality suppose that . has as its closest Cartesian point. In short, Then clearly every Cartesian point in the strict interior of the hourglass associated with the basically vertical lines arises as the strict closest Cartesian point of at least one rectopolar point. Similar statements can be made about points on the boundary of the hourglass, although the arguments supporting those statements are much simpler, essentially mere inspection. Similar statements can be made about the points in the transposed hourglass. The italicized claim is established. E. One-Dimensional Wavelet Transform To complete the ridgelet transform, we must take a 1-D wavelet transform along the radial variable in Radon space. We now discuss the choice of digital 1-D wavelet transform.
675
Experience has shown that compactly supported wavelets can lead to many visual artifacts when used in conjunction with nonlinear processingsuch as hard-thresholding of individual wavelet coefficientsparticularly for decimated wavelet schemes used at critical sampling. Also, because of the lack of localization of such compactly supported wavelets in the frequency domain, fluctuations in coarse-scale wavelet coefficients can introduce fine-scale fluctuations; this is undesirable in our setting. Here we take a frequency-domain approach, where the discrete Fourier transform is reconstructed from the inverse Radon transform. These considerations lead us to use band-limited waveletwhose support is compact in the Fourier domain rather than the time-domain. Other implementations have made a choice of compact support in the frequency domain as well [14], [15]. However, we have chosen a specific overcomplete system, based on work of Starck et al. [26], [28], who constructed such a wavelet transform and applied it to interferometric image reconstruction. The wavelet transform algorithm is based on a scaling function such that vanishes outside of the interval . We defined the scaling function as a renormalized -spline
Fig. 2. Ridgelet transform flowgraph. Each of the 2n radial lines in the Fourier domain is processed separately. The 1-D inverse FFT is calculated along each radial line followed by a 1-D nonorthogonal wavelet transform. In practice, the 1-D wavelet coefficients are directly calculated in the Fourier space.
and
Because is compactly supported, the sampling theorem shows than one can easily build a pyramid of elements, see [28] for details. This transform enjoys the following features. The wavelet coefficients are directly calculated in the Fourier space. In the context of the ridgelet transform, this allows avoiding the computation of the 1-D inverse Fourier transform along each radial line. Each subband is sampled above the Nyquist rate, hence, avoiding aliasinga phenomenon typically encountered by critically sampled orthogonal wavelet transforms [25]. The reconstruction is trivial. The wavelet coefficients simply need to be co-added to reconstruct the input signal at any given point. In our application, this implies that the ridgelet coefficients simply need to be co-added to reconstruct Fourier coefficients. This wavelet transform introduces an extra redundancy factor, which might be viewed as an objection by advocates of orthogonality and critical sampling. However, we note that our goal in this implementation is not data compression/efficient codingfor which critical sampling might be relevantbut instead noise removal, for which it well-known that overcompleteness can provide substantial advantages [9]. F. Combining the Pieces Fig. 2 shows the flowgraph of the ridgelet transform. The is an image of size ridgelet transform of an image of size , introducing a redundancy factor equal to four. We note that, because our transform is made of a chain of steps, each one of which is invertible, the whole transform is
invertible, and so has the exact reconstruction property. For the same reason, the reconstruction is stable under perturbations of the coefficients. Last but not least, our discrete transform is computationally attractive. Indeed, the algorithm we presented here has low comflops for an image. plexity since it runs in G. Digital and Discrete Ridgelet Transforms Quite frankly, there is an apparent discrepancy between the theory of ridgelets and our proposed digital implementation which requires some justification. In our digital ridgelet pyramid, the number of orientations is constant, i.e., independent of scale whereas in the theory, the number of orientations is inversely proportional to the scale [2]. In other words, the theory imposes a downsampling on the set of orientations by a factor two as one proceeds to the coarser scale. In some sense, our digital implementation increasingly oversamples the angular variable at coarser scales. There is an analogy here with wavelet algorithms for noise removal. In the orthonormal wavelet pyramid, the number of elements per scale and location is fixedindependent of scale. Yet undecimated wavelet transforms traditionally in use for denoising do not exhibit this principle. There are increasingly many elements per scale and location at coarser scales. The practical benefits of such redundant systems are well-established. Because we are working in a denoising setting, there is an advantage in having more orientations than necessary at coarse scales. To continue with this analogy, undecimated wavelet transforms are thought to be translation invariant. Likewise, the digital ridgelet transform we introduced here is in some sense rotation invariant. We want to make unmistakably clear that the digital ridgelet transform we presented is intended for noise removal purposes. Other strategies may be pursued for other intentions.
676
H. Smooth Partitioning: Local Ridgelet Transforms A digital version of the ideas presented in Section II-B decomposes the original by image into smoothly overlapping blocks of sidelength pixels in such a way that the overlap between two vertically adjacent blocks is a rectangular array of ; we use overlap to avoid blocking artifacts. For an size by by image, we count such blocks in each direction. The partitioning introduces redundancy, as a pixel belongs to 4 neighboring blocks. We present two competing strategies to perform the analysis and synthesis: 1) The block values are weighted (analysis) in such a way that the co-addition of all blocks reproduce exactly the original pixel value (synthesis). 2) The block values are those of the image pixel values (analysis) but are weighted when the image is reconstructed (synthesis). Of course, there are intermediate strategies and one could apply smooth windowing at both the analysis and synthesis stage as discussed in Section II-B, for example. In the first approach, the data are smoothly windowed and this presents the advantage to limit the analysis artifacts traditionally associated with boundaries. The drawback, however, is a loss of sensitivity. Indeed, suppose for sake of simplicity that a vertical line with intensity level intersects a given block of size . Without loss of generality assume that the noise standard deviation is equal to 1. When the angular parameter of the Radon transform coincides with that of the line, we obtain a measurement with a signal inwhile the noise standard deviation is equal tensity equal to [in this case, the value of the SNR is ]. If weights to are applied at the analysis stage, the SNR is roughly equal to . Experiments have shown that this sensitivity loss may have substantial effects in filtering applications and, therefore, the second approach seems more appropriate since our goal is image restoration. from its four correWe calculate a pixel value, , namely, , sponding block values of half-size , and with and , in the following way:
Fig. 3. Curvelet transform flowgraph. The figure illustrates the decomposition of the original image into subbands followed by the spatial partitioning of each subband. The ridgelet transform is then applied to each block.
IV. DIGITAL CURVELET TRANSFORM A. Discrete Curvelet Transform of Continuum Functions We now briefly return to the continuum viewpoint of Section II-B. Suppose we set an initial goal to produce a decomposition using the multiscale ridgelet pyramid. The hope is that this would allow us to use thin brushstrokes to reconstruct the image, with all lengths and widths available to us. In particular, this would seem allow us to trace sharp edges precisely using a few elongated elements with very narrow widths. As mentioned in Section II-B, the full multiscale ridgelet pyramid is highly overcomplete. As a consequence, convenient algorithms like simple thresholding will not find sparse decompositions when such good decompositions exist. An important ingredient of the curvelet transform is to restore sparsity by reducing redundancy across scales. In detail, one introduces interscale orthogonality by means of subband filtering. Roughly speaking, different levels of the multiscale ridgelet pyramid are used to represent different subbands of a filter bank output. At the same time, this subband decomposition imposes a relationship between the width and length of the important frame elements so that they are anisotropic and obey width length . The discrete curvelet transform of a continuum function makes use of a dyadic sequence of scales, and a with the property that bank of filters is concentrated near the frequencies the passband filter , e.g.,
(8) . Of course, one might select any other with , smooth, nonincreasing function satisfying, , and obeying the symmetry property . It is worth mentioning that the spatial partitioning introduces a redundancy factor equal to four. Finally, we note that in order to be in better agreement with the theory one should of course introduce a normalizing factor depending upon the block-size. However, since we are concerned about denoising and the thresholding of individual coefficients, the normalization is a nonissue. Renormalizing coefficients automatically renormalizes corresponding thresholds in the exact same way, see Section V.
In wavelet theory, one uses a decomposition into dyadic sub. In contrast, the subbands used in the discrete bands
677
Fig. 4.
A few curvelets.
curvelet transform of continuum functions have the nonstandard . This is nonstandard feature of the discrete form curvelet transform well worth remembering. With the notations of Section II-B, the curvelet decomposition is the sequence of the following steps. Subband Decomposition. The object is decomposed into subbands
We believe that the trous subband filtering algorithm is especially well-adapted to the needs of the digital curvelet transform. The algorithm decomposes an by image as a superposition of the form
Smooth Partitioning. Each subband is smoothly windowed into squares of an appropriate scale (of ) sidelength
where is a coarse or smooth version of the original image and represents the details of at scale , see [28] for subband more information. Thus, the algorithm outputs . [The indexing is such that, here, arrays of size corresponds to the finest scale (high frequencies).] C. Algorithm We now present a sketch of the discrete curvelet transform algorithm: 1) apply the trous algorithm with scales; ; 2) set do 3) for with a block size and a) partition the subband apply the digital ridgelet transform to each block; then ; b) if . c) else The sidelength of the localizing windows is doubled at every other dyadic subband, hence maintaining the fundamental property of the curvelet transform which says that elements of length serve for the analysis and synthesis of the th subabout . Note also that the coarse description of the band is not processed. Finally, Fig. 3 gives an overview of image the organization of the algorithm. This implementation of the curvelet transform is also redunwhenever dant. The redundancy factor is equal to scales are employed. Finally, the method enjoys exact reconstruction and stability, because this invertibility holds for each element of the processing chain. Fig. 4 shows a few curvelets at different scales, orientations and locations.
Renormalization. Each resulting square is renormalized to unit scale (9) Ridgelet Analysis. Each square is analyzed via the discrete ridgelet transform. and In this definition, the two dyadic subbands are merged before applying the ridgelet transform. B. Digital Realization In developing a transform for digital by data which is analogous to the discrete curvelet transform of a continuous , we replace each of the continuum concepts function with the appropriate digital concept mentioned in sections above. In general, the translation is rather obvious and direct. However, experience shows that one modification is essential; we found that, rather than merging the two the two dyadic and as in the theoretsubbands ical work, in the digital application, leaving these subbands separate, applying spatial partitioning to each subband and applying the ridgelet transform on each subband separately led to improved visual and numerical results.
678
V. FILTERING We now apply our digital transforms for removing noise from image data. The methodology is standard and is outlined mainly for the sake of clarity and self-containedness. Suppose that one is given noisy data of the form
is the image to be recovered and is white noise, i.e., . Unlike FFTs or FWTs, our discrete ridgelet (resp. curvelet) transform is not norm-preserving and, therefore, the variance of the noisy ridgelet (resp. curvelet) coefficients will depend on the ridgelet (resp. curvelet) index . For instance, letting denote the discrete curvelet transform matrix, we have . Because the computation of is proof hibitively expensive, we calculated an approximate value the individual variances using Monte-Carlo simulations where are simply estimated by evaluthe diagonal elements of ating the curvelet transforms of a few standard white noise images. be the noisy curvelet coefficients ( ). We use Let the following hard-thresholding rule for estimating the unknown curvelet coefficients if if (10) (11)
where
We use the PSNR as an objective measure of performance. In addition, we used our own visual capabilities to identify artifacts whose effects may not be well-quantified by the PSNR value. The sort of artifacts we are particularly concerned about may be seen on display in the upper right panel of Fig. 5, which displays a wavelet reconstruction. This image has a number of problems near edges. In reconstructing some edges which should follow smooth curves one gets edges which are poorly defined and very choppy in reconstruction (for example in the crown of the hat); also some edges which are accurately reconstructed exhibit oscillatory structure along the edge which is not present in the underlying image (for example in the shoulder and the hat brim). We refer to all such effects as artifacts. Our experiments are reported on Figs. 5 and 6. The latter figure represents a detail of the original image and helps the reader observe the qualitative differences between the different methods (see Table I). We observe the following. The curvelet transform enjoys superior performance over local ridgelet transforms, regardless of the block size. The undecimated wavelet transform approach outputs a PSNR comparable to that obtained via the curvelet transform (the PSNR is slightly better for the multiscale entropy method). The curvelet reconstruction does not contain the quantity of disturbing artifacts along edges that one sees in wavelet reconstructions. An examination of the details of the restored images (Fig. 6) is instructive. One notices that the decimated wavelet transform exhibits distortions of the boundaries and suffers substantial loss of important detail. The undecimated wavelet transform gives better boundaries, but completely omits to reconstruct certain ridges in the hatband. In addition, it exhibits numerous small-scale embedded blemishes; setting higher thresholds to avoid these blemishes would cause even more of the intrinsic structure to be missed. The curvelet reconstructions display higher sensitivity than the wavelet-based reconstructions. In fact both wavelet reconstructions obscure structure in the hatband which was visually detectable in the noisy panel at upper left. In comparison, every structure in the image which is visually detectable in the noisy image is clearly displayed in the curvelet reconstruction. These observations are not limited to the particular experiment displayed here. We have observed similar characteristics in many other experiments; see Fig. 11 for another example. Further results are visible at http://www-stat.stanford.edu/~jstarck. To study the dependency of the curvelet denoising procedure on the noise level, we generated a set of noisy images (the noise standard deviation varies from five to 100) from both Lenna and Barbara. We then compared the three different filtering procedures based, respectively, on the curvelet transform and on the undecimated/decimated wavelet transforms. This series of experiments is summarized in Fig. 7 which displays the PSNR versus the noise standard deviation. These experimental results show that the curvelet transform outperforms wavelets for removing noise from those images, as the curvelet PSNR is systematically higher than the wavelet PSNRsand this, across a broad range of noise levels. Other experiments with color images led to similar results.
In our experiments, we actually chose a scale-dependent value for the first scale while for for ; we have . the others VI. FILTERING EXPERIMENTS A. Who Else? In our first example, a Gaussian noise with a standard deviation equal to 20 was added to the classical Lenna image (512 512). Several methods were used to filter the noisy image. 1) Thresholding of Monoscale ridgelet transforms with scale ( block size) (8, 16, 32, and 64). 2) Thresholding of Curvelet transform. 3) Wavelet denoising methods in the following four families. a) Bi-orthogonal wavelet transform using the Dauchechies-Antonini 7/9 filters (FWT-7/9) and hard thresholding. b) Undecimated bi-orthogonal wavelet transform (UWT-7/9) with hard thresholding; we used for the finest scale, and three for the others. c) Multiscale entropy processing using the undecimated wavelet transform. This method is discussed in [27], [29]. d) Wavelet-domain Hidden Markov Models (WHMM) using Daubechies orthonormal wavelets of length eight. This method [10] attempts to model the joint probability density of the wavelet coefficients and derives the filtered coefficients using an empirical Bayesian approach. We used this rather than a competing method of Simoncelli [24] owing to availability of a convenient software implementation.
679
Fig. 5. (Top left) Noisy image and (top right) filtered images using the decimated wavelet transform, (bottom left) the undecimated wavelet transform and the (bottom right) curvelet transform.
B. Recovery of Linear Features The next experiment (Fig. 8) consists of an artificial image containing a few bars, lines and a square. The intensity is constant along each individual bar; from left to right, the intensities of the ten vertical bars (these are in fact thin rectangles which , are four pixels wide and 170 pixels long) are equal to . The intensity along all the other lines is equal to 1, and the noise standard deviation is 1/2. Displayed images have -transformed in order to better see the results at low been SNR. The curvelet reconstruction of the nonvertical lines is obviously sharper than that obtained using wavelets. The curvelet transform also seems to go one step further as far as the reconstruction of the vertical lines is concerned. Roughly speaking, for those templates, the wavelet transforms stops detecting signal at a SNR equal to one (we defined here the SNR as the intensity level of the pixels on the line, divided by the noise standard deviation of the noise) while the cut-off value equals 0.5 for the curvelet approach. It is important to note that the horizontal and vertical lines correspond to privileged directions for the wavelet transform, because the underlying basis functions are direct products of functions varying solely in the horizontal and vertical directions. Wavelet methods will given even poorer results on lines of the same intensity but
tilting substantially away from the Cartesian axes. Compare the reconstructions of the faint diagonal lines in the image. C. Recovery of Curves In this experiment (Fig. 9), we have added a Gaussian noise to War and Peace, a drawing from Picasso which contains many curved features. Fig. 9 bottom left and right shows, respectively, the restored images by the undecimated wavelet transform and the curvelet transform. Curves are more sharply recovered with the curvelet transform. The authors are working on new methods (some of which will be based on the curvelet transform) to extract and recover curves from noisy data with greater accuracy and, therefore, this example is merely to be taken for illustrative purposes. D. Denoising of a Color Image In a wavelet based denoising scenario, color RGB images are generally mapped into the YUV space, and each YUV band is then filtered independently from the others. The goal here to see whether the curvelet transform would give improved results. We used four of the classical color images, namely Lenna, Peppers, Baboon, and Barbara (all images except perhaps Barbara are available from the USC-SIPI Image Database [11]. We performed the series of experiments described in
680
Fig. 6. (Top left) Noisy image and (top right) the restored images after denoising by means of the DWT, (bottom left) UWT, and (bottom right) the curvelet transform. The diagonal lines of the hat have been recovered with much greater fidelity in the curvelet approach.
TABLE I TABLE OF PSNR VALUES AFTER FILTERING THE NOISY IMAGE [Lenna GAUSSIAN WHITE NOISE (SIGMA 20)]. IMAGES ARE AVAILABLE AT http://www-stat.stanford.edu/~jstarck/lena.html
pleasant. Fig. 11 illustrates this last point. For other examples, please check http://www-stat.stanford.edu/~jstarck. VII. CONCLUSION In this paper, we presented a strategy for digitally implementing both the ridgelet and the curvelet transforms. The resulting implementations have the exact reconstruction property, give stable reconstruction under perturbations of the coefficients, and as deployed in practice, partial reconstructions seem not to suffer from visual artifacts. There are, of course, many competing strategies to translate the theoretical results on ridgelets and curvelets into digital representations. Guided by a series of experiments, we arrived at several innovative choices which we now highlight. 1) Subband Definition. We split the nonstandard frequency arising in theoretical treatments subband of curvelets [5]into the two standard dyadic frequency and and we prosubbands cessed each of them individually. This seems to give better result. 2) Subband Filtering. The trous algorithm is well-adapted to the decomposition into subbands. For instance, an alternative strategy using a decimated 2-D wavelet transform introduces visual artifacts near strong edges, in the
Section VI-A and summarized our findings on Fig. 10 which again displays the PSNR versus the noise standard deviation for the four images. In all cases, the curvelet transform outperforms the wavelet transforms in terms of PSNRat least for moderate and large values of the noise level. In addition, the curvelet transform outputs images that are visually more
681
Fig. 7. PSNR versus noise standard deviation for different denoising methods. The three methods based on the curvelet, undecimated and decimated wavelet transforms are represented with a continuous, dashed, and dotted line, respectively. The left panel corresponds to Lenna, and the right to Barbara.
Fig. 8. Top panels: a geometric image and that same image contaminated with a Gaussian white noise. The bottom left and right panels display the restored images using the undecimated wavelet transform and the curvelet transform, respectively.
form of curve fragments at 90 orientation to the underlying edge direction. 3) Wavelet underlying the Ridgelet Transform. Our ridgelet transform uses a 1-D wavelet transform based on wavelets which are compactly supported in the Fourier domain. If a wavelet, compactly supported in space, is used instead, it appears that thresholding of ridgelet/curvelet coefficients may introduce many visual artifacts in the form of ghost oscillations parallel to strong linear features. A remark about which principles are important: because we are working in a denoising setting, the attraction of traditional transform desideratacritical sampling and orthogonalityis weak. Instead, redundancy, and overcompleteness seem to offer advantages, particularly in avoiding visual artifacts. The work presented here is an initial attempt to address the problem of image denoising using digital analogs of some new mathematical transforms. Our experiments show that curvelet thresholding rivals sophisticated techniques that have been the
object of extensive development over the last decade. We find this encouraging, particularly as there seem to be numerous opportunities for further improvement. Areas for further work clearly include improved interpolation schemes, and improved folding strategies for space partitioning, to mention a few. On the other hand, the digital curvelet transform is nonorthogonal, quite redundant and as a consequence, the noisy coefficients are correlated and one should clearly design thresholding rules taking into account this dependency. There is an obvious tree structure with parent and children curvelet coefficients that might also be used effectively in this setting. We also look forward to testing our transforms on larger datasets in order to fully exploit the multiscale nature of the 2048 or 4096 curvelet transform. Images of size 2048 4096 would be a reasonable target, as those resolutions will undoubtedly become standard over the next few years. As images scale up, the asymptotic theory which suggests that curvelets outperform wavelets may become increasingly
682
Fig. 9. Top panels: a Picasso picture (War and Peace) and that same image contaminated with a Gaussian white noise. The bottom left and right panels display the restored images using the undecimated wavelet transform and the curvelet transform, respectively.
Fig. 10. PSNR versus noise standard deviation using different filtering methods. YUV and curvelet, YUV and undecimated wavelet, and YUV and decimated wavelet transforms are represented, respectively, with a continuous, dashed, and dotted line. The upper left panel corresponds to Lenna (RGB), the upper right to pepper (RGB), the bottom left to Baboon (RGB), and the bottom right to Barbara (RGB).
relevant. The quality of the local reconstructions as illustrated on the zoomed restored images obtained via the curvelet
transform are especially promising. We hope to report on this issue in a forthcoming paper.
683
Fig. 11. Upper left: noisy Barbara image. Upper right: restored image after applying the curvelet transform. Details of the restored images are shown on the bottom left panel (undecimated wavelet transform) and right (curvelet transform) panel.
ACKNOWLEDGMENT The authors would like to thank one referee for some very helpful comments on the original version of the manuscript. REFERENCES
[1] A. Averbuch, R. R. Coifman, D. L. Donoho, M. Israeli, and J. Waldn, Polar FFT, rectopolar FFT, and applications, Stanford Univ., Stanford, CA, Tech. Rep., 2000. [2] E. J. Cands, Harmonic analysis of neural netwoks, Appl. Comput. Harmon. Anal., vol. 6, pp. 197218, 1999. , Monoscale ridgelets for the representation of images with edges, [3] Dept. Statist., Stanford Univ., Stanford, CA, Tech. Rep., 1999, submitted for publication. , On the representation of mutilated Sobolev functions, Dept. [4] Statist., Stanford Univ., Stanford, CA, Tech. Rep., 1999. [5] E. J. Cands and D. L. Donoho, Curvelets, [Online] Available: http://www-stat.stanford.edu/~donoho/Reports/1999/curvelets.pdf, 1999. [6] , CurveletsA surprisingly effective nonadaptive representation for objects with edges, in Curve and Surface Fitting: Saint-Malo 1999, A. Cohen, C. Rabut, and L. L. Schumaker, Eds. Nashville, TN: Vanderbilt Univ. Press, 1999. [7] , Ridgelets: The key to higher-dimensional intermittency?, Phil. Trans. R. Soc. Lond. A., vol. 357, pp. 24952509, 1999.
[8]
[9]
[10]
, Edge-preserving denoising in linear inverse problems: Optimality of curvelet frames, Dept. Statist., Stanford Univ., Stanford, CA, Tech. Rep., 2000. R. R. Coifman and D. L. Donoho, Translation invariant de-noising, in Wavelets and Statistics, A. Antoniadis and G. Oppenheim, Eds. New York: Springer-Verlag, 1995, pp. 125150. M. Crouse, R. Nowak, and R. Baraniuk, Wavelet-based statistical signal processing using hidden Markov models, IEEE Trans. Signal Processing, vol. 46, pp. 886902, 1998. USC-SIPI Image Database [Online]. Available: http://sipi.usc.edu/services/database/Database.html. S. R. Deans, The Radon Transform and Some of Its Applications. New York: Wiley, 1983. M. N. Do and M. Vetterli, Orthonormal finite ridgelet transform for image compression, in Proc. IEEE Int. Conf. Image Processing (ICIP), Sept. 2000. D. L. Donoho, Fast ridgelet transforms in dimension 2, Stanford Univ., Dept. Statist., Stanford, CA, Tech. Rep., 1997. , Digital ridgelet transform via rectopolar coordinate transform, Stanford Univ., Stanford, CA, Tech. Rep., 1998. , Wedgelets: Nearly-minimax estimation of edges, Ann. Statist., vol. 27, pp. 859897, 1999. , Orthonormal ridgelets and linear singularities, SIAM J. Math Anal., vol. 31, no. 5, pp. 10621099, 2000. D. L. Donoho and M. R. Duncan, Digital curvelet transform: Strategy, implementation and experiments, Proc. SPIE, vol. 4056, pp. 1229, 2000.
684
[19] R. M. Mersereau and A. V. Oppenheim, Digital reconstruction of multidimensional signals from their projections, Proc. IEEE, vol. 62, pp. 13191338, Oct. 1974. [20] T. Olson and J. DeStefano, Wavelet localization of the Radon transform, IEEE Trans. Signal Processing, vol. 42, pp. 20552067, Aug. 1994. [21] B. Sahiner and A. E. Yagle, On the use of wavelets in inverting the Radon transform, in Conf. Rec. Nuclear Science Symp. Medical Imaging Conf. 1992, vol. 2, 1992, pp. 11291131. , Iterative inversion of the Radon transform using image-adaptive [22] wavelet constraints, in Proc. IEEE 18th Conf. Engineering in Medicine and Biology Society, 1996: Bridging Disciplines for Biomedicine, vol. 2, 1997, pp. 722723. , Iterative inversion of the Radon transform using image-adaptive [23] wavelet constraints, in Proc. ICIP 98, vol. 2, 1998, pp. 709713. [24] E. P. Simoncelli, Bayesian denoising of visual images in the wavelet domain, in Bayesian Inference in Wavelet Based Models, P. Muller and B. Vidakovic, Eds. New York: Springer-Verlag, 1999. [25] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, Shiftable multi-scale transforms [or Whats wrong with orthonormal wavelets], IEEE Trans. Inform. Theory, vol. 38, pp. 587607, Mar. 1992. [26] J. L. Starck, A. Bijaoui, B. Lopez, and C. Perrier, Image reconstruction by the wavelet transform applied to aperture synthesis, Astron. Astrophys., vol. 283, pp. 349360, 1994. [27] J. L. Starck and F. Murtagh, Multiscale entropy filtering, Signal Process., vol. 76, no. 2, pp. 147165, 1999. [28] J. L. Starck, F. Murtagh, and A. Bijaoui, Image Processing and Data Analysis: The Multiscale Approach. Cambridge, U.K.: Cambridge Univ. Press, 1998. [29] J. L. Starck, F. Murtagh, and R. Gastaud, A new entropy measure based on the wavelet transform and noise modeling, IEEE Trans. Circuits Syst. II, vol. 45, pp. 11181124, Aug. 1998. [30] S. Zhao, G. Welland, and G. Wang, Wavelet sampling and localization schemes for the Radon transform in two dimensions, SIAM J. Appl. Math., vol. 57, no. 6, pp. 17491762, 1997. [31] R. A. Zuidwijk, The wavelet X-ray transform, Centre Math. Comput. Sci., Tech. Rep. PNA-R9703, ISSN 1386-3711, 1997. [32] R. A. Zuidwijk and P. M. de Zeeuw, The fast wavelet X-ray transform, Centre Math. Comput. Sci., Tech. Rep. PNA-R9908, ISSN 1386-3711, 1999.
Jean-Luc Starck received the Ph.D. degree from the University Nice-Sophia Antipolis, France, and the Habilitation degree from the University of Paris XI, France. He was a visitor with the European Southern Observatory (ESO) in 1993 and the Department of Statistics, Stanford University, Stanford, CA, in 2000. He has been a Researcher at CEA since 1994 and is on the Infrared Space Observatory (ISO) Project. His research interests include image processing, multiscale methods, and statistical methods in astrophysics. He is author of the book Image Processing and Data Analysis: The Multiscale Approach (Cambridge, U.K.: Cambridge Univ. Press, 1998).
Emmanuel J. Cands graduated from the Ecole Polytechnique, France, and received the M.Sc. degree in applied mathematics from the University of Paris VI, France. He received the Ph.D. degree in statistics from Stanford University, Stanford, CA, where D. L. Donoho was his adviser. He is Assistant Professor of Applied and Computational Mathematics, California Institute of Technology (Caltech), Pasadena. Prior to joining Caltech, he was an Assistant Professor in Statistics, Stanford University. His research interests are in the areas of computational harmonic analysis, approximation theory, statistical estimation and their applications to signal and image processing and scientific computing. Dr. Cands is an Alfred P. Sloan Research Fellow.
David L. Donoho received the A.B. degree (summa cum laude) in statistics from Princeton University, Princeton, NJ, where his senior thesis adviser was J. W. Tukey and the Ph.D. in statistics from Harvard University, Cambridge, MA, where his Ph.D. adviser was P. Huber. He is Professor of Statistics at Stanford University, Stanford, CA. He was a Professor at the University of California, Berkeley, and a Visiting Professor at the Universit de Paris, as well as a Sackler Fellow at Tel Aviv University, Tel Aviv, Israel. His research interests are in harmonic analysis, image representation, and mathematical statistics. Dr. Donoho is a member of the U.S. National Academy of Sciences and a fellow of the American Academy of Arts and Sciences.