Spherical Coding Algorithm For Wavelet Image Compression
Spherical Coding Algorithm For Wavelet Image Compression
Spherical Coding Algorithm For Wavelet Image Compression
5, MAY 2009
1015
I. INTRODUCTION
LL image coders are based on some statistical model for natural images and exploit the dependencies implied by that model. The coder is explicitly or implicitly optimized for the specic model and applied to sample images. Therefore, coding efciency depends on how well the source model matches the true distribution of natural images. In other words, without a realistic source model to begin with, it is not possible to construct an efcient compression algorithm. Building a good source model requires a convenient and efcient representation of the data. The success of transform domain techniques have proven that coders based on DCT or wavelet representations are superior to pixel domain methods. Wavelet domain is shown to provide a good match to the space-frequency characteristics of natural images. Hence, it is much easier to build a realistic image model in the wavelet domain than, say, in the pixel domain. That is why a simple coder in the wavelet domain could outperform a complex coder
Manuscript received February 15, 2008; revised December 23, 2008. Current version published April 10, 2009. This work was supported in part by the National Science Foundation under Grant DMS9872890 and in part by the Isik University BAP-05B302 Grant. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ricardo L. de Queiroz. H. F. Ates is with the Department of Electronics Engineering, Isik University, Sile, Istanbul, Turkey (e-mail: hfates@isikun.edu.tr). M. T. Orchard is with the Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA (e-mail: orchard@rice.edu). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TIP.2009.2014502
in the pixel domain. Within the last decade, wavelets have exemplied how a good image representation opens the doors to a variety of original and successful image models. In this paper, we further develop and analyze the spherical representation that has been introduced in [1] as a novel way of representing image information in wavelet domain. We rst discuss the essential properties of a good wavelet-based image model and indicate the weaknesses of the existing models. Then, we show why spherical representation provides a robust framework for building efcient image coders. We suggest that wavelet subbands are best characterized by spatially varying nonhomogeneous processes. Based on the spherical representation, we develop a coding algorithm that handles this nonhomogeneity in an effective and nonparametric way. Understanding the reasons behind the success of wavelet coders is important for predicting the future directions of image coding. Wavelet transform achieves energy compaction into few low-pass coefcients plus a sparse set of clustered high-pass coefcients. Such a compact representation is very suitable for a simple yet effective source model. The history of wavelet coders shows an evolution of the models employed for exploiting the energy clustering in the wavelet domain. At the beginning stages, image information was assumed to be naturally classied into statistically independent wavelet subbands, each of which was modeled as an independent identically distributed (i.i.d.) process. Successful coding schemes used in DCT-based algorithms, such as run-length coding, and vector quantization [2], [3] were tried in wavelet image coding, but demonstrated modest coding gains over standard transform-based algorithms. The breakthrough in wavelet image coding arrived with coders using hierarchical wavelet trees, such as EZW [4], SPIHT [5], SFQ [6][9]. Grouping wavelet coefcients that belong to the same spatial region under a tree-structure, these coders were able to adapt to the properties of different regions in the image. Other coders, such as EQ [10], classication-based algorithms [11], EBCOT [12], etc., achieved improved coding efciency by introducing more spatial adaptivity in modeling the subbands. The success of adaptive models is a direct consequence of the special characteristics of image information. Natural images consist of large smooth areas with localized high-frequency structures (i.e., edges) separating them. Edges and texture come in arbitrary locations, orientations, shapes, and sizes in natural images. Since high-frequency information is rather localized, even coarse level information about the location of high activity areas allows the coding methods to be successfully adapted to the statistics of different regions. In other words, using such location information, wavelet subbands are modeled as nonhomogeneous processes and coded accordingly.
1016
Recognizing the spatially changing properties of wavelet subbands is crucial for accurate modeling. Equally important is the optimal allocation of bitrate to different parts of a subband having distinct statistical characteristics. Sophisticated adaptive techniques ne tune models for each coefcient based on the context of its local (scale and/or spatial) neighborhood. A good example is the EQ coder [10], which uses a generalized Gaussian distribution (GGD) with spatially adapted variance for modeling each subband; the variance at each point is estimated from the decoded values in its causal neighborhood unless all the neighborhood coefcients are quantized to zero. Based on the estimated variances, the coefcients are coded in a way that yields overall rate-distortion optimality. Despite their success, the EQ coder and other adaptive methods could only offer a restricted view of image information in the wavelet domain. For instance, zerotrees of EZW coder [4] are able to provide a rather structured separation between signicant and insignicant sets of coefcients. The EQ coder is more exible; however, because of the way the variances are estimated, it assumes a slowly changing variance eld for the wavelet subband. It is doubtful whether this level of adaptivity is adequate to accurately model the rich variety of local statistics of wavelet coefcients. A modeling mismatch for each coefcient will contribute to the loss of coding efciency for the overall image. We claim that parametric descriptions of wavelet coefcient distributions are especially prone to mismatches. In other words, the wavelet image model should not be tied down to a xed parametric description. A more adaptive coding approach should be developed, which updates its modeling paradigm locally as more information becomes available about the underlying wavelet coefcients. In this paper, we develop a wavelet-based representation that is general, exible and realistic. The spherical representation is a hierarchical description of how total coefcient energy gets distributed within each wavelet subband. A hierarchical tree of subband energy is formed by summing up the squared coefcients. Phase variables are dened that describe how the energy in a given region is split into energies of two sub-regions. Phase variables are coded based on a simple and effective model. The nonhomogeneity of wavelet subbands is handled through this nonparametric model of the hierarchy. We discuss why the spherical coding framework is more robust against modeling mismatches than typical parametric techniques. In particular, we explain how our coder improves the coding efciency by allocating total bitrate according to the local sum of energies within the subband. The local energy is used to adapt the coder to the local statistics of wavelet coefcients. We claim that this approach makes it possible to build highly adaptive and exible coding algorithms. Section II denes what modeling mismatch is and shows its detrimental effects on the coder performance using a simple example. Section III motivates and explains the spherical representation, and discusses why this representation is more robust against modeling mismatch while coding the wavelet subbands. Then, Section IV describes the details of the spherical coding algorithm. In Section V, the algorithm is tested on standard test images. Compared to some of the state-of-art wavelet coders, the spherical coding algorithm provides better or as good coding performance.
II. EFFECTS OF MODELING MISMATCH IN CODING Mismatch in source coding indicates the loss of coding efciency resulting when a coder optimized for a certain source model is applied to a different model. This is an important problem in image coding, since there is no single source model that can successfully describe a variety of different image characteristics. Edges, texture, smooth regions require different type of characterizations. It is not easy to determine the exact statistical nature of each such region. Even if we assume that we could develop correct models for each and every pixel or wavelet coefcient of the image, we will probably need a large set of parameters to dene these distributions and this incurs a heavy cost as side information for the coder. On the other hand, if the parametrization is restricted in some way, as it is done in all wavelet coders, modeling mismatch seems inevitable. We provide a simple example to show quantitatively the effects of mismatch. In lossless coding, the performance loss due to mismatch is measured by the relative entropy between the two distributions, i.e., the distribution for which the coder is designed and the distribution to which the coder is applied. For lossy coding, results from high-rate vector quantization theory can be used to show that relative entropy between two continuous distributions is a good representative of the mismatch [13]. Suppose that we apply the optimal coder designed for an i.i.d. zero-mean Gaussian process to an independent nonhomogeneous zero-mean Gaussian process with changing variances. The relative entropy is dened as (1) Using (2) and (3) we get
(4) Each term in this sum is greater than or equal to zero, with . Hence, the coder loses some equality being when efciency over all coefcients, unless the variance estimate matches the true variance. In Fig. 1, the relative entropy is for . plotted against This example illustrates the importance of accurate parameter estimation for coding a nonhomogeneous process. The estimation errors accumulate to reduce the efciency of the overall coding scheme. In wavelet subbands, this could be a major problem, since the statistics change rapidly from one region to the next and there are not enough samples to perform reliable estimation. The EQ coder tries to reduce mismatch by performing local variance estimation. However, it is not
ATES AND ORCHARD: SPHERICAL CODING ALGORITHM FOR WAVELET IMAGE COMPRESSION
1017
even clear whether the immediate neighbors of a coefcient are reliable enough to estimate its variance. This problem persists in general for all parametric coding algorithms, i.e., for algorithms that try to estimate certain parameters of the coefcient distributions, and code them accordingly. In the next section, we introduce our representation, and discuss how robust nonparametric modeling can be carried out in this framework, which we claim has the potential to signicantly reduce the damaging effects of modeling mismatch. III. SPHERICAL REPRESENTATION The clustering of energy in wavelet subbands motivates the use of spatially varying models in coding the wavelet information. All adaptive wavelet coders introduce some form of nonhomogeneity, either explicitly by parameterizing the distribution of each wavelet coefcient (e.g., the EQ coder and classication-based coders), or implicitly by using different quantization and coding techniques in different parts of the subband (e.g., zerotree coding). In either case, care must be taken not to produce excessive side information in the form of model parameters or a classication map. This limitation compromises the freedom and the exibility in choosing a matching model for the nonhomogeneous nature of the wavelet subband. As discussed in Section II, model mismatches could result in severe performance loss. Natural images offer many complications in modeling the existing nonhomogeneity. Different image regions require different characterizations for efcient coding. There does not seem to be a small number of models one can easily dene and use to capture the statistical variety observed in natural images. Due to the rich structure of edges and texture, statistical differences need to be recognized within windows of different shapes and sizes, ranging from large chunks of coefcients in smooth regions down to the level of single isolated locations. It is because of these challenges that weve decided to look for exible representations that can deal with such varying information content. Our representation and coding method share a similar philosophy with the EQ coder in its use of local energy, equivalently local variance, as a reliable measure of local information content. We suggest that local variance provides sufcient infor-
mation about how the wavelet coefcients could be efciently coded. Out of all wavelet coders in literature, we single out the EQ coder for its effort to offer an innite mixture model for the wavelet coefcients. That is, each coefcient can possess a GG distribution with a different variance of any positive value. The problem in EQ is the obligation to use the causal neighborhood for variance estimation in order to avoid side information. In cases when local variances exhibit sudden changes, e.g., around high-frequency structures such as edges, the estimated variances are not accurate and this leads to model mismatch. One way to overcome the problem in EQ is to represent local energy as part of the wavelet information to be coded, and not as extra parameters needed for modeling. In other words, local energy should be implicitly coded as part of the subband information content. If both encoder and decoder have access to local energy information, then coding could be adapted according to this local statistic without any need for side information. With that perspective, it is convenient to dene local energy hierarchically, starting from the total energy of the full subband going down to smaller regions, even down to the energy of a single coefcient. As the size of the region is reduced, the local energy provides a better estimate of the variance of the coefcients in that region. Given the energy in a certain region, the encoder only needs to code how this energy is divided into its sub-regions. This coarse-to-ne strategy renes successively the available local information, and makes coding adaptation more successful. Motivated by this reasoning, we propose to use the following hierarchical structure to represent a random process (see Fig. , and for , 2): In 1-D for
(5)
(6) where . Here, could be seen as one of the wavelet subbands of a 1-D signal. In the next section, this formulation is easily extended to 2-D subbands. provide local energy information at difThe variables indicate ferent resolution levels . The phase variables how the local energy gets split between the two neighboring regions (7) (8) The phase variables in a sense represent the difference in information content between the two regions. Going from the top level to the bottom level of the hierarchy, the phase values provide a renement of the available information in each region of the subband. When the total energy, , and the phases at all levels of the hierarchy are given, the coefcients are easily determined up to a sign bit; i.e., . The sign bits could also be dened as part of the representation if
1018
the phase values at the bottom of the hierarchy cover full range; , and i.e., (9) (10) In this type of representation, we are able to use local energy not only to differentiate between statistically distinct parts of the process but also to provide direct information about the and can be underlying coefcient values. Coding seen as an alternative to coding . We might say that, instead of cartesian coordinates, spherical coordinate system is used in representing the process; hence, the name spherical representation. The simple example of Section II helps us understand better the convenience of spherical representation for coding a nonhois i.i.d. zero mogeneous process. In case when the process are mean Gaussian with variance , the local energies , where is chi-square distributed with degrees of freedom. The ratio of two chi-square distributed random vari; the distribution of the phase ables has F-distribution, variables could be computed accordingly. It can be shown that the joint pdfs satisfy (11) Therefore, and are mutually independent; it folare independent random variables for all lows that , . In theory, we can design an optimal coder for these variables, and this coder is going to have a performance equal to that of the coder designed for i.i.d. Gaussian . If is in fact nonhomogeneous Gaussian with changing variances for , then the total loss of efciency due to using the optimal spherical coder designed for the i.i.d. case has to be equal to the mismatch calculated in Section II. However, unlike contribute in difthe previous case, the phase variables ferent proportions to the total mismatch; the phases at the top levels of the hierarchy cause more mismatch than then ones at the lower levels. To show this, let us rst look at the distribution of , where have variances with mean variance . Fig. 3 plots the pdf of for , and . The dashed curve shows the pdf of . It turns out that, as long as s do not deviate too much from the mean value , is approximately distributed as . Therefore, for slowly changing variances, we can assume that is , and is , where and are the mean variances for the corresponding coefcients. Then is , where the pdf of is given by (12) and the pdf of is (13) where is an appropriate normalization factor. The relative enand is given by tropy between
Fig. 3. Pdf for E (0) and (m = 4 and = 1).
(14)
; hence, it increases The relative entropy is proportional to exponentially at higher levels of the hierarchy. Even though the is not an exact measure relative entropy for of coding mismatch for the phase variables, we can argue that coding mismatch has to be signicantly higher at the top levels of the hierarchy. Since the total mismatch should be equal to the case when s are coded as i.i.d. Gaussian, the coding mismatch has to be relatively small for the phase variables at the lowest [when compared with the expression in (4)]. level, i.e., Since the upper levels contribute a major portion of the efciency loss due to mismatch, improving the coding performance at the upper levels signicantly improves the coding efciency of the overall coding scheme. This makes the spherical representation robust against the nonhomogeneity of the underlying process. In other words, without knowing the exact nature of this
ATES AND ORCHARD: SPHERICAL CODING ALGORITHM FOR WAVELET IMAGE COMPRESSION
1019
nonhomogeneity and without any detailed parametrization, we only need reasonable models for upper-level phase variables in order to have good overall coding results. From a different viewpoint, if the upper levels are optimally coded, then the lower levels will have access to optimally decoded local energies, and the coding at the lower levels will benet from this information. The attractiveness of the spherical representation is not limited by its ability to collect modeling mismatch at the few upper level phase values. It also creates a highly adaptive coding framework, where different coding techniques could be developed at different levels of the hierarchy without requiring any side information. Imagine the optimal codeword for is given by . The decoded phase variables at a certain level affect the decoded values of lower , , and eventually the decoded level energies coefcients . On the other hand, optimal coding of is directly related to the total decoded energy and, therefore, on . This mutual dependency creates difculties for rate-distortion optimization but also opens the door to endless possibilities for innovative coding strategies. We resort to our original example to explain in simple terms how to perform model adaptation using the spherical represenof nonhomotation. Imagine that the variances , geneous are mutually independent. Then the maximum likelihood (ML) estimate for each is equal to . That is, the values of neighboring coefcients are useless in estimating the variance of . Without any a priori information, the optimal spherical coder based on the i.i.d. assumption will use as the variance estimate of all coefcients. At a certain level , assume that we have the decoded energy . Then, the and other phase values descendants of this node (e.g., ) can be coded using that belong to the subtree below the optimal spherical coder for an i.i.d. Gaussian with variance . In other words, each subtree will be coded based on its decoded energy and independent of the rest of the spherical tree. Ignoring the quantization effects, this new variance for the coefestimate provides a better match than cients at the leaves of this subtree. This means a reduction in the modeling mismatch for this subtree. As we go down the hierarchical tree, the variance estimate for every subtree gets rened at each level, and we could get a signicant recovery from the performance loss due to the modeling mismatch of using as the variance estimate. This example illustrates how different levels of the spherical hierarchy provides a natural renement
of available information and how this new information could be used for successful model and coder adaptation. In the next section, we extend the spherical representation to 2-D wavelet subbands and develop a simple coding algorithm that exploits the exibility and robustness of the spherical framework for efcient coding of the wavelet information. IV. SPHERICAL CODER IN WAVELET SUBBANDS Spherical representation could be easily extended to 2-D to be used in wavelet image coding. Partial squared sums need to be dened in both vertical and horizontal directions in an alternating fashion (see Fig. 4). Let us represent the phase variables and respectively. and the local energies as , so that , , and Assume the subband is , are dened accordingly. The spherical coder described in phase this section codes the wavelet subband through variables plus the total energy and the sign bits. , the spherical coder acts on the following In coding assumptions. The technique is applied independently at each subband. Even though we believe the energy information to be highly redundant across scales, it is a challenging problem to model the dependencies among phase variables in different subbands. We discuss this and other issues at the end of this paper. in each subband are assumed to be independent random variables that are uniformly distributed in [0, 2]. Independence assumption simplies the coding procedure. Once again, modeling and coding the intricate is a challenging and open dependencies of problem. The use of true histograms (see Fig. 6) in entropy achieve very little coding gain, which coding justies the use of uniform distribution. Independence of phase variables at different levels of the is also independent of hierarchy implies that , since is determined by or (see the denitions below). Since distortion is measured with respect to the actual decoded values of wavelet coefcients, rate-distortion theory implies that optimal coding of phase depends on the decoded values of corresponding local energy. Specifically, assuming independence, optimal coding requires to be normalized rate-distortion curve of each (see Fig. 5). by
1020
to zero nodes are set to zero. The spherical tree is constructed and coded with these quantized coefcients. It turns out that this is an effective way of determining the set of zero-quantized coefcients, which is essential for successful coding performance. In more detail, the spherical coding algorithm is given (for each wavelet subband at different scales and in different orien, ). tations) as follows (assume 1) Use soft-thresholding to estimate zero-quantized wavelet coefcients if else. 2) Dene, for
and, for Since the decoded value of the local energy is needed for coding the phase, decoding is performed hierarchically, starting from the top level of the spherical tree going down to the coefcient level (see Fig. 4). Given these modeling assumptions, the job of the encoder is to choose the optimal (in rate-distortion sense) codeword which is admissible within the spherical coding framework. A codeword is admissible if its spherical tree can be decoded with zero distortion at the designated bitrate. More specically, for such a codeword, decoded phase and local energy coefcients should be exactly equal to their original values. At a given bitrate, the set of all admissible codewords denes the spherical codebook. The spherical codebook has a rather complicated and nonlinear structure. It is not a trivial problem to nd the optimal codeword which minimizes the distortion for a given set of wavelet coefcients. Building the spherical tree using true coefcient values and coding this original tree does not lead to an optimal answer. This could be easily understood by looking at how the coder behaves in smooth regions of the subband. Insignicant energies could add up to be signicant, and the coder could end up spending bitrate coding the total energy, not knowing that this energy comes from an insignicant region of the subband. Since the coder wastes bitrate for coding insignificant sets, the resulting codeword cannot be the optimal one. Ideally, spherical coding tree has to be constructed using optimally decoded wavelet values. However, there is no obvious way of determining these optimal values. Here, we propose a simple strategy for estimating the optimal spherical tree. First, wavelet coefcients are thresholded using a dead-zone interval. The dead-zone interval is used to nd an initial set of coefcients that must be quantized to zero for rate-distortion efciency. After thresholding, we perform a Lagrangian cost analysis to nd any other set of coefcients that should also be quantized to zero. Going from the bottom to the top of the spherical tree, we compare the Lagrangian cost of zero-quantizing all coefcients of a given node to the best alternative associated with choosing not to do so. The latter is equal to the cost of coding the assigned phase value of the node plus the best cost of the two children nodes (see Fig. 4). At the end, coefcients that belong
The decoded values are represented as and . 3) Optimizing spherical representation: Compare the Lagrangian cost of sending coded values of the wavelet coefcients to the cost of quantizing them all to zero. If the latter as cost is smaller, then quantize to zero. Dene , the Lagrangian cost. For
where , i.e.,
. While ,
do: , dene
where the Lagrangian cost for coding , i.e., (see step 4), is added to the total cost of two subtrees in order to get the total cost of the coefcients related to this node. Consequently
ATES AND ORCHARD: SPHERICAL CODING ALGORITHM FOR WAVELET IMAGE COMPRESSION
1021
= 1, v = 0, 1; (b) u = 3, v = 2, 3.
, , repeat the same procedure for . Increment and repeat step 4. . Set . While 4) Decoding: Code and send do, , , code with a uniform For quantizer for the interval [0, 2]. Normalize the step size based on the magnitude, such that there are an integer number of quantization cells in the interval (see Fig. 5)
For
where
Two quantization levels are placed at and . The other quantization points are chosen accordquantization ingly. Therefore, there are have quantization intervals of levels (since 0 and , then there half the size). Note that, if is no need for coding the phase values, and the iteration stops for such a node. In this case, all the coefcients of the node are quantized to zero. , , decode the local energies For
, and
, repeat ,
. Decrement and repeat step 3. At the end of decoding, we have, for decoded wavelet coefcients
, the
In the algorithm, and are chosen as the optimal quantization step size and the optimal dead-zone interval size, re-
spectively, for best rate-distortion performance for a given Lagrangian multiplier . For a given bitrate, optimal is found using the convex bisection algorithm of [14]. Note that, optimal is equal to the slope of the rate-distortion curve of the spherical coder at its operating point. Starting from two extreme points of rate-distortion curve, the bisection algorithm shrinks the interval in which the optimal point lies until it converges. If the algorithm does not converge after a certain number of iterations, then the value of is incremented or decremented in small steps to nd the optimal operating point. Arithmetic coding is used to code the phase variables. The spherical tree provides a natural context for adaptive arithmetic is adapted coding. The coding model of each phase value based on the corresponding number of quantization levels, , and the level of the tree, i.e., pair. Fig. 6 plots the histograms of phase variables at highest scale vertical subband at different levels of the spherical tree. Uniform distribution seems to provide a good t to phase histograms at lower levels. The distribution gets more peaked around 45 degrees at higher levels of the spherical tree. However, since , the use of true the number of phase variables drops as histograms in arithmetic coding does not provide much coding gain over uniform distribution. As a result, the bitrate of the arithmetic coder turns out to be only slightly better than the entropy estimate based on the uniform distribution. In other words, it is justied to use the self-information for estimating the bitrate of each phase variable. While encoding/decoding the spherical tree, once the algorithm reaches to a zeronode, all the coefcients that belong to that node are set to zero and no further bitrate is spent for coding the remaining phase values. Therefore, the comparison of Lagrangian cost between two modes of quantization, namely zeronode quantization and spherical quantization is essential for achieving successful coding results. The performance of the algorithm is very much dependent on how the spherical tree is constructed. Note that, the optimization step, i.e., step 3, tries to nd the set of coefcients that are to be quantized to zero, and does not provide estimates for decoded values of remaining coefcients. In other words, zeronode quantization is introduced as the only alternative to standard spherical coding of phase variables. In principle, it is possible to include more sophisticated vector quantization modes
1022
into the search list. Yet, this surely turns Lagrangian optimization into a much harder problem. It is rather challenging to gure out what the best strategy is, and how much better (in terms of total Lagrangian cost) the decoded values can get. The answer lies in the complicated structure of the codebook generated by the spherical quantization and coding strategy. The quantization of phase is very much dependent on the decoded magnitude, which is related to the previously decoded phase values. Therefore, the possible set of codewords have a rather complicated structure to visualize. It is an open research problem to develop a better understanding of the nature of this codebook and to improve the coding algorithm. V. SIMULATIONS AND DISCUSSIONS Spherical coder is implemented using biorthogonal linear phase lter pairs in a six-level dyadic decomposition. Quantization step size used for all phase variables in all subbands is the same, up to the necessary normalization factor. Optimal and are chosen among the set . Low-pass subband is arithmetic coded, after applying an (8 8) DCT, using optimal scalar quantizer for a given . The performance of the spherical coder is compared to that of some of the best performing coders in the literature [15], including SPIHT [5], SFQ [6], EQ [10], EBCOT [12], and EZBC [16]. Lena, Goldhill, and Barbara images are used for comparison. All results are for the 9/7 lter pair, except for EQ which uses the 10/18 lter. The results are reported at 1.00, 0.50, 0.25 bpp (see Table I). The spherical coder, called as SPHE in Table I, outperforms SPIHT, and is as good as SFQ, EBCOT and EZBC in most cases. The slightly better performance of EQ coder is partially due to the use of 10/18 lter. In Fig. 7, PSNR for Lena is plotted against different bitrates for SPHE and EBCOT. Except for Barbara, the performance of SPHE is about the same as that of EBCOT, which is the algorithm used in JPEG2000 standard. Note that, EBCOT uses sophisticated contextual models which can adapt well to the local frequency content of textured regions in images such as Barbara. Considering the simplicity of the modeling choices we have made in the spherical coder, these results are rather encouraging for our future efforts in developing highly
TABLE II PSNR RESULTS FOR SPHE USING 17/11 AND 9/7 FILTERS
efcient and adaptive coding methods based on the spherical representation. Table II provides PSNR results of SPHE using both 17/11 and 9/7 lter pairs. With 17/11 lter, PSNR is 0.050.1 dB better for Lena and Goldhill, and 0.30.4 dB better for Barbara. This is because 17/11 lter achieves better compaction of energy in wavelet subbands. This energy compaction is more pronounced for textured images such as Barbara. The performance of the spherical coder could possibly be improved in many different ways. Using uniformly distributed independent phases simplies the algorithm, and introduces some form of nonhomogeneity. However, this assumption is not quite right for modeling the actual nonhomogeneity of wavelet subbands. There exist complicated high-order dependencies among phase variables. In addition, there exist interband dependencies among phase variables that belong to the same spatial locations in different subbands. Since the spherical representation is robust to coding mismatch, the spherical coder with the independence assumption is still very successful. Based on the discussions of Section III, if we can model these dependencies at different levels of the hierarchy, and manage to decode optimal or close to optimal local energies, we expect signicant overall coding gains. As for the computational cost of the algorithm, the most timefor consuming part is to nd the optimal parameter set a target bitrate. Due to this exhaustive optimization, the complexity of the algorithm is comparable to that of SFQ and EQ, and quite higher than the other algorithms mentioned above. However, we believe that an exhaustive search for the optimal parameter set could be avoided by modeling the relationships between these parameters. As for the coding procedure for xed
ATES AND ORCHARD: SPHERICAL CODING ALGORITHM FOR WAVELET IMAGE COMPRESSION
1023
values of , the computational complexity is reasonable. For building the tree, the cost calculations require simple addition and comparison operations at each node. During decoding, the number of quantization bins is computed and uniform scalar quantization is performed for each node. A signicant portion of the coding complexity is due to context-based arithmetic coding of the quantized phase variables. For hardware implementation, in both coding stages, the different nodes at a certain level of the hierarchy could be processed in parallel, which could signicantly speed-up the execution. VI. CONCLUSION In this paper, we have introduced and analyzed the spherical representation as a convenient and exible framework for developing adaptive models for wavelet information. A simple application of the framework in wavelet subbands has led to the spherical coding algorithm. The competitive results attained by the spherical coder point towards the potential of such energy-based representations in modeling wavelet subbands. On a more philosophical point, spherical coding technique is based on an orthogonal representation which is quite different than the usual orthogonal bases of Cartesian coordinate system. Instead, the phase variables here could be seen as the basis vectors of the spherical coordinate system. To be more accurate, in its current form, the spherical coder is a combination of both coordinate systems, since wavelet transformation is applied rst and the spherical coordinates are used independently in each subband. The phase coordinates could also be dened in different ways than the hierarchical way we did in our algorithm. One might think of various other ways to use these two types of representations together for developing interesting coding strategies. This could lead to a coding theory much more general than the theory of transform coding. Spherical representation could nd interesting applications in elds other than coding wavelet subbands. One such area is the study of turbulence. Multifractals are extensively used in this eld [17], [18], mainly to describe the spatial dissipation of turbulent energy. There exist several techniques to analyze the multifractal nature of given data using different statistical tools, such as the multifractal spectra [19]. In contrast to such global descriptions, spherical representation could be used to develop more localized statistical models for these kind of processes. Spherical coder is a basic implementation of the ideal adaptive coding methods that we are looking for, mainly because it does not rely on a deep understanding of natural images. We expect to develop more intelligent coding techniques and achieve much better results, if we can model the dependencies that exist among local energy and phase variables. Spherical coder, as described in this paper, is a nonprogressive lossy coding method. Our current work also focuses on different versions of spherical coder for lossless coding and for progressive coding by modifying the way in which the spherical tree is constructed and coded. REFERENCES
[1] H. Ates and M. Orchard, Wavelet image coding using the spherical representation, in Proc. IEEE Int. Conf. Image Processing, Genova, Italy, Sep. 2005, vol. 1, pp. 8992.
[2] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Process., vol. 1, no. 2, pp. 205220, Apr. 1992. [3] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using vector quantization in the wavelet transform domain, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Albuquerque, NM, Apr. 1990, vol. 4, pp. 22972300. [4] J. Shapiro, Embedded image coding using zerotrees of wavelet coefcients, IEEE Trans. Signal Process., vol. 41, no. 12, pp. 34453462, Dec. 1993. [5] A. Said and W. Pearlman, A new fast and efcient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuit Syst. Video Technol., vol. 6, no. 3, pp. 243250, Jun. 1996. [6] Z. Xiong, K. Ramchandran, and M. Orchard, Space-frequency quantization for wavelet image coding, IEEE Trans. Image Process., vol. 6, no. 5, pp. 677693, May 1997. [7] Z. Xiong, K. Ramchandran, and M. Orchard, Joint optimization of scalar and tree-structured quantization of wavelet image decomposition, in Proc. Conf. Rec. 33th Asilomar, Pacic Grove, CA, Nov. 1993, vol. 2, pp. 891895. [8] K. Ramchandran and M. T. Orchard, An investigation of waveletbased image coding using an entropy-constrained quantization framework, IEEE Trans. Signal Process., vol. 46, no. 2, pp. 342353, Feb. 1998. [9] Z. Xiong, N. Galatsanos, and M. Orchard, Marginal analysis prioritization for image compression based on a hierarchical wavelet decomposition, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Minneapolis, MN, Apr. 1993, vol. 5, pp. 546549. [10] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, Image coding based on mixture modeling of wavelet coefcients and a fast estimation-quantization framework, in Proc. Data Compression Conf., Snowbird, UT, Mar. 1997, pp. 221230. [11] R. Joshi and H. Jafarkhani et al., Comparison of different methods of classication in subband coding of images, IEEE Trans. Image Process., vol. 6, no. 11, pp. 14731486, Nov. 1997. [12] D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. Image Process., vol. 9, no. 7, pp. 12191235, Jul. 2000. [13] R. Gray and T. Linder, Relative entropy and quantizer mismatch, in Proc. Conf. Rec. 36th Asilomar, Nov. 2002, vol. 1, pp. 129133. [14] K. Ramchandran and M. Vetterli, Best wavelet packet bases in a rate-distortion sense, IEEE Trans. Image Process., vol. 2, no. 4, pp. 160175, Apr. 1993. [15] [Online]. Available: http://www.icsl.ucla.edu/~ipl/psnr_results.html, Wavelet Image Coding: PSNR Results [16] S. Hsiang, Embedded image coding using zeroblocks of subband/ wavelet coefcients and context modeling, in Proc. Data Compression Conf., Washington, DC, 2001, pp. 8392. [17] C. Meneveau and K. Sreenivasan, Simple multifractal cascade model for fully developed turbulence, Phys. Rev. Lett., vol. 59, no. 13, pp. 14241427, Sept. 1987. [18] C. Meneveau and K. Sreenivasan, The multifractal nature of turbulent energy dissipation, J. Fluid Dyn., vol. 224, pp. 429484, Mar. 1991. [19] R. Riedi and I. Scheuring, Conditional and relative multifractal spectra, Fractals, vol. 5, no. 1, pp. 153168, Mar. 1997.
Hasan F. Ates (S96M04) received the B.S. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 1998, and the M.A. and Ph.D. degrees from the Department of Electrical Engineering, Princeton University, Princeton, NJ, in 2000 and 2004, respectively. He was a Postdoctorate Research Associate at Sabanci University, Istanbul, between 2004 and 2005. He is currently an Assistant Professor in the Department of Electronics Engineering, Isik University, Istanbul, Turkey, which he joined in August 2005. His research interests include image, video and graphics compression, video enhancement, wavelets and multiresolution representations, and computer vision. He is currently working on industrial- and government-sponsored projects related to video coding, high-denition TV and 3-D mesh compression. He is the author/co-author of more than 20 peer-reviewed publications. Dr. Ates serves on the technical committee of various IEEE conferences and journals and was session chair at ICIP 2006.
1024
Michael T. Orchard (F00) was born in Shanghai, China, and grew up in New York. He received the B.S. and M.S. degrees in electrical engineering from San Diego State University, San Diego, CA, in 1980 and 1986, respectively, and the M.A. and Ph.D. degrees in electrical engineering from Princeton University, Princeton, NJ, in 1988 and 1990, respectively. He has been a Professor in the Department of Electrical and Computer Engineering, Rice University, Houston, TX, since 2001. Prior to joining Rice University, he was on the faculty at the University of Illinois at Urbana-Champaign, Urbana, from 1990 to 1995, and at Princeton University from 1995 to 2001. He consulted with the Visual Communications Department, AT&T Bell Laboratories, from 1988 to 1999. Prof. Orchard was elected IEEE Fellow in 2000 for contributions to the theory and development of image and video compression algorithms.