High Performance Scalable Image Compression With EBCOT: David Taubman, Member, IEEE
High Performance Scalable Image Compression With EBCOT: David Taubman, Member, IEEE
High Performance Scalable Image Compression With EBCOT: David Taubman, Member, IEEE
7, JULY 2000
(a)
I. INTRODUCTION
(1)
Fig. 5. Typical vertical spectra traced from the image domain to the vertically high-pass subbands.
Fig. 7. Appearance of coding passes and quad-tree codes in each block’s embedded bit-stream.
1) “Forward Significance Propagation Pass,” : In this , from code-block . As illustrated in Fig. 2, some
pass, we visit the sub-block samples in scan-line order, skip- code-blocks might contribute no bytes at all to some layers.
ping over all samples which are either insignificant or do not Along with these incremental contributions, the length of the
have a so-called “preferred neighborhood.” For the LH and HL segment, , and the number of new coding passes,
subbands, sample is said to have a preferred neighbor- , in the segment must be explicitly identified. Finally,
hood if at least one of its horizontal neighbors is significant, if makes its first nonempty contribution to quality layer
i.e., . Recall that HL subband code-blocks are then the most significant bit-plane, , must also be identi-
transposed so that both LH and HL subbands tend to contain fied, as mentioned in Section III-A.
horizontal line segments. The LL subband is treated in the same We focus our description on the two quantities which exhibit
way for convenience, while the HH subbands’ samples are said substantial inter-block redundancy and show how this redun-
to have a preferred neighborhood if one or more of the four di- dancy is exploited within the second tier coding engine; full de-
agonal neighbors is significant, i.e., . To tails may be found in [19], [1]. These two quantities are
each such sample, we apply the ZC and RLC primitives, as ap- and the index, , of the quality layer to which first makes a
propriate, to identify whether or not the sample first becomes nonempty contribution. The latter quantity, , is encoded using
significant in bit-plane ; if so, we invoke the SC primitive to a separate embedded quad-tree code within each subband as
code its sign. We call this the “forward significance propagation follows. Let denote the sequence of quads at level in the
pass” because samples which have been found to be significant quad-tree, with denoting the leaves and the root
typically serve as seeds for waves of new significance determi- of the tree, representing the entire subband. Let be the index
nation steps which propagate in the direction of the scan. of the first layer in which any code-block in quad makes a
2) “Reverse Significance Propagation Pass,” : This nonempty contribution, i.e. . In each
coding pass is identical to , except that the samples are quality layer, , a binary quad-tree code is used to identify
visited in the reverse order and the notion of a preferred whether or not . That is, a single bit is used to identify
neighborhood is expanded to encompass samples for which at whether or not for each quad at each level, , in the
least one of the eight immediate neighbors has already been tree, skipping over quads for which the value of this bit may be
found to be significant. Of course, we skip over samples for inferred by the decoder, for one of the following two reasons:
which information was coded in the previous pass. 1) , in which case the value of has already been
3) “Magnitude Refinement Pass,” : During this pass we identified in a previous quality layer; or 2) where
skip over all samples except those which are already significant, belongs to the parent quad, , in which case we must have
i.e. , and for which no information has been coded in . To see how this code exploits inter-block redundancy,
the previous two passes. These samples are processed with the consider an initial set of lowest quality layers which correspond
MR primitive. to very low bit-rates; in this case, it is reasonable to suppose that
4) “Normalization Pass,” : Here we code the least sig- none of the code-blocks from the highest frequency subbands
nificant bit, , of all samples not considered in the previous have sufficiently steep distortion-rate slopes to make any contri-
three coding passes, using the SC and RLC primitives as appro- bution to these layers. The quad-tree code for each such subband
priate; if a sample is found to be significant in this process, its consists of a single 0 bit for each of these empty layers. More
sign is coded immediately using the SC primitive. generally, the distortion-rate slopes for individual code-blocks
depend upon local image statistics; if these statistics vary slowly
over the image then neighboring code-blocks should have sim-
IV. LAYER FORMATION AND REPRESENTATION ilar or identical values.
The other quantity which exhibits substantial inter-block
In this section we consider the second tier coding engine of redundancy is . One might consider using a similar em-
Fig. 3, which is responsible for efficiently identifying the con- bedded quad-tree code to represent this quantity. However, the
tribution of each code-block to each bit-stream layer, along with value of is irrelevant until the quality layer in which
other summary information for the code-blocks. Recall that the the code-block first makes a contribution to the bit-stream
final bit-stream is composed of a collection of quality layers, and the code-blocks in any given subband do not generally
. Together, layers through contain the initial all make their first contribution in the same quality layer. The
bytes of each code-block, . Here, we write for the trun- embedding principle suggests that we should avoid sending
cation point selected for the th quality layer, with some abuse any unnecessary information concerning until layer
of the notation established in Section II where denotes the . EBCOT achieves this, while still exploiting inter-block
R-D optimal truncation point corresponding to a threshold of redundancy in coding the values, by means of a mod-
on the distortion-rate slope. Thus, is short-hand for with ified embedded quad-tree, which is driven from the leaves
denoting the distortion-rate threshold selected for layer . rather than the root of the tree. Specifically, let denote
Layer contains the incremental contribution, the elements of the quad-tree structure built on top of the
1166 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000
V. NUMERICAL RESULTS
Table III provides numerical results to illustrate the perfor- the target bit-rates in the table; this may be sufficient for some
mance of the proposed EBCOT algorithm under a variety of applications. The fourth column corresponds to the extreme case
conditions. Results are presented for the well-known USC im- in which 50 separate layers are included in the bit-stream span-
ages, “Lenna” and “Barbara,” as well as the three most pop- ning bit-rates ranging from approximately 0.05 bpp to 2.0 bpp;
ular test images from the JPEG2000 test suite, “bike,” “cafe,” in this case, the layer bit-rates are spaced approximately loga-
and “woman,” which are substantially more complex and less rithmically through this range by selecting an appropriate set of
blurred than the USC images. The first column of PSNR re- distortion-rate slope parameters, , but no rate-control iteration
sults corresponds to the well known SPIHT [13] algorithm with is performed to adjust the values for specific target bit-rates.
arithmetic coding. The remaining columns are obtained with the As might be expected, performance decreases as more layers
EBCOT algorithm, running within the framework of JPEG2000 are added to the bit-stream, because the overhead associated
Verification Model VM3A [20]. In all cases, we use the popular with identifying the contributions of each code-block to each
Daubechies 9/7 bi-orthogonal wavelet filters with a five level layer grows. Nevertheless, performance continues to be com-
transform. For EBCOT we use code-blocks of size 64 64 with petitive with respect to state-of-the-art compression algorithms,
sub-blocks of size 16 16. significantly outperforming the common reference, SPIHT. All
Recall that the EBCOT bit-stream is composed of a collec- results for the EBCOT algorithm are obtained using a single
tion of quality layers and that SNR scalability is obtained by quantization step size, regardless of the image or bit-rate, with
discarding unwanted layers. The second column in the table cor- rate control implemented exclusively through truncation of the
responds to a bit-stream with only one layer, so that the overall embedded block bit-streams. For the smaller images, at very low
bit-stream is not SNR scalable. Results in this case are obtained bit-rates the EBCOT results are slightly penalized by the 59 byte
by generating a separate compressed bit-stream for each of the header included by the JPEG2000 VM3A software [20]. Some
relevant bit-rates. Each of the remaining columns are obtained of the results appear to be counter-intuitive. Specifically, at 0.25
by truncating a single bit-stream to the relevant bit-rates. The bpp, the performance for “Lenna” with five layers appears to be
third column corresponds to a limited form of SNR scalability marginally higher than that with only one layer, even though the
in which there are only five quality layers, optimized for each of overhead associated with signaling five layers is undoubtedly
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1167
higher. Similar behavior is observed with the “Barbara” image. The proposed visual masking strength operator, , has the
The explanation for this lies with the observation in Section II form
that it is not generally possible to find a distortion-rate threshold,
, which satisfies the rate constraint exactly. As a result, the por-
tion of the bit-stream which is actually decoded in SNR progres- (5)
sive tests sometimes includes part of the next quality layer after
that which was optimized for the target bit-rate, as in Fig. 2.
Whereas the first four columns of PSNR results in Table III Here denotes a neighborhood of samples about and
are obtained using a five-level Mallat decomposition of the form denotes the size of this neighborhood. The nonlinear
shown in Fig. 1(a), the fifth column corresponds to a five-level operation is to be understood within the context of normalized
decomposition of the form shown in Fig. 1(b), using five tar- image samples with a range of 0 to 1 and a normalized wavelet
geted bit-stream layers as for the second column. This has been transform whose low-pass and high-pass analysis filters have
coined the “Spacl” decomposition within the JPEG2000 com- unit gain at DC and the Nyquist frequency, respectively, so that
munity. Evidently, this decomposition structure typically leads . For our experiments, the exponent is set to ,
to lower MSE distortion (higher PSNR) at all but the highest and the neighborhoods, , are obtained by partitioning the
bit-rates. We point out that tree-based coders such as SPIHT code-block into 8 8 cells, using the same masking strength
cannot readily be adapted to non-Mallat decomposition struc- value for all samples in any given cell. This reduces the com-
tures. plexity of computing the visual distortion metric to a small frac-
tion of that for the entire encoder. Our formulation is closely re-
lated to the models used in [22] and [3] in which and
VI. VISUAL DISTORTION METRICS AND PERFORMANCE
, respectively; in our experiments, appears to give
The numerical results in Table III are obtained with MSE as superior visual performance. We use a single small visibility
the distortion metric for the PCRD optimization algorithm of floor, , of for all subbands, so that the distortion metric
Section II. It is well known that MSE is a poor model for the is rendered independent of any assumptions on the viewing dis-
visual significance of distortion. Various authors (e.g., [7], [8]) tance, which is highly desirable for uncontrolled viewing con-
have considered the relatively straightforward extension to fre- ditions.
quency weighted MSE, in which the overall image distortion is Not surprisingly, this visual masking metric has a greater
taken to be a weighted sum of the MSE contributions from each effect on image quality when the code-blocks are small; we find
subband, with the weights derived from studies of the contrast the best performance over a wide range of images when using
sensitivity function (CSF). These approaches have two notable 32 32 code-blocks. Fig. 8 provides a comparison of SPIHT
drawbacks: the weights depend strongly upon the angle sub- [13] with arithmetic coding against EBCOT, operating in the
tended by each reconstructed pixel at an assumed viewing dis- context of JPEG2000 VM3A with this visual distortion metric.
tance; and the model fails to account for the substantial impact When optimizing for MSE alone, the visual quality of EBCOT
of visual masking effects. In fact, the CSF accounts primarily for compressed images is very similar to that of SPIHT. Although
the modulation transfer function (MTF) of the physical proper- only small segments from the 2560 2048 image “woman”
ties of the optics and aperture of the cones in the human eye; the can be reproduced here, we note that quality is uniformly
MTF of the relevant display device is often also incorporated. improved over the entire image. The EBCOT images exhibit
More generally, we may consider spatially varying distortion substantially less ringing around edges and superior rendition
metrics which attempt to exploit the masking phenomenon. of texture; some details preserved in the EBCOT images are
Watson’s work [22] on visual optimization of JPEG com- completely lost by the SPIHT algorithm. In fact, for this image
pressed images is noteworthy in this regard, as is the work of we find that the image quality obtained using EBCOT at 0.2
Höntsch and Karam [3]. In these and other previous works, bpp is comparable to that obtained using SPIHT at 0.4 bpp.
visual masking effects must be taken into account by explicitly Similar results are observed with other large images having
modifying the quantization parameters; scalable compression is comparable content, although the method is less effective with
not considered; and rate-control must be performed iteratively. some image types. Although we make no assumptions here
The EBCOT algorithm provides an excellent context within concerning viewing distance, other studies [8] have shown that
which masking phenomena can be exploited without substan- the visual masking metric outlined here can be successfully
tially increasing computational complexity or sacrificing other combined with an appropriate global compensation for known
properties such as random access or scalability. In our studies, CSF characteristics, yielding complementary improvements in
the following distortion metric has been found to yield signifi- quality.
cantly superior visual image quality than the MSE metric
VII. UTILITY OF THE RANDOM ACCESS ATTRIBUTE
(4)
Since EBCOT partitions each subband into relatively small
code-blocks and codes each of them independently, it would
Here , and are all as in Section II, is the appear to be suited to applications requiring some degree of
“visual masking strength” at sample and is a “visibility “random access” into the image. At one extreme we may con-
floor” term which establishes the visual significance of distor- sider applications which intend to decompress the entire image,
tion in the absence of masking. but in a different order to that in which it was compressed,
1168 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000
Fig. 8. Comparison between SPIHT (left) and EBCOT using the visual masking metric (right). Shows 500 2 500 regions from the 2560 2 2048 test image
“woman,” at 0.15 bpp.
e.g., from bottom to top or left to right. Since the code-blocks a server, which may or may not ultimately constitute the entire
can be decoded in any order and the wavelet transform may image. The efficient realization of such applications depends
be incrementally implemented in any direction, it is clear that upon careful caching of code-blocks which will often belong
EBCOT is well suited to such applications. At the opposite ex- to multiple regions.
treme we consider applications which require only a small, ar- For the purposes of this investigation, we assume that a square
bitrarily located region in the original image. In general, the image region with pixels must be decompressed from
number of subband samples represented by the code-blocks re- an EBCOT bit-stream representing an original image with very
quired to synthesize the requested image region will be larger much larger dimensions so that we can ignore boundary condi-
than the region itself. In this section we attempt to quantify this tions. Fig. 9 contains a log-log plot of vs. , where
inherent inefficiency. Remote browsing applications lie between is the number of bits required to reconstruct the requested region
these two extremes: a client interactively requests regions from and is the compressed bit-rate of the original image. Thus, the
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1169
TABLE IV [3] I. Höntsch and L. Karam, “APIC: Adaptive perceptual image coding
CPU DECODING TIMES OBTAINED USING JPEG2000 VM5.1 WITH A 400 based on subband decomposition with locally adaptive perceptual
MHz PENTIUM II PROCESSOR. CPU TIMES EXPRESSED IN SECONDS PER weighting,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, 1997,
MILLION IMAGE PIXELS pp. 37–40.
[4] J. Li and S. Lei, “Rate-distortion optimized embedding,” in Proc. Picture
Coding Symp., Berlin, Germany, Sept. 10–12, 1997, pp. 201–206.
[5] S. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell.,
vol. 11, pp. 674–693, July 1989.
[6] Reduced Complexity Entropy Coding, ISO/IEC JTC1/SC29/WG1
N1312, June 1999.
[7] A. Mazzarri and R. Leonardi, “Perceptual embedded image coding using
wavelet transforms,” in Proc. IEEE Int. Conf. Image Processing, vol. 1,
1995, pp. 586–589.
[8] Quality Improvement Using Contrast Sensitivity Filtering, ISO/IEC
independent coding principle in EBCOT ensures that multiple JTC1/SC29/WG1 N1306, June 1999.
code-blocks may be processed in parallel, microscopic paral- [9] D. Nister and C. Christopoulos, “Lossless region of interest with a natu-
rally progressive still image coding algorithm,” in Proc. IEEE Int. Conf.
lelism at the level of individual coding passes can be exploited Image Processing, Oct. 1998, pp. 856–860.
for more efficient hardware implementations. To enable this [10] E. Ordentlich, M. Weinberger, and G. Seroussi, “A low-complexity
behavior, the arithmetic codeword generation process can be modeling approach for embeddded coding of wavelet coefficients,” in
Proc. IEEE Data Compression Conf., Snowbird, UT, Mar. 1998, pp.
reset at the commencement of each coding pass and the various 408–417.
coding contexts reset to their skewed initial states. The context [11] E. Ordentlich, D. Taubman, M. Weinberger, G. Seroussi, and M. Mar-
cellin, “Memory efficient scalable line-based image coding,” in Proc.
quantization process for the various coding primitives may also IEEE Data Compression Conf., Snowbird, UT, Mar. 1999, pp. 218–227.
be constrained to be vertically stripe causal, meaning that sam- [12] Line-Based Coding—On Merging VMA and VMB, ISO/IEC
ples from future stripes may be considered insignificant when JTC1/SC29/WG1 N1201, Mar. 1999.
[13] A. Said and W. Pearlman, “A new, fast and efficient image codec based
determining the coding contexts. These options typically result on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst.
in an additional loss of about 0.15 dB with 64 64 code-blocks Video Technol., vol. 6, pp. 243–250, June 1996.
[14] J. M. Shapiro, “An embedded hierarchical image coder using zerotrees
and about 0.35 dB with 32 32 code-blocks at modest bit-rates. of wavelet coefficients,” in IEEE Data Compression Conf., Snowbird,
Most of the ideas behind these parallel processing options are UT, 1993, pp. 214–223.
explained in [11], [12]. [15] F. Sheng, A. Bilgin, P. Sementilli, and M. Marcellin, “Lossy and loss-
less image compression using reversible integer wavelet transforms,” in
A so-called lazy coding option has also been introduced, Proc. IEEE Int. Conf. Image Processing, Oct. 1998.
in which the arithmetic coding procedure is completely by- [16] D. Taubman and A. Zakhor, “Multi-rate 3-D subband coding of video,”
IEEE Trans. Image Processing, vol. 3, pp. 572–588, Sept. 1994.
passed for most of the significance propagation and magnitude [17] D. Taubman, “Directionality and scalability in image and video com-
refinement coding passes. This mode substantially reduces pression,” Ph.D. dissertation, Univ. Calif., Berkeley, 1994.
complexity at high bit-rates, with the loss of less than 0.1 dB in [18] Embedded, Independent Block-Based Coding of Subband Data,
ISO/IEC JTC1/SC29/WG1 N871R, July 1998.
compression performance. [19] EBCOT: Embedded Block Coding with Optimized Truncation, ISO/IEC
JTC1/SC29/WG1 N1020R, Oct. 1998.
[20] JPEG2000 Verification Model VM3A, ISO/IEC JTC1/SC29/WG1
IX. CONCLUSIONS N1143, Feb. 1999.
[21] Proposal of the Arithmetic Coder for JPEG2000, ISO/IEC
The EBCOT image compression algorithm offers JTC1/SC29/WG1 N762, Mar. 1998.
state-of-the-art compression performance together with an [22] A. B. Watson, “DCT quantization matrices visually optimized for indi-
unprecedented set of bit-stream features, including resolution vidual images,” Proc. SPIE, vol. 1913, pp. 202–216, 1993.
scalability, SNR scalability and a “random access” capability.
All features can coexist-exist within a single bit-stream without
substantial sacrifices in compression efficiency. David Taubman (S’93–M’95) received the B.S. de-
The ability to produce independent, finely embedded gree in computer science and mathematics in 1986
and the B.E. degree in electrical engineering in 1988,
bit-streams, for relatively small blocks of subband samples both from the University of Sydney, Australia. He re-
enables the effective use of post-compression rate-distortion ceived the M.S. degree in 1992 and the Ph.D. degree
optimization. In turn, this enables the successful exploitation in 1994, both in electrical engineering, from the Uni-
versity of California, Berkeley.
of visual masking, which has been hampered by causality From 1994 to 1998, he was with Hewlett-Packard
constraints in more conventional compression frameworks. Research Laboratories, Palo Alto, CA, joining the
The EBCOT algorithm also introduces the concept of abstract University of New South Wales, Sydney, in 1998
as a Senior Lecturer in the School of Electrical
quality layers which are not directly related to the structural Engineering. Since 1998, he has been heavily involved in the working group
properties of the underlying entropy coder. This endows the ISO/IEC JTC1/SC29/WG1, developing the JPEG2000 image compression
bit-stream with tremendous flexibility, since the encoder may standard; he is the author of the Verification Model (VM3A) software and
decide to construct any number of layers from any combination accompanying documentation upon which the standard is based. His research
interests include highly scalable image and video compression technology,
of code-block contributions whatsoever. inverse problems in imaging, perceptual modeling and optimization, and
network distribution of compressed multimedia content.
Dr. Taubman received the University Medal from the University of Sydney in
REFERENCES 1988 for electrical engineering. In the same year, he also received the Institute
[1] JPEG2000 Verfication Model 5.0 (Technical Description), ISO/IEC of Engineers, Australia, Prize and the Texas Instruments Prize for Digital Signal
JTC1/SC29/WG1 N1420, 1999. Processing, Australasia. His 1996 paper with A. Zakhor, entitled “A common
[2] H. Everett, “Generalized Lagrange multiplier method for solving framework for rate and distortion based scaling of highly scalable compressed
problems of optimum allocation of resources,” Oper. Res., vol. 11, pp. video” was awarded Best Paper by the IEEE Circuits and Systems Society in
399–417, 1963. 1997.