High Performance Scalable Image Compression With EBCOT: David Taubman, Member, IEEE

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1158 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO.

7, JULY 2000

High Performance Scalable Image Compression


with EBCOT
David Taubman, Member, IEEE

Abstract—A new image compression algorithm is proposed,


based on independent Embedded Block Coding with Optimized
Truncation of the embedded bit-streams (EBCOT). The algorithm
exhibits state-of-the-art compression performance while pro-
ducing a bit-stream with a rich set of features, including resolution
and SNR scalability together with a “random access” property.
The algorithm has modest complexity and is suitable for applica-
tions involving remote browsing of large compressed images. The
algorithm lends itself to explicit optimization with respect to MSE
as well as more realistic psychovisual metrics, capable of modeling
the spatially varying visual masking phenomenon.
Index Terms—Embedded coding, image compression, JPEG
2000, random access, rate-distortion optimization, scalability,
visual masking.

(a)
I. INTRODUCTION

T HIS paper describes a novel image compression algorithm


known as EBCOT. The acronym is derived from the de-
scription “embedded block coding with optimized truncation”
(EBCOT) which identifies some of the major contributions
of the algorithm. The EBCOT algorithm is related in various
degrees to much earlier work on scalable image compression.
Noteworthy among its early predecessors are Shapiro’s EZW
(embedded zero-tree wavelet compression) algorithm [14],
Said and Pearlman’s SPIHT (spatial partitioning of images into
hierarchical trees) algorithm [13] and Taubman and Zakhor’s
LZC (layered zero coding) algorithm [16]. Like each of these,
the EBCOT algorithm uses a wavelet transform to generate the
subband samples which are to be quantized and coded, where
the usual dyadic decomposition structure attributed to Mallat (b)
[5] is typical, but other “packet” decompositions are also Fig. 1. (a) Mallat and (b) “Spacl” wavelet decomposition structure for three
supported and occasionally preferable. Fig. 1 illustrates two decomposition levels—five are used in the experiments.
decomposition structures which are considered in this paper.
In each case, the original image is represented in terms of a
efficient compression of the original image at a reduced resolu-
collection of subbands, , , which may be organized
tion or increased distortion. The terms “resolution scalability”
into increasing resolution levels, , . The lowest
and “SNR scalability” are commonly used in connection with
resolution level consists of the single LL subband, .
this idea. We say that a bit-stream is resolution scalable if it
Each successive resolution level, , contains the additional
contains distinct subsets, , representing each successive res-
subbands, , which are required to reconstruct the image with
olution level, . We say that a bit-stream is SNR scalable if it
twice the horizontal and vertical resolution.
contains distinct subsets, , such that together repre-
Scalable compression refers to the generation of a bit-stream
sent the samples from all subbands at some quality (SNR) level,
which contains embedded subsets, each of which represents an
. A bit-stream may be both resolution and SNR scalable if it
contains distinct subsets, , which hold the relevant quality
Manuscript received May 4, 1999; revised October 13, 1999. This work was refinement of only those subbands in resolution level . A key
largely completed while the author was at Hewlett-Packard Research Laborato-
ries, Palo Alto, CA. The associate editor coordinating the review of this manu-
advantage of scalable compression is that the target bit-rate or
script and approving it for publication was Prof. Touradj Ebrahimi. reconstruction resolution need not be known at the time of com-
The author is with the School of Electrical Engineering, University of New pression. A related advantage of practical significance is that the
South Wales, Sydney, Australia.
Publisher Item Identifier S 1057-7149(00)05436-1. image need not be compressed multiple times in order to achieve

1057–7149/00$10.00 © 2000 IEEE


TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1159

a target bit-rate, as is common with the existing JPEG compres-


sion standard.
In EBCOT, each subband is partitioned into relatively small
blocks of samples, which we call code-blocks. EBCOT gen-
erates a separate highly scalable (or embedded) bit-stream for
each code-block, . The bit-stream associated with may
be independently truncated to any of a collection of different
lengths, , where the increase in reconstructed image distor-
tion resulting from these truncations is modeled by . An en-
abling observation leading to the development of the EBCOT
algorithm is that it is possible to independently compress rela-
tive small code-blocks (say 32 32 or 64 64 samples each)
Fig. 2. Progressive appearance of embedded code-block bit-streams in quality
with an embedded bit-stream consisting of a large number of layers. Only nine blocks and three layers shown for simplicity. The shaded
truncation points, , such that most of these truncation points region identifies the block contributions which are discarded by truncating the
lie on the convex hull of the corresponding rate-distortion curve. bit-stream between layers 1 and 2.
To achieve this efficient, fine embedding, the EBCOT block
coding algorithm builds upon the fractional bit-plane coding the corresponding lengths, . Such a bit-stream is clearly
ideas which have recently been introduced by Ordentlich et al. resolution scalable, because the information representing the
[10] and by Li and Lei [4]. The embedded block coding algo- individual code-blocks and hence the subbands and resolution
rithm is developed in Section III. levels is clearly delineated. Also, the bit-stream possesses a
useful “random access” attribute: given any region of interest
A. Efficient One-Pass Rate Control and a wavelet transform with finite support kernels, as is
Given a target bit-rate, say , we can truncate each of common, it is possible to identify the region within each
the independent code-block bit-streams in an optimal way so subband and hence the code-blocks which are required to
as to minimize distortion subject to the bit-rate constraint. We correctly reconstruct the region of interest [9].
refer to this as post-compression rate-distortion (PCRD) opti- Interestingly, this simple bit-stream organization is not itself
mization, because the rate-distortion algorithm is applied after SNR scalable, despite the fact that it is composed of SNR scal-
all the subband samples have been compressed. The PCRD op- able block bit-streams. This is because only a single truncation
timization algorithm is described in Section II. point and length are identified within the final bit-stream for
Although image compression schemes involving rate-dis- each code-block. The EBCOT algorithm overcomes this dif-
tortion optimization abound in the literature, the advantage of ficulty by collecting incremental contributions from the var-
PCRD optimization is its reduced complexity. The image need ious code-blocks into so-called quality layers, , such that
only be compressed once, after which the PCRD algorithm the code-block contributions represented by layers through
consumes negligible computational resources in passing over form a rate-distortion optimal representation of the image,
the embedded block bit-streams. Perhaps even more impor- for each . This is easily achieved with the aid of the PCRD
tantly, there is no need to buffer the entire image or indeed algorithm described in Section II. In this way, truncating the
any quantity comparable to the size of the image. The wavelet bit-stream to any whole number of layers yields a rate-distor-
transform and block coding operations may be implemented tion optimal representation of the image, while truncating to an
incrementally using a relatively small amount of memory intermediate bit-rate yields a bit-stream which is approximately
which is proportional to one linear dimension of the image (say optimal provided the number of quality layers is relatively large.
its width), as explained in [19]. Thus, the only representation of Fig. 2 illustrates the layered bit-stream concept; it also illus-
the image which must be buffered prior to PCRD optimization trates the effect of truncating the bit-stream between the first and
is the embedded block bit-streams, which are generally much second layers. Each quality layer must include auxiliary infor-
smaller than the original image. In fact,it is also possible to mation to identify the size of each code-block’s contribution to
perform the PCRD optimization step incrementally so that the layer. When the number of layers is large, only a subset of the
only a fraction of the compressed block bit-streams need be code-blocks will contribute to any given layer, introducing sub-
buffered. Earlier work on PCRD optimization may be found stantial redundancy in this auxiliary information. To take advan-
in [17]. The key features which distinguish EBCOT from this tage of this, EBCOT introduces a “second tier” coding engine
previous approach are the availability of more finely embedded to compress the auxiliary information for each quality layer.
bit-streams and the use of much smaller blocks of subband EBCOT’s layered bit-stream organization and two-tiered
samples. coding strategy represent a novel departure from current con-
vention. Image compression algorithms previously described
in the literature generate bit-streams whose organization is tied
B. Feature-Rich Bit-Streams
concretely to the structure of the embedded quantization and
The simplest incarnation of the concepts mentioned above is coding algorithm which is used. EBCOT, however, constructs
a bit-stream generated by concatenating the suitably truncated abstract bit-stream layers, whose relationship to the trunca-
representations of each code-block, , including sufficient tion points offered by the underlying block coding engine is
auxiliary information to identify the truncation points, , and entirely arbitrary and is itself compressed. Fig. 3 illustrates
1160 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

whose embedded bit-streams may be truncated to rates, .


The contribution from to distortion in the reconstructed
image is denoted , for each truncation point, . We are thus
assuming that the relevant distortion metric is additive, i.e.,

(1)

where represents overall image distortion and denotes the


truncation point selected for code-block . In our experimental
work, two different distortion metrics are considered. An addi-
tive distortion metric which approximates Mean Squared Error
(MSE) is obtained by setting

Here, denotes the two-dimensional (2-D) sequence of sub-


Fig. 3. Two-tiered coding structure of the EBCOT image compression band samples in code-block , denotes the quantized rep-
algorithm.
resentation of these samples associated with truncation point ,
this two-tiered compression paradigm. The layered abstraction and denotes the L2-norm of the wavelet basis functions for
and associated coding techniques are discussed further in Sec- the subband, , to which code-block belongs. This approxi-
tion IV. Useful bit-stream organizations range from single-layer mation is valid provided the wavelet transform’s basis functions
streams which possess only the resolution scalable and random are orthogonal or the quantization errors in each of the samples
access attributes, through streams with a few layers targeted are uncorrelated. Neither of these requirements is strictly sat-
at specific bit-rates of interest, and ultimately to streams with isfied; however, the wavelet kernels used in our experimental
a large number of layers, which offer excellent generic SNR investigations in Section V have nearly orthogonal basis func-
scalability, in combination with the resolution scalable and tions. A second distortion metric which correlates more suc-
random access attributes. EBCOT is able to achieve all of these cessfully with perceived visual distortion is investigated in Sec-
features in a single bit-stream while exhibiting state-of-the-art tion VI.
compression performance, as demonstrated in Section V. By We now briefly discuss the optimal selection of the truncation
contrast, well known scalable image compression algorithms points, , so as to minimize distortion subject to a constraint,
such as EZW [14] and SPIHT [13] offer only SNR scalability, , on the available bit-rate, i.e.,
to which the LZC algorithm [16] adds resolution scalability.
The utility of the random access attribute is examined in (2)
Section VII. It is worth pointing out that the IW44 algorithm
in AT&T’s DjVu document compression system also achieves
resolution and SNR scalability in combination with the random The procedure is not novel [2], but is summarized here for com-
access attribute, using similar techniques to EBCOT; however, pleteness.
the abstract layering and PCRD optimization concepts are It is easy to see that any set of truncation points, , which
missing from IW44, which also has a less efficient embedded minimizes
representation for each code-block. The DjVu specification is
available at http://djvu.research.att.com/djvu/sci/djvuspec. (3)
As a result of this rich set of features, modest implementa-
tion complexity, and excellent compression performance, the for some is optimal in the sense that the distortion cannot be
EBCOT algorithm was adopted for inclusion in the evolving reduced without also increasing the overall rate and vice-versa.
JPEG2000 image compression standard at the Los Angeles Thus, if we can find a value of such that the truncation points
international meeting of ISO/IEC JTC1/SC29/WG1 (JPEG which minimize (3) yield , then this set of trunca-
working group) in November 1998. Most features of the algo- tion points must be an optimal solution to our R-D optimization
rithm were initially described in [18] and later in [19] as part of problem. Since we have only a discrete set of truncation points,
this standardization effort, but the work has not previously been we will not generally be able to find a value of for which
published in the public arena. Since its original acceptance for is exactly equal to . Nevertheless, since EBCOT’s
JPEG2000, the algorithm has undergone several modifications
code-blocks are relatively small and there are many truncation
to further reduce implementation complexity [6]; these are
points, it is sufficient in practice to find the smallest value of
outlined in Section VIII for the benefit of readers who are
such that .
interested in the relationship between EBCOT and JPEG2000.
The determination of the optimal truncation points, , for
any given , may be performed very efficiently, based on a small
II. RATE DISTORTION OPTIMIZATION amount of summary information collected during the genera-
Recall that EBCOT partitions the subbands representing the tion of each code-block’s embedded bit-stream. It is clear that
image into a collection of relatively small code-blocks, , we have a separate minimization problem for each code-block,
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1161

. A simple algorithm to find the truncation point, , which


minimizes , is as follows:
• initialize ; Fig. 4. Deadzone quantizer thresholds.
• for
set and ; explicit sub-block significance coding is that significant sam-
if then update . ples tend to be clustered so that the opportunity frequently ex-
Since this algorithm must be executed for many different ists to dispose of a large number of samples by coding a single
values of , we first find the subset, , of feasible truncation binary symbol. This is the same assumption which underlies
points. Let be an enumeration of these quad-tree and zero-tree coding algorithms as in [14], [13]. In our
feasible truncation points and let the corresponding distor- case, however, we exploit the block-based clustering assump-
tion-rate “slopes” be given by where tion only down to relatively large sub-blocks of size 16 16,
and . Evidently, rather than individual samples.
the slopes must be strictly decreasing, for if
then the truncation point, , could never be selected by the A. Quantization and Significance
above algorithm, regardless of the value of , contradicting
Following the notation of Section II, let de-
the fact that is the set of feasible truncation points. When
note the 2-D sequence of subband samples belonging to code-
restricted to a set of truncation points whose slopes are strictly
block . For the LH (vertically high-pass) subband, as well as
decreasing, the above algorithm reduces to the trivial selection
the HH and LL subbands, and denote horizontal and ver-
so that each such point must
tical position, respectively. For the HL (horizontally high-pass)
be a valid candidate for some value of . It follows that is
subband, however, we first transpose the code-block so that
the largest set of truncation points for which the corresponding
and correspond to vertical and horizontal position, respec-
distortion-rate slopes are strictly decreasing. This unique set
tively, in the original image. This transposition allows us to
may be determined using a conventional convex hull analysis.
treat subbands with LH and HL orientation in exactly the same
In a typical implementation of the EBCOT algorithm, , is
way, thereby simplifying the ensuing description. Let
determined immediately after the bit-stream for has been
denote the sign of and let denote the quan-
generated. The rates, and slopes, , for each ,
tized magnitude, i.e.,
are kept in a compact form along with the embedded bit-stream
until all code-blocks have been compressed, at which point the
search for the optimal and proceeds in a straightforward
manner. It is worth emphasizing that only rate and slope values
must be stored, not the distortion. This requires only a fraction where is the step-size for subband and is the subband to
of the storage for the embedded bit-stream itself. which code-block belongs. Fig. 4 depicts the thresholds for
this so-called “deadzone” quantizer. Evidently the quantizer has
uniformly spaced thresholds, except in the interval containing 0,
III. BLOCK CODING
which is twice as large.
In this section, we describe the actual block coding algo- Let denote the th bit in the binary representation of
rithm, which generates a separate embedded bit-stream for each , where corresponds to the least significant bit. Also,
code-block, . The algorithm relies upon the use of classical let denote the maximum value of (i.e. the most signif-
context adaptive arithmetic coding to efficiently represent a col- icant bit) such that for at least one sample in the
lection of binary symbols. The coder is essentially a bit-plane code-block. It turns out to be most efficient to encode the value
coder, using similar techniques to those of the LZC algorithm of in the second tier coding algorithm, as described in Sec-
[16]. The key enhancements are: 1) the use of “fractional bit- tion IV. The idea behind bit-plane coding is to encode first the
planes,” in which the quantization symbols for any given bit- most significant bits, , for all samples in the code-block,
plane are separated into multiple coding passes; 2) careful re- then the next most significant bits, , and so forth until
duction of the number of model contexts for arithmetic coding; all bit-planes have been encoded. If the bit-stream is truncated
and 3) the code-block is further partitioned into “sub-blocks,” then some or all of the samples in the block may be missing one
with the significance of each sub-block coded explicitly prior or more least significant bits, which is equivalent to having used
to sample-by-sample coding in the significant sub-blocks. The a coarser dead-zone quantizer for the relevant samples, with step
use of fractional bit-planes is motivated by separate work by size , where is the index of the last available bit-plane for
Ordentlich et al. [10] and by Li and Lei [4]; its purpose is to the relevant sample.
ensure a sufficiently fine embedding. Another variation on the In order to efficiently encode , it is important to ex-
fractional bit-plane concept was introduced into an early incar- ploit previously encoded information about the same sample
nation of the JPEG2000 verification model [15]. The introduc- and neighboring samples. We do this primarily by means of
tion of sub-blocks, with explicit coding of whether or not each a binary-valued state variable, , which is initialized to 0,
sub-block contains at least one significant sample in the relevant but transitions to 1 when the relevant sample’s first nonzero
bit-plane, is a useful tool for reducing the model adaptation cost bit-plane, , is encoded. We refer to the state, ,
as well as implementation complexity. The assumption behind as the sample’s “significance.” The point at which a sample
1162 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

becomes significant depends intimately upon the sequence in TABLE I


which sample values are encoded, which is the subject of Sec- ASSIGNMENT OF THE NINE ZC CONTEXTS BASED ON NEIGHBOURHOOD
SIGNIFICANCE
tion III-D.

B. Sub-Block Significance Coding Front-End


Each block, , is partitioned into a 2-D sequence of sub-
blocks, , whose size is typically 16 16. For each bit-
plane, , we first encode information to identify
those sub-blocks which contain one or more significant sam-
ples; all other sub-blocks are by-passed in the remaining coding
phases for that bit-plane, which reduces complexity as well as
the cost of adapting probability models to highly skewed statis-
tics. Let denote the significance of sub-block ,
in bit-plane . That is, if and only if eight neighbors. In fact, almost all the relevant information ap-
for some ; otherwise, . There are a va- pears to be captured by the significance of these neighbors,
riety of ways to encode the values, , in each successive which we group into the following three categories:
bit-plane, of which one of the most obvious is quad-tree coding. horizontal: we write so that
In brief, we introduce a tree structure by identifying ;
the sub-blocks with the leaf nodes, i.e. , vertical: we write so that
and defining higher levels in the tree according to ;
, . At the diagonal: we write
root of the tree, we have , representing the so that .
entire code-block. In any given bit-plane, , the significance of Neighbors which lie outside the code-block are interpreted as in-
the quads, , is identified one level at a time, starting significant, so as to ensure that the block bit-streams are truly in-
from the root at and working to the leaves at . In our dependent. No such assumption is imposed on neighbors which
implementation, these binary significance symbols are sent to lie outside the relevant sub-block, however, so that sub-blocks
the arithmetic coding engine as uniformly distributed symbols are by no means independent.
without any adaptive model whatsoever; however redundant To minimize both model adaptation cost and implementa-
symbols are always skipped. A symbol, , is redundant tion complexity, we quantize the 256 possible neighborhood
if any of the following conditions holds: 1) the parent quad, configurations to nine distinct coding contexts, with the labels
if any, is insignificant, i.e., ; indicated in Table I. The context assignment for the LH and
2) the quad was significant in the previous bit-plane, i.e., HL bands is identical, because the HL (horizontally high-pass)
; or 3) this is the last quad visited amongst subband’s code-blocks are transposed, as explained in Sec-
those which share the same, significant parent and the other tion III-A. The LH (vertically high-pass) subband responds
siblings are all insignificant. most strongly to horizontal edges in the original image, so
we expect strong correlation amongst horizontally adjacent
C. Bit-Plane Coding Primitives samples; this explains the emphasis on horizontal neighbors in
The purpose of this section is to describe the four different the first three rows of the table.
primitive coding operations which form the foundation of the 2) Run-Length Coding (RLC): The RLC primitive is used
embedded block coding strategy. The primitives are used to code to reduce the average number of binary symbols which must be
processed by the arithmetic coding engine. It is invoked in place
new information for a single sample in some bit-plane, . If the
of the ZC primitive when a horizontal run of insignificant sam-
sample is not yet significant, i.e. , a combination of the
ples is encountered whose immediate neighbors are also all in-
“zero coding” (ZC) and “run-length coding” (RLC) primitives
significant. Specifically, each of the following conditions must
is used to code whether or not the symbol becomes significant
hold: 1) four consecutive samples must all be insignificant, i.e.,
in the current bit-plane; if so, the “sign coding” (SC) primitive , for ; 2) the samples must have
must also be invoked to identify the sign, . If the sample is insignificant neighbors, i.e.,
already significant, the “magnitude refinement” (MR) primitive ; 3) the samples must reside within the same
is used to encode the value of . In every case, a single bi- sub-block; and 4) the horizontal index of the first sample, ,
nary-valued symbol must be coded using a common arithmetic must be even. The last two conditions are enforced only to facil-
coding engine. The probability models used by the arithmetic itate efficient implementations of the symbol grouping scheme.
coder evolve within 18 different contexts: nine for the ZC prim- When a group of four samples satisfies the above conditions,
itive; one for the RLC primitive; five for the ZC primitive; and a single symbol is encoded to identify whether any sample in
three for the MR primitive. the group is significant in the current bit-plane. A separate con-
1) Zero Coding (ZC): The objective here is to code , text is used to model the distribution of this symbol. If any of
given that . Empirical evidence suggests that the the four samples becomes significant, i.e., ,
sample statistics are approximately Markov: the significance of the zero-based index, , of the first such sample is sent as a
sample depends only upon the values of its immediate two-bit quantity with a nonadaptive, uniform probability model.
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1163

Fig. 5. Typical vertical spectra traced from the image domain to the vertically high-pass subbands.

This is reasonable subject to the assumption that the proba- TABLE II


bility of any sample from the group becoming significant is very ASSIGNMENT OF THE FIVE SC CONTEXTS BASED ON THE SIGNS OF
SIGNIFICANT HORIZONTAL AND VERTICAL NEIGHBORS
small; in this case, the conditional distribution of the run-length,
, given the significance of at least one sample in
the group, is approximately uniform. Empirically, we observe
that the introduction of the RLC primitive tends to slightly im-
prove compression efficiency; its main role, however, is to sub-
stantially reduce the number of symbols which must be coded
and hence implementation complexity.
3) Sign Coding (SC): The SC primitive is used at most once
for each sample, immediately after a previously insignificant
sample is first found to be significant during a ZC or RLC opera-
tion. It turns out that the sign bits, , from adjacent samples,
exhibit substantial statistical dependencies which can be effec-
tively exploited to improve coding efficiency. To understand We further reduce this to five contexts by not distinguishing the
this, consider the LH (vertically high-pass) subbands. We claim case in which opposite neighbors are both significant with the
that horizontally adjacent samples from LH subbands tend to same sign. Let equal 0 if both horizontal neighbors are
insignificant or both are significant with different signs, 1 if at
have the same sign, whereas vertically adjacent samples tend to
least one of the horizontal neighbors is positive and 1 if at
have opposite signs. Equivalently, the LH subband samples have
least one of the horizontal neighbors is negative. Define
predominantly low-pass horizontal power spectra and high-pass
in similar fashion for the vertical neighbors. Table II identifies
vertical power spectra. In the horizontal direction, this is entirely
reasonable, since images typically have low-pass spectra which the five unique probability contexts formed by and ,
along with the sign prediction, , which is used to exploit the
are preserved by the horizontal low-pass filtering and decima-
second type of symmetry mentioned above. The binary valued
tion operations used to generate the LH subbands.
symbol which is coded with respect to the relevant context is
In the vertical direction, the aliasing introduced by the
. In view of the transposition of code-blocks from
high-pass filtering and decimation operations leads to the oppo-
the HL subbands, the same strategy may be applied to the LH
site conclusion. Fig. 5 illustrates the effect of these operations
and HL subbands; for convenience, it is extended without mod-
on the vertical spectrum of a typical image. Again, images
ification to the less important LL and HH subbands as well.
typically have low-pass spectra; even sharp horizontal edges
4) Magnitude Refinement (MR): The objective here is to
yield spectra whose amplitude decreases in inverse proportion
code the value of , given that . Experience
to the vertical frequency. After high-pass filtering, then, the
shows that the conditional distribution of is only weakly
vertical spectrum typically exhibits more energy at the lower
dependent on the magnitude, , represented by the
end of the pass band. Finally, the aliasing associated with ver-
previously encoded bit-planes and also only weakly dependent
tical decimation reverses this trend, so that the actual subband
on the magnitude of neigbouring samples. For this reason, we
samples are primarily high-pass in the vertical direction, which
use only three model contexts for magnitude refinement. It is
substantiates our claim.
convenient to introduce a second state variable, , which
To exploit this redundancy in the sign information, we use five
transitions from 0 to 1 after the the MR primitive is first applied
model contexts for coding , according to the available in-
to . The magnitude refinement context depends upon
formation concerning the signs of the immediate horizontal and
the value of and also on whether or not any immediate
vertical neighbors. Since there are four such neighbors, each of
horizontal or vertical neighbors are significant. Specifically,
which may be insignificant, positive or negative, there would ap-
is coded with context 0 if ,
pear to be unique neighborhood configurations. How-
with context 1 if and , and with
ever, two inherent symmetry properties dramatically reduce this
context 2 if .
number: there is no reason not to expect horizontal and vertical
symmetry amongst the neighborhoods; and the conditional dis-
D. Fractional Bit-Planes and Scanning Order
tribution of given any particular neighborhood should be
identical to the conditional distribution of , given the dual For each bit-plane, , the coding proceeds in a number of dis-
neighborhood with the signs of all neighbors reversed. Taking tinct passes, which we identify as “fractional bit-planes.” In this
these symmetries into account, it is not difficult to show that work we consider a total of four such passes, , , and
the number of unique conditional distributions is at most 13. and we identify the truncation available points with these
1164 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

end of each coding pass of a conventional bit-plane coder. These


points lie on the dashed rate-distortion curve because discarding
the last bit-planes from the bit-stream is equivalent to multi-
plying the quantization step size by ; neither the distortion
nor the information content and hence entropy are affected by
whether we discard bit-planes or scale the step size. The solid
lines in Fig. 6(a) illustrate the rate-distortion curve which one
could expect to obtain by truncating the bit-stream produced
by a conventional bit-plane coder to an arbitrary bit-rate. Sup-
pose and are the rate-distortion pairs cor-
responding to two adjacent bit-planes, and . If we trun-
cate to some arbitrary bit-rate, , so that a
(a) fraction, , of the samples is refined to bit-plane and the re-
mainder are available only at bit-plane , then we expect to
find and ,
because there is no reason to suppose that the initial samples
coded in each bit-plane pass exhibit different statistics to later
samples. Consequently, the solid lines in Fig. 6(a) are simply the
convex interpolation of the R-D points corresponding to whole
bit-planes. This is necessarily sub-optimal with respect to the
convex dashed rate-distortion curve.
On the other hand, if we separate the code-block samples into
smaller subsets with different statistics, then it is possible to im-
prove upon this behavior by coding the next bit-plane one subset
at a time, starting with the subsets which are expected to offer
the largest reduction in distortion for each extra bit in the code
(b)
length. This de-interleaving of the samples into subsets with dis-
tinct statistics is the goal of fractional bit-plane coding. In the
present work, there are four subsets corresponding to the four
coding passes and the end of the fourth coding pass, , marks
the point at which all samples have been updated to bit-plane .
Thus, the solid dots in Fig. 6(a) and (b) may be associated with
this coding pass. Since the initial coding passes generally have
steeper rate-distortion slopes, the end points of each coding pass
lie below the convex interpolation of the bit-plane termination
points, as indicated in the figure.
The rate-distortion points corresponding to the various
fractional bit-plane coding passes can even lie below the dashed
line in Fig. 6(a). Fig. 6(c) provides empirical evidence for this.
(c)
Recall from Section II that the candidate truncation points for
a given embedded bit-stream are those which lie on the convex
Fig. 6. Rate-distortion properties of (a) regular bit-plane coding and (b)
fractional bit-plane coding. (c) shows the percentage of code-block bit-planes
hull of the rate-distortion curve described by all available
in which each of the four EBCOT coding passes yields a point on the convex truncation points. Fig. 6(c) clearly shows that each of the four
hull of the rate-distortion curve. Data are obtained by averaging results at 1 bpp coding passes frequently generates a point on this convex hull;
from the three most popular images in the JPEG2000 test set: “bike,” “cafe,”
and “woman,” each of which measures 2560 2048. 2 moreover, the rate-distortion points corresponding to fully
coded bit-planes at the end of coding pass , do not always
lie on the convex hull, so that other passes occasionally yield
coding passes, so that is the number of leading bytes from superior rate-distortion performance to that which is achievable
the arithmetic code word, which are required to uniquely decode by coding all samples with a fixed quantization step size.
the symbols in the first fractional bit-plane coding passes. The Fig. 7 provides a helpful illustration of the appearance of in-
reason for introducing multiple coding passes is to ensure that formation within the embedded bit-stream generated for each
each code-block has a finely embedded bit-stream. code-block. Here, denotes the quad-tree code which iden-
Fig. 6 is helpful in understanding the goals and benefits of tifies which sub-blocks are significant in bit-plane . Notice
fractional bit-plane coding. The dashed line in Fig. 6(a) identi- that appears immediately before the final coding pass, ,
fies the rate-distortion curve described by modulating the quan- not the initial coding pass, , for the bit-plane. This means
tization step size and decoding all bit-planes. It is important to that sub-blocks which become significant for the first time in
note that this curve is almost invariably convex. The solid dots bit-plane , are ignored until pass . We now define the roles
in Fig. 6(a) identify the rate and distortion associated with the played by each coding pass.
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1165

Fig. 7. Appearance of coding passes and quad-tree codes in each block’s embedded bit-stream.

1) “Forward Significance Propagation Pass,” : In this , from code-block . As illustrated in Fig. 2, some
pass, we visit the sub-block samples in scan-line order, skip- code-blocks might contribute no bytes at all to some layers.
ping over all samples which are either insignificant or do not Along with these incremental contributions, the length of the
have a so-called “preferred neighborhood.” For the LH and HL segment, , and the number of new coding passes,
subbands, sample is said to have a preferred neighbor- , in the segment must be explicitly identified. Finally,
hood if at least one of its horizontal neighbors is significant, if makes its first nonempty contribution to quality layer
i.e., . Recall that HL subband code-blocks are then the most significant bit-plane, , must also be identi-
transposed so that both LH and HL subbands tend to contain fied, as mentioned in Section III-A.
horizontal line segments. The LL subband is treated in the same We focus our description on the two quantities which exhibit
way for convenience, while the HH subbands’ samples are said substantial inter-block redundancy and show how this redun-
to have a preferred neighborhood if one or more of the four di- dancy is exploited within the second tier coding engine; full de-
agonal neighbors is significant, i.e., . To tails may be found in [19], [1]. These two quantities are
each such sample, we apply the ZC and RLC primitives, as ap- and the index, , of the quality layer to which first makes a
propriate, to identify whether or not the sample first becomes nonempty contribution. The latter quantity, , is encoded using
significant in bit-plane ; if so, we invoke the SC primitive to a separate embedded quad-tree code within each subband as
code its sign. We call this the “forward significance propagation follows. Let denote the sequence of quads at level in the
pass” because samples which have been found to be significant quad-tree, with denoting the leaves and the root
typically serve as seeds for waves of new significance determi- of the tree, representing the entire subband. Let be the index
nation steps which propagate in the direction of the scan. of the first layer in which any code-block in quad makes a
2) “Reverse Significance Propagation Pass,” : This nonempty contribution, i.e. . In each
coding pass is identical to , except that the samples are quality layer, , a binary quad-tree code is used to identify
visited in the reverse order and the notion of a preferred whether or not . That is, a single bit is used to identify
neighborhood is expanded to encompass samples for which at whether or not for each quad at each level, , in the
least one of the eight immediate neighbors has already been tree, skipping over quads for which the value of this bit may be
found to be significant. Of course, we skip over samples for inferred by the decoder, for one of the following two reasons:
which information was coded in the previous pass. 1) , in which case the value of has already been
3) “Magnitude Refinement Pass,” : During this pass we identified in a previous quality layer; or 2) where
skip over all samples except those which are already significant, belongs to the parent quad, , in which case we must have
i.e. , and for which no information has been coded in . To see how this code exploits inter-block redundancy,
the previous two passes. These samples are processed with the consider an initial set of lowest quality layers which correspond
MR primitive. to very low bit-rates; in this case, it is reasonable to suppose that
4) “Normalization Pass,” : Here we code the least sig- none of the code-blocks from the highest frequency subbands
nificant bit, , of all samples not considered in the previous have sufficiently steep distortion-rate slopes to make any contri-
three coding passes, using the SC and RLC primitives as appro- bution to these layers. The quad-tree code for each such subband
priate; if a sample is found to be significant in this process, its consists of a single 0 bit for each of these empty layers. More
sign is coded immediately using the SC primitive. generally, the distortion-rate slopes for individual code-blocks
depend upon local image statistics; if these statistics vary slowly
over the image then neighboring code-blocks should have sim-
IV. LAYER FORMATION AND REPRESENTATION ilar or identical values.
The other quantity which exhibits substantial inter-block
In this section we consider the second tier coding engine of redundancy is . One might consider using a similar em-
Fig. 3, which is responsible for efficiently identifying the con- bedded quad-tree code to represent this quantity. However, the
tribution of each code-block to each bit-stream layer, along with value of is irrelevant until the quality layer in which
other summary information for the code-blocks. Recall that the the code-block first makes a contribution to the bit-stream
final bit-stream is composed of a collection of quality layers, and the code-blocks in any given subband do not generally
. Together, layers through contain the initial all make their first contribution in the same quality layer. The
bytes of each code-block, . Here, we write for the trun- embedding principle suggests that we should avoid sending
cation point selected for the th quality layer, with some abuse any unnecessary information concerning until layer
of the notation established in Section II where denotes the . EBCOT achieves this, while still exploiting inter-block
R-D optimal truncation point corresponding to a threshold of redundancy in coding the values, by means of a mod-
on the distortion-rate slope. Thus, is short-hand for with ified embedded quad-tree, which is driven from the leaves
denoting the distortion-rate threshold selected for layer . rather than the root of the tree. Specifically, let denote
Layer contains the incremental contribution, the elements of the quad-tree structure built on top of the
1166 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

code-blocks, , from any given subband, exactly as described TABLE III


above; define . Also, let PSNR RESULTS, MEASURED IN dB, FOR VARIOUS IMAGES AND BIT-RATES
denote the ancestry of quads from which is descended,
so that . Let be
a value which is guaranteed to be larger than for any
code-block, , in the relevant subband. When code-block
first contributes to the bit-stream in quality layer , we code
the value of using the following algorithm.
• For
Send binary digits to identify whether or not
for , skipping all redundant bits. If
then stop.
The redundant bits mentioned above are those corresponding
to conditions which can be inferred either from
previously coded conditions in the same partial quad-tree scan,
i.e. if , or from the partial quad-tree code which
was used to identify for a different code-block, .
In this way, we delay sending information for any condi-
tion, , which is not relevant to the code-blocks
which are contributing for the first time to the quality layer at
hand. As one might expect, efficient implementation strategies
exist for this leaf-driven embedded quad-tree coding algorithm.
At first glance, our policy of exploiting inter-block redundancy
through a second tier coding engine would appear to interfere
with the random access property mentioned in Section I, since
code-blocks are no longer strictly independent. However, the
second tier coding engine operates only on summary informa-
tion for whole code-blocks, rather than individual samples, so
that the second tier decoding process is best viewed as a some-
what elaborated parser for recovering pointers to code-block
segments in the bit-stream.

V. NUMERICAL RESULTS
Table III provides numerical results to illustrate the perfor- the target bit-rates in the table; this may be sufficient for some
mance of the proposed EBCOT algorithm under a variety of applications. The fourth column corresponds to the extreme case
conditions. Results are presented for the well-known USC im- in which 50 separate layers are included in the bit-stream span-
ages, “Lenna” and “Barbara,” as well as the three most pop- ning bit-rates ranging from approximately 0.05 bpp to 2.0 bpp;
ular test images from the JPEG2000 test suite, “bike,” “cafe,” in this case, the layer bit-rates are spaced approximately loga-
and “woman,” which are substantially more complex and less rithmically through this range by selecting an appropriate set of
blurred than the USC images. The first column of PSNR re- distortion-rate slope parameters, , but no rate-control iteration
sults corresponds to the well known SPIHT [13] algorithm with is performed to adjust the values for specific target bit-rates.
arithmetic coding. The remaining columns are obtained with the As might be expected, performance decreases as more layers
EBCOT algorithm, running within the framework of JPEG2000 are added to the bit-stream, because the overhead associated
Verification Model VM3A [20]. In all cases, we use the popular with identifying the contributions of each code-block to each
Daubechies 9/7 bi-orthogonal wavelet filters with a five level layer grows. Nevertheless, performance continues to be com-
transform. For EBCOT we use code-blocks of size 64 64 with petitive with respect to state-of-the-art compression algorithms,
sub-blocks of size 16 16. significantly outperforming the common reference, SPIHT. All
Recall that the EBCOT bit-stream is composed of a collec- results for the EBCOT algorithm are obtained using a single
tion of quality layers and that SNR scalability is obtained by quantization step size, regardless of the image or bit-rate, with
discarding unwanted layers. The second column in the table cor- rate control implemented exclusively through truncation of the
responds to a bit-stream with only one layer, so that the overall embedded block bit-streams. For the smaller images, at very low
bit-stream is not SNR scalable. Results in this case are obtained bit-rates the EBCOT results are slightly penalized by the 59 byte
by generating a separate compressed bit-stream for each of the header included by the JPEG2000 VM3A software [20]. Some
relevant bit-rates. Each of the remaining columns are obtained of the results appear to be counter-intuitive. Specifically, at 0.25
by truncating a single bit-stream to the relevant bit-rates. The bpp, the performance for “Lenna” with five layers appears to be
third column corresponds to a limited form of SNR scalability marginally higher than that with only one layer, even though the
in which there are only five quality layers, optimized for each of overhead associated with signaling five layers is undoubtedly
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1167

higher. Similar behavior is observed with the “Barbara” image. The proposed visual masking strength operator, , has the
The explanation for this lies with the observation in Section II form
that it is not generally possible to find a distortion-rate threshold,
, which satisfies the rate constraint exactly. As a result, the por-
tion of the bit-stream which is actually decoded in SNR progres- (5)
sive tests sometimes includes part of the next quality layer after
that which was optimized for the target bit-rate, as in Fig. 2.
Whereas the first four columns of PSNR results in Table III Here denotes a neighborhood of samples about and
are obtained using a five-level Mallat decomposition of the form denotes the size of this neighborhood. The nonlinear
shown in Fig. 1(a), the fifth column corresponds to a five-level operation is to be understood within the context of normalized
decomposition of the form shown in Fig. 1(b), using five tar- image samples with a range of 0 to 1 and a normalized wavelet
geted bit-stream layers as for the second column. This has been transform whose low-pass and high-pass analysis filters have
coined the “Spacl” decomposition within the JPEG2000 com- unit gain at DC and the Nyquist frequency, respectively, so that
munity. Evidently, this decomposition structure typically leads . For our experiments, the exponent is set to ,
to lower MSE distortion (higher PSNR) at all but the highest and the neighborhoods, , are obtained by partitioning the
bit-rates. We point out that tree-based coders such as SPIHT code-block into 8 8 cells, using the same masking strength
cannot readily be adapted to non-Mallat decomposition struc- value for all samples in any given cell. This reduces the com-
tures. plexity of computing the visual distortion metric to a small frac-
tion of that for the entire encoder. Our formulation is closely re-
lated to the models used in [22] and [3] in which and
VI. VISUAL DISTORTION METRICS AND PERFORMANCE
, respectively; in our experiments, appears to give
The numerical results in Table III are obtained with MSE as superior visual performance. We use a single small visibility
the distortion metric for the PCRD optimization algorithm of floor, , of for all subbands, so that the distortion metric
Section II. It is well known that MSE is a poor model for the is rendered independent of any assumptions on the viewing dis-
visual significance of distortion. Various authors (e.g., [7], [8]) tance, which is highly desirable for uncontrolled viewing con-
have considered the relatively straightforward extension to fre- ditions.
quency weighted MSE, in which the overall image distortion is Not surprisingly, this visual masking metric has a greater
taken to be a weighted sum of the MSE contributions from each effect on image quality when the code-blocks are small; we find
subband, with the weights derived from studies of the contrast the best performance over a wide range of images when using
sensitivity function (CSF). These approaches have two notable 32 32 code-blocks. Fig. 8 provides a comparison of SPIHT
drawbacks: the weights depend strongly upon the angle sub- [13] with arithmetic coding against EBCOT, operating in the
tended by each reconstructed pixel at an assumed viewing dis- context of JPEG2000 VM3A with this visual distortion metric.
tance; and the model fails to account for the substantial impact When optimizing for MSE alone, the visual quality of EBCOT
of visual masking effects. In fact, the CSF accounts primarily for compressed images is very similar to that of SPIHT. Although
the modulation transfer function (MTF) of the physical proper- only small segments from the 2560 2048 image “woman”
ties of the optics and aperture of the cones in the human eye; the can be reproduced here, we note that quality is uniformly
MTF of the relevant display device is often also incorporated. improved over the entire image. The EBCOT images exhibit
More generally, we may consider spatially varying distortion substantially less ringing around edges and superior rendition
metrics which attempt to exploit the masking phenomenon. of texture; some details preserved in the EBCOT images are
Watson’s work [22] on visual optimization of JPEG com- completely lost by the SPIHT algorithm. In fact, for this image
pressed images is noteworthy in this regard, as is the work of we find that the image quality obtained using EBCOT at 0.2
Höntsch and Karam [3]. In these and other previous works, bpp is comparable to that obtained using SPIHT at 0.4 bpp.
visual masking effects must be taken into account by explicitly Similar results are observed with other large images having
modifying the quantization parameters; scalable compression is comparable content, although the method is less effective with
not considered; and rate-control must be performed iteratively. some image types. Although we make no assumptions here
The EBCOT algorithm provides an excellent context within concerning viewing distance, other studies [8] have shown that
which masking phenomena can be exploited without substan- the visual masking metric outlined here can be successfully
tially increasing computational complexity or sacrificing other combined with an appropriate global compensation for known
properties such as random access or scalability. In our studies, CSF characteristics, yielding complementary improvements in
the following distortion metric has been found to yield signifi- quality.
cantly superior visual image quality than the MSE metric
VII. UTILITY OF THE RANDOM ACCESS ATTRIBUTE
(4)
Since EBCOT partitions each subband into relatively small
code-blocks and codes each of them independently, it would
Here , and are all as in Section II, is the appear to be suited to applications requiring some degree of
“visual masking strength” at sample and is a “visibility “random access” into the image. At one extreme we may con-
floor” term which establishes the visual significance of distor- sider applications which intend to decompress the entire image,
tion in the absence of masking. but in a different order to that in which it was compressed,
1168 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

Fig. 8. Comparison between SPIHT (left) and EBCOT using the visual masking metric (right). Shows 500 2 500 regions from the 2560 2 2048 test image
“woman,” at 0.15 bpp.

e.g., from bottom to top or left to right. Since the code-blocks a server, which may or may not ultimately constitute the entire
can be decoded in any order and the wavelet transform may image. The efficient realization of such applications depends
be incrementally implemented in any direction, it is clear that upon careful caching of code-blocks which will often belong
EBCOT is well suited to such applications. At the opposite ex- to multiple regions.
treme we consider applications which require only a small, ar- For the purposes of this investigation, we assume that a square
bitrarily located region in the original image. In general, the image region with pixels must be decompressed from
number of subband samples represented by the code-blocks re- an EBCOT bit-stream representing an original image with very
quired to synthesize the requested image region will be larger much larger dimensions so that we can ignore boundary condi-
than the region itself. In this section we attempt to quantify this tions. Fig. 9 contains a log-log plot of vs. , where
inherent inefficiency. Remote browsing applications lie between is the number of bits required to reconstruct the requested region
these two extremes: a client interactively requests regions from and is the compressed bit-rate of the original image. Thus, the
TAUBMAN: HIGH PERFORMANCE SCALABLE IMAGE COMPRESSION WITH EBCOT 1169

include here a brief summary of these changes. Most of the


changes are described in [6].

A. Changes to Enhance Compression Efficiency


In this paper, the 18 probability models used by the condi-
tional arithmetic coder are initialized to the usual equi-probable
state. By contrast, in JPEG2000 some of the contexts are started
in an assumed highly skewed state to reduce the model adapta-
tion cost in typical images.

B. Changes to Reduce Complexity


A low complexity arithmetic coder, known as the MQ coder
Fig. 9. Random access efficiency for various regions of size 64 2 64 through [21], has replaced the more classical arithmetic coder used in
2
to 2048 2048. this paper. The MQ coder avoids multiplications and divisions
in a similar manner to the more widely known QM coder. The
JPEG2000 entropy coder does not transpose the HL subband’s
cost of recovering the requested region is equivalent to code-blocks, as described in Section III-A; instead, the corre-
the cost of recovering an image, separately compressed sponding entries in the ZC context assignment map are trans-
to the same bit-rate. In our case bpp. The curves in Fig. 9 posed. The JPEG2000 standard uses only three coding passes
are averages, assuming a uniform distribution for the location of per bit-plane instead of four; the forward and reverse signifi-
the requested region. The relevant subband samples and hence cance propagation passes have been merged into a single for-
code-blocks are computed as in [9], assuming Daubechies 9/7 ward significance propagation pass, whose preferred neighbor-
wavelet kernels with the usual Mallat decomposition. The trans- hood is identical to that of the reverse pass. This ensures that
mission cost is determined by taking into account the average all coding passes have a consistent scan direction, at a small ex-
bit-rate in each of the subbands, as determined from the statis- pense in compression efficiency.
tics of the three large images in Table III. Each sub-block is scanned column-by-column, rather than
The four curves in Fig. 9 correspond to four different block row-by-row, and sub-blocks have been reduced to size 4 4
size configurations: the curves labeled “F64” and “F32” corre- from the optimal size of 16 16 considered in this paper. With
spond to code-blocks with a fixed size of 64 64 and 32 these very small sub-blocks and highly skewed initialization of
32 in every subband; The curves labeled “H64” and “H32” uti- the probability models, we found that explicit coding of sub-
lize the “frames” mode in the JPEG2000 VM3A software [20] block significance, as in Section III-B, is no longer justified. The
to obtain code-blocks whose size depends upon the resolution coder behaves as though all sub-blocks are significant from the
level. In the “H64” case, code-blocks of size 64 64 are used outset so that the corresponding bit-stream entries, , in Fig. 7
in the highest resolution subbands, while blocks of size 32 32 are all empty. With this modification, the coder is now more
are used in the next lower resolution level and so forth. We start easily understood as operating on stripes of four rows each. Nev-
with 32 32 code-blocks in the “H32” case. Each of the curves ertheless, it appears that the most efficient software implemen-
in Fig. 9 is also labeled with the average loss in compression tations, such as that in JPEG2000 VM5, are those which exploit
performance relative to that reported in Table III. It would ap- properties of the MQ coder to realize the sub-block paradigm at
pear that the two most attractive configurations for applications an implementation level.
requiring this type of random access are those corresponding The cumulative effect of these modifications is an increase
to “F32” and “H64.” Both exhibit only relatively small losses of about 40% in software execution speed for the entropy
in compression efficiency for a significant gain in random ac- coding part of the system, with an average loss of about 0.15
cess efficiency. While “H64” offers superior random access ef- dB, relative to the results reported in Section V. Since the
ficiency, we note that it degenerates into the less desirable “H32” software implementation of the entropy decoder in VM5.1
case if the image is browsed at half the original resolution. [1] has been heavily optimized (short of actually resorting to
Evidently, random access granularity is relatively coarse: in assembly code), the timing results reported in Table IV help
the “H64” case, a region of size 256 256 requires the same to establish the complexity of the EBCOT coder. With small
number of bits as an image of size 400 400 compressed to images (e.g., 512 512), the JPEG2000 VM5.1 entropy coder
the same bit-rate. Nevertheless, the capability is attractive for has comparable execution speed to the official public domain
interactive browsing of very large images, where the requested implementation of SPIHT [13] without arithmetic coding;
regions might represent a significant portion of the client’s dis- this coder suffers about 0.5 dB loss relative to that reported
play. in Table III for SPIHT with arithmetic coding. For the larger
images of Table IV, SPIHT suffers from inherently nonlocal
VIII. RELATIONSHIP TO JPEG2000 memory accesses, and runs 6 to 30 times more slowly than
Since EBCOT was first adopted as the basis for the JPEG2000 JPEG2000 VM5.1, depending the bit-rate.
image compression standard, some modifications have been in-
troduced to the entropy coding part of the algorithm described C. Options in JPEG2000
in Section III. Since many readers are likely to have an interest JPEG2000 has an optional mode to enable parallel implemen-
in JPEG2000 and PART-I of the standard is now stable [1], we tation of the coding passes within any code-block. Although the
1170 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 7, JULY 2000

TABLE IV [3] I. Höntsch and L. Karam, “APIC: Adaptive perceptual image coding
CPU DECODING TIMES OBTAINED USING JPEG2000 VM5.1 WITH A 400 based on subband decomposition with locally adaptive perceptual
MHz PENTIUM II PROCESSOR. CPU TIMES EXPRESSED IN SECONDS PER weighting,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, 1997,
MILLION IMAGE PIXELS pp. 37–40.
[4] J. Li and S. Lei, “Rate-distortion optimized embedding,” in Proc. Picture
Coding Symp., Berlin, Germany, Sept. 10–12, 1997, pp. 201–206.
[5] S. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell.,
vol. 11, pp. 674–693, July 1989.
[6] Reduced Complexity Entropy Coding, ISO/IEC JTC1/SC29/WG1
N1312, June 1999.
[7] A. Mazzarri and R. Leonardi, “Perceptual embedded image coding using
wavelet transforms,” in Proc. IEEE Int. Conf. Image Processing, vol. 1,
1995, pp. 586–589.
[8] Quality Improvement Using Contrast Sensitivity Filtering, ISO/IEC
independent coding principle in EBCOT ensures that multiple JTC1/SC29/WG1 N1306, June 1999.
code-blocks may be processed in parallel, microscopic paral- [9] D. Nister and C. Christopoulos, “Lossless region of interest with a natu-
rally progressive still image coding algorithm,” in Proc. IEEE Int. Conf.
lelism at the level of individual coding passes can be exploited Image Processing, Oct. 1998, pp. 856–860.
for more efficient hardware implementations. To enable this [10] E. Ordentlich, M. Weinberger, and G. Seroussi, “A low-complexity
behavior, the arithmetic codeword generation process can be modeling approach for embeddded coding of wavelet coefficients,” in
Proc. IEEE Data Compression Conf., Snowbird, UT, Mar. 1998, pp.
reset at the commencement of each coding pass and the various 408–417.
coding contexts reset to their skewed initial states. The context [11] E. Ordentlich, D. Taubman, M. Weinberger, G. Seroussi, and M. Mar-
cellin, “Memory efficient scalable line-based image coding,” in Proc.
quantization process for the various coding primitives may also IEEE Data Compression Conf., Snowbird, UT, Mar. 1999, pp. 218–227.
be constrained to be vertically stripe causal, meaning that sam- [12] Line-Based Coding—On Merging VMA and VMB, ISO/IEC
ples from future stripes may be considered insignificant when JTC1/SC29/WG1 N1201, Mar. 1999.
[13] A. Said and W. Pearlman, “A new, fast and efficient image codec based
determining the coding contexts. These options typically result on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst.
in an additional loss of about 0.15 dB with 64 64 code-blocks Video Technol., vol. 6, pp. 243–250, June 1996.
[14] J. M. Shapiro, “An embedded hierarchical image coder using zerotrees
and about 0.35 dB with 32 32 code-blocks at modest bit-rates. of wavelet coefficients,” in IEEE Data Compression Conf., Snowbird,
Most of the ideas behind these parallel processing options are UT, 1993, pp. 214–223.
explained in [11], [12]. [15] F. Sheng, A. Bilgin, P. Sementilli, and M. Marcellin, “Lossy and loss-
less image compression using reversible integer wavelet transforms,” in
A so-called lazy coding option has also been introduced, Proc. IEEE Int. Conf. Image Processing, Oct. 1998.
in which the arithmetic coding procedure is completely by- [16] D. Taubman and A. Zakhor, “Multi-rate 3-D subband coding of video,”
IEEE Trans. Image Processing, vol. 3, pp. 572–588, Sept. 1994.
passed for most of the significance propagation and magnitude [17] D. Taubman, “Directionality and scalability in image and video com-
refinement coding passes. This mode substantially reduces pression,” Ph.D. dissertation, Univ. Calif., Berkeley, 1994.
complexity at high bit-rates, with the loss of less than 0.1 dB in [18] Embedded, Independent Block-Based Coding of Subband Data,
ISO/IEC JTC1/SC29/WG1 N871R, July 1998.
compression performance. [19] EBCOT: Embedded Block Coding with Optimized Truncation, ISO/IEC
JTC1/SC29/WG1 N1020R, Oct. 1998.
[20] JPEG2000 Verification Model VM3A, ISO/IEC JTC1/SC29/WG1
IX. CONCLUSIONS N1143, Feb. 1999.
[21] Proposal of the Arithmetic Coder for JPEG2000, ISO/IEC
The EBCOT image compression algorithm offers JTC1/SC29/WG1 N762, Mar. 1998.
state-of-the-art compression performance together with an [22] A. B. Watson, “DCT quantization matrices visually optimized for indi-
unprecedented set of bit-stream features, including resolution vidual images,” Proc. SPIE, vol. 1913, pp. 202–216, 1993.
scalability, SNR scalability and a “random access” capability.
All features can coexist-exist within a single bit-stream without
substantial sacrifices in compression efficiency. David Taubman (S’93–M’95) received the B.S. de-
The ability to produce independent, finely embedded gree in computer science and mathematics in 1986
and the B.E. degree in electrical engineering in 1988,
bit-streams, for relatively small blocks of subband samples both from the University of Sydney, Australia. He re-
enables the effective use of post-compression rate-distortion ceived the M.S. degree in 1992 and the Ph.D. degree
optimization. In turn, this enables the successful exploitation in 1994, both in electrical engineering, from the Uni-
versity of California, Berkeley.
of visual masking, which has been hampered by causality From 1994 to 1998, he was with Hewlett-Packard
constraints in more conventional compression frameworks. Research Laboratories, Palo Alto, CA, joining the
The EBCOT algorithm also introduces the concept of abstract University of New South Wales, Sydney, in 1998
as a Senior Lecturer in the School of Electrical
quality layers which are not directly related to the structural Engineering. Since 1998, he has been heavily involved in the working group
properties of the underlying entropy coder. This endows the ISO/IEC JTC1/SC29/WG1, developing the JPEG2000 image compression
bit-stream with tremendous flexibility, since the encoder may standard; he is the author of the Verification Model (VM3A) software and
decide to construct any number of layers from any combination accompanying documentation upon which the standard is based. His research
interests include highly scalable image and video compression technology,
of code-block contributions whatsoever. inverse problems in imaging, perceptual modeling and optimization, and
network distribution of compressed multimedia content.
Dr. Taubman received the University Medal from the University of Sydney in
REFERENCES 1988 for electrical engineering. In the same year, he also received the Institute
[1] JPEG2000 Verfication Model 5.0 (Technical Description), ISO/IEC of Engineers, Australia, Prize and the Texas Instruments Prize for Digital Signal
JTC1/SC29/WG1 N1420, 1999. Processing, Australasia. His 1996 paper with A. Zakhor, entitled “A common
[2] H. Everett, “Generalized Lagrange multiplier method for solving framework for rate and distortion based scaling of highly scalable compressed
problems of optimum allocation of resources,” Oper. Res., vol. 11, pp. video” was awarded Best Paper by the IEEE Circuits and Systems Society in
399–417, 1963. 1997.

You might also like