20.1 References For Image Compression
20.1 References For Image Compression
20.1 References For Image Compression
Image Compression
483
484 CHAPTER 20 IMAGE COMPRESSION
which is perfectly readable (at least to most of us for whom English is their first
language) but which requires only 75% of the letters in the complete text. The
moral of this example is that vowels are (usually) unnecessary to the meaning of
text messages. In general, vowels supply redundant information in the message and
often may be discarded without loss of meaning. In an imaging sense, we might say
that vowels supply low-frequency information (bulk properties of the words), while
consonants carry the high-frequency information (information about the “edges”).
Redundancies in text messages or images may be identified and eliminated using
different but analogous schemes. The data compression process often is called coding,
because the representation is altered and may require a reference source (codebook) to
interpret the message in terms of the original symbols. The name coding also implies a
relationship between the processes of message compression and message cryptography,
which are (in a sense) opposites. Compression is the process of removing nonessential
information from the message to reduce the quantity of data; cryptography is the
process of adding nonessential information to different messages to make them look
indistinguishable.
The principles of image compression require an understanding of the fundamental
concepts of information theory, which were laid down by Claude Shannon in the late
1940s, with later contributions by many authors, notably Norbert Wiener and A.I.
Khinchin. The technologies (hardware and software) were very hot topics in the
1980s because of limitations in storage space (as difficult as it may be to believe now,
a common size for a PC hard drive was 30 MBytes in the mid-late 1980s). The topic
heated up again in the late 1990s for internet applications because of limitations in
transmission bandwidth.
There are (at least) two classes of data redundancy: objective and subjective. Ob-
jectively redundant data can be removed without any loss of information. In other
words, deleted objectively redundant data may be recovered without loss. Subjec-
tively redundant data may be removed from a message or image without any “visible”
loss of information, though the original image cannot be recovered without error. By
20.3 IMAGE COMPRESSION 485
usually is not selected from a uniformly random distribution, but rather exhibits
some type of correlation with the gray values of other (usually neighboring) pixels.
An image or other message with redundant data may be compressed without loss of
information by removing some (or even all) of the redundancy. Given the ratio of
nonredundant data in a message or image to the total data, the redundancy R in the
message may be defined as:
n = n1 · n2 (1)
and:
1 1 1 1
p [X] = = p [Y ] · p [Z] = · = (2)
n n1 n2 (n1 · n2 )
Thus the information I about the outcome of experiment X should be a function f
that satisfies the requirement:
where c is some constant (which can be assumed to be unity for now) and b is the
base of the logarithm such that b > 1. In fact, Khinchin proved that the logarithm
is the only continuous function that satisfies the required properties for any finite
number n of possible outcomes (symbols). This definition of information satisfies the
additivity requirement for information, i.e.,
I [X] = logb [n] = logb [n1 · n2 ] = logb [n1 ] + logb [n2 ] = I [Y ] + I [Z]) (7)
The units of information are base-b digits, e.g., decimal digits 0 − 9 for b = 10 and
(bi)nary digi(ts), or bits. for b = 2.
488 CHAPTER 20 IMAGE COMPRESSION
If the coin has two (indistinguishable!) heads, then there is only one (“equally likely”)
outcome and pH = 1, pT = 0. The information content in a statement about the
outcome of a toss of a two-headed coin is:
pA = pB = pC = pD = 0.25,
where outcomes A, B, C are heads and outcome D is a tail. The quantity of informa-
tion in a single experiment with four equally likely outcomes is:
∙ ¸
1
I [X] = log2 [4] = − log2 = 2 bits per toss
4
However, since the probabilities of the two distinguishable outcomes are not equal,
there must be excess information in statements about the identical outcomes that
must be subtracted from the 2 bits. The excess information in the message about a
head is log2 [nH ] multiplied by the probability of a head pH . Similarly for a tail, the
excess information is pT log nT , where pT = nnT and n = nH + nT .
3 3
Excess Information for head = log2 [3] = · 1.585 = 1.189 bits
4 4
1 1
Excess Information for tail = log2 [1] = · 0 = 0 bits
4 4
20.3 IMAGE COMPRESSION 489
The general expression for information content in the case of the unfair coin with
probabilities pH and pT is:
I [X] = log2 [n] − (pH log2 [nH ] + pT log2 [nT ]) = (pH + pT ) log2 [n] − pH log2 [nH ] − pT log2 [nT ]
= −pH (log2 [nH ] − log2 [n]) − pT (log2 [nT ] − log2 [n])
hn i hn i
H T
= −pH log2 − pT log2
n n
= −pH log2 [pH ] − pT log2 [pT ] . (10)
P
M−1 X
M−1
I [X] = − (pi log2 [pi ]) , subject to the constraint pi = 1. (11)
i=0 i=0
Xµ
M−1
1
∙ ¸¶
1
µ
1
∙ ¸¶
1
I [X] = − log2 = −M · log2
i=0
M M M M
∙ ¸
1
= − log2 = + log2 [M] = − log2 [pi ] , (12)
M
X
M−1
I [X] = − pi log2 [pi ] (14)
i=0
For most realistic monochrome images digitized to 8 bits per pixel, the entropy is
in the range of 4-6 bits per pixel. To further investigate the meaning of Shannon’s
definition of information, consider that an image with N pixels and M gray levels can
be produced from N experiments with each having M possible outcomes for a total
of M N possible distinct images. In the binary image case, the gray values may be
produced by N coin flips. For a 1×1 image, N = 1 and the number of possible images
is 21 = 2 (0 and 1), and there are also two possible histograms (1,0) and (0,1). There
are 22 = 4 possible two-pixel binary images (11, 10, 01, and 00) and three possible
histograms (2,0), (1,1), and (0,2). Note that there are two images with histogram
(1,1). For N = 3 pixels, there are 23 = 8 distinct images (111, 110, 101, 011, 100,
010, 001, 000) and four possible histograms (3,0), (2,1), (1,2), and (0,3). There is one
image each with histogram (3,0) or (0,3), and three images each with histogram (2,1)
and (1,2). The progression of the number of possible binary images divided into a
number of distinct histograms specified by the set of binomial coefficients:
N!
N [N0 , N1 ] ≡
N0 ! · N1 !
where N is the number of pixels in the image, and N0 , N1 are the number of pixels
with level 0 and 1, respectively. Note that the constraint N0 + N1 = N must be
satisfied. The array of the binomial coefficients defines Pascal’s triangle as shown
below, where the row number represents the number of pixels in the image, the sum
of the numbers in the row is the number of possible images, the number of groups in
each row is the number of possible histograms, and the number in each group is the
number of images that has a particular histogram:
492 CHAPTER 20 IMAGE COMPRESSION
images increases as the histogram becomes “flatter”. Also recall that Shannon’s def-
inition of information is determined by the image histogram. Because there are more
distinct images with a flat histogram than with a clustered histogram, the “amount”
of information should be larger for an image with a flat histogram. An image with a
flat histogram indicates that there is maximum uncertainty and maximum informa-
tion content, and thus requires the most bits of data for complete specification.
Now consider the number of possible images with specific histograms for more
“reasonable” image formats. A common size for a monochrome digital image is N ×
N = 5122 pixels and M = 256 levels, so that the number of distinguishable images
is:
¡ ¢2,097,152
M N×N = 256(512 ) = 2(8·2 ·2 ) = 22,097,152 = 10log10 [2]
2 9 9
¡ ¢2,097,152
∼
= 100.30103 = 100.30103·2,097,152 ∼
= 10631,306.66
The size of this number may be gauged against the estimates that there are 1078
atoms and 1088 cubic millimeters in the universe. Of these MANY images, only a
single one has the histogram with all pixels clustered at gray level 0. There are 5122
possible images with one pixel at gray level 1 and the rest (262,143) at gray level zero.
The number of images with two pixels at gray level 1 and 262,142 at 0 is:
(262144)! ∼
N [262142, 2, 0, · · · , 0] = = 3.44 · 1010
(262142!) (2!) (0!) · · · (0!)
If we continue this progression and add more populated gray levels to “flatten” the
histogram, the number of distinguishable images increases very rapidly. Such an
image with a perfectly flat histogram has 5122 /256 = 1024 pixels in each level; the
number of such images is:
(5122 )! ∼ 630,821
N [10240 , 10241 , · · · , 1024255 ] = = 10 .
(1024!)256
The approximation was derived via Stirling’s formula for N! where N is large:
√
lim [N!] ∼
= 2πN · N N · e−N
N→∞
10! ∼
= 3.5987 × 106
whereas the actual result is 10! = 3.6288 × 106 , so the error is only about a factor of
10−3
If 254 levels are populated with 1028 pixels each and one level has 1032 pixels (a
“slightly clustered” histogram), the number of images is:
! ∼
N [10280 , 10281 , 10282 , · · · , 1028253 , 1032254 , 0255 ] = 5122 254 = 10630,377 ,
(1028!) · 1032! · 0!
494 CHAPTER 20 IMAGE COMPRESSION
which is smaller than the number of images with a flat histogram (by a factor of
10444 , which is still pretty large!). Note that the number of possible images again is
maximized when the histogram is flat; the uncertainty, the information content, and
thus the number of bits to specify the image are maximized when the histogram is
flat.
∂I ∂I ∂I ∂I
dI = dp0 + dp1 + dp2 + · · · + dpM−1 = 0.
∂p0 ∂p1 ∂p2 ∂p M−1
subject to the constraint that the probabilities sum to a constant:
ÃM−1 !
X
d pi = 0
i=0
X
M−1 X
M−1 X
M−1 X
M−1
L [f ] = I [f ] − λ pi = − pi logb [pi ] − λpi = − pi [logb [pi ] + λ] .
i=0 i=0 i=0 i=0
X
M−1
∂L
dL = dpi = 0
i=0
∂p i
Because the differential probabilities pi are arbitrary, a necessary and sufficient con-
dition for dL to vanish is to have the component derivatives ∂L∂p
vanish individually:
i
∂L ∂p ∂
= · (logb [pi ] + λ) + pi · (logb [pi ] + λ)
∂p i ∂p i ∂p i
1
= (logb [pi ] + λ) + pi · = 1 + logb [pi ] + λ = 0, for all i
pi
logb [pi ] = − (1 + λ)
=⇒ pi = e−(1+λ) , where λ is constant.
Note that the probability pi for the occurence of the i th level is constant, thus the
probabilities of all outcomes must be equal when L [f ] (and hence I [f ])) is maximized.
The constraint allows us to calculate the numerical value of the Lagrangian multiplier:
X
M−1 X
M−1 µ ¶
−(1+λ) −(1+λ) 1
pi = 1 = b =M b = M pi = M p =⇒ p =
i=0 i=0
M
Based on the previous discussion, the number of 1282 4-bit images is:
2
Number = 24·128 = 265,536 ∼
= 1019,728
which is STILL many times larger than the estimated number of atoms in the universe!
20.4 LOSSLESS COMPRESSION 497
7 7 7 2 6 6 6 6 6 2 2 5 5 5 5 5 5 5 5 5
7 3 2 1 6 5 2 2 5 9
where the first digit in the encoded image is the gray value of the first pixel, the
second digit is the number of occurences, etc. The sequence is reduced from 20 3-bit
numbers (60 bits) to 10 digits (though the long run of “5” at the end requires a 4-bit
number to encode). This means that this sequence might be encoded into 5 3-bit
numbers and 5 4-bit numbers, for a total of 35 bits. If the system were limited to
three bits, the sequence of 9 examples of level “5” would be split into one sequence
of 7 and one of 2:
7 3 2 1 6 5 2 2 5 7 5 2
graduate student at MIT. Huffman developed the coding scheme as part of a final
assignment in a course on information theory given by Prof. Robert A. Fano. After
working unsuccessfully on the problem for months, the solution came to Huffman as
he was tossing his notebooks into the trash (related in Scientific American, September
1991, pp. 54-58). He presented his method to Fano, who “Is that all there is to it?”
Huffman’s coding scheme is now ubiquitous. (This story must be a metaphor for
something.) (Huffman, DA., Prod. IRE 40, 1098-1101, 1952).
The Huffman code assumes that the gray values are selected at random from some
known probability distribution. In other words, it assumes that the gray values of
adjacent pixels are unrelated (which is, of course, not true for meaningful pictorial
images, though perhaps true for images transformed to a different coordinate system).
A source of uncorrelated random numbers is called “memoryless”.
The entropy of the Huffman-coded image always is within 1 bit per pixel of the
information content defined by Shannon. The Huffman code removes bits from the
message by discarding objective redundancy. It is lossless and unambiguous. This
first quality means that the original image may be reconstructed without error from
knowledge of the coded image and the code book. The quality of no ambiguity ensures
that only one set of gray levels may be decoded from an ungarbled string of binary
digits encoded by the Huffman procedure.
The Huffman code is perhaps best described by example. The pixels in an image
with 8 gray levels are usually defined specified by a binary code that requires 3 bits per
pixel. For attempted clarity, the gray levels will be indicated by alphabetic characters:
Consider a 3-bit 100-pixel image with levels distributed as in the following his-
togram:
H [A, B, C, D, E, F, G, H] = [0, 9, 12, 40, 30, 5, 4, 0]
The probability of each gray level therefore is:
p [A] = 0, p [B] = .09, p [C] = 12, p [D] = .40, p [E] = 30, p [F ] = .05, p [G] = .04, p [H] = 0
20.4 LOSSLESS COMPRESSION 499
The information content in this image obtained from Shannon’s entropy formula:
which is considerably less than the 3-bit quantization. To derive the Huffman code,
arrange the occupied levels in descending order of probability. One bit is used to
distinguish the least likely pair of levels using either the convention that the binary
digit 1 is assigned to the more probable level and 0 to the less probable, or vice
versa. This is the least significant bit in the Huffman code, as it distinguishes the
most rarely occurring gray levels. The probabilities of the two rarest levels are then
summed (giving a total probability of 0.09 in this case) and the list of gray values
is rearranged in descending order. In this example, the sum of the probabilities
of the rarest levels F and G is equal to the probability of the next level B, and so
reordering is not required. The new pair of least-likely levels in the rearranged list are
distinguished by again assigning a binary digit using the same convention as before.
The last two probabilities are summed, the list is reordered, and the process continues
until bits are assigned to the last pair of probabilities. The last binary digit derived
using this procedure is the most significant bit. The schematic of the entire sequence
is shown in the figure:
Calculation of Huffman code for 3-bit image with probabilities as listed in the text.
The gray levels are arranged in descending order of probability. One bit is used to
distinguish the rarest levels, the probabilities are summed, the list is rearranged in
descending order, and the process continues.
The Huffman code for a specific gray level is the sequence of binary digits assigned
from right to left for that level, i.e., from most significant to least significant bit. The
code for level B is obtained by following the sequence from the right: the path for
level B is in the upper branch on the far right (code = 1), and the lower branch at
two more junctions (codes = 0 and 0) so that the code for level B is 10112 . The codes
for the other levels are found similarly and the resulting code book is listed above.
The number of bits to encode all pixels having each gray level may be calculated, and
their sum is the average number of bits per pixel required to encode the entire image:
500 CHAPTER 20 IMAGE COMPRESSION
The message is decoded by examining the order of bits. Since the first bit is not a
“0”, it cannot be the most common gray value “D”. Since the secon bit is not a “1”,
the first pixel cannot be an “E”. The third bit is “0”, and therefore the first pixel
must be “C”
100 00111110111101010110100100011100
Now repeat: the fourth bit is “0”, and only “D” has this first bit:
100 0 0111110111101010110100100011100
100 0 0 111110111101010110100100011100
The fourth pixel begins with “11”, and therefore is “E”:
100 0 0 11 1110111101010110100100011100
100 0 0 11 11 10111101010110100100011100
The sequence continues until all bits have been decoded:
20.4 LOSSLESS COMPRESSION 501
This example also may be used to illustrate the pitfall of Huffman coding. If a bit is
garbled (“flipped”), then it is likely that many (if not all) of the following characters
in the message will not be decodable, because the redundancy in the message has
been removed. Consider the example where the sixth bit is flipped from a “1” to a
“0”:
10000011110111101010110100100011100
In this case, the decoded message would be:
The “flipped” bit resulted in the incorrect decoding of the sample “B” as two samples
“DE”.
This code exceeds the theoretical limit of Shannon information by only 0.04
bits/pixel, and reduces the number of bits required to store the image to (2.17/3)
bits, or 72% of that required for the uncompressed original. The efficiency of the
Huffman code is defined as the ratio of the average number of bits per pixel for the
code to the theoretical limit determined by Shannon’s definition. In this case, the
compression efficiency is:
2.131
η= = 0.982
2.17
For real images, lossless compression via a Huffman code on a pixel-by-pixel basis
can achieve compression efficiencies in the range of 0.67 ≤ η ≤ 0.25, a modest im-
provement. Note that the code book must be available to the receiver to allow recon-
struction of the image. This extra data is known as the overhead of the compression
scheme and has not been included in the calculations of compression efficiency. The
length of the Huffman codeword of a gray level whose histogram probability is p is
− log2 [p], e.g., if the probability is 0.125, the ideal codeword length is 3 bits. If log2 [p]
is significantly different from an integer (e.g., p = 0.09 =⇒ − log2 [p] = 3.47 bits),
the coding efficiency will suffer.
A significant shortcoming of a Huffman code is that it cannot adapt to locally
varying image statistics, or equivalently, a particular Huffman code will not be opti-
mum for a variety of images. In fact, inappropriate application of a Huffman code
can actually lead to an increase in the storage requirements over the original, e.g., if
the image contains many levels which were rare in the original source image.
502 CHAPTER 20 IMAGE COMPRESSION
X M−1
M−1 X
I [fk |fk−1 ] = pi,j log2 [pi,j |pj ]
i=0 j=0
where fk and fk−1 are the gray levels of the kth and (k − 1)st pixels and I [fk |fk−1 ]
is the information in the kth pixel fk given that the (k − 1)st pixel has gray value
fk−1 . The conditional probability of pi,j given pj is denoted [pi,j |pj ]. It may be clearer
20.4 LOSSLESS COMPRESSION 503
to think of the conditional entropy as just the information content of the set of 2-D
gray-level “vectors” f = [f1 , f2 ]
X
I [f] = − p [f] log2 [p [f]]
f
where the sum is over all the 2-D pixel vectors f. Note that the total number of
gray-level states of the 2-D histogram is the square of the number of gray levels; the
vector histogram of an image with M levels has M 2 bins.
The first-order entropy is the average information content of the experiment at
the k th pixel given that the k − 1st pixel gray value is known. If the gray values
are correlated, the first-order entropy per pixel may be much less than the Shannon
entropy (pixels considered independently); in other words, the vector histogram of the
image is clustered. Note that the Shannon entropy may be considered as the zeroth-
order, or scalar entropy of the image. If the pixel gray levels are independent (random
DMS source), then the first-order entropy will be log2 [M 2 ] = 2·log2 [M] bits per pixel
because every pair of gray levels will be equally likely; the 2-D vector histogram is
thus flat, i.e., all 2-D vectors (whose components are gray levels of adjacent pixels)
will be equally likely. The image from a first-order Markov source may have a flat
histogram, but the 2-D histogram of neighboring pixels may be clustered; 2-D vector
coding of such an image will exhibit significant data compression.
Now consider the probabilities of the four cases for two pixels generated by the DMS.
Since the pixels are independent, the probability that the current pixel is a “1” given
that the previous pixel is a “1” is just the product of the probabilities — there is no
influence of the previous pixel on the choice of the current pixel:
h i
f [n] = 1 1 : p 1 1 = p [1] · p [1] = 0.8 · 0.8 = 0.64
504 CHAPTER 20 IMAGE COMPRESSION
Similarly, the probability that the current pixel is a “0” given that the previous pixel
is a “0” is:
h i
f [n] = 0 0 : p 0 0 = p [0] · p [0] = 0.2 · 0.2 = 0.04
h i
f [n] = 1 0 : p 1 0 = p [1] · p [0] = 0.8 · 0.2 = 0.16
h i
f [n] = 0 1 : p 0 1 = p [0] · p [1] = 0.8 · 0.2 = 0.16
Note that the sum of these conditional probabilities is still unity, as required. The
entropy of the conditional probabilities is:
I = −0.64 log2 [0.64] − 0.04 log2 [0.44] − 0.16 log2 [0.16] − 0.16 log2 [0.16]
= 1.4438 bits per element
Since there are two pixels per element, the entropy of the pixels taken two at a time
is just:
I = 1.4438 bits per pair
= 0.7219 bits per pixel
2 pixels per pair
which is identical to the scalar entropy of the DMS. In words, the information in the
pixels from the DMS taken two at a time is identical to that from the DMS taken one
at a time; there is no additional compression because there is no interpixel correlation.
In a realistic imaging situation, we might expect that black andh white ipixels
will be grouped together in the message. Thus the probabilities p 1 1 and
h i
p 0 0 would be larger in a real image that for a discrete memoryless source. To
h i h i
ensure that the sum of the probabilities is unity, p 0 1 and p 1 0 would
be expected to decrease, and also to be the same, since we would expect the same
number of transitions from black to white as from white to black. A possible table of
probabilities from a first-order Markov source with p [0] = 0.2 and p [1] = 0.8 would
be:
h i
p 1 1 = 0.8
h i
p 0 0 = 0.16
h i h i
p 0 1 = p 1 0 = 0.02
I = −0.80 log2 [0.80] − 0.16 log2 [0.16] − 0.02 log2 [0.02] − 0.02 log2 [0.02]
= 0.9063 bits per element = 0.4632 bits per pixel
There is a significant reduction in the entropy of 0.7291 bits per pixel for this exam-
ple of a first-order Markov source. The additional compression is due to interpixel
20.4 LOSSLESS COMPRESSION 505
correlation.
The concept may be extended to higher-order entropies. If the source is second-
order Markov, the gray value of a pixel is influenced by those of two adjacent pixels. In
this case, the set of 3-D vectors defined by triplets of gray values would be clustered,
and 3-D vector coding will further reduce the number of bits.
As an example of the effect of spatial correlations on image compression, consider
first the the 4 × 4 2-bit image shown:
0 1 0 0
0 1 2 2
f [n, m] =
0 1 2 3
1 2 2 3
∙ ¸
5 4 5 2
H [f ] = [5, 4, 5, 2] , p [f ] = , , ,
16 16 16 16
µ ∙ ¸ ∙ ¸ ∙ ¸ ∙ ¸¶
5 5 1 1 5 5 1 1
I [f ] = − log2 + log2 + log2 + log
16 16 4 4 16 16 8 2 8
= 1.924 bits/pixel
Because the histogram of this image is approximately ”flat”, little (if any) benefit
would be obtained by using a Huffman code. But note that adjacent pairs of pixels
exhibit some correlation; if a pixel is dark (0 or 1), it is more likely that its neighbor
to the right is dark than bright. This is demonstrated by constructing the two-
dimensional histogram of the pairs of left-right pixels of f[n,m]:
⎡ ⎤
Gray Values LEFT SIDE
⎢ ⎥
⎢ 0 1 2 3 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 0 2 0 ⎥
H [L, R] = ⎢
⎢
⎥
⎥
⎢ RIGHT 1 0 1 1 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ SIDE 2 3 0 0 0 ⎥
⎣ ⎦
3 1 0 0 0
Note that each pixel appears in only one pair; there is no double-counting. The
sum of the elements of the 2-D histogram is 8, and thus there are 16 pixels (8 pairs) in
f [n, m]. Also note that there is a total of 16 possible elements in the 2-D histogram,
four times as many as in the 1-D histogram; the 2-D histogram of a 2-bit image is
a 4-bit image. Even so, if the 2-D histogram of pixel pairs is clustered, it may be
possible to encode the representation in fewer bits. The entropy of this specific 2-D
506 CHAPTER 20 IMAGE COMPRESSION
histogram is:
∙ ¸ µ ∙ ¸¶ ∙ ¸
3 3 1 1 2 2 bits
I [H [L, R]] = − log2 + 3 · − log2 − log2 = 2.156
8 8 8 8 8 8 element
Because there are two pixels per element, the information content per pixel in the
2-D histogram is:
2.156 bits bits bits
= 1.078 < 1.924
2 pixel pixel pixel
The reduction in entropy of the image is due to the correlations between neighboring
gray values. As before, the elements of the 2-D histogram may be coded by a Huffman
procedure. This approach is called vector coding or block coding, because each pixel
pair describes a two-dimensional vector of gray levels. In analogous fashion, the
original Huffman approach to encode individual pixels is sometimes called scalar
coding.
Of course, if is feasible to encode larger blocks, which is equivalent to constructing
gray-level vectors with more dimensions. For example, consider the 4 × 4 3-bit image:
0 1 0 1
2 3 2 3
f [n, m] =
0 1 0 1
2 3 2 3
case) and the space. If these 27 characters were equally probable, the information
content in a message would be:
X
27 µ ¶ ∙ ¸ ∙ ¸
1 1 1
I= − log2 = − log2 = + log2 [27]
i=1
27 27 27
∼
= 4.755 bits per character
Of course, the probabilities of English characters are not equal. The histogram of the
characters may be determined from a statistical study of words. Abramson gives the
following table of character occurences:
Graph of histogram of letter occurences in the English language if the case of the
character is not considered.
Using this “1-D” histogram of probabilities (i.e., assuming that the characters
occur “independently” but are taken from this histogram of probabilities), the en-
tropy of an English-language message would be somewhat less than for equally likely
occurences:
X27
I= (−pi log2 [pi ]) ∼
= 4.080 bits per character
i=1
A sample of text that might be selected from this probability distribution is:
X
27 X
27
I(27 characters as pairs) = (−p [i|j] log2 [p [i|j]]) ∼
= 3.56 bits per character
i=1 j=1
20.4 LOSSLESS COMPRESSION 509
Note that this text appears to be (perhaps) slightly more “intelligible” than that
selected from the independent histogram — some characters almost form words! (e.g.,
inctore, deamy)
X
27 X
27 X
27
I(27 characters as triplets) = (−p [i|j|k] log2 [p [i|j|k]]) ∼
= 3.3 bits per character
i=1 j=1 k=1
Note that as more characters are included in a group for statistical computation of the
entropy, the more the typical text resembles English. Shannon continued the process
and estimated the upper and lower bounds of the entropy per character for groups
of English letters up to 15 characters and also for 100 characters. The entropy per
character approaches a limit for more than 9 or so letters in the interval 1 ≤ I ≤ 2
bits per character, but drops to the range 0.6 ≤ I ≤ 1.3 bits for groups of 100.
This means that characters in long strings of English characters are correlated; the
100-dimensional histogram of groups of 100 characters exhibits clustering.
0 ≤ x0 < 1
The first character in the message is “0” with known frequency of 3/4; the unit
interval is shrunk to the lower 3/4:
3 3
0 ≤ x1 < , |x1 | =
4 4
20.4 LOSSLESS COMPRESSION 511
The second character also is “0”, and the interval is subdivided to the lower 3/4 again:
9 9
0 ≤ x2 < , |x2 | =
16 16
1 1
The third character is “1” with frequency 4
, so the next subinterval is the upper 4
of x2 :
9 1 9 27 9 9
− · = ≤ x3 < , |x3 | =
16 4 16 64 16 64
Any point in the subinterval x4 can be used as the code for the string, because that
x4 can only be obtained by that specific sequence of m input characters.
Because the probabilities are multiples of powers of two in this specific example,
the length of the subinterval is easily represented as a binary fraction. Though this
is not always true, it is useful to continue the analysis to show how the code may be
represented. Recall that fractional numbers may be represented in binary notation
by breaking up into inverse powers of 2, so the the bit closest to the binary point
represents 2−1 , the second bit represents 2−2 , · · · . The endpoints of the interval x4 in
this example are:
27 0 1 1 0 1 1 0 0
= + + + + + + + = (04 01101100)2 → (04 011011)2
64 2 4 8 16 32 64 128 256
135 1 0 0 0 0 1 1 1
= + + + + + + + = (04 1000011)2
256 2 4 8 16 32 64 128 256
(04 011011)2 ≤ x4 < (04 10000111)2
The arithmetic code for the sequence “0010” is produced by selecting the representa-
tion of ANY binary fractional number in the interval x4 , though the shortest binary
fractional number will give the shortest code. In this example, the binary fractional
number (04 1)2 could be selected because it lies between the endpoints:
The binary sequence to the right of the binary point is the encoded sequence, which
in this case is the one-bit number 1; four letters have been encoded with a single bit.
As a second example, consider encoding of the sequence “1000” with the same a
512 CHAPTER 20 IMAGE COMPRESSION
priori probabilities. The first subinterval is the upper fourth of the unit interval:
3 1
≤ x1 < 1, |x1 | =
4 4
The second subinterval is the lower 3/4 of x1 :
3 3 3 1 15 3
≤ x2 < + · = , |x2 | =
4 4 4 4 16 16
The third subinterval is the lower 3/4 of x2 :
µ ¶2
3 3 3 1 57 9
≤ x3 < + · = , |x3 | =
4 4 4 4 64 64
The final subinterval is the lower 3/4 of x4 :
µ ¶3
3 3 3 1 219 9
≤ x4 < + · = , |x4 | =
4 4 4 4 256 256
The binary codes for the endpoints are:
1 1 1 1 0 1 1 1 1 1
+ = (04 11000000)2 ≤ x4 < + + + + + + + = (04 11011111)2
2 4 2 4 8 16 32 64 128 256
The code for the subinterval is the lower limit (04 11)2 , thus encoding the four-bit
sequence with two bits.
Endpoints for and lengths of other possible sequences are:
£ ¢
“0000” → 0, (0.75)4 = [0, 0.31640625) → |x4 | = (0.75)4
£ ¢
“1111” → 1 − (0.25)4 , 1 = [0.99609375, 1) → |x4 | = (0.25)4 = 0.0039 · · ·
examples will be demonstrated, where the codes were binary strings. First, the binary
points are added on the left of the code to yield:
1
x = (04 1)2 =
2
for the first example. The decision point which determines the first character is
located at t1 = 3/4, so that x < t1 and specifying that the first character must have
probability 3/4, i.e., it is 0. The decision point for the second character is the lower
3/4 of the interval determined by t1 i.e., t2 = 9/16. Again x ≤ t2 , so the second
character also must be 0. The decision point for the third character is located at:
3 27 1
t3 = · t2 = <x=
4 64 2
Because the coded point is larger than t3 , the third character is a 1 with probability
equal to 0.25. The final decision point is divides the upper quarter of the interval in
the proporation 3:1:
135
t4 = >x
256
So the fourth character also is 0.
In the second example, the code is
3
x = (04 11)2 =
4
of a binary decision:
A decision which selects a code symbol from a set of symbols is decomposed into
a sequence of binary decisions. The codeword may be considered to be the pointer
to the gray level being coded; the binary codeword defines the steps taken to reach
the symbol; the more probable the sequence, the wider the interval of the pointer,
and the shorter the decision sequence to obtain that codeword. The advantage of
arithmetic coding is that it can be adaptive, i.e., the code can adjust to changes in
the relative probabilities of symbols. The adaptive arithmetic coder assumes some a
priori probability distribution of characters and updates the distribution after sending
(or receiving) each new character.
This is the basis for the Q-coder, which was developed by IBM to encode binary
sequences using a 12-bit register. The coder derives a robust estimate of the prob-
ability distribution as it reads the source symbols. Because of its adaptability, the
Q-coder is effective for nonstationary sources.
1. Select the number of bits k in the codeword, for a total of 2k codewords. The
number k must be greater than m (preferably, k >> m).
2. Initialize the codes by setting the first 2m codes to the available gray levels —
this ensures that legitimate codes exist for all pixels, even after the code table
is filled.
516 CHAPTER 20 IMAGE COMPRESSION
3. Read the gray level of the first pixel; it is the first element in the string “S”
and is a member of the code table (from step 2).
4. If all pixels have been read, then output the code for “S” (k bits) and quit.
5. If pixels are left, then read the gray level P of the next pixel and append to
“S ” to create string “SP ”.
8. If there are unused codewords left, then add “SP” to the table.
f [n] = αααβααγααδαααβαααγααβααβαααααααααβααγ
Assume that k = 4, so that the codebook contains 24 = 16 symbols. From step 2, the
first four codes are A ≡ α, B ≡ β, C ≡ γ, and D ≡ δ. The coding process proceeds
in this manner:
1. S1 = “α”, in codebook as “A”
P1 = “α” → S1 P1 = “αα”, not in codebook,
S1 P1 = αα is assigned to the first available symbol “E” in codebook,
First character in output is code for α =“A”
S2 → P1 = “α”
2. P2 = third character = “α”
S2 P2 = “αα”, already exists in codebook as symbol “E”
S3 → S2 P2 = “αα”
3. P3 = β, S3 P3 = “ααβ”, not in codebook
S3 P3 = “ααβ” assigned to “F” in codebook
Second character in coded image = “E” = “αα”
S4 → S3 P3 = “β”
4. P4 = “α”, S4 P4 = “βα”, not in codebook
S4 P4 = “βα” assigned to “G” in codebook
Third character in coded image = “B” = “β”
S5 → P4 = “α”
5. P5 = “α”, S5 P5 = “αα” exists in codebook as “E”
S6 → “αα”
6. P6 = “γ”, S6 P6 = “ααγ”, not in codebook
S6 P6 = “ααγ” assigned to “H” in codebook
Fourth character in coded image = ”E”, code = “AEBE· · · ”
S7 = “γ”
20.4 LOSSLESS COMPRESSION 517
7.· · ·
After all pixels have been interogated (and if I made no mistakes), the codebook
is:
Symbol String
A α
B β
C γ
D δ
E αα
F ααβ
G βα
H ααγ
I γα
J ααδ
K δα
L ααα
M αβ
N βαα
O ααγα
P ααβα
The entire coded message is “AEBECEDEAGHFPLEFH”. Note that new elements
are added to the codebook quite rapidly, and the later elements typically represent
longer and longer sequences. When the codebook is full, the last element (or the
least used) can be deleted, and the process can proceed with the available elements.
The coded image requires 17 characters at 4 bits per character, for a total of 68 bits,
which is not much of a reduction when compared to the original quantity of 72 bits.
If a 3-bit code had been used, the last character in the codebook would be H,
and the coded message would be AEBECEDEAGECFEEEFH, for a total of 18 3-bit
characters, or 54 bits. Note that the total number of bits for the 3-bit code is less
than for the 4-bit code (which is not the usual case). This illustrates the sensitivity
of the process to the local image statistics.
As mentioned above, the LZW algorithm was used in most PC file compressors
(archiving programs such as “PKARC” and “PKZIP”) back in the days of small disk
drives. In those applications, the files consisted of 1-bit ASCII characters. LZW with
a 13-bit codeword was known as “squashing”, while 12-bit LZW was “crunching”.
518 CHAPTER 20 IMAGE COMPRESSION
This histogram of this image is flat with a population of one pixel per bin. and
therefore the information content of the image is 6 bits per pixel.
1-D 6-bit “ramp” image f [n] = n + 32 for −32 ≤ n ≤ 31. The histogram is “flat”
with one count per gray level. The information content is 6 bits per pixel.
Because the slope of f [n] is constant, the image of its derivative is constant also.
Recall that discrete differentiation can be implemented as a convolution:
df
= f [n] ∗ h [n]
dn
h [n] = +1 -1 0
For this derivative operation, the discrete transfer function (discrete Fourier trans-
for of the impulse response h[n]) is:
H [k] = |k|
and is nonzero at all samples k except at the origin (i.e., at zero frequency — the
constant part or average value). The derivative of a 1-D image is invertible if the
average (or initial) value is known as a boundary condition. In this example, the
histogram of the derivative has only two occupied gray levels, and the entropy of
the derivative image is 0.116 bits/pixel. This transformed image can be encoded by
6
one-bit characters with a compression efficiency of η = 0.116 ∼
= 52, which means that
this coding scheme produced a bit rate that is very much smaller than the Shannon
limit. How is this possible? Because we encoded a quality of groups of characters
rather than the individuals.
20.5 TRANSFORM CODING 521
Derivative of the ramp image: f 0 [n] = f [n] − f [n − 1]; the “gray value” of the first
pixel is “0” and all of the rest are “1”. The histogram has 63 pixels at 1 and 1 pixel
at 0, for an information content of 0.116 bits per pixel.
Obviously, derivative images can have negative gray values (though it did not in
this case). In fact, if the dynamic range of f [n] is 64 levels in the interval [0,63] (6
bits per pixel), the theoretical dynamic range of the derivative image is 127 levels
in the interval [-63,63] (not quite 7 bits per pixel). So although the histogram of
the derivative image may have less entropy if the pixels of the original image are
well correlated, an additional bit per pixel must be stored if the image is to be
recovered without loss. If some error in the uncompressed image is tolerable, sparsely
populated levels in the transform may be substituted with a similar gray level, or
levels with small amplitudes may be quantized to 0. These can significantly reduce
the information content. The latter process is called coring by some in the image
compression community.
The process of encoding the derivative image rather than the original is the basis
for both run-length encoding and differential pulse code modulation (DPCM). These
are examples of predictive coding, where a reduction in entropy is obtained by trans-
mitting the difference between the actual gray value at a pixel and a prediction
obtained by some rule. In run-length encoding of images with large uniform regions
(e.g., binary text images for FAX machines), the transmitted data are the number
of consecutive pixels with the same gray level before the next switch. If the strings
of 0s and 1s are long, run-length encoding can reduce the data stream significantly.
In DPCM, the gray value at a pixel is predicted from some linear combination of
previous gray levels; the error between the actual gray level and the prediction is
quantized and encoded. The predictors of the pixel gray value f [x, y] may include
one or several previous pixels, including f [x − 1, y], f [x − 1, y − 1], f [x, y − 1], and
f [x + 1, y − −1], as shown below:
f [x − 1, y − 1] f [x, y − 1] f [x + 1, y − 1]
f [x − 1, y] f [x, y]
522 CHAPTER 20 IMAGE COMPRESSION
where the aij are the weights of the linear combination. Among the predictors
commonly used are:
(1) 1-D first-order (based on one pixel): f [x0 , y0 ] = f [x0 − 1, y0 ]
(2) 2-D second-order (based on two pixels): f [x0 , y0 ] = 12 (f [x0 − 1, y0 ] + f [x0 , y0 − 1])
(3) 2-D third-order (based on three pixels):
In the first case, the difference between the actual and predicted gray values is
just the discrete first derivative:
¯
∂f ¯¯
[x0 , y0 ] = f [x0 , y0 ] − f [x0 , y0 ] = f [x0 , y0 ] − f [x0 − 1, y0 ] =
∂x ¯x=x0
The 2-D predictor usually improves compression efficiency significantly over 1-D
predictors for real images.
In adaptive DPCM, the mathematical form of the predictor may vary based on
the image structure. The compression in DPCM results because the prediction error
is quantized to fewer levels than the original image data. Note that the final image
may exhibit grainy or contouring errors if the minimum quantization level is coarse
and the gray levels vary slowly across the image.
while for 2-D images, the form of the space-variant operation is:
X
F [k, ] = f [n, m] p [n, m; k, ]
n,m
In words, the 2-D transform is the product of a 4-D matrix and the 2-D input. The
most common such transform in imaging is the discrete Fourier transform (DFT); the
mask m[n,k] for the 1-D DFT is the set of 1-D complex-valued sinusoids with spatial
frequency k/N:
∙ ¸
2πnk
p [n; k] = exp −i
N
µ ¶ µ ¶
2πnk 2πnk
= cos − i sin
N N
The mask p[n, m; k, ] for 2-D images is the set of 2-D complex-valued sinusoids; they
vary in a sinusoidal fashion along one direction and are constant in the orthogonal
direction. For an N × N input image, the mathematical expression for the mask
function is:
∙ ¸
2πi (nk + m )
p [n, m; k, ] = exp −
N
∙ µ ¶¸ ∙ µ ¶¸
m m
= cos 2π nk + − i sin 2π nk +
N N
The spatial frequencies of the sinusoidal mask indexed by k and are respectively
ξ = Nk and η = N . The gray level of each pixel in the transformed image F [k, ]
describes the degree of similarity between f [n, m] and that specific 2-D sinusoid.
Recall that the amplitudes of all pixels in a 1-D sinusoid can be completely specified
by three numbers: the magnitude, spatial frequency, and phase. In other words, the
gray value of a particular sample of a sinusoid is determined completely by any other
pixel if the parameters of the sinusoid are known; a perfect interpixel correlation
exists among the amplitudes. The DFT operation compresses the image information
into the number of bits required to represent those three parameters. Therefore,
an image composed of a small number of sinusoidal components can be compressed
to a very significant degree by converting to its Fourier representation. We begin
by considering a simple 1-D case; the image to be compressed has 64 pixels and is
524 CHAPTER 20 IMAGE COMPRESSION
a 1-D sinusoid of period 32 pixels, as shown below. The image has been sampled
but not quantized, i.e., all real numbers between 0 and 31 are allowed gray levels.
Because the Shannon entropy (information content) is defined as a sum over discrete
probabilities (gray levels), it strictly can not apply to images with real-valued gray
levels; the image must be quantized to calculate the entropy. The histogram of the
sinusoid after quantizing to 64 bins is shown; note that it is approximately flat:
µ ¶
2πn
f [n] = 16 + 15 cos , 0 ≤ f [n] ≤ 31
32
N
−1
X
2 ∙ ¸
2πink
F [ξ] = F [k · ∆ξ] =⇒ F [k] = f [n] exp −
N
n=− N
2
Note that the transform often includes a scale factor of N −1 that is not included
here. In words, the transform is the sum of gray values of the product of the input
image and the mask function, which is real-valued in the interval [−1, +1]. In general,
the transform F [k] generates noninteger values that typically lie outside the dynamic
range of f [n]. The amplitude of the transform at k = 0 is the sum of the gray values
of the image. In the example under consideration which has a mean gray value of 16,
the DC value of the DFT is:
The entire discrete spectrum F [k] has 61 samples with value zero, two with value
480 (located at k = ±2 so that ξ = ± Nk = ± 32 1
) and one with value 4096 at k = 0.
The histogram of F [k] generally needs an infinite number of bins to account for the
continuously valued range of F [k], but is certainly much less flat than the histogram
of f [n]; thus there is less entropy in F [ξ].
The sinusoid after quantization to 32 gray levels (5 bits per pixel) and the resulting
bits
histogram are shown below. The original image entropy is 3.890 pixel . Since the
image is quantized, it no longer is a sampled pure sinusoid and the Fourier transform
includes extra artifacts. The histogram of the quantized transform indicates that the
bits
information content of the transform is only 0.316 pixel .
Because the statistical properties of typical images vary across the image (i.e., the
statistics are shift variant), it is common to apply the selected transform to local
blocks of pixels (often of size 8 × 8 or 16 × 16) that are coded individually using
a spatial transform (to reduce gray-level redundancy), followed by Huffman coding.
The discrete Fourier transform (DFT) is a possible such transform, but its advantage
of generating an equivalent image which exhibits a clustered histogram often is offset
somewhat by its assumption that an input array of size N ×M pixels actually is infinite
array that is periodic over N × M blocks of pixels. In words, the DFT assumes that
the gray value of a pixel off the edge on one side of the array is identical to the gray
value on the edge of the other side of the array. Of course, it is common for pixels
at opposite edges of the array to have different gray levels (for sky at the top and
for ground at the bottom, for example). The gray-level transitions at the boundaries
of the periods of the array generate artifacts in the DFT known as leakage, or false
frequency components. Though these false frequency terms are necessary to generate
the true sharp edge boundaries of the block, the additional samples of the DFT with
non-zero amplitudes increase redundancy (and thus the entropy) of the transformed
image. Therefore, the potential efficiency of compression using the DFT often suffers.
An additional problem arises because the DFT of a real-valued discrete input image
is a complex-valued Hermitian array. The symmetry of the transform array (even real
part and odd imaginary part) ensures that only half of each part need be stored, but
the histograms of both parts must be accounted when computing the entropy of the
transform coefficients.
To reduce the impact of the sharp transitions at the edges of the blocks, as well
as to obtain a transform that is real-valued for a real-valued input image, the discrete
cosine transform (DCT) may be used instead of the DFT. The DCT has become very
important in the image compression community, being the basis transformation for
the JPEG and MPEG compression standards. The DCT of an M × M block may be
viewed as the DFT of a synthetic 2M × 2M block that is created by replicating the
original M × M block after folding about the vertical and horizontal edges:
526 CHAPTER 20 IMAGE COMPRESSION
Consider the computation of the DCT for a 1-D M-pixel block f [n] (0 ≤ n ≤
M − 1). The 2M-pixel synthetic array g[n] is indexed over n (0 ≤ n ≤ 2M − 1) and
has the form:
⎧
⎨ f [n] for 0 ≤ n ≤ 7
g[n] =
⎩ f [15 − n] for 8 ≤ n ≤ 15
20.5 TRANSFORM CODING 527
g [8] = f [7]
g [9] = f [6]
g [10] = f [5]
g [11] = f [4]
g [12] = f [3]
g [13] = f [2]
g [14] = f [1]
g [15] = f [0]
If the “new” array g [n] is assumed to be periodic over 2M samples, its amplitude is
defined for all n, e.g.,
If this function were symmetric (even), then circular translation of the 16-point array
by 8 pixels to generate g[n − 8 mod 16] also be an even function.
From the graph, it is apparent that the translated array is not symmetric about
the origin; rather, it has been translated by − 12 pixel from symmetry in the 2M -pixel
array. Thus define a new 1-D array c[n] that is shifted to the left by 12 pixel:
∙ ¸
1
c [n] = g n −
2
This result may seem confusing at first; how can a sampled array be translated by 12
pixel? For the answer, consider the continuous Fourier transform of a sampled array
translated by 12 unit:
½ ∙ ¸¾ ½ ∙ ¸¾
1 1
F1 {c [n]} ≡ C [ξ] = F1 g x − = F1 g [x] ∗ δ x −
2 2
∙ ¸
1
= G [ξ] · exp −2πiξ · C [ξ] = G [ξ] · exp [−iπξ]
2
528 CHAPTER 20 IMAGE COMPRESSION
1
Thus the effect of translation by 2
pixel on the transform is multiplication by the
specific linear-phase factor:
½ ∙ ¸¾ ½ ∙ ¸¾
1 1
F2M {c [n]} = F2M g n − = F2M g [n] ∗ δ n −
2 2
∙ µ ¶¸ ∙ ¸
k iπk
= G [k] · exp −iπ C [k] = G [k] · exp −
2M 2M
µ ∙ ¸ ∙ ¸¶
πk πk
= G [k] · cos − i sin
2M 2M
where the continuous spatial frequency ξ has been replaced by the sampled frequency
k
2M
. This function C [k] is the DCT of f [n]. Because the 2M -point translated function
c[n] is real and even, so must be the 2M -point discrete spectrum C [k]; therefore only
M samples of the spectrum are independent. This array is the DCT of f [n].
g[n] = f [n] : 0 ≤ n ≤ M − 1
g[n] = f [2M − 1 − n] : M ≤ n ≤ 2M − 1
The entire process may be cast into the form of a single equation, though the
algebra required to get there is a bit tedious,
X
M−1 µ ¶
2n + 1
C [k] = 2 f [n] cos πk · f or 0 ≤ k ≤ M − 1
n=0
2M
1
where w [k] = 2
for k = 0 and w [k] = 1 for 1 ≤ k ≤ M − 1.
The corresponding process to compute the 2-D DCT C[k, ] of an M × M-pixel block
f [n, m] is:
g [n, m] = f [n, m] : 0 ≤ n, m ≤ M − 1
g [n, m] = f [2M − 1 − n, m] : M − 1 ≤ n ≤ 2M − 1, 0 ≤ m ≤ M − 1
g [n, m] = f [n, 2M − 1 − m] : M − 1 ≤ n ≤ 2M − 1, 0 ≤ m ≤ M − 1
g [n, m] = f [2M − 1 − n, 2M − 1 − m] : M −1 ≤ n, m ≤ 2M −1 ≤ k, ≤ M −1
h i
3. C [k, ] = exp − iπ(k+
2M
)
· G [k, ] for 0 ≤ k, ≤ M − 1
530 CHAPTER 20 IMAGE COMPRESSION
The form of the forward DCT may be written more simply as:
∙ ¸ ∙ ¸
w [n] w [m] X X
N −1 N−1
(2n + 1) kπ (2n + 1) π
F [k, ] = 4 f [n, m] cos cos
N2 n=0 m=0
2N 2N
⎧
⎪
⎪ √1 if j=0
⎪
⎨ 2
where:w [j] ≡
⎪
⎪
⎪
⎩ 1 if j = 1, · · · , N − 1
X
N X
−1 N−1 ∙ ¸ ∙ ¸
(2k + 1) kπ (2 + 1) π
f [n, m] = w [k] w [ ] F [k, ] cos cos
k=0 =0
2N 2N
The action of the 2-D 8 × 8 block DCT will be detailed for blocks in the 64 × 64
5-bit image LIBERTY.
20.5 TRANSFORM CODING 531
The image and the 8 × 8 block DCT are shown. Note the bright pixel in each
8 × 8 block; its amplitude is proportional to the average gray value of that block. The
numerical values of the block and the DCT will be shown for two cases. In the first,
the block is taken from the upper right where all gray values are white:
⎡ ⎤
31 31 31 31 31 31 31 31
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 31 31 31 31 31 ⎥
⎣ ⎦
31 31 31 31 31 31 31 31
532 CHAPTER 20 IMAGE COMPRESSION
The amplitude of the upper-left pixel in the DCT is the zero-frequency (DC) term;
its amplitude is eight times the average gray value of the block. The other terms in
the DCT are proportional to the amplitude of oscillating sinusoidal components and
often are called the AC terms; in this constant block, the oscillating terms have null
amplitude because all gray values in the block are equal.
The gray values of a second 8 × 8 block located near the center of the image are:
⎡ ⎤
9 7 21 12 11 13 15 13
⎢ ⎥
⎢ ⎥
⎢ 31 8 9 13 9 12 12 11 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 22 7 10 13 11 11 10 ⎥
⎢ ⎥
⎢ ⎥
⎢ 23 8 10 7 10 13 10 11 ⎥
⎢ ⎥
⎢ ⎥
⎢ 11 31 31 26 9 8 9 10 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 31 14 14 20 9 8 ⎥
⎢ ⎥
⎢ ⎥
⎢ 31 31 28 10 29 31 31 30 ⎥
⎣ ⎦
31 31 21 18 31 31 30 29
with an average gray value of 17.95.
20.5 TRANSFORM CODING 533
The amplitudes of the DCT of the block near the center of the image are approx-
imately (rounded to one decimal place):
⎡ ⎤
143.6 14.6 9.7 5.7 −3.3 −1.1 4.1 2.9
⎢ ⎥
⎢ ⎥
⎢ −30.9 −0.1 −0.9 −0.7 5.0 6.5 1.1 −1.0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 13.9 −9.3 1.3 3.7 0.3 −0.4 0.5 3.0 ⎥
⎢ ⎥
⎢ ⎥
⎢ −1.5 3.8 −5.7 −8.6 −6.1 0.3 2.9 0.5 ⎥
⎢ ⎥
⎢ ⎥
⎢ −4.3 −4.7 −5.2 −6.1 −1.2 0.5 −0.9 −0.6 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0.3 −5.8 2.2 3.2 −1.1 −1.8 −2.0 3.7 ⎥
⎢ ⎥
⎢ ⎥
⎢ −1.5 4.7 −1.0 0.5 −1.1 −1.6 −0.7 −1.0 ⎥
⎣ ⎦
5.9 −0.1 −1.8 −5.4 −2.4 −2.4 −4.2 −0.8
Again, the amplitude of the sample in the upper-left corner is eight times the average
gray value of 17.95. The other 63 coefficeints (the AC terms) are bipolar. A negative
AC coefficient means that the particular cosine term oscillates out of phase by π
radians. The rate of oscillation of an AC component increases with distance from the
DC term, and the direction of oscillation is determined by the orientation relative
to the DC term at the origin; cosines that oscillate horizontally are specified by the
amplitudes along the first row and those that oscillate vertically by the amplitudes
in the first column. Note that the amplitudes of higher-frequency AC components
(away from the upper-left corner) tend to be smaller than the low-frequency terms
(towards the upper left). This is the usual result for realistic imagery, and is utilized
to obtain additional compression in the JPEG standard.
The DCT is useful as a transform for image compression because:
(b) the amplitudes are proportional to the amplitudes of sinusoids with differ-
ent oscillation rates, and
(c) the amplitudes of higher-frequency terms are smaller for realistic imagery.
£ ¤ √
Object B: f [n, m] = cos [2π (nξ 0 + mη 0 )] , ξ 0 = 14 sin π4 = 82 ∼= 0.1768,£oscillates
¤
two times in diagonal direction, period X 0 ∼ = 0.1768 −1
= 5.6561, η 0
= 14 cos π4
The “low-frequency” content of the signal concentrates the DCT amplitude near
the origin, but the fact that the array is only pseudoperiodic over 8 samples produces
larger amplitudes at more pixels in the DCT array.
JPEG Encoding
bipolar data
The DCTs of the 8 × 8 blocks are evaluated (data rounded to one decimal place):
If the block were pure white (gray value 255), then the DCT would produce a single
nonzero amplitude of +1016 at the DC term and zero elsewhere; if the block were
black, the DC coefficient of the DCT would be −1024
The DCT of the 64 8 × 8 blocks can be displayed in pictorial form:
538 CHAPTER 20 IMAGE COMPRESSION
parrot.tif
DCT values are divided by the values in a normalization matrix that accounts for
the contrast sensitivity function of the eye. Smaller numbers imply more weighting
applied. Largest numbers apply to the high-frequency terms positioned toward the
lower right corner. Note that the DC component (less 128) is divided by a larger
number than the neighboring low-frequency AC components.
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 50 57 69 56
14 17 22 29 51 87 80 62
Q [u, v] =
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
Quantization rounds the values in the block to the nearest integer; note the large
number of null coefficients:
2 −5 0 0 0 0 0 0
−8 −2 1 0 0 0 0 0
−2 3 1 0 0 0 0 0
½ ¾
F [k, ] 1 1 0 −1 0 0 0 0
Q =
N [k, ] 0 0 −1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
The 2-D 8 × 8 is scanned in a “zig-zag” format to convert the 2-D block to a 1-D
sequence of coefficients. Because of the layout of the scan, the largest coefficients
should appear first for most “real” images.
The zig-zag path traveled during Huffman coding of the quantized DCT coefficients.
String of Characters:
540 CHAPTER 20 IMAGE COMPRESSION
1. the run length of consecutive zeros that preceded the current element in the
zigzag sequence.
0 0 0 0 0 0 0 0 0 0 −1 −1 0 × 39
This string is encoded using a predetermined Huffman code based on the number
of zeros in the string to the next nonzero coefficient and the numerical value of the
coefficient. Short strings of zeros are encoded with shorter sequences
(0, 3) (0, 4) (0, 2) (0, 1) (0, 2) (0, 1) (2, 1) (0, 1) (10, 1) (0, 1) EOB
First we reconstruct the approximate DCT from the Huffman code and multiplication
by the normalization matrix:
32 −55 0 0 0 0 0 0
−96 −24 14 0 0 0 0 0
−28 39 16 0 0 0 0 0
14 17 0 −29 0 0 0 0
DCT F̂ [k, ] · N [k, ] =
0 0 −37 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
The inverse DCT produceds an 8 × 8 block of bitonal “gray values:”
These values are rounded and the constant 128 is added back to obtain the recovered
542 CHAPTER 20 IMAGE COMPRESSION
The error in the block is obtained by subtracting the recovered image from the orig-
inal:
+5 −1 −6 −5 +1 +1 −2 −4
−3 +5 +8 +8 0 −8 −1 +3
−3 +6 +6 +1 −4 +2 −1 −1
+3 −2 0 +4 −5 −2 −3 +1
ε [n, m] = f [n, m] − fˆ [n, m] =
−2 +6 −4 −10 0 −2 −2 −9
+1 −5 −7 +2 +4 −1 −3 −12
−7 −4 +6 +3 0 +8 +4 −3
+1 −4 −2 −3 −3 −1 −1 0
⎡ ⎤
−10 4 8 −4 4 −1 2 −1
⎢ ⎥
⎢ ⎥
⎢ 13 −5 −3 1 3 0 −1 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 5 7 −4 −1 −3 0 −1 0 ⎥
⎢ ⎥
½ ¾ ⎢ ⎥
F [k, ] ⎢ 4 1 2 5 0 0 0 1 ⎥
integer =⎢
⎢
⎥
⎥
N [k, ] ⎢ 2 −2 1 0 0 0 −1 −1 ⎥
⎢ ⎥
⎢ ⎥
⎢ −3 0 0 0 0 1 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 −1 0 −1 0 0 0 0 ⎥
⎣ ⎦
0 0 0 0 0 0 0 0
20.6 JPEG IMAGE COMPRESSION OF STATIC IMAGES 545
⎡ ⎤
170 173 114 169 169 124 187 138
⎢ ⎥
⎢ ⎥
⎢ 158 149 147 122 180 129 49 136 ⎥
⎢ ⎥
⎢ ⎥
⎢ 72 56 149 173 81 81 42 224 ⎥
⎢ ⎥
⎢ ⎥
⎢ 117 49 24 154 29 85 54 186 ⎥
f [n, m] = ⎢
ˆ
⎢
⎥
⎥
⎢ 172 111 24 80 100 136 47 198 ⎥
⎢ ⎥
⎢ ⎥
⎢ 167 63 134 45 92 120 47 160 ⎥
⎢ ⎥
⎢ ⎥
⎢ 159 58 141 41 23 76 23 15 ⎥
⎣ ⎦
43 192 165 116 28 67 56 125
546 CHAPTER 20 IMAGE COMPRESSION
Note that the error is much larger in several of the locations, because the high-
frequency coefficients that are necessary to produce the “sharp edges” have been
quantized to zero.