EXERSISES
EXERSISES
EXERSISES
• Sports video annotation: using video classification to label video segments with
certain actions such as strokes in table tennis, penalty kicks in soccer games, etc.
• GameStory: a video game analytics challenge in which e-sport games often involv-
ing millions of players and viewers are analyzed. Audio and video streams, com-
mentaries, game data and statistics, interaction traces, and other multimedia data
are used to summarize and predict the development and evolution of the games.
• Live video streaming: requiring ultra-low end-to-end latency for real-time inter-
action between the broadcaster and the viewers. The main challenge is the QoE
(quality of experience), due to the latency constraint.
• Violent scenes detection in film: automatically detecting portions of movies
depicting violence. Again, all aspects available such as text and audio could be
brought into play.
• Preserving privacy in surveillance videos: methods obscuring private information
(such as faces on Google Earth), so as to render privacy-sensitive elements of video
unrecognizable, while at the same time allowing the video to still be viewable by
people and also allow computer vision tasks such as object tracking.
• Deep video understanding: understanding the relationships between different enti-
ties from a long duration movie. The relations can be family, work, social, and
other types. All modalities of multimedia will be exploited and a knowledge base
will be constructed and consulted.
• Large-scale human-centric video analysis: analyzing various crowd and complex
events such as getting off a train, dining in a busy restaurant, earthquake escape,
etc. Issues include multi-person tracking, crowd pose estimation and tracking,
and person-level action recognition.
• Searching and question answering for the spoken web: searching for audio content
within audio content by using an audio query, matching spoken questions with a
collection of spoken answers.
• Multimedia recommender systems: improving the quality of recommender sys-
tems to produce items more relevant to users’ interests. Applications include
movie recommendation, news recommendation, etc.
Solutions to these challenges can be difficult, but the impact can be enormous,
not only to the IT industry, but also to everyone, as we all live in a digital multimedia
world. We want this textbook to bring valuable knowledge about multimedia to you,
and hope you enjoy it and perhaps even contribute to this promising field (maybe for
some of the topics listed above or beyond) in your future career!
1.5 Exercises
1. Using your own words, describe what “multimedia” is. Is multimedia simply a
collection of different types of media?
2. Identify three novel multimedia applications. Discuss why you think these are
novel and their potential impact.
3. Discuss the relation between multimedia and hypermedia.
1.5 Exercise 25
4. Briefly explain, in your own words, “Memex” and its role regarding hypertext.
Could we carry out the Memex task today? How do you use Memex ideas in your
own work?
5. Discover a current media input, storage, or playback device that is analog. Is it
necessary to convert to digital? What are the pros and cons of being analog or
digital?
6. Your task is to think about the transmission of smell over the Internet. Suppose
we have a smell sensor at one location and wish to transmit the Aroma Vector (let
us say) to a receiver, to reproduce the same sensation. You are asked to design
such a system. List three key issues to consider and two applications of such a
delivery system. Hint: Think about medical applications.
7. Tracking objects or people can be done by both sight and sound. While vision
systems are precise, they are relatively expensive; on the other hand, a pair of
microphones can detect a person’s bearing inaccurately but cheaply. Sensor fusion
of sound and vision is thus useful. Surf the web to find out who is developing
tools for videoconferencing using this kind of multimedia idea.
8. Non-photorealistic graphics means computer graphics that do well enough with-
out attempting to make images that look like camera images. An example is
conferencing. For example, if we track lip movements, we can generate the right
animation to fit our face. If we don’t much like our own face, we can substi-
tute another one—facial-feature modeling can map correct lip movements onto
another model. See if you can find out who is carrying out research on generating
avatars to represent conference participants’ bodies.
9. Watermarking is a means of embedding a hidden message in data. This could
have important legal implications: Is this image copied? Is this image doctored?
Who took it? Where? Think of “messages” that could be sensed while capturing
an image and secretly embedded in the image, so as to answer these questions.
(A similar question is derived from the use of cell phones. What could we use
to determine who is putting this phone to use, and where, and when? This could
eliminate the need for passwords or others using the phone you lost.).
References
1. B. Newhall, The History of Photography: from 1839 to the Present. (The Museum of Modern
Art, 1982)
2. T. Gustavson, G.E. House, Camera: a History of Photography from Daguerreotype to Digital.
(Sterling Signature, 2012)
3. A. Koenigsberg, The Patent History of the Phonograph, 1877–1912. (APM Press, 1991)
4. D.L. Morton, Jr., Sound Recording: the Life Story of a Technology. (Johns Hopkins University
Press, 2006)
5. Q.D. Bowers, K. Fuller-Seeley, One Thousand Nights at the Movies: an Illustrated History of
Motion Pictures, 1895–1915. (Whitman Publishing, 2012)
6. T.K. Sarkar, R. Mailloux, A. Arthur, M. Salazar-Palma, D.L. Sengupta, History of Wireless.
(Wiley-IEEE Press, Oliner, 2006)
4.3 Color Models in Video 113
In fact, the output range is also clamped to [1 .. 254] since the Rec. 601 synchroniza-
tion signals are given by codes 0 and 255.
The inverse transform to Eq. (4.41) is as follows [16]:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
R 0.00456621 0.0000
0.00625893 Y − 16
⎣ G ⎦ = ⎣ 0.00456621 −0.00318811 ⎦ ⎣ Cb − 128 ⎦
−0.00153632
B 0.00456621 0.00791071
0.0000 Cr − 128
(4.42)
The YCbCr transform is used in JPEG image compression and MPEG video
compression.
4.4 Exercises
(a) wavelength,
(b) color level,
(c) brightness, and
(d) whiteness.
How would you match each of the following (more vaguely stated) characteris-
tics to each of the above terms?
(a) luminance,
(b) hue,
(c) saturation, and
(d) chrominance.
114 4 Color in Image and Video
2. What color is outdoor light? That is, around what wavelength would you guess
the peak power is for a red sunset? For blue sky light?
3. “The LAB gamut covers all colors in the visible spectrum.”
What does that statement mean? Briefly, how does LAB relate to color?—just
be descriptive.
What are (roughly) the relative sizes of the LAB gamut, the CMYK gamut, and
a monitor gamut?
4. Prove that straight lines in (X , Y , Z ) space project to straight lines in (x, y)
chromaticity space. That is, let C1 = (X 1 , Y1 , Z 1 ) and C2 = (X 2 , Y2 , Z 2 ) be
two different colors, and let C3 = (X 3 , Y3 , Z 3 ) fall on a line connecting C1
and C2 : C3 = αC1 + (1 − α)C2 . Then show that (x3 , y3 ) = β(x1 , y1 ) + (1 −
β)(x2 , y2 ) for some β.
5. Where does the chromaticity “horseshoe” shape in Fig. 4.11 come from? Can
we calculate it?
Write a small pseudocode solution for the problem of finding this so-called
“spectrum locus.”
Hint: Figure 4.20a shows the color-matching functions in Fig. 4.10 drawn as a
set of points in 3-space. And Fig. 4.20b shows these points mapped into another
3D set of points.
Hint: Try a programming solution for this problem, to help you answer it more
explicitly.
6. Suppose we use a new set of color-matching functions x̄ new (λ), ȳ new (λ), z̄ new (λ)
with values
In this system, what are the chromaticity values (x, y) of equi-energy white light
E(λ) where E(λ) ≡ 1 for all wavelengths λ? Explain.
7. Repeat the steps leading up to Eq. (4.18), but this time using the NTSC standard—
if you use the number of significant digits as in Table 4.18 you will end up with
the transform in Eq. (4.19).
8. (a) Suppose images are not gamma-corrected by a camcorder. Generally, how
would they appear on a screen?
(b) What happens if we artificially increase the output gamma for stored image
pixels? (One can do this in Photoshop.) What is the effect on the image?
9. Suppose image file values are in 0 .. 255 in each color channel. If we define
R = R/255 for the red channel, we wish to carry out gamma correction by
1/2.0
passing a new value R to the display device, with R R .
4.4 Exercises 115
1.8 0.9
1.6 0.8
1.4
0.7
1.2
1 0.6
0.8 0.5
0.6 0.4
0.4
0.3
0.2
0.2
0
1 0.1
0.8 1.4
0.6 1.2 0
1 1
0.4 0.8 0.8 0.8
0.6 0.6 0.6
0.2 0.4 0.4 0.4
0 0.2 0.2 0.2
0 0 0
(a) (b)
It is common to carry out this operation using integer math. Suppose we approx-
imate the calculation as creating new integer values in 0 .. 255 via
1/2.0
(int) (255 · (R ))
(a) Comment (very roughly) on the effect of this operation on the number of
actually available levels for display. Hint: Coding this up in any language
will help you understand the mechanism at work better—and as well, then
you can simply count the output levels.
(b) Which end of the levels 0 .. 255 is affected most by gamma correction, the
low end near 0 or the high end near 255? Why? How much at each end?
(a) Suppose a color (i.e., an RGB) is divided by 2.0, so that the RGB triple now
has values 0.5 times its former values.
Explain using numerical values:
116 4 Color in Image and Video
(i) If gamma correction is applied after the division by 2.0 and before the color
is stored, does the darker RGB have the same hue as the original in the
sense of having the same ratios R:G:B of light emanating from the display
device? (we’re not discussing any psychophysical effects that change our
perception—here we’re just worried about the machine itself).
(ii) If gamma correction is not applied, does the second RGB above, = RGB/2,
have the same hue as the first RGB, when displayed? And are these the
same hues as for the original color as stored, not the light as displayed?
(b) Assuming no gamma correction is applied, for what color triples is the hue
as displayed the same as for the original color as stored?
13. We wish to produce a graphic that is pleasing and easily readable. Suppose we
make the background color pink. What color text font should we use to make
the text most readable? Justify your answer.
14. To make matters simpler for eventual printing, we buy a camera equipped with
CMY sensors, as opposed to RGB sensors (CMY cameras are, in fact, available).
(a) Draw spectral curves roughly depicting what such a camera’s sensitivity to
wavelength might look like.
(b) Could the output of a CMY camera be used to produce ordinary RGB pic-
tures? How?
4.4 Exercises 117
15. Color ink-jet printers use the CMYK model. When the color ink cyan is sprayed
onto a sheet of white paper,
(i) why does it look cyan under daylight?
(ii) what color would it appear to be under a blue light. Why?
References
1. D.H. Pritchard, U.S. color television fundamentals—A review. IEEE Trans. Consum. Electron.
23(4), 467–478 (1977)
2. G. Wyszecki, W.S. Stiles, Color Science: concepts and Methods, Quantitative Data and For-
mulas, 2nd edn. (Wiley, New York, 2000)
3. R.W.G. Hunt, Color reproduction and color vision modeling, in 1st Color Imaging Conference
on Transforms & Transportability of Color, pp. 1–5. Society for Imaging Science & Technology
(IS&T)/Society for Information Display (SID) joint conference (1993)
4. M.J. Vrhel, R. Gershon, L.S. Iwan, Measurement and analysis of object reflectance spectra.
Color Res. Appl. 19, 4–9 (1994)
5. R.W.G. Hunt, The Reproduction of Color, 6th edn. (Fountain Press, England, 2004)
6. J.F. Hughes, A. van Dam, M. McGuire, D.F. Sklar, J.D. Foley, S.K. Feiner, K. Akeley, Computer
Graphics: Principles and Practice, 3rd edn. (Addison-Wesley, 2013)
7. C.A. Poynton, Color FAQ—frequently asked questions color (2006). http://www.poynton.com/
notes/colour_and_gamma/ColorFAQ.html
8. M.D. Fairchild, Color Appearance Models, 3rd edn. (Addison-Wesley, 2013)
9. D. Travis, Effective Color Displays. (Academic Press, 1991)
10. R.S. Berns, L.A. Taplin, Practical spectral imaging using a color-filter array digital camera.
art-si.org/PDFs/Acquisition/TR_Practical_Spectral_Imaging.pdf
11. C. Fredembach, S. Susstrunk, Colouring the near infrared, in 16th Color Imaging Conference,
pp. 176–182 (2008)
12. International Electrotechnical Commission, IEC 61966-2-1: multimedia systems and
equipment—Colour measurement and management—Part 2-1: colour management—Default
RGB colour space—sRGB. (IEC, Geneva, 2007)
13. M. Stokes, M. Anderson, S. Chandrasekar, R. Motta, A standard default color space for the
internet: sRGB (1996). http://www.color.org/sRGB.html
14. Microsoft Corporation, Colorspace Interchange Using sRGB (2001). http://www.microsoft.
com/whdc/device/display/color/sRGB.mspx
15. P. Urban, Ink limitation for spectral or color constant printing, in 11th Congress of the Interna-
tional Colour Association (2009)
16. C.A. Poynton, A Technical Introduction to Digital Video (Wiley, New York, 1996)
6.3 Quantization and Transmission of Audio 191
However, we can get into a difficult situation if we try to change the prediction
coefficients that multiply previous quantized values, because that makes a compli-
cated set of equations to solve for these coefficients. Suppose we decide to use a
least-squares approach to solving a minimization, trying to find the best values of
the ai :
N
min ( f n − fˆn )2 (6.24)
n=1
where here we would sum over a large number of samples f n for the current patch of
speech, say. But because fˆn depends on the quantization, we have a difficult problem
to solve. Also, we should really be changing the fineness of the quantization at the
same time, to suit the signal’s changing nature; this makes things problematical.
Instead, we usually resort to solving the simpler problem that results from using
not f˜n in the prediction but simply the signal f n itself. This is indeed simply solved,
since, explicitly writing in terms of the coefficients ai , we wish to solve
N
M
min ( fn − ai f n−i )2 (6.25)
n=1 i=1
Differentiation with respect to each of the ai and setting to zero produces a linear
system of M equations that is easy to solve. (The set of equations is called the
Wiener–Hopf equations.).
Thus, we indeed find a simple way to adaptively change the predictor as we go.
For speech signals, it is common to consider blocks of signal values, just as for
image coding, and adaptively change the predictor, quantizer, or both. If we sample
at 8 kHz, a common block size is 128 samples—16 ms of speech. Figure 6.19 shows
a schematic diagram for the ADPCM coder and decoder [8].
6.4 Exercises
64 kbps A-law fn en ~
en
Convert to Adaptive 32 kbps
or -law
uniform PCM
+ quantizer output
PCM input −
+
~
Adaptive fn
predictor
fˆn
(a)
~ 64 kbps A-law
32 kbps e~n fn Convert to
input
+ PCM
or -law
PCM output
Adaptive
predictor
fˆn
(b)
(a) If the output was 1 volt for frequencies at mid-range, after a loss of −3 dB
at 18 KHz, what is the output voltage at this frequency?
(b) To compensate the loss, a listener can adjust the gain (and hence the output)
at different frequencies from an equalizer. If the loss remains -3 dB and a
gain through the equalizer is 6 dB at 18 KHz, what is the output voltage
now?
[Hint: Assume log10 2 = 0.3.]
6. Suppose the sampling frequency is 1.5 times the true frequency. What is the alias
frequency?
7. In a crowded room, we can still pick out and understand a nearby speaker’s voice
notwithstanding the fact that general noise levels may be high. This is what is
known as the “cocktail-party effect”; how it operates is that our hearing can
localize a sound source by taking advantage of the difference in phase between
the two signals entering our left and right ears (“binaural auditory perception”).
In mono, we could not hear our neighbor’s conversation very well if the noise
level were at all high.
State how you think a karaoke machine works.
Hint: the mix for commercial music recordings is such that the “pan” parameter
is different going to the left and right channels for each instrument. That is,
for an instrument, the left, or the right, channel is emphasized. How would the
6.4 Exercises 193
singer’s track timing have to be recorded in order to make it easy to subtract out
the sound of the singer? (And this is typically done.)
8. The dynamic range of a signal V is the ratio of the maximum to the minimum,
expressed in decibels. The dynamic range expected in a signal is to some extent
an expression of the signal quality. It also dictates the number of bits per sample
needed in order to reduce the quantization noise down to an acceptable level,
e.g., we may like to reduce the noise to at least an order of magnitude below
Vmin .
Suppose the dynamic range for a signal is 60 dB. Can we use 10 bits for this
signal? Can we use 16 bits?
9. Suppose the dynamic range of speech in telephony implies a ratio Vmax /Vmin of
about 256. Using uniform quantization, how many bits should we use to encode
speech, so as to make the quantization noise at least an order of magnitude less
than the smallest detectable telephonic sound?
10. Perceptual nonuniformity is a general term for describing the nonlinearity of
human perception, e.g., when a certain parameter of an audio signal varies,
humans do not necessarily perceive the difference in proportion to the amount
of change.
11. Suppose we mistakenly always use the 0.75 point instead of the 0.50 point in
a quantization interval as the decision point, in deciding to which quantization
level an analog value should be mapped. Above, we have a rough calculation of
SQNR. What effect does this mistake have on the SQNR?
12. State the Nyquist frequency for the following digital sample intervals. Express
the result in Hertz in each case.
(a) 1 millisecond,
(b) 0.005 s, and
(c) 1 h.
13. Draw a diagram showing a sinusoid at 5.5 kHz, and sampling at 8 kHz (just
show eight intervals between samples in your plot). Draw the alias at 2.5 kHz
and show that in the eight sample intervals, exactly 5.5 cycles of the true signal
fit into 2.5 cycles of the alias signal.
14. In an old Western movie, we notice that a stagecoach wheel appears to be moving
backward at 5◦ per frame, even though the stagecoach is moving forward. To
what is this effect due? What is the true situation?
15. Suppose a signal contains tones at 1, 10, and 21 kHz, and is sampled at the rate
12 kHz (and then processed with an anti-aliasing filter limiting output to 6 kHz).
194 6 Basics of Digital Audio
fˆn = trunc 1 ˜
+ f˜n−2 ) ,
2 ( f n−1 (6.26)
en = f n − fˆn .
Also, suppose we adopt the quantizer Eq. (6.21). If the input signal has values
as follows:
then show that the output from a DPCM coder (without entropy coding) is as
follows:
250 250
200 200
Signal
Signal
150 150
100 100
50 50
Time Time
(a) (b)
Fig. 6.20 a DPCM reconstructed signal (dotted line) tracks the input signal (solid line). b DPCM
reconstructed signal (dashed line) steers farther and farther from the input signal (solid line)
Figure 6.20a shows how the quantized reconstructed signal tracks the input
signal. As a programming project, write a small piece of code to verify your
results.
(b) Now, suppose by mistake on the coder side we inadvertently use the predictor
for lossless coding, Eq. (6.15), using original values f n instead of quantized
ones f˜n . Show that on the decoder side we end up with reconstructed signal
values as follows:
References
1. B. Truax, Handbook for Acoustic Ecology, 2nd edn. (Cambridge Street Publishing, 1999)
2. K.C. Pohlmann, Principles of Digital Audio, 6th edn. (McGraw-Hill, 2010)
3. J.H. McClellan, R.W. Schafer, M.A. Yoder, DSP First: a Multimedia Approach. (Prentice-Hall
PTR, 1998)
4. J. Heckroth, Tutorial on MIDI and music synthesis. The MIDI Manufacturers Association, POB
3173, La Habra CA 90632-3173 (1995). http://www.harmony-central.com/MIDI/Doc/tutorial.
html
5. P.K. Andleigh, K. Thakrar, Multimedia Systems Design. (Prentice-Hall PTR, 1995)
6. The MIDI Association. Strategic overview and introduction to midi 2.0. in The National Asso-
ciation of Music Merchants (NAMM) Show (2020)
7. K. Sayood, Introduction to Data Compression, 5th edn. (Morgan Kaufmann, San Francisco,
2017)
8. R.L. Freeman, Reference Manual for Telecommunications Engineering, 3rd edn. (Wiley, 2001)
236 7 Lossless Compression Algorithms
Table 7.7 Comparison of lossless JPEG with other lossless compression programs
Compression Compression ratio
program
Lena Football F-18 Flowers
Lossless JPEG 1.45 1.54 2.29 1.26
Optimal lossless 1.49 1.67 2.71 1.33
JPEG
Compress (LZW) 0.86 1.24 2.21 0.87
Gzip (LZ77) 1.08 1.36 3.10 1.05
Gzip-9 (optimal 1.08 1.36 3.13 1.05
LZ77)
Pack (Huffman 1.02 1.12 1.19 1.00
coding)
4–7 that consider neighboring nodes in both horizontal and vertical dimensions offer
slightly better compression (approximately 0.2–0.5 higher) than predictors 1–3.
Table 7.7 shows a comparison of the compression ratio for several lossless com-
pression techniques using test images Lena, football, F-18, and flowers. These stan-
dard images used for many purposes in imaging work are shown on the textbook
website for this chapter.
This chapter has been devoted to the discussion of lossless compression algo-
rithms. It should be apparent that their compression ratio is generally limited (with
a maximum at about 2–3). However, many of the multimedia applications we will
address in the next several chapters require a much higher compression ratio. This
is accomplished by lossy compression schemes.
7.8 Exercises
1. Calculate the entropy of a “checkerboard” image in which half of the pixels are
BLACK and half of them are WHITE.
2. Suppose eight characters have a distribution A:(1), B:(1), C:(1), D:(2), E:(3),
F:(5), G:(5), H:(10). Draw a Huffman tree for this distribution. (Because the
algorithm may group subtrees with equal probability in a different order, your
answer is not strictly unique.)
3. (a) What is the entropy η of the image below, where numbers (0, 20, 50, 99)
denote the gray-level intensities?
99 99 99 99 99 99 99 99
20 20 20 20 20 20 20 20
0 0 0 0 0 0 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 0 0 0 0 0 0
7.8 Exercises 237
(b) Show step by step how to construct the Huffman tree to encode the above
four intensity values in this image. Show the resulting code for each intensity
value.
(c) What is the average number of bits needed for each pixel, using your Huffman
code? How does it compare to η?
4. Consider an alphabet with two symbols A, B, with probability P(A) = x and
P(B) = 1 − x.
(a) Plot the entropy as a function of x. You might want to use log2 3 = 1.6 and
log2 7 = 2.8.
(b) Discuss why it must be the case that if the probability of the two symbols is
1/2 + and 1/2 − , with small , the entropy is less than the maximum.
(c) Generalize the above result by showing that, for a source generating N sym-
bols, the entropy is maximum when the symbols are all equiprobable.
(d) As a small programming project, write code to verify the conclusions above.
(a) What are the advantages and disadvantages of arithmetic coding as compared
to Huffman coding?
(b) Suppose the alphabet is [A, B, C], and the known probability distribution is
PA = 0.5, PB = 0.4, PC = 0.1. For simplicity, let’s also assume that both
encoder and decoder know that the length of the messages is always 3, so
there is no need for a terminator.
i. How many bits are needed to encode the message BBB by Huffman
coding?
ii. How many bits are needed to encode the message BBB by arithmetic
coding?
238 7 Lossless Compression Algorithms
0 1
2 2 a
0 1
NEW 0 2 b
8. (a) What are the advantages of adaptive Huffman coding compared to the orig-
inal Huffman coding algorithm?
(b) Assume that adaptive Huffman coding is used to code an information source
S with a vocabulary of four letters (a, b, c, d). Before any transmission, the
initial coding is a = 00, b = 01, c = 10, d = 11. As in the example illustrated
in Fig. 7.8, a special symbol NEW will be sent before any letter if it is to be
sent the first time.
Figure 7.18 is the adaptive Huffman tree after sending letters aabb. After
that, the additional bitstream received by the decoder for the next few letters
is 01010010101.
9. Work out the details of scaling and incremental coding in arithmetic coding when
the probabilities for the three symbols are A: 0.8, B: 0.02, C: 0.18, and the input
sequence is ACBA.
10. We will use arithmetic coding to encode images.
(a) At most, how many bits does it take to encode the first row of the image
below, where all numbers are gray-level intensity values?
(ii) What is the code (sequence of bits) that the encoder generates for the
four numbers? Explain how you get the code.
11. Work out the details of the encoder and decoder for adaptive binary arithmetic
coding when the input symbols are 01111.
12. Compare the rate of adaptation of adaptive Huffman coding and adaptive arith-
metic coding. What prevents each method from adapting to quick changes in
source statistics?
13. Consider the dictionary-based LZW compression algorithm. Suppose the alpha-
bet is the set of symbols {0,1}. Show the dictionary (symbol sets plus associated
codes) and output for LZW compression of the input
0 1 1 0 0 1 1
14. Implement Huffman coding, LZW coding, and arithmetic coding algorithms
using your favorite programming language. Generate at least three types of sta-
tistically different artificial data sources to test your implementation of these
algorithms. Compare and comment on each algorithm’s performance in terms
of compression ratio for each type of data source.
References
1. M. Nelson, J.L. Gailly, The Data Compression Book, 2nd edn. (M&T Books, New York, 1995)
2. K. Sayood, Introduction to Data Compression, 5th edn. (Morgan Kaufmann, San Francisco,
2017)
3. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423,
623–656 (1948)
4. C.E. Shannon, W. Weaver, The Mathematical Theory of Communication (University of Illinois
Press, 1971)
5. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edn. (Prentice-Hall, 2007)
6. R. Fano, Transmission of Information. (MIT Press, 1961)
7. D.A. Huffman, A method for the construction of minimum-redundancy codes. Proc. IRE 40(9),
1098–1101 (1952)
8. T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, 3rd edn. (The MIT Press,
Cambridge, Massachusetts, 2009)
9. J. Ziv, A. Lempel, A universal algorithm for sequential data compression. IEEE Trans. Inf.
Theory 23(3), 337–343 (1977)
10. J. Ziv, A. Lempel, Compression of individual sequences via variable-rate coding. IEEE Trans.
Inf. Theory 24(5), 530–536 (1978)
11. T.A. Welch, A technique for high performance data compression. IEEE Comput. 17(6), 8–19
(1984)
296 8 Lossy Compression Algorithms
8.10 Exercises
1 x2
f X (x) = √ e− 2 (8.72)
2π
Follow the Lloyd–Max algorithm in this chapter: the other boundary values are
calculated as the midpoints of the reconstruction values. We now have b0 =
[−∞, −1.5, 0, 1.5, ∞]. Continue one more iteration for i = 1, using Eq. 8.13
and find y01 , y11 , y21 , and y31 , using numerical integration. Also calculate the
squared error of the difference between y1 and y0 .
Iteration is repeated until the squared error between successive estimates of
the reconstruction levels is below some predefined threshold . Write a small
program to implement the Lloyd–Max quantizer described above.
4. If the block size for a 2D DCT transform is 8 × 8, and we use only the DC
components to create a thumbnail image, what fraction of the original pixels
would we be using?
5. When the block size is 8 × 8, the definition of the DCT is given in Eq. 8.17.
(a) If an 8 × 8 grayscale image is in the range 0 .. 255, what is the largest value a
DCT coefficient could be, and for what input image? (Also, state all the DCT
coefficient values for that image.)
(b) If we first subtract the value 128 from the whole image and then carry out the
DCT, what is the exact effect on the DCT value F[2, 3]?
(c) Why would we carry out that subtraction? Does the subtraction affect the
number of bits we need to code the image?
(d) Would it be possible to invert that subtraction, in the IDCT? If so, how?
6. Write a simple program or refer to the sample DCT program dct_1D.c in the
book’s website to verify the results in Example 8.2 of the 1D DCT example in
this chapter.
298 8 Lossy Compression Algorithms
7. As Eq. 8.27 indicates, the 2D DCT can be implemented by the following matrix
multiplications:
F(u, v) = T · f (i, j) · TT .
8. Write a program to verify that the DCT-matrix T8 as defined in Eqs. 8.29 and
8.30 is an Orthogonal Matrix, i.e., all its rows and columns are orthogonal unit
vectors (orthonormal vectors).
9. Write a program to verify that the 2D DCT and IDCT matrix implementations
as defined in Eqs. 8.27 and 8.31 are lossless, i.e., they can transform any 8 × 8
values f (i, j) to F(u, v) and back to f (i. j). (Here, we are not concerned with
possible/tiny floating point calculation errors.)
10. We could use a similar DCT scheme for video streams by using a 3D version of
DCT. Suppose one color component of a video has pixels f i jk at position (i, j)
and time k. How could we define its 3D DCT transform?
11. Suppose a uniformly colored sphere is illuminated and has shading varying
smoothly across its surface, as in Fig. 8.27.
(a) What would you expect the DCT coefficients for its image to look like?
(b) What would be the effect on the DCT coefficients of having a checkerboard
of colors on the surface of the sphere?
(c) For the uniformly colored sphere again, describe the DCT values for a block
that straddles the top edge of the sphere, where it meets the black background.
(d) Describe the DCT values for a block that straddles the left edge of the sphere.
12. The Haar wavelet has a scaling function which is defined as follows:
10≤t ≤1
φ(t) = (8.73)
0 otherwise
√
and its scaling vector is h 0 [0] = h 0 [1] = 1/ 2.
(a) Draw the scaling function, then verify that its dilated translates φ(2t) and
φ(2t − 1) satisfy the dilation Eq. (8.62). Draw the combination of these func-
tions that makes up the full function φ(t).
(b) Derive the wavelet vector h 1 [0], h 1 [1] from Eq. 8.65 and then derive and draw
the Haar wavelet function ψ(t) from Eq. 8.63.
13. Suppose the mother wavelet ψ(t) has vanishing moments M p up to and including
Mn . Expand f (t) in a Taylor series around t = 0, up to the nth derivative of f
[i.e., up to leftover error of order O(n + 1) ]. Evaluate the summation of integrals
8.10 Exercises 299
produced by substituting the Taylor series into (8.58) and show that the result is
of order O(s n+2 ).
14. The program wavelet_compression.c on this book’s website is in fact
simple to implement as a MATLAB function (or similar fourth-generation lan-
guage). The advantage in doing so is that the imread function can input image
formats of a great many types, and imwrite can output as desired. Using the
given program as a template, construct a MATLAB program for wavelet-based
image reduction, with perhaps the number of wavelet levels being a function
parameter.
15. It is interesting to find the Fourier transform of functions, and this is easy if you
have available a symbolic manipulation system such as MAPLE. In that lan-
guage, you can just invoke the fourier function and view the answer directly!
As an example, try the following code fragment:
with(’inttrans’);
f := 1;
F := fourier(f,t,w);
f := exp(-tˆ2);
F := fourier(f,t,w);
√
Now the answer should be πe(−w /4) : the Fourier transform of a Gaussian is
2
This function oscillates about the value 0. Use a plotting package to convince
yourself that the function has a zero moment M p for any value of p.
17. Implement both a DCT-based and a wavelet-based image coder. Design your
user interface so that the compressed results from both coders can be seen side
by side for visual comparison. The PSNR for each coded image should also be
shown, for quantitative comparisons.
300 8 Lossy Compression Algorithms
Include a slider bar that controls the target bitrate for both coders. As you change
the target bitrate, each coder should compress the input image in real time and
show the compressed results immediately on your user interface.
Discuss both qualitative and quantitative compression results observed from
your program at target bitrates of 4, 1, and 0.25 bpp.
References
1. K. Sayood, Introduction to Data Compression, 5th edn. (Morgan Kaufmann, San Francisco,
2017)
2. H. Stark, J.W. Woods, Probability, Statistics, and Random Processes for Engineers, 4th edn.
(Pearson, 2012)
3. A. György, On the theoretical limits of lossy source coding, 1998. Tudományos Diákkör (TDK)
Conference (Hungarian Scientific Student’s Conference) at Technical University of Budapest.
http://www.szit.bme.hu/gya/mixed.ps
4. S. Arimoto, An algorithm for calculating the capacity of an arbitrary discrete memoryless
channel. IEEE Trans. Inf. Theory 18, 14–20 (1972)
5. R. Blahut, Computation of channel capacity and rate-distortion functions. IEEE Trans. Inform.
Theory 18, 460–473 (1972)
6. A. Gersho, R.M. Gray, Vector Quantization and Signal Compression. (Springer, 1991)
7. A.K. Jain, Fundamentals of Digital Image Processing (Prentice-Hall, Englewood Cliffs, NJ,
1988)
8. K.R. Rao, P. Yip, Discrete Cosine Transform: algorithms, Advantages Applications (Academic
Press, Boston, MA, 1990)
9. J.F. Blinn, What’s the deal with the DCT? IEEE Comput. Graph. Appl. 13(4), 78–83 (1993)
10. S. Mallat, A Wavelet Tour of Signal Processing, 3rd edn. (Academic Press, San Diego, 2008)
11. S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE
Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
12. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edn. (Prentice-Hall, 2007)
13. B.E. Usevitch, A tutorial on modern lossy wavelet image compression: foundations of JPEG
2000. IEEE Signal Process. Mag. 18(5), 22–35 (2001)
14. R. Coifman, Y. Meyer, S. Quake, V. Wickerhauser, Signal Processing and Compression with
Wavelet Packets (Yale University, Numerical Algorithms Research Group, 1990)
15. K. Ramachandran, M. Vetterli, Best wavelet packet basis in a rate-distortion sense. IEEE Trans.
Image Process. 2, 160–173 (1993)
16. J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal
Process. 41(12) (1993)
17. A. Said, W.A. Pearlman, A new, fast, and efficient image codec based on set partitioning in
hierarchical trees. IEEE Trans. CSVT 6(3), 243–249 (1996)
18. D. Taubman, High performance scalable image compression with EBCOT. IEEE Trans. Image
Process. 9(7), 1158–1170 (2000)