EXERSISES

24 1 Introduction to Multimedia
• Sports video annotation: using video classification to label video segments with
certain actions such as strokes in table tennis, penalty kicks in soccer games, etc.
• GameStory: a video game analytics challenge in which e-sport games often involv-
ing millions of players and viewers are analyzed. Audio and video streams, com-
mentaries, game data and statistics, interaction traces, and other multimedia data
are used to summarize and predict the development and evolution of the games.
• Live video streaming: requiring ultra-low end-to-end latency for real-time inter-
action between the broadcaster and the viewers. The main challenge is the QoE
(quality of experience), due to the latency constraint.
• Violent scenes detection in film: automatically detecting portions of movies
depicting violence. Again, all aspects available such as text and audio could be
brought into play.
• Preserving privacy in surveillance videos: methods obscuring private information
(such as faces on Google Earth), so as to render privacy-sensitive elements of video
unrecognizable, while at the same time allowing the video to still be viewable by
people and also allow computer vision tasks such as object tracking.
• Deep video understanding: understanding the relationships between different enti-
ties from a long duration movie. The relations can be family, work, social, and
other types. All modalities of multimedia will be exploited and a knowledge base
will be constructed and consulted.
• Large-scale human-centric video analysis: analyzing various crowd and complex
events such as getting off a train, dining in a busy restaurant, earthquake escape,
etc. Issues include multi-person tracking, crowd pose estimation and tracking,
and person-level action recognition.
• Searching and question answering for the spoken web: searching for audio content
within audio content by using an audio query, matching spoken questions with a
collection of spoken answers.
• Multimedia recommender systems: improving the quality of recommender sys-
tems to produce items more relevant to users’ interests. Applications include
movie recommendation, news recommendation, etc.
Solutions to these challenges can be difficult, but the impact can be enormous,
not only to the IT industry, but also to everyone, as we all live in a digital multimedia
world. We want this textbook to bring valuable knowledge about multimedia to you,
and hope you enjoy it and perhaps even contribute to this promising field (maybe for
some of the topics listed above or beyond) in your future career!
1.5 Exercises
1. Using your own words, describe what “multimedia” is. Is multimedia simply a
collection of different types of media?
2. Identify three novel multimedia applications. Discuss why you think these are
novel and their potential impact.
3. Discuss the relation between multimedia and hypermedia.
1.5 Exercise 25
4. Briefly explain, in your own words, “Memex” and its role regarding hypertext.
Could we carry out the Memex task today? How do you use Memex ideas in your
own work?
5. Discover a current media input, storage, or playback device that is analog. Is it
necessary to convert to digital? What are the pros and cons of being analog or
digital?
6. Your task is to think about the transmission of smell over the Internet. Suppose
we have a smell sensor at one location and wish to transmit the Aroma Vector (let
us say) to a receiver, to reproduce the same sensation. You are asked to design
such a system. List three key issues to consider and two applications of such a
delivery system. Hint: Think about medical applications.
7. Tracking objects or people can be done by both sight and sound. While vision
systems are precise, they are relatively expensive; on the other hand, a pair of
microphones can detect a person’s bearing inaccurately but cheaply. Sensor fusion
of sound and vision is thus useful. Surf the web to find out who is developing
tools for videoconferencing using this kind of multimedia idea.
8. Non-photorealistic graphics means computer graphics that do well enough with-
out attempting to make images that look like camera images. An example is
conferencing. For example, if we track lip movements, we can generate the right
animation to fit our face. If we don’t much like our own face, we can substi-
tute another one—facial-feature modeling can map correct lip movements onto
another model. See if you can find out who is carrying out research on generating
avatars to represent conference participants’ bodies.
9. Watermarking is a means of embedding a hidden message in data. This could
have important legal implications: Is this image copied? Is this image doctored?
Who took it? Where? Think of “messages” that could be sensed while capturing
an image and secretly embedded in the image, so as to answer these questions.
(A similar question is derived from the use of cell phones. What could we use
to determine who is putting this phone to use, and where, and when? This could
eliminate the need for passwords or others using the phone you lost.).
References
1. B. Newhall, The History of Photography: from 1839 to the Present. (The Museum of Modern
Art, 1982)
2. T. Gustavson, G.E. House, Camera: a History of Photography from Daguerreotype to Digital.
(Sterling Signature, 2012)
3. A. Koenigsberg, The Patent History of the Phonograph, 1877–1912. (APM Press, 1991)
4. D.L. Morton, Jr., Sound Recording: the Life Story of a Technology. (Johns Hopkins University
Press, 2006)
5. Q.D. Bowers, K. Fuller-Seeley, One Thousand Nights at the Movies: an Illustrated History of
Motion Pictures, 1895–1915. (Whitman Publishing, 2012)
6. T.K. Sarkar, R. Mailloux, A. Arthur, M. Salazar-Palma, D.L. Sengupta, History of Wireless.
(Wiley-IEEE Press, Oliner, 2006)
4.3 Color Models in Video 113
Written out, we then have

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Y 0.299 0.587 0.114 R 0
⎣ Cb ⎦ = ⎣ −0.168736 −0.331264 0.5 ⎦ ⎣ G ⎦ + ⎣ 0.5 ⎦
Cr 0.5 −0.418688 −0.081312 B 0.5
(4.40)
In practice, however, Recommendation 601 specifies 8-bit coding, with a maxi-
mum Y value of only 219, and a minimum of +16. Values below 16 and above 235
are reserved for other processing (these are denoted “headroom” and “footroom”).
Cb and Cr have a range of ±112 and offset of +128 (in other words, a maximum of
240 and a minimum of 16). If R , G , B are floats in [0 .. 1], then we obtain Y , Cb ,
Cr in [0 .. 255] via the transform [16]
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Y 65.481 128.553 24.966 R 16
⎣ Cb ⎦ = ⎣ −37.797 −74.203 112 ⎦ ⎣ G ⎦ + ⎣ 128 ⎦ (4.41)
Cr 112 −93.786 −18.214 B 128
In fact, the output range is also clamped to [1 .. 254] since the Rec. 601 synchroniza-
tion signals are given by codes 0 and 255.
The inverse transform to Eq. (4.41) is as follows [16]:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
R 0.00456621 0.0000
0.00625893 Y − 16
⎣ G ⎦ = ⎣ 0.00456621 −0.00318811 ⎦ ⎣ Cb − 128 ⎦
−0.00153632
B 0.00456621 0.00791071
0.0000 Cr − 128
(4.42)
The YCbCr transform is used in JPEG image compression and MPEG video
compression.
4.4 Exercises
1. Consider the following set of color-related terms:
(a) wavelength,
(b) color level,
(c) brightness, and
(d) whiteness.
How would you match each of the following (more vaguely stated) characteris-
tics to each of the above terms?
(a) luminance,
(b) hue,
(c) saturation, and
(d) chrominance.
114 4 Color in Image and Video
2. What color is outdoor light? That is, around what wavelength would you guess
the peak power is for a red sunset? For blue sky light?
3. “The LAB gamut covers all colors in the visible spectrum.”
What does that statement mean? Briefly, how does LAB relate to color?—just
be descriptive.
What are (roughly) the relative sizes of the LAB gamut, the CMYK gamut, and
a monitor gamut?
4. Prove that straight lines in (X , Y , Z ) space project to straight lines in (x, y)
chromaticity space. That is, let C1 = (X 1 , Y1 , Z 1 ) and C2 = (X 2 , Y2 , Z 2 ) be
two different colors, and let C3 = (X 3 , Y3 , Z 3 ) fall on a line connecting C1
and C2 : C3 = αC1 + (1 − α)C2 . Then show that (x3 , y3 ) = β(x1 , y1 ) + (1 −
β)(x2 , y2 ) for some β.
5. Where does the chromaticity “horseshoe” shape in Fig. 4.11 come from? Can
we calculate it?
Write a small pseudocode solution for the problem of finding this so-called
“spectrum locus.”
Hint: Figure 4.20a shows the color-matching functions in Fig. 4.10 drawn as a
set of points in 3-space. And Fig. 4.20b shows these points mapped into another
3D set of points.
Hint: Try a programming solution for this problem, to help you answer it more
explicitly.
6. Suppose we use a new set of color-matching functions x̄ new (λ), ȳ new (λ), z̄ new (λ)
with values
λ (nm) x̄ new (λ) ȳ new (λ) z̄ new (λ)

450 0.2 0.1 0.5
500 0.1 0.4 0.3
600 0.1 0.4 0.2
700 0.6 0.1 0.0
In this system, what are the chromaticity values (x, y) of equi-energy white light
E(λ) where E(λ) ≡ 1 for all wavelengths λ? Explain.
7. Repeat the steps leading up to Eq. (4.18), but this time using the NTSC standard—
if you use the number of significant digits as in Table 4.18 you will end up with
the transform in Eq. (4.19).
8. (a) Suppose images are not gamma-corrected by a camcorder. Generally, how
would they appear on a screen?
(b) What happens if we artificially increase the output gamma for stored image
pixels? (One can do this in Photoshop.) What is the effect on the image?
9. Suppose image file values are in 0 .. 255 in each color channel. If we define
R = R/255 for the red channel, we wish to carry out gamma correction by
1/2.0
passing a new value R to the display device, with R R .
4.4 Exercises 115
1.8 0.9
1.6 0.8
1.4
0.7
1.2
1 0.6
0.8 0.5
0.6 0.4
0.4
0.3
0.2
0.2
0
1 0.1
0.8 1.4
0.6 1.2 0
1 1
0.4 0.8 0.8 0.8
0.6 0.6 0.6
0.2 0.4 0.4 0.4
0 0.2 0.2 0.2
0 0 0
(a) (b)
Fig. 4.20 a Color-matching functions; b transformed color-matching functions
It is common to carry out this operation using integer math. Suppose we approx-
imate the calculation as creating new integer values in 0 .. 255 via
1/2.0
(int) (255 · (R ))
(a) Comment (very roughly) on the effect of this operation on the number of
actually available levels for display. Hint: Coding this up in any language
will help you understand the mechanism at work better—and as well, then
you can simply count the output levels.
(b) Which end of the levels 0 .. 255 is affected most by gamma correction, the
low end near 0 or the high end near 255? Why? How much at each end?
10. In many computer graphics applications, γ correction is performed only in the

color LUT (lookup table). Show the first five entries of the color LUT if it is
meant for use in γ correction.
Hint: Coding this up saves you the trouble of using a calculator for this question.
11. Devise a program to produce Fig. 4.21, showing the color gamut of a monitor
that adheres to SMPTE specifications.
12. The “hue” is the color, independent of brightness and how much pure white has
been added to it. We can make a simple definition of hue as the set of ratios
R:G:B.
(a) Suppose a color (i.e., an RGB) is divided by 2.0, so that the RGB triple now
has values 0.5 times its former values.
Explain using numerical values:
116 4 Color in Image and Video
Fig. 4.21 SMPTE monitor

gamut
(i) If gamma correction is applied after the division by 2.0 and before the color
is stored, does the darker RGB have the same hue as the original in the
sense of having the same ratios R:G:B of light emanating from the display
device? (we’re not discussing any psychophysical effects that change our
perception—here we’re just worried about the machine itself).
(ii) If gamma correction is not applied, does the second RGB above, = RGB/2,
have the same hue as the first RGB, when displayed? And are these the
same hues as for the original color as stored, not the light as displayed?
(b) Assuming no gamma correction is applied, for what color triples is the hue
as displayed the same as for the original color as stored?
13. We wish to produce a graphic that is pleasing and easily readable. Suppose we
make the background color pink. What color text font should we use to make
the text most readable? Justify your answer.
14. To make matters simpler for eventual printing, we buy a camera equipped with
CMY sensors, as opposed to RGB sensors (CMY cameras are, in fact, available).
(a) Draw spectral curves roughly depicting what such a camera’s sensitivity to
wavelength might look like.
(b) Could the output of a CMY camera be used to produce ordinary RGB pic-
tures? How?
4.4 Exercises 117
15. Color ink-jet printers use the CMYK model. When the color ink cyan is sprayed
onto a sheet of white paper,
(i) why does it look cyan under daylight?
(ii) what color would it appear to be under a blue light. Why?
References
1. D.H. Pritchard, U.S. color television fundamentals—A review. IEEE Trans. Consum. Electron.
23(4), 467–478 (1977)
2. G. Wyszecki, W.S. Stiles, Color Science: concepts and Methods, Quantitative Data and For-
mulas, 2nd edn. (Wiley, New York, 2000)
3. R.W.G. Hunt, Color reproduction and color vision modeling, in 1st Color Imaging Conference
on Transforms & Transportability of Color, pp. 1–5. Society for Imaging Science & Technology
(IS&T)/Society for Information Display (SID) joint conference (1993)
4. M.J. Vrhel, R. Gershon, L.S. Iwan, Measurement and analysis of object reflectance spectra.
Color Res. Appl. 19, 4–9 (1994)
5. R.W.G. Hunt, The Reproduction of Color, 6th edn. (Fountain Press, England, 2004)
6. J.F. Hughes, A. van Dam, M. McGuire, D.F. Sklar, J.D. Foley, S.K. Feiner, K. Akeley, Computer
Graphics: Principles and Practice, 3rd edn. (Addison-Wesley, 2013)
7. C.A. Poynton, Color FAQ—frequently asked questions color (2006). http://www.poynton.com/
notes/colour_and_gamma/ColorFAQ.html
8. M.D. Fairchild, Color Appearance Models, 3rd edn. (Addison-Wesley, 2013)
9. D. Travis, Effective Color Displays. (Academic Press, 1991)
10. R.S. Berns, L.A. Taplin, Practical spectral imaging using a color-filter array digital camera.
art-si.org/PDFs/Acquisition/TR_Practical_Spectral_Imaging.pdf
11. C. Fredembach, S. Susstrunk, Colouring the near infrared, in 16th Color Imaging Conference,
pp. 176–182 (2008)
12. International Electrotechnical Commission, IEC 61966-2-1: multimedia systems and
equipment—Colour measurement and management—Part 2-1: colour management—Default
RGB colour space—sRGB. (IEC, Geneva, 2007)
13. M. Stokes, M. Anderson, S. Chandrasekar, R. Motta, A standard default color space for the
internet: sRGB (1996). http://www.color.org/sRGB.html
14. Microsoft Corporation, Colorspace Interchange Using sRGB (2001). http://www.microsoft.
com/whdc/device/display/color/sRGB.mspx
15. P. Urban, Ink limitation for spectral or color constant printing, in 11th Congress of the Interna-
tional Colour Association (2009)
16. C.A. Poynton, A Technical Introduction to Digital Video (Wiley, New York, 1996)
6.3 Quantization and Transmission of Audio 191
However, we can get into a difficult situation if we try to change the prediction
coefficients that multiply previous quantized values, because that makes a compli-
cated set of equations to solve for these coefficients. Suppose we decide to use a
least-squares approach to solving a minimization, trying to find the best values of
the ai :

N
min ( f n − fˆn )2 (6.24)
n=1
where here we would sum over a large number of samples f n for the current patch of
speech, say. But because fˆn depends on the quantization, we have a difficult problem
to solve. Also, we should really be changing the fineness of the quantization at the
same time, to suit the signal’s changing nature; this makes things problematical.
Instead, we usually resort to solving the simpler problem that results from using
not fñ in the prediction but simply the signal f n itself. This is indeed simply solved,
since, explicitly writing in terms of the coefficients ai , we wish to solve

N
M
min ( fn − ai f n−i )2 (6.25)
n=1 i=1
Differentiation with respect to each of the ai and setting to zero produces a linear
system of M equations that is easy to solve. (The set of equations is called the
Wiener–Hopf equations.).
Thus, we indeed find a simple way to adaptively change the predictor as we go.
For speech signals, it is common to consider blocks of signal values, just as for
image coding, and adaptively change the predictor, quantizer, or both. If we sample
at 8 kHz, a common block size is 128 samples—16 ms of speech. Figure 6.19 shows
a schematic diagram for the ADPCM coder and decoder [8].
6.4 Exercises
1. We wish to develop a new Internet service, for doctors. Medical ultrasound is in

the range 2-10 MHz; what should our sampling rate be chosen as?
2. My old Soundblaster card is an 8-bit card.
(a) What is it 8 bits of?

(b) What is the best SQNR (signal-to-quantization-noise ratio) it can achieve?
3. If a tuba is 20 dB louder than a singer’s voice, what is the ratio of intensities

(power) of the tuba to the voice?
4. If a set of ear protectors reduces the noise level by 30 dB, how much do they
reduce the intensity (the power)?
5. It is known that a loss of audio output at both ends of the audible frequency
range is inevitable due to the frequency response function of audio amplifier.
192 6 Basics of Digital Audio
64 kbps A-law fn en ~
en
Convert to Adaptive 32 kbps
or -law
uniform PCM
+ quantizer output
PCM input −
+
~
Adaptive fn
predictor
fˆn
(a)
~ 64 kbps A-law
32 kbps e~n fn Convert to
input
+ PCM
or -law
PCM output
Adaptive
predictor
fˆn
(b)
Fig. 6.19 Schematic diagram for: a ADPCM encoder; b decoder
(a) If the output was 1 volt for frequencies at mid-range, after a loss of −3 dB
at 18 KHz, what is the output voltage at this frequency?
(b) To compensate the loss, a listener can adjust the gain (and hence the output)
at different frequencies from an equalizer. If the loss remains -3 dB and a
gain through the equalizer is 6 dB at 18 KHz, what is the output voltage
now?
[Hint: Assume log10 2 = 0.3.]
6. Suppose the sampling frequency is 1.5 times the true frequency. What is the alias
frequency?
7. In a crowded room, we can still pick out and understand a nearby speaker’s voice
notwithstanding the fact that general noise levels may be high. This is what is
known as the “cocktail-party effect”; how it operates is that our hearing can
localize a sound source by taking advantage of the difference in phase between
the two signals entering our left and right ears (“binaural auditory perception”).
In mono, we could not hear our neighbor’s conversation very well if the noise
level were at all high.
State how you think a karaoke machine works.
Hint: the mix for commercial music recordings is such that the “pan” parameter
is different going to the left and right channels for each instrument. That is,
for an instrument, the left, or the right, channel is emphasized. How would the
6.4 Exercises 193
singer’s track timing have to be recorded in order to make it easy to subtract out
the sound of the singer? (And this is typically done.)
8. The dynamic range of a signal V is the ratio of the maximum to the minimum,
expressed in decibels. The dynamic range expected in a signal is to some extent
an expression of the signal quality. It also dictates the number of bits per sample
needed in order to reduce the quantization noise down to an acceptable level,
e.g., we may like to reduce the noise to at least an order of magnitude below
Vmin .
Suppose the dynamic range for a signal is 60 dB. Can we use 10 bits for this
signal? Can we use 16 bits?
9. Suppose the dynamic range of speech in telephony implies a ratio Vmax /Vmin of
about 256. Using uniform quantization, how many bits should we use to encode
speech, so as to make the quantization noise at least an order of magnitude less
than the smallest detectable telephonic sound?
10. Perceptual nonuniformity is a general term for describing the nonlinearity of
human perception, e.g., when a certain parameter of an audio signal varies,
humans do not necessarily perceive the difference in proportion to the amount
of change.
(a) Briefly describe at least two types of perceptual nonuniformities in human

auditory perception.
(b) Which one of them does A-law (or µ-law) attempt to approximate? Why
could it improve the quantization?
11. Suppose we mistakenly always use the 0.75 point instead of the 0.50 point in
a quantization interval as the decision point, in deciding to which quantization
level an analog value should be mapped. Above, we have a rough calculation of
SQNR. What effect does this mistake have on the SQNR?
12. State the Nyquist frequency for the following digital sample intervals. Express
the result in Hertz in each case.
(a) 1 millisecond,
(b) 0.005 s, and
(c) 1 h.
13. Draw a diagram showing a sinusoid at 5.5 kHz, and sampling at 8 kHz (just
show eight intervals between samples in your plot). Draw the alias at 2.5 kHz
and show that in the eight sample intervals, exactly 5.5 cycles of the true signal
fit into 2.5 cycles of the alias signal.
14. In an old Western movie, we notice that a stagecoach wheel appears to be moving
backward at 5◦ per frame, even though the stagecoach is moving forward. To
what is this effect due? What is the true situation?
15. Suppose a signal contains tones at 1, 10, and 21 kHz, and is sampled at the rate
12 kHz (and then processed with an anti-aliasing filter limiting output to 6 kHz).
194 6 Basics of Digital Audio
What tones are included in the output?

Hint: Most of the output consists of aliasing.
16. The Pitch Bend opcode in MIDI is followed by two data bytes specifying how
the control is to be altered. How many bits of accuracy does this amount of data
correspond to? Why?
17. (a) Can a single MIDI message produce more than one note sounding?
(b) Is it possible that more than one note can be sounding on a particular instru-
ment at once? How is that done in MIDI?
(c) Is the Program Change MIDI message a Channel Message? What does this
message accomplish? Based on the Program Change message, how many
different instruments are there in General MIDI? Why?
(d) In general, what are the two main kinds of MIDI messages? In terms of data,
what is the main difference between the two types of messages? Within those
two categories, please list the different sub-types.
18. The note “A above Middle C” (with frequency 440 Hz) is note 69 in General
MIDI. What MIDI bytes (in hex) should be sent to play a note twice the frequency
of (i.e., one octave above) “A above Middle C” at maximum volume on Channel
1? (Don’t include start/stop bits.)
Information: An octave is 12 steps on a piano, i.e., 12 notes up.
19. Give an example (in English, not hex) of a MIDI voice message.
Describe the parts of the “assembler” statement for the message you suggested
above.
What does a “program change” message do? Suppose “Program change” is hex
&HC1 . What does the instruction &HC103 do?
20. We have suddenly invented a new kind of music: “18-tone music,” that requires
a keyboard with 180 keys. How would we have to change the MIDI standard to
be able to play this music?
21. In PCM, what is the delay, assuming 8 kHz sampling? Generally, delay is the
penalty associated with any algorithm due to sampling, processing, and analysis.
22. (a) Suppose we use a predictor as follows:
fˆn = trunc 1 ˜
+ fñ−2 ) ,
2 ( f n−1 (6.26)
en = f n − fˆn .
Also, suppose we adopt the quantizer Eq. (6.21). If the input signal has values
as follows:
20 38 56 74 92 110 128 146 164 182 200 218 236 254
then show that the output from a DPCM coder (without entropy coding) is as
follows:
20 44 56 74 89 105 121 153 161 181 195 212 243 251

6.4 Exercises 195
250 250
200 200
Signal
Signal
150 150
100 100
50 50
Time Time
(a) (b)
Fig. 6.20 a DPCM reconstructed signal (dotted line) tracks the input signal (solid line). b DPCM
reconstructed signal (dashed line) steers farther and farther from the input signal (solid line)
Figure 6.20a shows how the quantized reconstructed signal tracks the input
signal. As a programming project, write a small piece of code to verify your
results.
(b) Now, suppose by mistake on the coder side we inadvertently use the predictor
for lossless coding, Eq. (6.15), using original values f n instead of quantized
ones fñ . Show that on the decoder side we end up with reconstructed signal
values as follows:
20 44 56 74 89 105 121 137 153 169 185 201 217 233
so that the error gets progressively worse.

Figure 6.20b shows how this appears: the reconstructed signal gets progres-
sively worse. Modify your code from above to verify this statement.
References
1. B. Truax, Handbook for Acoustic Ecology, 2nd edn. (Cambridge Street Publishing, 1999)
2. K.C. Pohlmann, Principles of Digital Audio, 6th edn. (McGraw-Hill, 2010)
3. J.H. McClellan, R.W. Schafer, M.A. Yoder, DSP First: a Multimedia Approach. (Prentice-Hall
PTR, 1998)
4. J. Heckroth, Tutorial on MIDI and music synthesis. The MIDI Manufacturers Association, POB
3173, La Habra CA 90632-3173 (1995). http://www.harmony-central.com/MIDI/Doc/tutorial.
html
5. P.K. Andleigh, K. Thakrar, Multimedia Systems Design. (Prentice-Hall PTR, 1995)
6. The MIDI Association. Strategic overview and introduction to midi 2.0. in The National Asso-
ciation of Music Merchants (NAMM) Show (2020)
7. K. Sayood, Introduction to Data Compression, 5th edn. (Morgan Kaufmann, San Francisco,
2017)
8. R.L. Freeman, Reference Manual for Telecommunications Engineering, 3rd edn. (Wiley, 2001)
236 7 Lossless Compression Algorithms
Table 7.7 Comparison of lossless JPEG with other lossless compression programs
Compression Compression ratio
program
Lena Football F-18 Flowers
Lossless JPEG 1.45 1.54 2.29 1.26
Optimal lossless 1.49 1.67 2.71 1.33
JPEG
Compress (LZW) 0.86 1.24 2.21 0.87
Gzip (LZ77) 1.08 1.36 3.10 1.05
Gzip-9 (optimal 1.08 1.36 3.13 1.05
LZ77)
Pack (Huffman 1.02 1.12 1.19 1.00
coding)
4–7 that consider neighboring nodes in both horizontal and vertical dimensions offer
slightly better compression (approximately 0.2–0.5 higher) than predictors 1–3.
Table 7.7 shows a comparison of the compression ratio for several lossless com-
pression techniques using test images Lena, football, F-18, and flowers. These stan-
dard images used for many purposes in imaging work are shown on the textbook
website for this chapter.
This chapter has been devoted to the discussion of lossless compression algo-
rithms. It should be apparent that their compression ratio is generally limited (with
a maximum at about 2–3). However, many of the multimedia applications we will
address in the next several chapters require a much higher compression ratio. This
is accomplished by lossy compression schemes.
7.8 Exercises
1. Calculate the entropy of a “checkerboard” image in which half of the pixels are
BLACK and half of them are WHITE.
2. Suppose eight characters have a distribution A:(1), B:(1), C:(1), D:(2), E:(3),
F:(5), G:(5), H:(10). Draw a Huffman tree for this distribution. (Because the
algorithm may group subtrees with equal probability in a different order, your
answer is not strictly unique.)
3. (a) What is the entropy η of the image below, where numbers (0, 20, 50, 99)
denote the gray-level intensities?
99 99 99 99 99 99 99 99
20 20 20 20 20 20 20 20
0 0 0 0 0 0 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 50 50 50 50 0 0
0 0 0 0 0 0 0 0
7.8 Exercises 237
(b) Show step by step how to construct the Huffman tree to encode the above
four intensity values in this image. Show the resulting code for each intensity
value.
(c) What is the average number of bits needed for each pixel, using your Huffman
code? How does it compare to η?
4. Consider an alphabet with two symbols A, B, with probability P(A) = x and
P(B) = 1 − x.
(a) Plot the entropy as a function of x. You might want to use log2 3 = 1.6 and
log2 7 = 2.8.
(b) Discuss why it must be the case that if the probability of the two symbols is
1/2 + and 1/2 − , with small , the entropy is less than the maximum.
(c) Generalize the above result by showing that, for a source generating N sym-
bols, the entropy is maximum when the symbols are all equiprobable.
(d) As a small programming project, write code to verify the conclusions above.
5. Extended Huffman coding assigns one codeword to each group of k symbols.

Why is average(l) (the average number of bits for each symbol) still no less
than the entropy η as indicated in Eq. (7.7)?
6. (a) Suppose we are coding a binary source, i.e., the alphabet consists of 0 or 1.
For example, a fax is like this.
Suppose the probability of a 0 is 7/8, and that for a 1 is 1/8.
What is the entropy? (Note: In case you don’t have a calculator, log2 7 ≈
2.81.)
What is the set of Huffman codes? And what is the average bitrate?
(b) Now code the problem in terms of extended Huffman compression, using
n = 2 and groups of k = 2 pairs of symbols. What is the average bitrate
now? Show your work (you can just use fractions, not decimal numbers, if
you like).
7. Arithmetic coding and Huffman coding are two popular lossless compression
methods.
(a) What are the advantages and disadvantages of arithmetic coding as compared
to Huffman coding?
(b) Suppose the alphabet is [A, B, C], and the known probability distribution is
PA = 0.5, PB = 0.4, PC = 0.1. For simplicity, let’s also assume that both
encoder and decoder know that the length of the messages is always 3, so
there is no need for a terminator.
i. How many bits are needed to encode the message BBB by Huffman
coding?
ii. How many bits are needed to encode the message BBB by arithmetic
coding?
238 7 Lossless Compression Algorithms
Fig. 7.18 Adaptive Huffman

4
tree
0 1
2 2 a
0 1
NEW 0 2 b
8. (a) What are the advantages of adaptive Huffman coding compared to the orig-
inal Huffman coding algorithm?
(b) Assume that adaptive Huffman coding is used to code an information source
S with a vocabulary of four letters (a, b, c, d). Before any transmission, the
initial coding is a = 00, b = 01, c = 10, d = 11. As in the example illustrated
in Fig. 7.8, a special symbol NEW will be sent before any letter if it is to be
sent the first time.
Figure 7.18 is the adaptive Huffman tree after sending letters aabb. After
that, the additional bitstream received by the decoder for the next few letters
is 01010010101.
i. What are the additional letters received?

ii. Draw the adaptive Huffman trees after each of the additional letters is
received.
9. Work out the details of scaling and incremental coding in arithmetic coding when
the probabilities for the three symbols are A: 0.8, B: 0.02, C: 0.18, and the input
sequence is ACBA.
10. We will use arithmetic coding to encode images.
(a) At most, how many bits does it take to encode the first row of the image
below, where all numbers are gray-level intensity values?
200 200 80 40 20 80 200 200

200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
200 200 80 40 20 80 200 200
(b) (i) Show in detail how to encode the first four numbers (200, 200, 80, 40)
in the image above using arithmetic coding. (Note: You must use E1 or E2
7.8 Exercises 239
scaling when applicable.)
(ii) What is the code (sequence of bits) that the encoder generates for the
four numbers? Explain how you get the code.
11. Work out the details of the encoder and decoder for adaptive binary arithmetic
coding when the input symbols are 01111.
12. Compare the rate of adaptation of adaptive Huffman coding and adaptive arith-
metic coding. What prevents each method from adapting to quick changes in
source statistics?
13. Consider the dictionary-based LZW compression algorithm. Suppose the alpha-
bet is the set of symbols {0,1}. Show the dictionary (symbol sets plus associated
codes) and output for LZW compression of the input
0 1 1 0 0 1 1
14. Implement Huffman coding, LZW coding, and arithmetic coding algorithms
using your favorite programming language. Generate at least three types of sta-
tistically different artificial data sources to test your implementation of these
algorithms. Compare and comment on each algorithm’s performance in terms
of compression ratio for each type of data source.
Optionally, implement adaptive Huffman and adaptive arithmetic coding algo-

rithms.
References
1. M. Nelson, J.L. Gailly, The Data Compression Book, 2nd edn. (M&T Books, New York, 1995)
2017)
3. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423,
623–656 (1948)
4. C.E. Shannon, W. Weaver, The Mathematical Theory of Communication (University of Illinois
Press, 1971)
5. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edn. (Prentice-Hall, 2007)
6. R. Fano, Transmission of Information. (MIT Press, 1961)
7. D.A. Huffman, A method for the construction of minimum-redundancy codes. Proc. IRE 40(9),
1098–1101 (1952)
8. T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, 3rd edn. (The MIT Press,
Cambridge, Massachusetts, 2009)
9. J. Ziv, A. Lempel, A universal algorithm for sequential data compression. IEEE Trans. Inf.
Theory 23(3), 337–343 (1977)
10. J. Ziv, A. Lempel, Compression of individual sequences via variable-rate coding. IEEE Trans.
Inf. Theory 24(5), 530–536 (1978)
11. T.A. Welch, A technique for high performance data compression. IEEE Comput. 17(6), 8–19
(1984)
296 8 Lossy Compression Algorithms
Fig. 8.26 Reconstructed

transform coefficients from
D0 , S0 , D1 , S1 , D2 , and the
first 10 bits of S2
8.9 Set Partitioning in Hierarchical Trees (SPIHT)
SPIHT is a revolutionary extension of the EZW algorithm. Based on EZW’s under-

lying principles of the partial ordering of transformed coefficients, ordered bitplane
transmission of refinement bits, and the exploitation of self-similarity in the trans-
formed wavelet image, the SPIHT algorithm significantly improves the performance
of its predecessor by changing the ways subsets of coefficients are partitioned and
refinement information is conveyed.
A unique property of the SPIHT bitstream is its compactness. The resulting bit-
stream from the SPIHT algorithm is so compact that passing it through an entropy
coder would produce only marginal gain in compression at the expense of much
more computation. Therefore, a fast SPIHT coder can be implemented without any
entropy coder or possibly just a simple patent-free Huffman coder.
Another signature of the SPIHT algorithm is that no ordering information is
explicitly transmitted to the decoder. Instead, the decoder reproduces the execution
path of the encoder and recovers the ordering information. A desirable side effect of
this is that the encoder and decoder have similar execution times, which is rarely the
case for other coding methods. Said and Pearlman [17] give a full description of this
algorithm.
8.10 Exercises
1. Assume we have an unbounded source we wish to quantize using an M-bit

midtread uniform quantizer. Derive an expression for the total distortion if the
step size is 1.
8.10 Exercises 297
2. Suppose the domain of a uniform quantizer is [−b M , b M ]. We define the loading

fraction as
bM
γ=
σ
where σ is the standard deviation of the source. Write a simple program to
quantize a Gaussian distributed source having zero-mean and unit variance using
a 4-bit uniform quantizer. Plot the SNR against the loading fraction and estimate
the optimal step size that incurs the least amount of distortion from the graph.
3. ∗ Suppose the input source is Gaussian-distributed with zero-mean and unit
variance—that is, the probability density function is defined as
1 x2
f X (x) = √ e− 2 (8.72)
2π
We wish to find a four-level Lloyd–Max quantizer. Let yi = [yi0 , . . . , yi3 ] and

bi = [bi0 , . . . , bi3 ]. The initial reconstruction levels are set to y0 = [−2, −1, 1, 2].
This source is unbounded, so the outer two boundaries are +∞ and −∞.
Follow the Lloyd–Max algorithm in this chapter: the other boundary values are
calculated as the midpoints of the reconstruction values. We now have b0 =
[−∞, −1.5, 0, 1.5, ∞]. Continue one more iteration for i = 1, using Eq. 8.13
and find y01 , y11 , y21 , and y31 , using numerical integration. Also calculate the
squared error of the difference between y1 and y0 .
Iteration is repeated until the squared error between successive estimates of
the reconstruction levels is below some predefined threshold . Write a small
program to implement the Lloyd–Max quantizer described above.
4. If the block size for a 2D DCT transform is 8 × 8, and we use only the DC
components to create a thumbnail image, what fraction of the original pixels
would we be using?
5. When the block size is 8 × 8, the definition of the DCT is given in Eq. 8.17.
(a) If an 8 × 8 grayscale image is in the range 0 .. 255, what is the largest value a
DCT coefficient could be, and for what input image? (Also, state all the DCT
coefficient values for that image.)
(b) If we first subtract the value 128 from the whole image and then carry out the
DCT, what is the exact effect on the DCT value F[2, 3]?
(c) Why would we carry out that subtraction? Does the subtraction affect the
number of bits we need to code the image?
(d) Would it be possible to invert that subtraction, in the IDCT? If so, how?
6. Write a simple program or refer to the sample DCT program dct_1D.c in the
book’s website to verify the results in Example 8.2 of the 1D DCT example in
this chapter.
7. As Eq. 8.27 indicates, the 2D DCT can be implemented by the following matrix
multiplications:
F(u, v) = T · f (i, j) · TT .
(a) What is in each row of T?

(b) Explain what has been achieved in each of the two matrix multiplications.
8. Write a program to verify that the DCT-matrix T8 as defined in Eqs. 8.29 and
8.30 is an Orthogonal Matrix, i.e., all its rows and columns are orthogonal unit
vectors (orthonormal vectors).
9. Write a program to verify that the 2D DCT and IDCT matrix implementations
as defined in Eqs. 8.27 and 8.31 are lossless, i.e., they can transform any 8 × 8
values f (i, j) to F(u, v) and back to f (i. j). (Here, we are not concerned with
possible/tiny floating point calculation errors.)
10. We could use a similar DCT scheme for video streams by using a 3D version of
DCT. Suppose one color component of a video has pixels f i jk at position (i, j)
and time k. How could we define its 3D DCT transform?
11. Suppose a uniformly colored sphere is illuminated and has shading varying
smoothly across its surface, as in Fig. 8.27.
(a) What would you expect the DCT coefficients for its image to look like?
(b) What would be the effect on the DCT coefficients of having a checkerboard
of colors on the surface of the sphere?
(c) For the uniformly colored sphere again, describe the DCT values for a block
that straddles the top edge of the sphere, where it meets the black background.
(d) Describe the DCT values for a block that straddles the left edge of the sphere.
12. The Haar wavelet has a scaling function which is defined as follows:
10≤t ≤1
φ(t) = (8.73)
0 otherwise
√
and its scaling vector is h 0 [0] = h 0 [1] = 1/ 2.
(a) Draw the scaling function, then verify that its dilated translates φ(2t) and
φ(2t − 1) satisfy the dilation Eq. (8.62). Draw the combination of these func-
tions that makes up the full function φ(t).
(b) Derive the wavelet vector h 1 [0], h 1 [1] from Eq. 8.65 and then derive and draw
the Haar wavelet function ψ(t) from Eq. 8.63.
13. Suppose the mother wavelet ψ(t) has vanishing moments M p up to and including
Mn . Expand f (t) in a Taylor series around t = 0, up to the nth derivative of f
[i.e., up to leftover error of order O(n + 1) ]. Evaluate the summation of integrals
8.10 Exercises 299
Fig. 8.27 Sphere shaded by

a light
produced by substituting the Taylor series into (8.58) and show that the result is
of order O(s n+2 ).
14. The program wavelet_compression.c on this book’s website is in fact
simple to implement as a MATLAB function (or similar fourth-generation lan-
guage). The advantage in doing so is that the imread function can input image
formats of a great many types, and imwrite can output as desired. Using the
given program as a template, construct a MATLAB program for wavelet-based
image reduction, with perhaps the number of wavelet levels being a function
parameter.
15. It is interesting to find the Fourier transform of functions, and this is easy if you
have available a symbolic manipulation system such as MAPLE. In that lan-
guage, you can just invoke the fourier function and view the answer directly!
As an example, try the following code fragment:
with(’inttrans’);
f := 1;
F := fourier(f,t,w);
The answer should be 2πδ(w). Let’s try a Gaussian:
f := exp(-tˆ2);
F := fourier(f,t,w);
√
Now the answer should be πe(−w /4) : the Fourier transform of a Gaussian is
2
simply another Gaussian.

16. Suppose we define the wavelet function
ψ(t) = ex p(−t 1/4 ) sin(t 4 ) , t ≥ 0 (8.74)
This function oscillates about the value 0. Use a plotting package to convince
yourself that the function has a zero moment M p for any value of p.
17. Implement both a DCT-based and a wavelet-based image coder. Design your
user interface so that the compressed results from both coders can be seen side
by side for visual comparison. The PSNR for each coded image should also be
shown, for quantitative comparisons.
Include a slider bar that controls the target bitrate for both coders. As you change
the target bitrate, each coder should compress the input image in real time and
show the compressed results immediately on your user interface.
Discuss both qualitative and quantitative compression results observed from
your program at target bitrates of 4, 1, and 0.25 bpp.
References
2017)
2. H. Stark, J.W. Woods, Probability, Statistics, and Random Processes for Engineers, 4th edn.
(Pearson, 2012)
3. A. György, On the theoretical limits of lossy source coding, 1998. Tudományos Diákkör (TDK)
Conference (Hungarian Scientific Student’s Conference) at Technical University of Budapest.
http://www.szit.bme.hu/gya/mixed.ps
4. S. Arimoto, An algorithm for calculating the capacity of an arbitrary discrete memoryless
channel. IEEE Trans. Inf. Theory 18, 14–20 (1972)
5. R. Blahut, Computation of channel capacity and rate-distortion functions. IEEE Trans. Inform.
Theory 18, 460–473 (1972)
6. A. Gersho, R.M. Gray, Vector Quantization and Signal Compression. (Springer, 1991)
7. A.K. Jain, Fundamentals of Digital Image Processing (Prentice-Hall, Englewood Cliffs, NJ,
1988)
8. K.R. Rao, P. Yip, Discrete Cosine Transform: algorithms, Advantages Applications (Academic
Press, Boston, MA, 1990)
9. J.F. Blinn, What’s the deal with the DCT? IEEE Comput. Graph. Appl. 13(4), 78–83 (1993)
10. S. Mallat, A Wavelet Tour of Signal Processing, 3rd edn. (Academic Press, San Diego, 2008)
11. S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE
Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
12. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edn. (Prentice-Hall, 2007)
13. B.E. Usevitch, A tutorial on modern lossy wavelet image compression: foundations of JPEG
2000. IEEE Signal Process. Mag. 18(5), 22–35 (2001)
14. R. Coifman, Y. Meyer, S. Quake, V. Wickerhauser, Signal Processing and Compression with
Wavelet Packets (Yale University, Numerical Algorithms Research Group, 1990)
15. K. Ramachandran, M. Vetterli, Best wavelet packet basis in a rate-distortion sense. IEEE Trans.
Image Process. 2, 160–173 (1993)
16. J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal
Process. 41(12) (1993)
17. A. Said, W.A. Pearlman, A new, fast, and efficient image codec based on set partitioning in
hierarchical trees. IEEE Trans. CSVT 6(3), 243–249 (1996)
18. D. Taubman, High performance scalable image compression with EBCOT. IEEE Trans. Image
Process. 9(7), 1158–1170 (2000)

EXERSISES

Uploaded by

Copyright:

Available Formats

EXERSISES

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EXERSISES

Uploaded by

Copyright:

Available Formats

24 1 Introduction to Multimedia

Written out, we then have

1. Consider the following set of color-related terms:

λ (nm) x̄ new (λ) ȳ new (λ) z̄ new (λ)

Fig. 4.20 a Color-matching functions; b transformed color-matching functions

10. In many computer graphics applications, γ correction is performed only in the

Fig. 4.21 SMPTE monitor

1. We wish to develop a new Internet service, for doctors. Medical ultrasound is in

(a) What is it 8 bits of?

3. If a tuba is 20 dB louder than a singer’s voice, what is the ratio of intensities

Fig. 6.19 Schematic diagram for: a ADPCM encoder; b decoder

(a) Briefly describe at least two types of perceptual nonuniformities in human

What tones are included in the output?

20 38 56 74 92 110 128 146 164 182 200 218 236 254

20 44 56 74 89 105 121 153 161 181 195 212 243 251

20 44 56 74 89 105 121 137 153 169 185 201 217 233

so that the error gets progressively worse.

5. Extended Huffman coding assigns one codeword to each group of k symbols.

Fig. 7.18 Adaptive Huffman

i. What are the additional letters received?

200 200 80 40 20 80 200 200

scaling when applicable.)

Optionally, implement adaptive Huffman and adaptive arithmetic coding algo-

Fig. 8.26 Reconstructed

8.9 Set Partitioning in Hierarchical Trees (SPIHT)

SPIHT is a revolutionary extension of the EZW algorithm. Based on EZW’s under-

1. Assume we have an unbounded source we wish to quantize using an M-bit

2. Suppose the domain of a uniform quantizer is [−b M , b M ]. We define the loading

We wish to find a four-level Lloyd–Max quantizer. Let yi = [yi0 , . . . , yi3 ] and

(a) What is in each row of T?

Fig. 8.27 Sphere shaded by

The answer should be 2πδ(w). Let’s try a Gaussian:

simply another Gaussian.

ψ(t) = ex p(−t 1/4 ) sin(t 4 ) , t ≥ 0 (8.74)

You might also like