The Wavelet Tutorial
The Wavelet Tutorial
The Wavelet Tutorial
PART I
by
ROBI POLIKAR
FUNDAMENTAL CONCEPTS
&
AN OVERVIEW OF THE WAVELET THEORY
Welcome to this introductory tutorial on wavelet transforms. The wavelet transform is a
relatively new concept (about 10 years old), but yet there are quite a few articles and
books written on them. owever, most of these books and articles are written by math
people, for the other math people! still most of the math people don"t know what the other
math people are talkin# about (a math professor of mine made this confession). $n other
words, ma%ority of the literature available on wavelet transforms are of little help, if any,
to those who are new to this sub%ect (this is my personal opinion).
When $ first started workin# on wavelet transforms $ have stru##led for many hours and
days to fi#ure out what was #oin# on in this mysterious world of wavelet transforms, due
to the lack of introductory level te&t(s) in this sub%ect. Therefore, $ have decided to write
this tutorial for the ones who are new to the this topic. $ consider myself quite new to the
sub%ect too, and $ have to confess that $ have not fi#ured out all the theoretical details yet.
owever, as far as the en#ineerin# applications are concerned, $ think all the theoretical
details are not necessarily necessary (').
$n this tutorial $ will try to #ive basic principles underlyin# the wavelet theory. The proofs
of the theorems and related equations will not be #iven in this tutorial due to the simple
assumption that the intended readers of this tutorial do not need them at this time.
owever, interested readers will be directed to related references for further and in(depth
information.
$n this document $ am assumin# that you have no back#round knowled#e, whatsoever. $f
you do have this back#round, please disre#ard the followin# information, since it may be
trivial.
)hould you find any inconsistent, or incorrect information in the followin# tutorial,
please feel free to contact me. $ will appreciate any comments on this pa#e.
Robi POLIKAR
***********************************************************************
*
TRANS... WHAT?
+irst of all, why do we need a transform, or what is a transform anyway,
-athematical transformations are applied to si#nals to obtain a further information from
that si#nal that is not readily available in the raw si#nal. $n the followin# tutorial $ will
assume a time(domain si#nal as a raw si#nal, and a si#nal that has been .transformed. by
any of the available mathematical transformations as a processed si#nal.
There are number of transformations that can be applied, amon# which the +ourier
transforms are probably by far the most popular.
-ost of the si#nals in practice, are TIME-DOMAIN si#nals in their raw format. That is,
whatever that si#nal is measurin#, is a function of time. $n other words, when we plot the
si#nal one of the a&es is time (independent variable), and the other (dependent variable)
is usually the amplitude. When we plot time(domain si#nals, we obtain a time-amplitude
representation of the si#nal. This representation is not always the best representation of
the si#nal for most si#nal processin# related applications. $n many cases, the most
distin#uished information is hidden in the frequency content of the si#nal. The frequency
SPECTRUM of a si#nal is basically the frequency components (spectral components) of
that si#nal. The frequency spectrum of a si#nal shows what frequencies e&ist in the
si#nal.
$ntuitively, we all know that the frequency is somethin# to do with the chan#e in rate of
somethin#. $f somethin# ( a mathematical or physical variable, would be the technically
correct term) chan#es rapidly, we say that it is of hi#h frequency, where as if this variable
does not chan#e rapidly, i.e., it chan#es smoothly, we say that it is of low frequency. $f
this variable does not chan#e at all, then we say it has /ero frequency, or no frequency.
+or e&le the publication frequency of a daily newspaper is hi#her than that of a
monthly ma#a/ine (it is published more frequently).
The frequency is measured in cycles0second, or with a more common name, in .ert/..
+or e&le the electric power we use in our daily life in the 1) is 20 / (30 /
elsewhere in the world). This means that if you try to plot the electric current, it will be a
sine wave passin# throu#h the same point 30 times in 1 second. 4ow, look at the
followin# fi#ures. The first one is a sine wave at 5 /, the second one at 10 /, and the
third one at 30 /. 6ompare them.
)o how do we measure frequency, or how do we find the frequency content of a si#nal,
The answer is FOURIER TRANSFORM (FT). $f the +T of a si#nal in time domain is
taken, the frequency(amplitude representation of that si#nal is obtained. $n other words,
we now have a plot with one a&is bein# the frequency and the other bein# the amplitude.
This plot tells us how much of each frequency e&ists in our si#nal.
The frequency a&is starts from /ero, and #oes up to infinity. +or every frequency, we
have an amplitude value. +or e&le, if we take the +T of the electric current that we
use in our houses, we will have one spike at 30 /, and nothin# elsewhere, since that
si#nal has only 30 / frequency component. 4o other si#nal, however, has a +T which is
this simple. +or most practical purposes, si#nals contain more than one frequency
component. The followin# shows the +T of the 30 / si#nal7
+i#ure 1.8 The +T of the 30 / si#nal #iven in +i#ure 1.5
9ne word of caution is in order at this point. 4ote that two plots are #iven in +i#ure 1.8.
The bottom one plots only the first half of the top one. :ue to reasons that are not crucial
to know at this time, the frequency spectrum of a real valued si#nal is always symmetric.
The top plot illustrates this point. owever, since the symmetric part is e&actly a mirror
ima#e of the first part, it provides no additional information, and therefore, this
symmetric second part is usually not shown. $n most of the followin# fi#ures
correspondin# to +T, $ will only show the first half of this symmetric spectrum.
Why do we need the frequency information?
9ften times, the information that cannot be readily seen in the time(domain can be seen
in the frequency domain.
;et"s #ive an e&le from biolo#ical si#nals. )uppose we are lookin# at an <6= si#nal
(<lectro6ardio=raphy, #raphical recordin# of heart"s electrical activity). The typical
shape of a healthy <6= si#nal is well known to cardiolo#ists. >ny si#nificant deviation
from that shape is usually considered to be a symptom of a patholo#ical condition.
This patholo#ical condition, however, may not always be quite obvious in the ori#inal
time(domain si#nal. 6ardiolo#ists usually use the time(domain <6= si#nals which are
recorded on strip(charts to analy/e <6= si#nals. ?ecently, the new computeri/ed <6=
recorders0analy/ers also utili/e the frequency information to decide whether a
patholo#ical condition e&ists. > patholo#ical condition can sometimes be dia#nosed more
easily when the frequency content of the si#nal is analy/ed.
This, of course, is only one simple e&le why frequency content mi#ht be useful.
Today +ourier transforms are used in many different areas includin# all branches of
en#ineerin#.
>lthou#h +T is probably the most popular transform bein# used (especially in electrical
en#ineerin#), it is not the only one. There are many other transforms that are used quite
often by en#ineers and mathematicians. ilbert transform, short(time +ourier transform
(more about this later), Wi#ner distributions, the ?adon Transform, and of course our
featured transformation , the wavelet transform, constitute only a small portion of a
hu#e list of transforms that are available at en#ineer"s and mathematician"s disposal.
<very transformation technique has its own area of application, with advanta#es and
disadvanta#es, and the wavelet transform (WT) is no e&ception.
+or a better understandin# of the need for the WT let"s look at the +T more closely. +T
(as well as WT) is a reversible transform, that is, it allows to #o back and forward
between the raw and processed (transformed) si#nals. owever, only either of them is
available at any #iven time. That is, no frequency information is available in the time(
domain si#nal, and no time information is available in the +ourier transformed si#nal.
The natural question that comes to mind is that is it necessary to have both the time and
the frequency information at the same time,
>s we will see soon, the answer depends on the particular application, and the nature of
the si#nal in hand. ?ecall that the +T #ives the frequency information of the si#nal, which
means that it tells us how much of each frequency e&ists in the si#nal, but it does not tell
us when in time these frequency components e&ist. This information is not required
when the si#nal is so(called stationary .
;et"s take a closer look at this stationarity concept more closely, since it is of paramount
importance in si#nal analysis. )i#nals whose frequency content do not chan#e in time are
called stationary signals . $n other words, the frequency content of stationary si#nals do
not chan#e in time. $n this case, one does not need to know at what times frequency
components exist , since all frequency components exist at all times !!! .
+or e&le the followin# si#nal
&(t)@cos(A*pi*10*t)Bcos(A*pi*A3*t)Bcos(A*pi*30*t)Bcos(A*pi*100*t)
is a stationary si#nal, because it has frequencies of 10, A3, 30, and 100 / at any #iven
time instant. This si#nal is plotted below7
+i#ure 1.3
>nd the followin# is its +T7
+i#ure 1.2
The top plot in +i#ure 1.2 is the (half of the symmetric) frequency spectrum of the si#nal
in +i#ure 1.3. The bottom plot is the /oomed version of the top plot, showin# only the
ran#e of frequencies that are of interest to us. 4ote the four spectral components
correspondin# to the frequencies 10, A3, 30 and 100 /.
6ontrary to the si#nal in +i#ure 1.3, the followin# si#nal is not stationary. +i#ure 1.C
plots a si#nal whose frequency constantly chan#es in time. This si#nal is known as the
.chirp. si#nal. This is a non(stationary si#nal.
+i#ure 1.C
;et"s look at another e&le. +i#ure 1.D plots a si#nal with four different frequency
components at four different time intervals, hence a non(stationary si#nal. The interval 0
to 500 ms has a 100 / sinusoid, the interval 500 to 200 ms has a 30 / sinusoid, the
interval 200 to D00 ms has a A3 / sinusoid, and finally the interval D00 to 1000 ms has a
10 / sinusoid.
+i#ure 1.D
>nd the followin# is its +T7
+i#ure 1.E
:o not worry about the little ripples at this time! they are due to sudden chan#es from one
frequency component to another, which have no si#nificance in this te&t. 4ote that the
amplitudes of hi#her frequency components are hi#her than those of the lower frequency
ones. This is due to fact that hi#her frequencies last lon#er (500 ms each) than the lower
frequency components (A00 ms each). (The e&act value of the amplitudes are not
important).
9ther than those ripples, everythin# seems to be ri#ht. The +T has four peaks,
correspondin# to four frequencies with reasonable amplitudes... ?i#ht
WRONG (!)
Well, not e&actly wron#, but not e&actly ri#ht either...
ere is why7
+or the first si#nal, plotted in +i#ure 1.3, consider the followin# question7
>t what times (or time intervals), do these frequency components occur,
>nswer7
>t all times' ?emember that in stationary si#nals, all frequency components that e&ist in
the si#nal, e&ist throu#hout the entire duration of the si#nal. There is 10 / at all times,
there is 30 / at all times, and there is 100 / at all times.
4ow, consider the same question for the non(stationary si#nal in +i#ure 1.C or in +i#ure
1.D.
>t what times these frequency components occur,
+or the si#nal in +i#ure 1.D, we know that in the first interval we have the hi#hest
frequency component, and in the last interval we have the lowest frequency component.
+or the si#nal in +i#ure 1.C, the frequency components chan#e continuously. Therefore,
for these si#nals the frequency components do not appear at all times'
4ow, compare the +i#ures 1.2 and 1.E. The similarity between these two spectrum should
be apparent. Foth of them show four spectral components at e&actly the same
frequencies, i.e., at 10, A3, 30, and 100 /. 9ther than the ripples, and the difference in
amplitude (which can always be normali/ed), the two spectrums are almost identical,
althou#h the correspondin# time(domain si#nals are not even close to each other. Foth of
the si#nals involves the same frequency components, but the first one has these
frequencies at all times, the second one has these frequencies at different intervals. )o,
how come the spectrums of two entirely different si#nals look very much alike, ?ecall
that the +T #ives the spectral content of the si#nal, but it #ives no information re#ardin#
where in time those spectral components appear . Therefore, +T is not a suitable
technique for non(stationary si#nal, with one e&ception7
+T can be used for non(stationary si#nals, if we are only interested in what spectral
components e&ist in the si#nal, but not interested where these occur. owever, if this
information is needed, i.e., if we want to know, what spectral component occur at what
time (interval) , then +ourier transform is not the ri#ht transform to use.
+or practical purposes it is difficult to make the separation, since there are a lot of
practical stationary si#nals, as well as non(stationary ones. >lmost all biolo#ical si#nals,
for e&le, are non(stationary. )ome of the most famous ones are <6= (electrical
activity of the heart , electrocardio#raph), <<= (electrical activity of the brain,
electroencephalo#raph), and <-= (electrical activity of the muscles, electromyo#ram).
Once again please note that, the FT gives what frequency components (spectral
components) exist in the signal. Nothing more, nothing less.
When the time localization of the spectral components are needed, a transform #ivin#
the T$-<(+?<G1<46H ?<I?<)<4T>T$94 of the si#nal is needed.
THE ULTIMATE SOLUTION:
THE WAVELET TRANSFORM
The Wavelet transform is a transform of this type. $t provides the time(frequency
representation. (There are other transforms which #ive this information too, such as short
time +ourier transform, Wi#ner distributions, etc.)
9ften times a particular spectral component occurrin# at any instant can be of particular
interest. $n these cases it may be very beneficial to know the time intervals these
particular spectral components occur. +or e&le, in <<=s, the latency of an event(
related potential is of particular interest (<vent(related potential is the response of the
brain to a specific stimulus like flash(li#ht, the latency of this response is the amount of
time elapsed between the onset of the stimulus and the response).
Wavelet transform is capable of providin# the time and frequency information
simultaneously, hence #ivin# a time(frequency representation of the si#nal.
ow wavelet transform works is completely a different fun story, and should be
e&plained after short time Fourier Transform (STFT) . The WT was developed as an
alternative to the )T+T. The )T+T will be e&plained in #reat detail in the second part of
this tutorial. $t suffices at this time to say that the WT was developed to overcome some
resolution related problems of the )T+T, as e&plained in Iart $$.
To make a real lon# story short, we pass the time(domain si#nal from various hi#hpass
and low pass filters, which filters out either hi#h frequency or low frequency portions of
the si#nal. This procedure is repeated, every time some portion of the si#nal
correspondin# to some frequencies bein# removed from the si#nal.
ere is how this works7 )uppose we have a si#nal which has frequencies up to 1000 /.
$n the first sta#e we split up the si#nal in to two parts by passin# the si#nal from a
hi#hpass and a lowpass filter (filters should satisfy some certain conditions, so(called
admissibility condition) which results in two different versions of the same si#nal7
portion of the si#nal correspondin# to 0(300 / (low pass portion), and 300(1000 /
(hi#h pass portion).
Then, we take either portion (usually low pass portion) or both, and do the same thin#
a#ain. This operation is called decomposition .
>ssumin# that we have taken the lowpass portion, we now have 5 sets of data, each
correspondin# to the same si#nal at frequencies 0(A30 /, A30(300 /, 300(1000 /.
Then we take the lowpass portion a#ain and pass it throu#h low and hi#h pass filters! we
now have 8 sets of si#nals correspondin# to 0(1A3 /, 1A3(A30 /,A30(300 /, and 300(
1000 /. We continue like this until we have decomposed the si#nal to a pre(defined
certain level. Then we have a bunch of si#nals, which actually represent the same si#nal,
but all correspondin# to different frequency bands. We know which si#nal corresponds to
which frequency band, and if we put all of them to#ether and plot them on a 5(: #raph,
we will have time in one a&is, frequency in the second and amplitude in the third a&is.
This will show us which frequencies e&ist at which time ( there is an issue, called
.uncertainty principle., which states that, we cannot e&actly know what frequency exists
at what time instance , but we can only know what frequency bands e&ist at what time
intervals , more about this in the subsequent parts of this tutorial).
owever, $ still would like to e&plain it briefly7
The uncertainty principle, ori#inally found and formulated by eisenber#, states that, the
momentum and the position of a movin# particle cannot be known simultaneously. This
applies to our sub%ect as follows7
The frequency and time information of a si#nal at some certain point in the time(
frequency plane cannot be known. $n other words7 We cannot know what spectral
component e&ists at any #iven time instant. The best we can do is to investi#ate what
spectral components e&ist at any #iven interval of time. This is a problem of resolution,
and it is the main reason why researchers have switched to WT from )T+T. )T+T #ives a
fi&ed resolution at all times, whereas WT #ives a variable resolution as follows7
i#her frequencies are better resolved in time, and lower frequencies are better resolved
in frequency. This means that, a certain hi#h frequency component can be located better
in time (with less relative error) than a low frequency component. 9n the contrary, a low
frequency component can be located better in frequency compared to hi#h frequency
component.
Take a look at the followin# #rid7
f ^
|******************************************* continuous
|* * * * * * * * * * * * * * * wavelet
transform
|* * * * * * *
|* * * *
|* *
--------------------------------------------> time
Interpret the above grid as follows: The top row shows that at
higher frequencies we have more samples corresponding to smaller
intervals of time. In other words higher frequencies can be resolved
better in time. The bottom row however corresponds to low
frequencies and there are less number of points to characteri!e the
signal therefore low frequencies are not resolved well in time.
^frequenc"
|
|
|
| *******************************************************
|
|
|
| * * * * * * * * * * * * * * * * * * * discrete
time
| wavelet
transform
| * * * * * * * * * *
|
| * * * * *
| * * *
|----------------------------------------------------------> time
In discrete time case the time resolution of the signal wor#s the same
as above but now the frequenc" information has different resolutions
at ever" stage too. $ote that lower frequencies are better resolved in
frequenc" where as higher frequencies are not. $ote how the spacing
between subsequent frequenc" components increase as frequenc" increases.
%elow are some e&les of continuous wavelet transform:
'et(s ta#e a sinusoidal signal which has two different frequenc"
components at
two different times:
$ote the low frequenc" portion first and then the high frequenc".
)igure *.*+
The continuous wavelet transform of the above signal:
)igure *.**
$ote however the frequenc" a&is in these plots are labeled as
scale . The concept of the scale will be made more clear in the
subsequent
sections but it should be noted at this time that the scale is inverse
of frequenc". That is high scales correspond to low frequencies and
low scales correspond to high frequencies. ,onsequentl" the little
pea# in the plot corresponds to the high frequenc" components in the
signal and the large pea# corresponds to low frequenc" components
-which appear before the high frequenc" components in time. in the
signal.
/ou might be pu!!led from the frequenc" resolution shown in the plot
since it shows good frequenc" resolution at high frequencies. $ote
however that it is the good scale resolution that loo#s good
at high frequencies -low scales. and good scale resolution means poor
frequenc" resolution and vice versa. 0ore about this in 1art II and III.
TO BE CONTINUED...
This concludes the first part of this tutorial where I have tried to
give a brief overview of signal processing the )ourier transform and
the wavelet transform.
The Wavelet Tutorial
Part 2
FUNDAMENTALS:
THE FOURIER TRANSFORM
AND
THE SHORT TERM FOURIER TRANSFORM
FUNDAMENTALS
;et"s have a short review of the first part.
We basically need Wavelet Transform (WT) to analy/e non(stationary si#nals, i.e.,
whose frequency response varies in time. $ have written that +ourier Transform (+T) is
not suitable for non(stationary si#nals, and $ have shown e&les of it to make it more
clear. +or a quick recall, let me #ive the followin# e&le.
)uppose we have two different si#nals. >lso suppose that they both have the same
spectral components, with one ma%or difference. )ay one of the si#nals have four
frequency components at all times, and the other have the same four frequency
components at different times. The +T of both of the si#nals would be the same, as shown
in the e&le in part 1 of this tutorial. >lthou#h the two si#nals are completely
different, their (ma#nitude of) +T are the )>-< !. This, obviously tells us that we can
not use the +T for non(stationary si#nals.
But why does this happen? $n other words, how come both of the si#nals have the same
+T, HOW DOES FOURIER TRANSFORM WORK ANYWAY?
An Important Milestone in Signal Processing:
THE FOURIER TRANSFORM
$ will not #o into the details of +T for two reasons7
1. $t is too wide of a sub%ect to discuss in this tutorial.
A. $t is not our main concern anyway.
owever, $ would like to mention a couple important points a#ain for two reasons7
1. $t is a necessary back#round to understand how WT works.
A. $t has been by far the most important si#nal processin# tool for many (and $ mean
many many) years.
$n 1Eth century (1DAA*, to be e&act, but you do not need to know the e&act time. Just trust
me that it is far before than you can remember), the +rench mathematician J. +ourier,
showed that any periodic function can be e&pressed as an infinite sum of periodic
comple& e&ponential functions. -any years after he had discovered this remarkable
property of (periodic) functions, his ideas were #enerali/ed to first non(periodic
functions, and then periodic or non(periodic discrete time si#nals. $t is after this
#enerali/ation that it became a very suitable tool for computer calculations. $n 1E23, a
new al#orithm called fast +ourier Transform (++T) was developed and +T became even
more popular.
(* $ thank :r. Iedre#al for the valuable information he has provided)
4ow let us take a look at how +ourier transform works7
+T decomposes a si#nal to comple& e&ponential functions of different frequencies. The
way it does this, is defined by the followin# two equations7
+i#ure A.1
$n the above equation, t stands for time, f stands for frequency, and x denotes the si#nal at
hand. 4ote that x denotes the si#nal in time domain and the X denotes the si#nal in
frequency domain. This convention is used to distin#uish the two representations of the
si#nal. <quation (1) is called the Fourier transform of x(t), and equation (A) is called the
inverse Fourier transform of X(f), which is &(t).
+or those of you who have been usin# the +ourier transform are already familiar with
this. 1nfortunately many people use these equations without knowin# the underlyin#
principle.
Ilease take a closer look at equation (1)7
The si#nal &(t), is multiplied with an e&ponential term, at some certain frequency "f" ,
and then inte#rated over ALL TIMES !!! (The key words here are .all times. , as will
e&plained below).
4ote that the e&ponential term in <qn. (1) can also be written as7
Cos(2.pi.f.t)+j.Sin(2.pi.f.t).......(3)
The above e&pression has a real part of cosine of frequency f, and an ima#inary part of
sine of frequency f. )o what we are actually doin# is, multiplyin# the ori#inal si#nal with
a comple& e&pression which has sines and cosines of frequency f. Then we inte#rate this
product. $n other words, we add all the points in this product. $f the result of this
inte#ration (which is nothin# but some sort of infinite summation) is a lar#e value, then
we say that 7 the signal x(t), has a dominant spectral component at frequency "f".
This means that, a ma%or portion of this si#nal is composed of frequency f. $f the
inte#ration result is a small value, than this means that the si#nal does not have a ma%or
frequency component of f in it. $f this inte#ration result is /ero, then the si#nal does not
contain the frequency .f. at all.
$t is of particular interest here to see how this inte#ration works7 The si#nal is multiplied
with the sinusoidal term of frequency .f.. $f the si#nal has a hi#h amplitude component of
frequency .f., then that component and the sinusoidal term will coincide, and the product
of them will #ive a (relatively) large value. This shows that, the si#nal .&., has a ma%or
frequency component of .f..
owever, if the si#nal does not have a frequency component of .f., the product will yield
/ero, which shows that, the si#nal does not have a frequency component of .f.. $f the
frequency .f., is not a ma%or component of the si#nal .&(t)., then the product will #ive a
(relatively) small value. This shows that, the frequency component .f. in the si#nal .&.,
has a small amplitude, in other words, it is not a ma%or component of .&..
4ow, note that the inte#ration in the transformation equation (<qn. 1) is over time. The
left hand side of (1), however, is a function of frequency. Therefore, the inte#ral in (1), is
calculated for every value of f.
IMPORTANT(!) The information provided by the inte#ral, corresponds to all time
instances, since the inte#ration is from minus infinity to plus infinity over time. $t follows
that no matter where in time the component with frequency .f. appears, it will affect the
result of the inte#ration equally as well. $n other words, whether the frequency
component .f. appears at time t1 or tA , it will have the same effect on the inte#ration.
This is why Fourier transform is not suitable if the signal has time varying
frequency, i.e., the si#nal is non-stationary. $f only the si#nal has the frequency
component .f. at all times (for all .t. values), then the result obtained by the +ourier
transform makes sense.
4ote that the Fourier transform tells whether a certain frequency component exists
or not. This information is independent of where in time this component appears. $t is
therefore very important to know whether a si#nal is stationary or not, prior to processin#
it with the +T.
The e&le #iven in part one should now be clear. $ would like to #ive it here a#ain7
;ook at the followin# fi#ure, which shows the si#nal7
x(t)cos(2`pi`5`t)+cos(2`pi`10`t)+cos(2`pi`20`t)+cos(2`pi`50`t)
that is , it has four frequency components of 3, 10, A0, and 30 /., all occurrin# at all
times.
+i#ure A.A
>nd here is the +T of it. The frequency a&is has been cut here, but theoretically it e&tends
to infinity (for continuous +ourier transform (6+T). >ctually, here we calculate the
discrete +ourier transform (:+T), in which case the frequency a&is #oes up to (at least)
twice the samplin# frequency of the si#nal, and the transformed si#nal is symmetrical.
owever, this is not that important at this time.)
+i#ure A.5
4ote the four peaks in the above fi#ure, which correspond to four different frequencies.
4ow, look at the followin# fi#ure7 ere the si#nal is a#ain the cosine si#nal, and it has
the same four frequencies. owever, these components occur at different times.
+i#ure A.8
>nd here is the +ourier transform of this si#nal7
+i#ure A.3
What you are supposed to see in the above fi#ure, is it is (almost) same with the previous
+T fi#ure. Ilease look carefully and note the ma%or four peaks correspondin# to 3, 10, A0,
and 30 /. $ could have made this fi#ure look very similar to the previous one, but $ did
not do that on purpose. The reason of the noise like thin# in between peaks show that,
those frequencies also e&ist in the si#nal. Fut the reason they have a small amplitude , is
because, they are not major spectral components of the given signal, and the reason
we see those, is because of the sudden chan#e between the frequencies. <specially note
how time domain si#nal chan#es at around time A30 (ms) (With some suitable filterin#
techniques, the noise like part of the frequency domain si#nal can be cleaned, but this has
not nothin# to do with our sub%ect now. $f you need further information please send me
an e(mail).
Fy this time you should have understood the basic concepts of +ourier transform, when
we can use it and we can not. >s you can see from the above e&le, +T cannot
distin#uish the two si#nals very well. To +T, both si#nals are the same, because they
constitute of the same frequency components. Therefore, +T is not a suitable tool for
analy/in# non(stationary si#nals, i.e., si#nals with time varyin# spectra.
Ilease keep this very important property in mind. 1nfortunately, many people usin# the
+T do not think of this. They assume that the si#nal they have is stationary where it is not
in many practical cases. 9f course if you are not interested in at what times these
frequency components occur, but only interested in what frequency components e&ist,
then +T can be a suitable tool to use.
)o, now that we know that we can not use (well, we can, but we shouldn"t) +T for non(
stationary si#nals, what are we #oin# to do,
?emember that, $ have mentioned that wavelet transform is only (about) a decade old.
Hou may wonder if researchers noticed this non(stationarity business only ten years a#o
or not.
9bviously not.
>pparently they must have done somethin# about it before they fi#ured out the wavelet
transform....,
Well..., they sure did...
They have come up with ...
LINEAR TIME FREQUENCY REPRESENTATIONS
THE SHORT TERM FOURIER TRANSFORM
)o, how are we #oin# to insert this time business into our frequency plots, ;et"s look at
the problem in hand little more closer.
What was wron# with +T, $t did not work for non(stationary si#nals. ;et"s think this7
Can we assume that , some portion of a non-stationary signal is stationary?
The answer is yes.
Just look at the third fi#ure above. The si#nal is stationary every A30 time unit intervals.
Hou may ask the followin# question,
What if the part that we can consider to be stationary is very small,
Well, if it is too small, it is too small. There is nothin# we can do about that, and actually,
there is nothin# wron# with that either. We have to play this #ame with the physicists"
rules.
$f this re#ion where the si#nal can be assumed to be stationary is too small, then we look
at that si#nal from narrow windows, narrow enou#h that the portion of the si#nal seen
from these windows are indeed stationary.
This approach of researchers ended up with a revised version of the +ourier transform,
so(called 7 The )hort Time +ourier Transform ()T+T) .
There is only a minor difference between )T+T and +T. $n )T+T, the si#nal is divided
into small enou#h se#ments, where these se#ments (portions) of the si#nal can be
assumed to be stationary. +or this purpose, a window function "w" is chosen. The width
of this window must be equal to the se#ment of the si#nal where its stationarity is valid.
This window function is first located to the very be#innin# of the si#nal. That is, the
window function is located at t@0. ;et"s suppose that the width of the window is "T" s.
>t this time instant (t@0), the window function will overlap with the first T/2 seconds ($
will assume that all time units are in seconds). The window function and the si#nal are
then multiplied. Fy doin# this, only the first T0A seconds of the si#nal is bein# chosen,
with the appropriate wei#htin# of the window (if the window is a rectan#le, with
amplitude .1., then the product will be equal to the si#nal). Then this product is assumed
to be %ust another si#nal, whose +T is to be taken. $n other words, +T of this product is
taken, %ust as takin# the +T of any si#nal.
The result of this transformation is the +T of the first T/2 seconds of the si#nal. $f this
portion of the si#nal is stationary, as it is assumed, then there will be no problem and the
obtained result will be a true frequency representation of the first T0A seconds of the
si#nal.
The ne&t step, would be shiftin# this window (for some t1 seconds) to a new location,
multiplyin# with the si#nal, and takin# the +T of the product. This procedure is followed,
until the end of the si#nal is reached by shiftin# the window with .t1. seconds intervals.
The followin# definition of the )T+T summari/es all the above e&planations in one line7
+i#ure A.2
Ilease look at the above equation carefully. x(t) is the si#nal itself, w(t) is the window
function, and ` is the comple& con%u#ate. >s you can see from the equation, the )T+T of
the si#nal is nothin# but the +T of the si#nal multiplied by a window function.
+or every t' and f a new )T+T coefficient is computed (6orrection7 The .t. in the
parenthesis of )T+T should be .t".. $ will correct this soon. $ have %ust noticed that $ have
mistyped it).
The followin# fi#ure may help you to understand this a little better7
+i#ure A.C
The =aussian(like functions in color are the windowin# functions. The red one shows the
window located at t@t1", the blue shows t@tA", and the #reen one shows the window
located at t@t5". These will correspond to three different +Ts at three different times.
Therefore, we will obtain a true time-frequency representation (TFR) of the si#nal.
Irobably the best way of understandin# this would be lookin# at an e&le. +irst of all,
since our transform is a function of both time and frequency (unlike +T, which is a
function of frequency only), the transform would be two dimensional (three, if you count
the amplitude too). ;et"s take a non(stationary si#nal, such as the followin# one7
+i#ure A.D
$n this si#nal, there are four frequency components at different times. The interval 0 to
A30 ms is a simple sinusoid of 500 /, and the other A30 ms intervals are sinusoids of
A00 /, 100 /, and 30 /, respectively. >pparently, this is a non(stationary si#nal.
4ow, let"s look at its )T+T7
+i#ure A.E
>s e&pected, this is two dimensional plot (5 dimensional, if you count the amplitude too).
The .&. and .y. a&es are time and frequency, respectively. Ilease, i#nore the numbers on
the a&es, since they are normali/ed in some respect, which is not of any interest to us at
this time. Just e&amine the shape of the time(frequency representation.
+irst of all, note that the #raph is symmetric with respect to midline of the frequency a&is.
?emember that, althou#h it was not shown, +T of a real si#nal is always symmetric, since
)T+T is nothin# but a windowed version of the +T, it should come as no surprise that
)T+T is also symmetric in frequency. The symmetric part is said to be associated with
ne#ative frequencies, an odd concept which is difficult to comprehend, fortunately, it is
not important! it suffices to know that )T+T and +T are symmetric.
What is important, are the four peaks! note that there are four peaks correspondin# to four
different frequency components. >lso note that, unlike +T, these four peaks are located
at different time intervals along the time axis . ?emember that the ori#inal si#nal had
four spectral components located at different times.
4ow we have a true time(frequency representation of the si#nal. We not only know what
frequency components are present in the si#nal, but we also know where they are located
in time.
$t is #rrrreeeaaatttttt'''' ?i#ht,
Well, not really'
Hou may wonder, since )T+T #ives the T+? of the si#nal, why do we need the wavelet
transform. The implicit problem of the )T+T is not obvious in the above e&le. 9f
course, an e&le that would work nicely was chosen on purpose to demonstrate the
concept.
The problem with )T+T is the fact whose roots #o back to what is known as the
Heisenberg Uncertainty Principle . This principle ori#inally applied to the momentum
and location of movin# particles, can be applied to time(frequency information of a
si#nal. )imply, this principle states that one cannot know the e&act time(frequency
representation of a si#nal, i.e., one cannot know what spectral components e&ist at what
instances of times. What one can know are the time intervals in which certain band of
frequencies e&ist, which is a resolution problem.
The problem with the )T+T has somethin# to do with the width of the window function
that is used. To be technically correct, this width of the window function is known as the
support of the window. $f the window function is narrow, than it is known as compactly
supported . This terminolo#y is more often used in the wavelet world, as we will see
later.
ere is what happens7
?ecall that in the +T there is no resolution problem in the frequency domain, i.e., we
know e&actly what frequencies e&ist! similarly we there is no time resolution problem in
the time domain, since we know the value of the si#nal at every instant of time.
6onversely, the time resolution in the +T, and the frequency resolution in the time
domain are /ero, since we have no information about them. What #ives the perfect
frequency resolution in the +T is the fact that the window used in the +T is its kernel, the
expjwt] function, which lasts at all times from minus infinity to plus infinity. 4ow, in
)T+T, our window is of finite len#th, thus it covers only a portion of the si#nal, which
causes the frequency resolution to #et poorer. What $ mean by #ettin# poorer is that, we
no lon#er know the e&act frequency components that e&ist in the si#nal, but we only
know a band of frequencies that e&ist7
$n +T, the kernel function, allows us to obtain perfect frequency resolution, because the
kernel itself is a window of infinite len#th. $n )T+T is window is of finite len#th, and we
no lon#er have perfect frequency resolution. Hou may ask, why don"t we make the len#th
of the window in the )T+T infinite, %ust like as it is in the +T, to #et perfect frequency
resolution, Well, than you loose all the time information, you basically end up with the
+T instead of )T+T. To make a lon# story real short, we are faced with the followin#
dilemma7
$f we use a window of infinite len#th, we #et the +T, which #ives perfect frequency
resolution, but no time information. +urthermore, in order to obtain the stationarity, we
have to have a short enou#h window, in which the si#nal is stationary. The narrower we
make the window, the better the time resolution, and better the assumption of stationarity,
but poorer the frequency resolution7
4arrow window @@@K#ood time resolution, poor frequency resolution.
Wide window @@@K#ood frequency resolution, poor time resolution.
$n order to see these effects, let"s look at a couple e&les7 $ will show four windows of
different len#th, and we will use these to compute the )T+T, and see what happens7
The window function we use is simply a =aussian function in the form7
w(t)exp(-a`(t^2)/2);
where a determines the len#th of the window, and t is the time. The followin# fi#ure
shows four window functions of varyin# re#ions of support, determined by the value of a
. Ilease disre#ard the numeric values of a since the time interval where this function is
computed also determines the function. Just note the len#th of each window. The above
e&le #iven was computed with the second value, a0.001 . $ will now show the )T+T
of the same si#nal #iven above computed with the other windows.
+i#ure A.10
+irst let"s look at the first most narrow window. We e&pect the )T+T to have a very #ood
time resolution, but relatively poor frequency resolution7
+i#ure A.11
The above fi#ure shows this )T+T. The fi#ure is shown from a top bird(eye view with an
an#le for better interpretation. 4ote that the four peaks are well separated from each other
in time. >lso note that, in frequency domain, every peak covers a ran#e of frequencies,
instead of a sin#le frequency value. 4ow let"s make the window wider, and look at the
third window (the second one was already shown in the first e&le).
+i#ure A.1A
4ote that the peaks are not well separated from each other in time, unlike the previous
case, however, in frequency domain the resolution is much better. 4ow let"s further
increase the width of the window, and see what happens7
THE WAVELET TUTORIAL
PART III
MULTIRESOLUTION ANALYSIS
&
THE CONTINUOUS WAVELET TRANSFORM
by
Robi Polikar
MULTIRESOLUTION ANALYSIS
>lthou#h the time and frequency resolution problems are results of a physical
phenomenon (the eisenber# uncertainty principle) and e&ist re#ardless of the transform
used, it is possible to analy/e any si#nal by usin# an alternative approach called the
multiresolution analysis (MRA) . -?>, as implied by its name, analy/es the si#nal at
different frequencies with different resolutions. <very spectral component is not resolved
equally as was the case in the )T+T.
-?> is desi#ned to #ive #ood time resolution and poor frequency resolution at hi#h
frequencies and #ood frequency resolution and poor time resolution at low frequencies.
This approach makes sense especially when the si#nal at hand has hi#h frequency
components for short durations and low frequency components for lon# durations.
+ortunately, the si#nals that are encountered in practical applications are often of this
type. +or e&le, the followin# shows a si#nal of this type. $t has a relatively low
frequency component throu#hout the entire si#nal and relatively hi#h frequency
components for a short duration somewhere around the middle.
THE CONTINUOUS WAVELET TRANSFORM
The continuous wavelet transform was developed as an alternative approach to the short
time +ourier transform to overcome the resolution problem. The wavelet analysis is done
in a similar way to the )T+T analysis, in the sense that the si#nal is multiplied with a
function, LMit the waveletN, similar to the window function in the )T+T, and the transform
is computed separately for different se#ments of the time(domain si#nal. owever, there
are two main differences between the )T+T and the 6WT7
1. The +ourier transforms of the windowed si#nals are not taken, and therefore sin#le
peak will be seen correspondin# to a sinusoid, i.e., ne#ative frequencies are not
computed.
A. The width of the window is chan#ed as the transform is computed for every sin#le
spectral component, which is probably the most si#nificant characteristic of the wavelet
transform.
The continuous wavelet transform is defined as follows
<quation 5.1
>s seen in the above equation , the transformed si#nal is a function of two variables, tau
and s , the translation and scale parameters, respectively. psi(t) is the transformin#
function, and it is called the mother wavelet . The term mother wavelet #ets its name
due to two important properties of the wavelet analysis as e&plained below7
The term wavelet means a small wave . The smallness refers to the condition that this
(window) function is of finite len#th ( compactly supported). The wave refers to the
condition that this function is oscillatory . The term mother implies that the functions
with different re#ion of support that are used in the transformation process are derived
from one main function, or the mother wavelet. $n other words, the mother wavelet is a
prototype for #eneratin# the other window functions.
The term translation is used in the same sense as it was used in the )T+T! it is related to
the location of the window, as the window is shifted throu#h the si#nal. This term,
obviously, corresponds to time information in the transform domain. owever, we do not
have a frequency parameter, as we had before for the )T+T. $nstead, we have scale
parameter which is defined as O10frequencyO. The term frequency is reserved for the
)T+T. )cale is described in more detail in the ne&t section.
The Scale
The parameter scale in the wavelet analysis is similar to the scale used in maps. >s in the
case of maps, hi#h scales correspond to a non(detailed #lobal view (of the si#nal), and
low scales correspond to a detailed view. )imilarly, in terms of frequency, low
frequencies (hi#h scales) correspond to a #lobal information of a si#nal (that usually
spans the entire si#nal), whereas hi#h frequencies (low scales) correspond to a detailed
information of a hidden pattern in the si#nal (that usually lasts a relatively short time).
6osine si#nals correspondin# to various scales are #iven as e&les in the followin#
fi#ure .
+
i#ure 5.A
+ortunately in practical applications, low scales (hi#h frequencies) do not last for the
entire duration of the si#nal, unlike those shown in the fi#ure, but they usually appear
from time to time as short bursts, or spikes. i#h scales (low frequencies) usually last for
the entire duration of the si#nal.
)calin#, as a mathematical operation, either dilates or compresses a si#nal. ;ar#er scales
correspond to dilated (or stretched out) si#nals and small scales correspond to
compressed si#nals. >ll of the si#nals #iven in the fi#ure are derived from the same
cosine si#nal, i.e., they are dilated or compressed versions of the same function. $n the
above fi#ure, s0.05 is the smallest scale, and s1 is the lar#est scale.
$n terms of mathematical functions, if f(t) is a #iven function f(st) corresponds to a
contracted (compressed) version of f(t) if s > 1 and to an e&panded (dilated) version of
f(t) if s < 1 .
owever, in the definition of the wavelet transform, the scalin# term is used in the
denominator, and therefore, the opposite of the above statements holds, i.e., scales s > 1
dilates the si#nals whereas scales s < 1 , compresses the si#nal. This interpretation of
scale will be used throu#hout this te&t.
COMPUTATION OF THE CWT
$nterpretation of the above equation will be e&plained in this section. ;et x(t) is the si#nal
to be analy/ed. The mother wavelet is chosen to serve as a prototype for all windows in
the process. >ll the windows that are used are the dilated (or compressed) and shifted
versions of the mother wavelet. There are a number of functions that are used for this
purpose. The -orlet wavelet and the -e&ican hat function are two candidates, and they
are used for the wavelet analysis of the e&les which are presented later in this
chapter.
9nce the mother wavelet is chosen the computation starts with s1 and the continuous
wavelet transform is computed for all values of s , smaller and lar#er than PP1"". owever,
dependin# on the si#nal, a complete transform is usually not necessary. +or all practical
purposes, the si#nals are bandlimited, and therefore, computation of the transform for a
limited interval of scales is usually adequate. $n this study, some finite interval of values
for s were used, as will be described later in this chapter.
+or convenience, the procedure will be started from scale s1 and will continue for the
increasin# values of s , i.e., the analysis will start from hi#h frequencies and proceed
towards low frequencies. This first value of s will correspond to the most compressed
wavelet. >s the value of s is increased, the wavelet will dilate.
The wavelet is placed at the be#innin# of the si#nal at the point which corresponds to
time@0. The wavelet function at scale PP1"" is multiplied by the si#nal and then inte#rated
over all times. The result of the inte#ration is then multiplied by the constant number
1/sqrts] . This multiplication is for ener#y normali/ation purposes so that the
transformed si#nal will have the same ener#y at every scale. The final result is the value
of the transformation, i.e., the value of the continuous wavelet transform at time zero and
scale s1 . $n other words, it is the value that corresponds to the point tau 0 , s1 in the
time(scale plane.
The wavelet at scale s1 is then shifted towards the ri#ht by tau amount to the location
ttau , and the above equation is computed to #et the transform value at ttau , s1 in
the time(frequency plane.
This procedure is repeated until the wavelet reaches the end of the si#nal. One row of
points on the time-scale plane for the scale s1 is now completed.
Then, s is increased by a small value. 4ote that, this is a continuous transform, and
therefore, both tau and s must be incremented continuously . owever, if this transform
needs to be computed by a computer, then both parameters are increased by a
sufficiently small step size. This corresponds to samplin# the time(scale plane.
The above procedure is repeated for every value of s. <very computation for a #iven
value of s fills the correspondin# sin#le row of the time(scale plane. When the process is
completed for all desired values of s, the 6WT of the si#nal has been calculated.
The fi#ures below illustrate the entire process step by step.
+i#ure
5.5
$n +i#ure 5.5, the si#nal and the wavelet function are shown for four different values of
tau . The si#nal is a truncated version of the si#nal shown in +i#ure 5.1. The scale value
is 1 , correspondin# to the lowest scale, or hi#hest frequency. 4ote how compact it is (the
blue window). $t should be as narrow as the hi#hest frequency component that e&ists in
the si#nal. +our distinct locations of the wavelet function are shown in the fi#ure at to2 ,
to40, to90, and to140 . >t every location, it is multiplied by the si#nal. 9bviously,
the product is non/ero only where the si#nal falls in the re#ion of support of the wavelet,
and it is /ero elsewhere. Fy shiftin# the wavelet in time, the si#nal is locali/ed in time,
and by chan#in# the value of s , the si#nal is locali/ed in scale (frequency).
$f the si#nal has a spectral component that corresponds to the current value of s (which is
1 in this case), the product of the wavelet with the si#nal at the location where this
spectral component exists #ives a relatively lar#e value. $f the spectral component that
corresponds to the current value of s is not present in the si#nal, the product value will be
relatively small, or /ero. The si#nal in +i#ure 5.5 has spectral components comparable to
the window"s width at s1 around t@100 ms.
The continuous wavelet transform of the si#nal in +i#ure 5.5 will yield lar#e values for
low scales around time 100 ms, and small values elsewhere. +or hi#h scales, on the other
hand, the continuous wavelet transform will #ive lar#e values for almost the entire
duration of the si#nal, since low frequencies e&ist at all times.
+i#ure
5.8
+i#ure
5.3
+i#ures 5.8 and 5.3 illustrate the same process for the scales s@3 and s@A0, respectively.
4ote how the window width chan#es with increasin# scale (decreasin# frequency). >s
the window width increases, the transform starts pickin# up the lower frequency
components.
>s a result, for every scale and for every time (interval), one point of the time(scale plane
is computed. The computations at one scale construct the rows of the time(scale plane,
and the computations at different scales construct the columns of the time(scale plane.
4ow, let"s take a look at an e&le, and see how the wavelet transform really looks like.
6onsider the non-stationary si#nal in +i#ure 5.2. This is similar to the e&le #iven for
the )T+T, e&cept at different frequencies. >s stated on the fi#ure, the si#nal is composed
of four frequency components at 50 /, A0 /, 10 / and 3 /.
+i#ure 5.2
+i#ure 5.C is the continuous wavelet transform (6WT) of this si#nal. 4ote that the a&es
are translation and scale, not time and frequency. owever, translation is strictly related
to time, since it indicates where the mother wavelet is located. The translation of the
mother wavelet can be thou#ht of as the time elapsed since t0 . The scale, however, has
a whole different story. ?emember that the scale parameter s in equation 5.1 is actually
inverse of frequency. $n other words, whatever we said about the properties of the
wavelet transform re#ardin# the frequency resolution, inverse of it will appear on the
fi#ures showin# the WT of the time(domain si#nal.
+i#ure 5.C
4ote that in +i#ure 5.C that smaller scales correspond to hi#her frequencies, i.e.,
frequency decreases as scale increases, therefore, that portion of the #raph with scales
around /ero, actually correspond to hi#hest frequencies in the analysis, and that with hi#h
scales correspond to lowest frequencies. ?emember that the si#nal had 50 / (hi#hest
frequency) components first, and this appears at the lowest scale at a translations of 0 to
50. Then comes the A0 / component, second hi#hest frequency, and so on. The 3 /
component appears at the end of the translation a&is (as e&pected), and at hi#her scales
(lower frequencies) a#ain as e&pected.
+i#ure 5.D
4ow, recall these resolution properties7 1nlike the )T+T which has a constant resolution
at all times and frequencies, the WT has a #ood time and poor frequency resolution at
hi#h frequencies, and #ood frequency and poor time resolution at low frequencies. +i#ure
5.D shows the same WT in +i#ure 5.C from another an#le to better illustrate the resolution
properties7 $n +i#ure 5.D, lower scales (hi#her frequencies) have better scale resolution
(narrower in scale, which means that it is less ambi#uous what the e&act value of the
scale) which correspond to poorer frequency resolution . )imilarly, hi#her scales have
scale frequency resolution (wider support in scale, which means it is more ambitious
what the e&act value of the scale is) , which correspond to better frequency resolution of
lower frequencies.
The a&es in +i#ure 5.C and 5.D are normali/ed and should be evaluated accordin#ly.
?ou#hly speakin# the 100 points in the translation a&is correspond to 1000 ms, and the
130 points on the scale a&is correspond to a frequency band of 80 / (the numbers on the
translation and scale a&is do not correspond to seconds and Hz, respectively , they are
%ust the number of samples in the computation).
TIME AND FREQUENCY RESOLUTIONS
$n this section we will take a closer look at the resolution properties of the wavelet
transform. ?emember that the resolution problem was the main reason why we switched
from )T+T to WT.
The illustration in +i#ure 5.E is commonly used to e&plain how time and frequency
resolutions should be interpreted. <very bo& in +i#ure 5.E corresponds to a value of the
wavelet transform in the time(frequency plane. 4ote that bo&es have a certain non-zero
area, which implies that the value of a particular point in the time(frequency plane cannot
be known. >ll the points in the time(frequency plane that falls into a bo& is represented
by one value of the WT.
+i#ure 5.E
;et"s take a closer look at +i#ure 5.E7 +irst thin# to notice is that althou#h the widths and
hei#hts of the bo&es chan#e, the area is constant. That is each bo& represents an equal
portion of the time(frequency plane, but #ivin# different proportions to time and
frequency. 4ote that at low frequencies, the hei#ht of the bo&es are shorter (which
corresponds to better frequency resolutions, since there is less ambi#uity re#ardin# the
value of the e&act frequency), but their widths are lon#er (which correspond to poor time
resolution, since there is more ambi#uity re#ardin# the value of the e&act time). >t hi#her
frequencies the width of the bo&es decreases, i.e., the time resolution #ets better, and the
hei#hts of the bo&es increase, i.e., the frequency resolution #ets poorer.
Fefore concludin# this section, it is worthwhile to mention how the partition looks like in
the case of )T+T. ?ecall that in )T+T the time and frequency resolutions are determined
by the width of the analysis window, which is selected once for the entire analysis, i.e.,
both time and frequency resolutions are constant. Therefore the time(frequency plane
consists of squares in the )T+T case.
?e#ardless of the dimensions of the bo&es, the areas of all bo&es, both in )T+T and WT,
are the same and determined by Heisenberg's inequality . >s a summary, the area of a
bo& is fi&ed for each window function ()T+T) or mother wavelet (6WT), whereas
different windows or mother wavelets can result in different areas. owever, all areas
are lower bounded by 1/4 \pi . That is, we cannot reduce the areas of the bo&es as much
as we want due to the eisenber#"s uncertainty principle. 9n the other hand, for a #iven
mother wavelet the dimensions of the bo&es can be chan#ed, while keepin# the area the
same. This is e&actly what wavelet transform does.
THE WAVELET THEORY: A MATHEMATICAL APPROACH
This section describes the main idea of wavelet analysis theory, which can also be
considered to be the underlyin# concept of most of the si#nal analysis techniques. The +T
defined by +ourier use basis functions to analy/e and reconstruct a function. Every
vector in a vector space can be written as a linear combination of the basis vectors in
that vector space , i.e., by multiplyin# the vectors by some constant numbers, and then
by takin# the summation of the products. The analysis of the si#nal involves the
estimation of these constant numbers (transform coefficients, or +ourier coefficients,
wavelet coefficients, etc). The synthesis, or the reconstruction, corresponds to computin#
the linear combination equation.
>ll the definitions and theorems related to this sub%ect can be found in Qeiser"s book, A
Friendly Guide to Wavelets but an introductory level knowled#e of how basis functions
work is necessary to understand the underlyin# principles of the wavelet theory.
Therefore, this information will be presented in this section.
Basis Vectors
4ote7 -ost of the equations include letters of the =reek alphabet. These letters are
written out e&plicitly in the te&t with their names, such as tau, psi, phi etc. +or capital
letters, the first letter of the name has been capitali/ed, such as, Tau, Psi, Phi etc. >lso,
subscripts are shown by the underscore character _ , and superscripts are shown by the ^
character. >lso note that all letters or letter names written in bold type face represent
vectors, )ome important points are also written in bold face, but the meanin# should be
clear from the conte&t.
> basis of a vector space V is a set of linearly independent vectors, such that any vector v
in V can be written as a linear combination of these basis vectors. There may be more
than one basis for a vector space. owever, all of them have the same number of vectors,
and this number is known as the dimension of the vector space. +or e&le in two(
dimensional space, the basis will have two vectors.
<quation 5.A
<quation 5.A shows how any vector v can be written as a linear combination of the basis
vectors b_k and the correspondin# coefficients nu^k .
This concept, #iven in terms of vectors, can easily be #enerali/ed to functions, by
replacin# the basis vectors b_k with basis functions phiRk(t), and the vector v with a
function f(t). <quation 5.A then becomes
<quation 5.Aa
The comple& e&ponential (sines and cosines) functions are the basis functions for the +T.
+urthermore, they are ortho#onal functions, which provide some desirable properties for
reconstruction.
;et f(t) and #(t) be two functions in ;SA Ta,bU. ( ;SA Ta,bU denotes the set of square
inte#rable functions in the interval Ta,bU). The inner product of two functions is defined
by <quation 5.57
<quation 5.5
>ccordin# to the above definition of the inner product, the 6WT can be thou#ht of as the
inner product of the test si#nal with the basis functions psiR(tau ,s)(t)7
<quation 5.8
where,
<quation 5.3
This definition of the 6WT shows that the wavelet analysis is a measure of similarity
between the basis functions (wavelets) and the si#nal itself. ere the similarity is in the
sense of similar frequency content. The calculated 6WT coefficients refer to the
closeness of the si#nal to the wavelet at the current scale .
This further clarifies the previous discussion on the correlation of the si#nal with the
wavelet at a certain scale. $f the si#nal has a ma%or component of the frequency
correspondin# to the current scale, then the wavelet (the basis function) at the current
scale will be similar or close to the si#nal at the particular location where this frequency
component occurs. Therefore, the 6WT coefficient computed at this point in the time(
scale plane will be a relatively lar#e number.
Inner Products, Orthogonality, and Orthonormality
Two vectors v , w are said to be orthogonal if their inner product equals /ero7
<quation 5.2
)imilarly, two functions OfO and O#O are said to be ortho#onal to each other if their inner
product is /ero7
<quation 5.C
> set of vectors v_1, v_2, ....,v_n] is said to be orthonormal , if they are pairwise
ortho#onal to each other, and all have len#th PP1"". This can be e&pressed as7
<quation 5.D
)imilarly, a set of functions LphiRk(t)N, k@1,A,5,..., is said to be orthonormal if
<quation 5.E
and
<quation 5.10
or equivalently
<quation 5.11
where, deltaRLklN is the Kronecker delta function, defined as7
<quation 5.1A
>s stated above, there may be more than one set of basis functions (or vectors). >mon#
them, the orthonormal basis functions (or vectors) are of particular importance because of
the nice properties they provide in findin# these analysis coefficients. The orthonormal
bases allow computation of these coefficients in a very simple and strai#htforward way
usin# the orthonormality property.
+or orthonormal bases, the coefficients, muRk , can be calculated as
<quation 5.15
and the function f(t) can then be reconstructed by <quation 5.ARa by substitutin# the
muRk coefficients. This yields
<quation 5.18
9rthonormal bases may not be available for every type of application where a
#enerali/ed version, biorthogonal bases can be used. The term PPbiortho#onal"" refers to
two different bases which are ortho#onal to each other, but each do not form an
ortho#onal set.
$n some applications, however, biortho#onal bases also may not be available in which
case frames can be used. +rames constitute an important part of wavelet theory, and
interested readers are referred to Qaiser"s book mentioned earlier.
+ollowin# the same order as in chapter A for the )T+T, some e&les of continuous
wavelet transform are presented ne&t. The fi#ures #iven in the e&les were #enerated
by a pro#ram written to compute the 6WT.
Fefore we close this section, $ would like to include two mother wavelets commonly used
in wavelet analysis. The -e&ican at wavelet is defined as the second derivative of the
=aussian function7
<quation 5.13
which is
<quation 5.12
The -orlet wavelet is defined as
<quation 5.12a
where a is a modulation parameter, and sigma is the scalin# parameter that affects the
width of the window.
EXAMPLES
>ll of the e&les that are #iven below correspond to real(life non(stationary si#nals.
These si#nals are drawn from a database si#nals that includes event related potentials of
normal people, and patients with >l/heimer"s disease. )ince these are not test si#nals like
simple sinusoids, it is not as easy to interpret them. They are shown here only to #ive an
idea of how real(life 6WTs look like.
The followin# si#nal shown in +i#ure 5.11 belon#s to a normal person.
+i#ure 5.11 and the followin# is its 6WT. The numbers on the a&es are of no importance
to us. those numbers simply show that the 6WT was computed at 530 translation and 20
scale locations on the translation(scale plane. The important point to note here is the fact
that the computation is not a true continuous WT, as it is apparent from the computation
at finite number of locations. This is only a discreti/ed version of the 6WT, which is
e&plained later on this pa#e. 4ote, however, that this is 49T discrete wavelet transform
(:WT) which is the topic of Iart $V of this tutorial.
+i#ure 5.1A
and the +i#ure 5.15 plots the same transform from a different an#le for better
visuali/ation.
+i#ure 5.15
+i#ure 5.18 plots an event related potential of a patient dia#nosed with >l/heimer"s
disease
+i#ure 5.18 and +i#ure 5.13 illustrates its 6WT7
+i#ure 5.13 and here is another view from a different an#le
+i#ure 5.12
THE WAVELET SYNTHESIS
The continuous wavelet transform is a reversible transform, provided that <quation 5.1D
is satisfied. +ortunately, this is a very non(restrictive requirement. The continuous
wavelet transform is reversible if <quation 5.1D is satisfied, even thou#h the basis
functions are in #eneral may not be orthonormal. The reconstruction is possible by usin#
the followin# reconstruction formula7
<quation 5.1C $nverse Wavelet Transform
where 6Rpsi is a constant that depends on the wavelet used. The success of the
reconstruction depends on this constant called, the admissibility constant , to satisfy the
followin# admissibility condition 7
<quation 5.1D >dmissibility 6ondition
where psiShat(&i) is the +T of psi(t). <quation 5.1D implies that psiShat(0) @ 0, which is
<quation 5.1E
>s stated above, <quation 5.1E is not a very restrictive requirement since many wavelet
functions can be found whose inte#ral is /ero. +or <quation 5.1E to be satisfied, the
wavelet must be oscillatory.
Discretization of the Continuous Wavelet Transform: The Wavelet
Series
$n today"s world, computers are used to do most computations (well,...ok... almost all
computations). $t is apparent that neither the +T, nor the )T+T, nor the 6WT can be
practically computed by usin# analytical equations, inte#rals, etc. $t is therefore necessary
to discreti/e the transforms. >s in the +T and )T+T, the most intuitive way of doin# this
is simply samplin# the time(frequency (scale) plane. >#ain intuitively, samplin# the
plane with a uniform samplin# rate sounds like the most natural choice. owever, in the
case of WT, the scale chan#e can be used to reduce the samplin# rate.
>t hi#her scales (lower frequencies), the samplin# rate can be decreased, accordin# to
4yquist"s rule. $n other words, if the time(scale plane needs to be sampled with a
samplin# rate of N_1 at scale s_1 , the same plane can be sampled with a samplin# rate of
N_2 , at scale s_2 , where, s_1 < s_2 (correspondin# to frequencies f1>f2 ) and N_2 <
N_1 . The actual relationship between N_1 and N_2 is
<quation 5.A0
or
<quation 5.A1
$n other words, at lower frequencies the samplin# rate can be decreased which will save a
considerable amount of computation time.
$t should be noted at this time, however, that the discreti/ation can be done in any way
without any restriction as far as the analysis of the si#nal is concerned. $f synthesis is not
required, even the 4yquist criteria does not need to be satisfied. The restrictions on the
discreti/ation and the samplin# rate become important if, and only if, the si#nal
reconstruction is desired. 4yquist"s samplin# rate is the minimum samplin# rate that
allows the ori#inal continuous time si#nal to be reconstructed from its discrete samples.
The basis vectors that are mentioned earlier are of particular importance for this reason.
>s mentioned earlier, the wavelet psi(tau,s) satisfyin# <quation 5.1D, allows
reconstruction of the si#nal by <quation 5.1C. owever, this is true for the continuous
transform. The question is7 can we still reconstruct the si#nal if we discreti/e the time and
scale parameters, The answer is PPyes"", under certain conditions (as they always say in
commercials7 certain restrictions apply ''').
The scale parameter s is discreti/ed first on a lo#arithmic #rid. The time parameter is then
discreti/ed with respect to the scale parameter , i.e., a different samplin# rate is used
for every scale. $n other words, the samplin# is done on the dyadic samplin# #rid shown
in +i#ure 5.1C 7
+i#ure 5.1C
Think of the area covered by the a&es as the entire time(scale plane. The 6WT assi#ns a
value to the continuum of points on this plane. Therefore, there are an infinite number of
6WT coefficients. +irst consider the discreti/ation of the scale a&is. >mon# that infinite
number of points, only a finite number are taken, usin# a lo#arithmic rule. The base of
the lo#arithm depends on the user. The most common value is 2 because of its
convenience. $f A is chosen, only the scales A, 8, D, 12, 5A, 28,...etc. are computed. $f the
value was 5, the scales 5, E, AC, D1, A85,...etc. would have been computed. The time a&is
is then discreti/ed accordin# to the discreti/ation of the scale a&is. )ince the discrete
scale chan#es by factors of 2 , the samplin# rate is reduced for the time a&is by a factor of
2 at every scale.
4ote that at the lowest scale (s@A), only 5A points of the time a&is are sampled (for the
particular case #iven in +i#ure 5.1C). >t the ne&t scale value, s@8, the samplin# rate of
time a&is is reduced by a factor of A since the scale is increased by a factor of A, and
therefore, only 12 samples are taken. >t the ne&t step, s@D and D samples are taken in
time, and so on.
>lthou#h it is called the time(scale plane, it is more accurate to call it the translation-
scale plane, because PPtime"" in the transform domain actually corresponds to the shiftin#
of the wavelet in time. +or the wavelet series, the actual time is still continuous.
)imilar to the relationship between continuous +ourier transform, +ourier series and the
discrete +ourier transform, there is a continuous wavelet transform, a semi(discrete
wavelet transform (also known as wavelet series) and a discrete wavelet transform.
<&pressin# the above discreti/ation procedure in mathematical terms, the scale
discreti/ation is s s_0^j , and translation discreti/ation is tau k.s_0^j.tau_0 where
s_0>1 and tau_0>0 . 4ote, how the translation discreti/ation is dependent on scale
discreti/ation with s_0 .
The continuous wavelet function
<quation 5.AA
<quation 5.A5
by insertin# s s_0^j , and tau k.s_0^j.tau_0 .
$f psi_(j,k)] constitutes an orthonormal basis, the wavelet series transform becomes
<quation 5.A8
or
<quation 5.A3
> wavelet series requires that psi_(j,k)] are either orthonormal, biortho#onal, or frame.
$f psi_(j,k)] are not orthonormal, <quation 5.A8 becomes
<quation 5.A2
where hat psi_j,k]^`(t)] , is either the dual biorthogonal basis or dual frame (4ote
that ` denotes the con%u#ate).
$f psi_(j,k) ] are orthonormal or biortho#onal, the transform will be non(redundant,
where as if they form a frame, the transform will be redundant. 9n the other hand, it is
much easier to find frames than it is to find orthonormal or biortho#onal bases.
The followin# analo#y may clear this concept. 6onsider the whole process as lookin# at a
particular ob%ect. The human eyes first determine the coarse view which depends on the
distance of the eyes to the ob%ect. This corresponds to ad%ustin# the scale parameter
s_0^(-j). When lookin# at a very close ob%ect, with #reat detail, j is ne#ative and lar#e
(low scale, hi#h frequency, analyses the detail in the si#nal). -ovin# the head (or eyes)
very slowly and with very small increments (of an#le, of distance, dependin# on the
ob%ect that is bein# viewed), corresponds to small values of tau k.s_0^j.tau_0 . 4ote
that when j is ne#ative and lar#e, it corresponds to small chan#es in time, tau , (hi#h
samplin# rate) and lar#e chan#es in s_0^-j (low scale, hi#h frequencies, where the
samplin# rate is hi#h). The scale parameter can be thou#ht of as ma#nification too.
ow low can the samplin# rate be and still allow reconstruction of the si#nal, This is the
main question to be answered to optimi/e the procedure. The most convenient value (in
terms of pro#rammin#) is found to be PPA"" for sR0 and .1. for tau. 9bviously, when the
samplin# rate is forced to be as low as possible, the number of available orthonormal
wavelets is also reduced.
The continuous wavelet transform e&les that were #iven in this chapter were actually
the wavelet series of the #iven si#nals. The parameters were chosen dependin# on the
si#nal. )ince the reconstruction was not needed, the samplin# rates were sometimes far
below the critical value where sR0 varied from A to 10, and tauR0 varied from A to D, for
different e&les.
This concludes Iart $$$ of this tutorial. $ hope you now have a basic understandin# of
what the wavelet transform is all about. There is one thin# left to be discussed however.
<ven thou#h the discreti/ed wavelet transform can be computed on a computer, this
computation may take anywhere from a couple seconds to couple hours dependin# on
your si#nal si/e and the resolution you want. >n ama/in#ly fast al#orithm is actually
available to compute the wavelet transform of a si#nal. The discrete wavelet transform
(:WT) is introduced in the final chapter of this tutorial, in Iart $V.
;et"s meet at the #rand finale, shall we,
THE WAVELET TUTORIAL
PART IV
by
ROBI POLIKAR
MULTIRESOLUTION ANALYSIS: THE
DISCRETE WAVELET TRANSFORM
Why is the Discrete Wavelet Transform Needed?
>lthou#h the discreti/ed continuous wavelet transform enables the computation of the
continuous wavelet transform by computers, it is not a true discrete transform. >s a
matter of fact, the wavelet series is simply a sampled version of the 6WT, and the
information it provides is hi#hly redundant as far as the reconstruction of the si#nal is
concerned. This redundancy, on the other hand, requires a si#nificant amount of
computation time and resources. The discrete wavelet transform (:WT), on the other
hand, provides sufficient information both for analysis and synthesis of the ori#inal
si#nal, with a si#nificant reduction in the computation time.
The :WT is considerably easier to implement when compared to the 6WT. The basic
concepts of the :WT will be introduced in this section alon# with its properties and the
al#orithms used to compute it. >s in the previous chapters, e&les are provided to aid
in the interpretation of the :WT.
THE DISCRETE WAVELET TRANSFORM (DWT)
The foundations of the :WT #o back to 1EC2 when 6roiser, <steban, and =aland devised
a technique to decompose discrete time si#nals. 6rochiere, Weber, and +lana#an did a
similar work on codin# of speech si#nals in the same year. They named their analysis
scheme as subband coding. $n 1ED5, Furt defined a technique very similar to subband
codin# and named it pyramidal coding which is also known as multiresolution analysis.
;ater in 1EDE, Vetterli and ;e =all made some improvements to the subband codin#
scheme, removin# the e&istin# redundancy in the pyramidal codin# scheme. )ubband
codin# is e&plained below. > detailed covera#e of the discrete wavelet transform and
theory of multiresolution analysis can be found in a number of articles and books that are
available on this topic, and it is beyond the scope of this tutorial.
The Subband Coding and The Multiresolution Analysis
The main idea is the same as it is in the 6WT. > time(scale representation of a di#ital
si#nal is obtained usin# di#ital filterin# techniques. ?ecall that the 6WT is a correlation
between a wavelet at different scales and the si#nal with the scale (or the frequency)
bein# used as a measure of similarity. The continuous wavelet transform was computed
by chan#in# the scale of the analysis window, shiftin# the window in time, multiplyin#
by the si#nal, and inte#ratin# over all times. $n the discrete case, filters of different cutoff
frequencies are used to analy/e the si#nal at different scales. The si#nal is passed throu#h
a series of hi#h pass filters to analy/e the hi#h frequencies, and it is passed throu#h a
series of low pass filters to analy/e the low frequencies.
The resolution of the si#nal, which is a measure of the amount of detail information in the
si#nal, is chan#ed by the filterin# operations, and the scale is chan#ed by upsamplin# and
downsamplin# (subsamplin#) operations. )ubsamplin# a si#nal corresponds to reducin#
the samplin# rate, or removin# some of the samples of the si#nal. +or e&le,
subsamplin# by two refers to droppin# every other sample of the si#nal. )ubsamplin# by
a factor n reduces the number of samples in the si#nal n times.
1psamplin# a si#nal corresponds to increasin# the samplin# rate of a si#nal by addin#
new samples to the si#nal. +or e&le, upsamplin# by two refers to addin# a new
sample, usually a /ero or an interpolated value, between every two samples of the si#nal.
1psamplin# a si#nal by a factor of n increases the number of samples in the si#nal by a
factor of n.
>lthou#h it is not the only possible choice, :WT coefficients are usually sampled from
the 6WT on a dyadic #rid, i.e., s
0
@ A and
0
@ 1, yieldin# s@A
%
and @k*A
%
, as described
in Iart 5. )ince the si#nal is a discrete time function, the terms function and sequence will
be used interchan#eably in the followin# discussion. This sequence will be denoted by
&TnU, where n is an inte#er.
The procedure starts with passin# this si#nal (sequence) throu#h a half band di#ital
lowpass filter with impulse response hTnU. +ilterin# a si#nal corresponds to the
mathematical operation of convolution of the si#nal with the impulse response of the
filter. The convolution operation in discrete time is defined as follows7
> half band lowpass filter removes all frequencies that are above half of the hi#hest
frequency in the si#nal. +or e&le, if a si#nal has a ma&imum of 1000 / component,
then half band lowpass filterin# removes all the frequencies above 300 /.
The unit of frequency is of particular importance at this time. $n discrete si#nals,
frequency is e&pressed in terms of radians. >ccordin#ly, the samplin# frequency of the
si#nal is equal to A radians in terms of radial frequency. Therefore, the hi#hest
frequency component that e&ists in a si#nal will be radians, if the si#nal is sampled at
4yquistWs rate (which is twice the ma&imum frequency that e&ists in the si#nal)! that is,
the 4yquistWs rate corresponds to rad0s in the discrete frequency domain. Therefore
usin# / is not appropriate for discrete si#nals. owever, / is used whenever it is
needed to clarify a discussion, since it is very common to think of frequency in terms of
/. $t should always be remembered that the unit of frequency for discrete time si#nals is
radians.
>fter passin# the si#nal throu#h a half band lowpass filter, half of the samples can be
eliminated accordin# to the 4yquistWs rule, since the si#nal now has a hi#hest frequency
of 0A radians instead of radians. )imply discardin# every other sample will subsample
the si#nal by two, and the si#nal will then have half the number of points. The scale of
the si#nal is now doubled. 4ote that the lowpass filterin# removes the hi#h frequency
information, but leaves the scale unchan#ed. 9nly the subsamplin# process chan#es the
scale. ?esolution, on the other hand, is related to the amount of information in the si#nal,
and therefore, it is affected by the filterin# operations. alf band lowpass filterin#
removes half of the frequencies, which can be interpreted as losin# half of the
information. Therefore, the resolution is halved after the filterin# operation. 4ote,
however, the subsamplin# operation after filterin# does not affect the resolution, since
removin# half of the spectral components from the si#nal makes half the number of
samples redundant anyway. alf the samples can be discarded without any loss of
information. $n summary, the lowpass filterin# halves the resolution, but leaves the scale
unchan#ed. The si#nal is then subsampled by A since half of the number of samples are
redundant. This doubles the scale.
This procedure can mathematically be e&pressed as
avin# said that, we now look how the :WT is actually computed7 The :WT analy/es
the si#nal at different frequency bands with different resolutions by decomposin# the
si#nal into a coarse appro&imation and detail information. :WT employs two sets of
functions, called scalin# functions and wavelet functions, which are associated with low
pass and hi#hpass filters, respectively. The decomposition of the si#nal into different
frequency bands is simply obtained by successive hi#hpass and lowpass filterin# of the
time domain si#nal. The ori#inal si#nal &TnU is first passed throu#h a halfband hi#hpass
filter #TnU and a lowpass filter hTnU. >fter the filterin#, half of the samples can be
eliminated accordin# to the 4yquistWs rule, since the si#nal now has a hi#hest frequency
of 0A radians instead of . The si#nal can therefore be subsampled by A, simply by
discardin# every other sample. This constitutes one level of decomposition and can
mathematically be e&pressed as follows7
where y
hi#h
TkU and y
low
TkU are the outputs of the hi#hpass and lowpass filters, respectively,
after subsamplin# by A.
This decomposition halves the time resolution since only half the number of samples now
characteri/es the entire si#nal. owever, this operation doubles the frequency resolution,
since the frequency band of the si#nal now spans only half the previous frequency band,
effectively reducin# the uncertainty in the frequency by half. The above procedure, which
is also known as the subband codin#, can be repeated for further decomposition. >t every
level, the filterin# and subsamplin# will result in half the number of samples (and hence
half the time resolution) and half the frequency band spanned (and hence double the
frequency resolution). +i#ure 8.1 illustrates this procedure, where &TnU is the ori#inal
si#nal to be decomposed, and hTnU and #TnU are lowpass and hi#hpass filters, respectively.
The bandwidth of the si#nal at every level is marked on the fi#ure as .f..
+i#ure 8.1. The )ubband 6odin# >l#orithm >s an e&le, suppose that the ori#inal
si#nal &TnU has 31A sample points, spannin# a frequency band of /ero to rad0s. >t the
first decomposition level, the si#nal is passed throu#h the hi#hpass and lowpass filters,
followed by subsamplin# by A. The output of the hi#hpass filter has A32 points (hence
half the time resolution), but it only spans the frequencies 0A to rad0s (hence double
the frequency resolution). These A32 samples constitute the first level of :WT
coefficients. The output of the lowpass filter also has A32 samples, but it spans the other
half of the frequency band, frequencies from 0 to 0A rad0s. This si#nal is then passed
throu#h the same lowpass and hi#hpass filters for further decomposition. The output of
the second lowpass filter followed by subsamplin# has 1AD samples spannin# a frequency
band of 0 to 08 rad0s, and the output of the second hi#hpass filter followed by
subsamplin# has 1AD samples spannin# a frequency band of 08 to 0A rad0s. The second
hi#hpass filtered si#nal constitutes the second level of :WT coefficients. This si#nal has
half the time resolution, but twice the frequency resolution of the first level si#nal. $n
other words, time resolution has decreased by a factor of 8, and frequency resolution has
increased by a factor of 8 compared to the ori#inal si#nal. The lowpass filter output is
then filtered once a#ain for further decomposition. This process continues until two
samples are left. +or this specific e&le there would be D levels of decomposition, each
havin# half the number of samples of the previous level. The :WT of the ori#inal si#nal
is then obtained by concatenatin# all coefficients startin# from the last level of
decomposition (remainin# two samples, in this case). The :WT will then have the same
number of coefficients as the ori#inal si#nal.
The frequencies that are most prominent in the ori#inal si#nal will appear as hi#h
amplitudes in that re#ion of the :WT si#nal that includes those particular frequencies.
The difference of this transform from the +ourier transform is that the time locali/ation of
these frequencies will not be lost. owever, the time locali/ation will have a resolution
that depends on which level they appear. $f the main information of the si#nal lies in the
hi#h frequencies, as happens most often, the time locali/ation of these frequencies will be
more precise, since they are characteri/ed by more number of samples. $f the main
information lies only at very low frequencies, the time locali/ation will not be very
precise, since few samples are used to e&press si#nal at these frequencies. This procedure
in effect offers a #ood time resolution at hi#h frequencies, and #ood frequency resolution
at low frequencies. -ost practical si#nals encountered are of this type.
The frequency bands that are not very prominent in the ori#inal si#nal will have very low
amplitudes, and that part of the :WT si#nal can be discarded without any ma%or loss of
information, allowin# data reduction. +i#ure 8.A illustrates an e&le of how :WT
si#nals look like and how data reduction is provided. +i#ure 8.Aa shows a typical 31A(
sample si#nal that is normali/ed to unit amplitude. The hori/ontal a&is is the number of
samples, whereas the vertical a&is is the normali/ed amplitude. +i#ure 8.Ab shows the D
level :WT of the si#nal in +i#ure 8.Aa. The last A32 samples in this si#nal correspond to
the hi#hest frequency band in the si#nal, the previous 1AD samples correspond to the
second hi#hest frequency band and so on. $t should be noted that only the first 28
samples, which correspond to lower frequencies of the analysis, carry relevant
information and the rest of this si#nal has virtually no information. Therefore, all but the
first 28 samples can be discarded without any loss of information. This is how :WT
provides a very effective data reduction scheme.
+i#ure 8.A <&le of a :WT
We will revisit this e&le, since it provides important insi#ht to how :WT should be
interpreted. Fefore that, however, we need to conclude our mathematical analysis of the
:WT.
9ne important property of the discrete wavelet transform is the relationship between the
impulse responses of the hi#hpass and lowpass filters. The hi#hpass and lowpass filters
are not independent of each other, and they are related by
where #TnU is the hi#hpass, hTnU is the lowpass filter, and ; is the filter len#th (in number
of points). 4ote that the two filters are odd inde& alternated reversed versions of each
other. ;owpass to hi#hpass conversion is provided by the ((1)
n
term. +ilters satisfyin# this
condition are commonly used in si#nal processin#, and they are known as the Guadrature
-irror +ilters (G-+). The two filterin# and subsamplin# operations can be e&pressed by
The reconstruction in this case is very easy since halfband filters form orthonormal bases.
The above procedure is followed in reverse order for the reconstruction. The si#nals at
every level are upsampled by two, passed throu#h the synthesis filters #WTnU, and hWTnU
(hi#hpass and lowpass, respectively), and then added. The interestin# point here is that
the analysis and synthesis filters are identical to each other, e&cept for a time reversal.
Therefore, the reconstruction formula becomes (for each layer)
owever, if the filters are not ideal halfband, then perfect reconstruction cannot be
achieved. >lthou#h it is not possible to reali/e ideal filters, under certain conditions it is
possible to find filters that provide perfect reconstruction. The most famous ones are the
ones developed by $n#rid :aubechies, and they are known as :aubechiesW wavelets.
4ote that due to successive subsamplin# by A, the si#nal len#th must be a power of A, or
at least a multiple of power of A, in order this scheme to be efficient. The len#th of the
si#nal determines the number of levels that the si#nal can be decomposed to. +or
e&le, if the si#nal len#th is 10A8, ten levels of decomposition are possible.
$nterpretin# the :WT coefficients can sometimes be rather difficult because the way
:WT coefficients are presented is rather peculiar. To make a real lon# story real short,
:WT coefficients of each level are concatenated, startin# with the last level. >n e&le
is in order to make this concept clear7
)uppose we have a A32(sample lon# si#nal sampled at 10 -X and we wish to obtain its
:WT coefficients. )ince the si#nal is sampled at 10 -/, the hi#hest frequency
component that e&ists in the si#nal is 3 -/. >t the first level, the si#nal is passed
throu#h the lowpass filter hTnU, and the hi#hpass filter #TnU, the outputs of which are
subsampled by two. The hi#hpass filter output is the first level :WT coefficients. There
are 1AD of them, and they represent the si#nal in the TA.3 3U -/ ran#e. These 1AD
samples are the last 1AD samples plotted. The lowpass filter output, which also has 1AD
samples, but spannin# the frequency band of T0 A.3U -/, are further decomposed by
passin# them throu#h the same hTnU and #TnU. The output of the second hi#hpass filter is
the level A :WT coefficients and these 28 samples precede the 1AD level 1 coefficients in
the plot. The output of the second lowpass filter is further decomposed, once a#ain by
passin# it throu#h the filters hTnU and #TnU. The output of the third hi#hpass filter is the
level 5 :WT coefficiets. These 5A samples precede the level A :WT coefficients in the
plot.
The procedure continues until only 1 :WT coefficient can be computed at level E. This
one coefficient is the first to be plotted in the :WT plot. This is followed by A level D
coefficients, 8 level C coefficients, D level 2 coefficients, 12 level 3 coefficients, 5A level
8 coefficients, 28 level 5 coefficients, 1AD level A coefficients and finally A32 level 1
coefficients. 4ote that less and less number of samples is used at lower frequencies,
therefore, the time resolution decreases as frequency decreases, but since the frequency
interval also decreases at low frequencies, the frequency resolution increases. 9bviously,
the first few coefficients would not carry a whole lot of information, simply due to
#reatly reduced time resolution. To illustrate this richly bi/arre :WT representation let us
take a look at a real world si#nal. 9ur ori#inal si#nal is a A32(sample lon# ultrasonic
si#nal, which was sampled at A3 -/. This si#nal was ori#inally #enerated by usin# a
A.A3 -/ transducer, therefore the main spectral component of the si#nal is at A.A3 -/.
The last 1AD samples correspond to T2.A3 1A.3U -/ ran#e. >s seen from the plot, no
information is available here, hence these samples can be discarded without any loss of
information. The precedin# 28 samples represent the si#nal in the T5.1A 2.A3U -/ ran#e,
which also does not carry any si#nificant information. The little #litches probably
correspond to the hi#h frequency noise in the si#nal. The precedin# 5A samples represent
the si#nal in the T1.3 5.1U -/ ran#e. >s you can see, the ma%ority of the si#nalWs ener#y
is focused in these 5A samples, as we e&pected to see. The previous 12 samples
correspond to T0.C3 1.3U -/ and the peaks that are seen at this level probably represent
the lower frequency envelope of the si#nal. The previous samples probably do not carry
any other si#nificant information. $t is safe to say that we can #et by with the 5
rd
and 8
th
level coefficients, that is we can represent this A32 sample lon# si#nal with 12B5A@8D
samples, a si#nificant data reduction which would make your computer quite happy.
9ne area that has benefited the most from this particular property of the wavelet
transforms is ima#e processin#. >s you may well know, ima#es, particularly hi#h(
resolution ima#es, claim a lot of disk space. >s a matter of fact, if this tutorial is takin# a
lon# time to download, that is mostly because of the ima#es. :WT can be used to reduce
the ima#e si/e without losin# much of the resolution. ere is how7
+or a #iven ima#e, you can compute the :WT of, say each row, and discard all values in
the :WT that are less then a certain threshold. We then save only those :WT
coefficients that are above the threshold for each row, and when we need to reconstruct
the ori#inal ima#e, we simply pad each row with as many /eros as the number of
discarded coefficients, and use the inverse :WT to reconstruct each row of the ori#inal
ima#e. We can also analy/e the ima#e at different frequency bands, and reconstruct the
ori#inal ima#e by usin# only the coefficients that are of a particular band. $ will try to put
sample ima#es hopefully soon, to illustrate this point.
>nother issue that is receivin# more and more attention is carryin# out the decomposition
(subband codin#) not only on the lowpass side but on both sides. $n other words, /oomin#
into both low and hi#h frequency bands of the si#nal separately. This can be visuali/ed as
havin# both sides of the tree structure of +i#ure 8.1. What result is what is known as the
wavelet packages. We will not discuss wavelet packa#es in this here, since it is beyond
the scope of this tutorial. >nyone who is interested in wavelet packa#es, or more
information on :WT can find this information in any of the numerous te&ts available in
the market.
>nd this concludes our mini series of wavelet tutorial. $f $ could be of any assistance to
anyone stru##lin# to understand the wavelets, $ would consider the time and the effort
that went into this tutorial well spent. $ would like to remind that this tutorial is neither a
complete nor a throu#h covera#e of the wavelet transforms. $t is merely an overview of
the concept of wavelets and it was intended to serve as a first reference for those who
find the available te&ts on wavelets rather complicated. There mi#ht be many structural
and0or technical mistakes, and $ would appreciate if you could point those out to me.
Hour feedback is of utmost importance for the success of this tutorial.
Thank you very much for your interest in The Wavelet Tutorial .
+i#ure A.15
Well, this should be of no surprise to anyone now, since we would e&pect a terrible (and $
mean absolutely terrible) time resolution.
These e&les should have illustrated the implicit problem of resolution of the )T+T.
>nyone who would like to use )T+T is faced with this problem of resolution. What kind
of a window to use, 4arrow windows #ive #ood time resolution, but poor frequency
resolution. Wide windows #ive #ood frequency resolution, but poor time resolution!
furthermore, wide windows may violate the condition of stationarity. The problem, of
course, is a result of choosin# a window function, once and for all, and use that window
in the entire analysis. The answer, of course, is application dependent7 $f the frequency
components are well separated from each other in the ori#inal si#nal, than we may
sacrifice some frequency resolution and #o for #ood time resolution, since the spectral
components are already well separated from each other. owever, if this is not the case,
then a #ood window function, could be more difficult than findin# a #ood stock to invest
in.
Fy now, you should have reali/ed how wavelet transform comes into play. The Wavelet
transform (WT) solves the dilemma of resolution to a certain e&tent, as we will see in the
ne&t part.
This completes Iart $$ of this tutorial. The continuous wavelet transform is the sub%ect of
the Iart $$$ of this tutorial. $f you did not have much trouble in comin# this far, and what
have been written above make sense to you, you are now ready to take the ultimate
challen#e in understandin# the basic concepts of the wavelet theory.