Digital Image Warping:: Massachusetts Institute of Technology
Digital Image Warping:: Massachusetts Institute of Technology
Digital Image Warping:: Massachusetts Institute of Technology
Author ........................................
Department ot Electrical Engmeerntg anau , mputer Science
May 17, 1996
Certified by .....................
George Verghese
Thesis Supervisor
Accepted by
:. Morgenthaler
,raduate Theses
JUN 1 1 1996
LIBRARIES
Digital Image Warping:
Theory and Real Time Hardware Implementation Issues
by
Submitted to the
Department of Electrical Engineering and Computer Science
Abstract
This thesis begins with a broad introduction to digital image warping. It includes back-
ground on spatial transformations and basic signal processing theory as they are used in
the warping application. Next, an analysis of various interpolation kernels in the time
domain is performed. The results from this study lead directly to a new approach to per-
forming a generalized high-quality image warp that maps well to an efficient hardware
implementation. Finally, the design of a hardware system capable of performing image
warps in real-time is described.
First I would like to thank my 6-A internship company, the David Sarnoff Research Center
for sponsoring my work. In particular, I would like to recognize Gooitzen van der Wal and
Peter Burt for finding me such an interesting project, supervising my work, and always
having the time to listen and help.
Thanks to Professor George Verghese for being my MIT thesis supervisor and providing
great feedback throughout the process.
Thanks to my family for always being so supportive (and keeping me well supplied with
candy and cookies.)
Thanks to my brother Dan for always using my car so I had nothing else to do but work on
my thesis.
Thanks to all my friends at MIT for providing me with plenty of opportunities to relax and
take my mind off of thesis.
Finally, thanks to Benjie Chen for providing me with disk space to store all those warped
images.
Table of Contents
1 Introduction ................................................................................................................ 13
1.1 Image W arping................................................ ........................................... 13
1.2 Thesis Overview ........................................................... 14
2 Spatial Transform ations ............................................................ ........................... 17
2.1 D efinitions........................................................................................................ 17
2.2 Polynomial Transformations....................................... ...................... 18
2.3 Perspective Transformation .................................................................. 19
2.4 Flow-Field Transformation .................................................... 20
2.5 Summary Remarks ..................................................... ........................ 20
3 Resampling Theory for Image Warping .............................................. 23
3.1 Sampling of Continuous-Time Signals ....................................... .... 23
3.2 Resampling Through Interpolation in Digital Domain............................... 28
3.3 Summary Remarks..................................................... ........................ 33
4 Analysis of Interpolation Methods for Image Resampling............... ... 35
4.1 D escription ...................................................................................................... 35
4.2 Expected Error Calculation............................................ .................... 36
4.3 Sampling Rate Issues ........................................................ 41
5 High Quality, Efficient Image Warping............................................45
5.1 General Description of Method. ................................................................... 45
5.2 Comparison with Standard Techniques ...................................... ..... 53
5.3 Optimal Upsampling Kernel Design................................... ....... 61
5.4 Final R em arks ....................................................................... ..................... 73
6 H ardw are D esign ......................................................................... ........................ 75
6.1 B ackground ......................................................................... ....................... 75
6.2 Functionality Requirements .................................................... 75
6.3 C om ponents ........................................................................ ....................... 75
6.4 High-Level Design ....................................................... ..................... 77
6.5 Warp Board Functional Description ........................................ ..... 77
6.6 Warp Control Block..................................................... ...................... 78
6.7 Address Generation Block .................................................... 82
6.8 Warp Memory Block .................................................... .................... 83
6.9 Coefficient Look-Up Table .................................................. 84
6.10 Sum of Products Interpolator ................................................... 85
6.11 C onclusion ......................................................................... ........................ 86
7 C onclusions........................................................ .................................................. 87
7.1 Summary of Work..................................................... ........................ 87
7.2 Further Research ......................................................... ....................... 87
Appendix A Forward Difference Method .................................................................. 89
B ibliography .................................................................................. ......................... 93
List of Figures
Figure 2.1: General inverse mapping ............................................ ......... 17
Figure 3.1: Continuous signal to discrete signal ........................................ ..... 24
Figure 3.2: Continuous signal to discrete signal in the frequency domain .................. 27
Figure 3.3: Resampling at an arbitrary location........................................................... 30
Figure 3.4: Upsampling by U in frequency domain. ..................................... ... 32
Figure 3.5: Downsampling, with and without aliasing ........................................ 33
Figure 4.1: Reconstruction error illustration........................................ 37
Figure 4.2: Interpolation kernels and their frequency responses ................................. 38
Figure 4.3: Expected error vs. frequency for sampling rate of 2000Hz. ..................... 39
Figure 4.4: Expected error vs. frequency for sampling rate of 2000Hz. ..................... 40
Figure 4.5: Sampling rate multiplier (linear vs. cubic convolution) ............................ 42
Figure 4.6: Sampling rate multiplier (nearest neighbor vs. linear) .............................. 42
Figure 5.1: High quality warp system ...................................... ............. 46
Figure 5.2: Upsampling stage .......................................................... 48
Figure 5.3: Frequency domain comparison. ......................................... ...... 49
Figure 5.4: Downsampling as part of image warp ...................................... ...... 50
Figure 5.5: Original and bilinear rotation images................................. ...... 56
Figure 5.6: Keys-3 and Keys-4 rotation images. ....................................... ..... 56
Figure 5.7: Up2(Keys-3) and Up2(Keys-4) rotation images ...................................... 57
Figure 5.8: Diamond cycle RMS error. .................................................. 58
Figure 5.9: Diamond cycle RMS error. ........................................... ....... 59
Figure 5.10: Original and bilinear after 12 diamond cycles. ..................................... 60
Figure 5.11: Keys-3 and Keys-4 after 12 diamond cycles .......................................... 60
Figure 5.12: Up2(Keys-3) and Up2(Key-4) after 12 diamond cycles ........................... 61
Figure 5.13: General 4 tap upsampling filter ........................................ ...... 63
Figure 5.14: Frequency response of various 4-tap filters. .............................................. 63
Figure 5.15: Frequency response of overall system for integer and half-pixel translation us-
ing the Lohmeyer kernel for upsampling ...................................................................... 65
Figure 5.16: Frequency response of overall system for integer and half-pixel translations
using Pyramid kernel for upsampling ............................................ ........ 66
Figure 5.17: Up2(Loh) and Up2(Pyr) images after 16 rotations. ................................... 67
Figure 5.18: RMS error for diamond test..............................................68
Figure 5.19: RMS error for diamond test..............................................68
Figure 5.20: Up2(Loh) and Up2(Pyr) for diamond test ............................................... 69
Figure 5.21: RMS error for random translation test. ...................................... ... 70
Figure 5.22: RMS error for random translation ........................................ ..... 70
Figure 5.23: Original and bilinear image .................................................................... 71
Figure 5.24: Keys-3 and Keys-4 images ................................................. 72
Figure 5.25: Up2(Keys-3) and Up2(Keys-4) images...................................................72
Figure 5.26: Up2(Loh) and Up2(Pyr) images .............................................................. 73
Figure 6.1: Warp module block diagram. .......................................... ....... 76
Figure 6.2: Image data timing ...................................... ....... .................... 80
Figure 6.3: Image storage pattern. ...................................................... 84
List of Tables
Table 5.1: Warping Method Descriptions 54
Table 5.2; RMS Error for Rotation 55
Table 5.3: RMS Error for Rotation 66
Table 6.1: Warp Board Inputs 77
Table 6.2: Warp Board Outputs 77
Table 6.3: Control Registers 79
Table 6.4: Write Process 80
Table 6.5: Read Process 82
Table A.1: Increment Table 89
Chapter 1
Introduction
1.1 Image Warping
Digital image warping is an important and expanding branch of the field of digital image
image. Simple examples of warps include translation, rotation, and scale change. More
Image warping was first used extensively for geometric correction in remote sensing
applications. In the past twenty years, the field has grown to encompass a broad range of
applications. These include medical imaging, image stabilization, machine vision, image
synthesis and special effects generation. Every application has different speed, accuracy,
and flexibility requirements. For example, the warping for movie special effects has to
result in high-quality video, but the frames can be processed off-line and then pasted
together into a sequence afterwards. On the other hand, in certain machine vision applica-
tions, the warping must occur in real-time, but the quality of the resultant warped images
is not as critical.
In recent years, with the advances in digital logic and memory, it has become possible
is required. Moving target detection from a moving platform is one example. In this case,
it is not adequate to simply detect motion by frame differencing because the motion due to
the target will be lost in the motion from the moving camera. Instead, if the frames are first
warped to remove the motion due to the camera, and then differenced, the only motion left
will be due to the target. This motion can then be detected more readily. The desire to per-
form image warping in real-time has lead to an increased interest in algorithms that map
There are two basic components to an image warp: spatial transformation and resam-
between each point in the input image and a corresponding point in the output image.
There are many possible models for this geometric relationship. Typically, the spatial
transformation will require the value of points in the source image that do not correspond
to the integral sample positions. This is where the second component of image warping,
namely age resampling through interpolation, becomes important. Some sort of interpola-
tion is required to estimate the value of the underlying continuous source image at loca-
tions that don't correspond to the sample points. Together, these two components
transformations and basic signal processing theory are introduced to the extent required in
the warping application. Next, an analysis of various interpolation kernels in the time
domain is performed. The results from this study lead directly to an interesting approach
to performing a generalized high-quality image warp that maps well to an efficient hard-
ware implementation. Finally, the thesis describes the design of a hardware system capa-
Chapter 2 gives background on spatial transformations. There are many ways to spec-
ify the relationship between points in the source and target images. Different representa-
tions allow for varying degrees of freedom. The price for greater freedom is often higher
some detail because they are each important for certain real-time machine vision tasks.
These are: the polynomial transformation, the perspective transformation, and the flow-
field transformation.
Chapter 3 gives the theoretical framework for image resampling. First it discusses
reconstructed from its samples with a low-pass filter and then resampled to form a new
discrete sequence. Finally, this chapter reviews aliasing within the image warping frame-
describes some candidate interpolation kernels for resampling. Then it analyzes their per-
formance in the time domain. For each interpolation method, and a specified sampling
rate, the expected error between an original and reconstructed sinusoid of a certain fre-
quency is calculated. The error is then computed and graphed for a wide range of frequen-
cies to determine how the interpolation method performs as a function of the frequency of
the original signal. Next, the sampling rate is allowed to vary, while the frequency of the
analog sinusoid is fixed. In this way, it is possible to determine how much the sampling
rate must increase for an otherwise poor interpolation method to perform adequately. The
which may have numerous advantages over some previous approaches. This method is
analyzed in Chapter 5.
digital image warping through upsampling followed by warping and downsampling using
a smaller kernel for interpolation. This approach makes it particularly suitable for a low-
Finally, based on the test results and knowledge of the filter used in the warping stage, an
Chapter 6 describes the hardware design of a specific warper implementation for use
in a real-time image processing system. This includes a block diagram, summary of com-
ponents, and board level design description. The chapter also analyzes the engineering
Chapter 7 is a summary of the work completed, and a discussion of topics for further
research.
Chapter 2
Spatial Transformations
2.1 Definitions
In a digital image warp, the spatial transformation defines a geometric relationship
between each point in the input or source image and its corresponding point in the output
or target image [13]. This relationship between the points can be specified as either a for-
ward or inverse mapping. If [u,v] refers to the input image coordinates and [x,y] refers to
the output image coordinates, then a forward mapping gives the locations [x,y] as some
function of [u,v]. Likewise, an inverse mapping gives [u,v] as a function of [x,y]. This is
depicted in Figure 2.1, where U and V are the inverse mapping functions. For most hard-
The inverse mapping specifies reference locations in the source image as a function of
u x
V y
0
Source Image Target Image
each integer position the reference location in the source image is calculated. Because the
inverse mapping function can be arbitrary, the referenced source location may not have
integer coordinates. Therefore, some sort of interpolation is required to give the input
value at nonintegral input locations. This will be discussed later. The second difficulty
with the inverse mapping is that pixels in the source image may be left unaccounted for in
the target image. If the input is not appropriately bandlimited prior to this sort of a map-
ping, aliasing can occur. Methods of avoiding this aliasing will also be discussed later.
In this chapter, three general categories of inverse mappings will be presented. These
are: polynomial, perspective, and flow-field transformations. The selection of the most
appropriate model depends upon both the application and the implementation require-
N N
u = _ _a(i,j)x i
i = Oj = 0
N N
v = I b (i,j)x 'iy
i = Oj = 0
where a(i,j) and b(i,j) are constant coefficients. This transformation originated in the
remote sensing field, where it was used to correct both internal sensor distortions and
Mosaic construction is one example [5]. Typical hardware implementations use the for-
ward differencing method to compute [u,v]. This is discussed in Appendix A.
One specific polynomial transformation that is commonly used is the affine transfor-
mation, specified as
u = a+bx+cy
v = d+ex+fy
This transformation allows for image translation, rotation, scale, and shear. It has six
degrees of freedom, given by the six arbitrary constants. Therefore, only three correspond-
ing points in a source and target image are needed to infer the mapping. The affine trans-
formation can map a triangle to any other triangle. It can also map a rectangle to a
using the forward differencing method. Using the forward differencing method to calcu-
late the source image addresses requires six registers to hold the values of the constant
coefficients, four accumulators to store and update the current reference address, and four
Iff and g are taken to be zero, then the perspective transformation reduces to the affine
transformation. Iff and g are not zero, then it is a perspective or projective mapping. The
perspective mapping preserves lines in all orientations. The perspective mapping has eight
ment in hardware as the affine transformation. The numerator and denominator for each
[u,v] value can be computed using the forward differencing method, just as in the affine
case. Then a division is needed. This could be performed in a specialized division chip,
which may introduce a fairly long pipeline delay. Or, it could be done in a large look-up
table. Depending on the required accuracy of the result, both of these methods may be pro-
hibitively expensive in terms of cost or space. The perspective transformation can also be
any other spatial mapping. It can be viewed simply as a table that gives an input location
to reference for each output location. If these flow-field values are given to the warper
module from other modules, the flow-field adds very little complexity to the warper mod-
ule. The trade-off, of course, is that significant resources may be required elsewhere to
those transformations that are being considered for real-time hardware implementation at
the David Sarnoff Research Center. A more detailed description of these and other geo-
metric transformations can be found in [13]. A related problem is the estimation of the
constants of the warp equation, given a set of constraints. For example, in performing
image stabilization, interframe motion must be accurately estimated. It has been shown
way to perform this estimation. During each iteration, an optical flow-field is estimated
between the images through local cross-correlation analysis, then a motion model is fit to
the flow-field using weighted least-squares regression. The estimated transform is then
used to warp the previous image to the current image and the process is repeated between
them, [5]. This and other methods of estimating the warping parameters will not be dis-
pixel values based on a mathematical formula. This can range from something as simple
physics. If such a mathematical model is available, it is appropriate to warp the image sim-
ply by evaluating the formula at different points. In the case of sampled images, however,
no such formula is available. Usually, all that is available is the actual samples. To under-
stand what can be done under these conditions, one must understand the digital image as a
representation of a continuous image in both the spatial and frequency domains. First, the
sampling the continuous signal. If the sampling rate is T, this relationship is expressed as:
f[n] = fc (nT)
(3.1)
To derive the relationship between the Fourier transforms off[n] andfc(x), it is helpful
to analyze this conversion in two steps. In the first step, the continuous signal is sampled
by multiplication with an impulse train of period T. The second step is to then convert this
impulse train into a discrete sequence. This process is shown in the spatial domain in Fig-
ure 3.1.
fc(x) S(X) fs(X) f[n]
x 0 T 2T 3T4T5T X
S.LILLLUI
0 T 2T 3T 4T
~
5T x
~
0 1
Iv7V
2 3
s(x) = B ((x-nT)
n = -oo (3.2)
Then
Since fs(x) is the product offc(x) and s(x), the Fourier transform offs(x) is the convolution
sO(jn) =
k = -00
(3.6)
where Qs = 21i/T, which is the sampling frequency. Doing the convolution then gives:
F s (j) = Fc (j - kjs) (3.7)
Equation 3.7 gives the frequency-domain relationship between the original analog sig-
nal and the impulse-sampled analog signal. The Fourier transform of the impulse sampled
signal comprises copies of the Fourier transform of the original signal superimposed at
integer multiples of the sampling frequency. The higher the sampling rate, the further
these copies are spread apart. From this analysis, one can see the origins of the Nyquist
a high enough rate that the copies of the Fourier transform of the original signal do not
overlap in the Fourier transform of the sampled signal. If there is no overlap, then the orig-
inal signal can be exactly reconstructed with an ideal lowpass filter with the correct gain.
An ideal lowpass filter will reject all copies except the one centered at the origin, giving
the original signal back. For there to be no overlap, the sampling rate must be at least
Us > 2Mn
(3.8)
If there is overlap in the frequency domain, the signal can no longer be exactly recon-
structed from its samples. The error that occurs due to this overlap is referred to as alias-
ing.
The second step in the continuous to discrete process is the conversion from the
impulse-sampled signal in the continuous domain to a discrete sequence. The values of the
discrete time sequence at n = 0, 1, 2,... are the areas of the impulses at 0, Us, 2Ms,... But
First the results will be derived mathematically. Taking the Fourier transform of equa-
f [n] fc (nT)
(3.10)
and
1 oo
F(ejQT) = 1 - Fc (jQ-jkQs) (3.13)
k = -oo
1 .co .2·rk •
F (eJ(O) = Fc co 2-) (3.14)
k = -0W
From these equations it can be seen that the conversion from the continuous domain
the frequency domain. The frequency scaling is given by cow=T. This scaling normalizes
the continuous frequency 9=Qs to o-=2t for the Fourier transform. This makes some intu-
itive sense as well. In the spatial domain, the impulses are spaced T apart, while the dis-
crete sequence values are spaced apart by one. Thus, the spatial axis is normalized by a
factor of T. In the frequency domain then, the axis is normalized by a factor of 1/T. [10]
A graphical representation is quite useful in understanding the relationship between all
FP/ti 01
S(jQ)
i
i • I • •
Fs(jQ)
-ON ON Qs Q
F(ej )
1/T
-UNT QNT
in the spatial and frequency domain. In the spatial domain, the values of the discrete
sequence at n = 0, 1, 2, etc. are the values of the continuous-time signal at the sample
points, 0, T, 2T, etc. In the frequency domain, the Fourier transform of the discrete
sequence can be produced by normalizing the frequency axis of the continuous signal by a
factor of 1/T, and then replicating this at all integer multiples of 27t. The two-dimensional
case can be analyzed using the same methods as in the one-dimensional case here. The
only difference is now one has to worry about frequencies in two dimensions.
For the purposes of this thesis, it will be assumed that no appreciable aliasing occurs in
the continuous to discrete sampling of an image. This is indeed the case with most of the
relationships between the underlying analog image and its samples is still of extreme
importance for warping. In this situation, a discrete image is all that is given. The values
of the underlying continuous image are only known at integer locations in this discrete
probable that the values of the continuous image at other positions than the original sam-
ple points will be needed. This is the fundamental resampling problem encountered in dig-
to another. In the case of image warping, these two coordinate systems are related to each
other by the spatial transformation that defines the warp. Conceptually, resampling can be
divided into two steps. First, interpolate the discrete image into a continuous one. Second,
sample this continuous image at the desired locations to form the resampled image. In
practice, these two steps are often consolidated so the interpolated values are only calcu-
lated for those locations that will be sampled. This makes it possible to implement the res-
ampling procedure entirely in the digital domain. This section discusses the spatial and
frequency-domain interpretations of the resampling process. It also examines how this res-
ampling can be accomplished in digital hardware for an arbitrary geometric transforma-
tion function.
Conceptually, the first step in the resampling process is reconstruction of the bandlim-
ited analog signal from its samples. In Section 3.1, it was shown that if the original signal
after modulation by an impulse train was appropriately low-pass filtered, it would return
the original analog signal. This impulse train, fs(t), can be constructed from the discrete
samples as follows:
Then if Hr(jQ) is the frequency response of the low-pass filter, and h,(x) is its impulse
The perfect Hr(jI) is an ideal low-pass filter with cut-off frequency of R/T and a gain
sin( mc)
hr(x) = (3.17)
7Cx
T
So, perfect reconstruction comes from convolving the samples with a sinc function.
One important property of this sinc function is that it has a value of unity at x=O, and a
value of zero at all other integer multiples of T. This means that the reconstructed signal
has the same values at the sample points as the original continuous signal.
After the continuous signal is reconstructed, it can be resampled at different positions
to form the resampled sequence. Of course, in real implementations, this is not how the
resampling is performed. One approach to performing this resampling entirely in the digi-
In the typical resampling problem, the samples at integer locations are given. From
these, we wish to determine the value of the continuous sequence at other locations. This
f rn f c(x) "
It is obvious at this point that the sinc function can not possibly be used for reconstruc-
tion, as it is infinitely long. Instead, a finite length interpolation function must be used so
that the convolution can be performed. Of course, this means that it will not correspond to
a perfect low-pass filter, which will result in some error. The trade-offs involved in picking
an appropriate interpolation method are discussed in greater detail in Chapter 4. For prac-
approach, a kernel with a width of four will be used. This interpolation kernel will be
referred to as hr4(x).
Call the spatial location we are interested in loc. This can be written as the sum of a
was carried out using hr4(X) as the kernel, the value of the function at loc would be:
2
fr (loc) = C f[k +intloc] x hr 4 (fracloc-k) (3.19)
k= -1
This approach lends itself well to a hardware implementation. A look-up table can be
used to store the values of the interpolation kernel. The integer portion of the referenced
source location is used to determine which sample points are accessed, and the fractional
part of the source location determines the weights of the coefficients as generated by the
look-up table. The final value at the referenced location is given by a sum of products of
the sample points and the weighting coefficients. This approach can easily be used in two
dimensions for image warping resampling. For example, if the one dimensional interpola-
tion kernel is of width four, then this method will require a pixel area of 4x4, with sixteen
weighting coefficients.
Next, it will be helpful to examine the effects that upsampling and downsampling have
in the frequency domain. In general, the geometric transform of an image warp may
require upsampling in some areas of the image and downsampling in others. Or, it may
ematical form. Therefore, it will be useful to analyze the simpler cases of regular upsam-
pling and downsampling to gain intuition into the frequency-domain effects of the
that the interpolation uses an ideal low-pass filter. If this is the case, then resampling in the
digital domain corresponds exactly to sampling the original signal at different locations.
Upsampling refers to an increase in the sampling rate. For example, upsampling by
two corresponds to doubling the sampling rate of the continuous signal. From the analysis
in Section 3.1, it can easily be seen that upsampling corresponds to a compression of the
Fourier transform. If the signal in upsampled by a factor of U, then the Fourier transform
IHl(eJ0)l IH2(ej0 )
Upsampling
-2c -1_
C 27 -2n -itU iM/U 2n
Of course, the relationship above holds only if the low-pass filter used for interpolation is
Downsampling refers to a decrease in the sampling rate. If the signal is not appropri-
ately low-pass filtered before the downsampling occurs, aliasing can result. If the signal is
downsampled by a factor of D, then the Fourier transform of the signal is expanded in fre-
quency by that same factor. This can lead to an overlap in the frequency domain, which is
the cause of the aliasing. The problem is identical to that of sampling the original continu-
ous signal below the Nyquist rate. This aliasing problem can be avoiding by pre-filtering
the signal with a discrete-time low-pass filter to remove the frequencies that would over-
-27 Wn 2
2;
rt
The geometric transformations rarely requite shrinking the image to a size where aliasing
might result. Also, the frequency composition of typical images rarely contains much
energy in the higher frequencies that would overlap. For example, if the image contains no
energy in radian frequencies higher than 7c/2, then the image can be downsampled by a
factor of two in both directions, and no aliasing will occur. Also, if the geometric transfor-
mation is known beforehand, then the image can be preprocessed with a low-pass filter to
remove the frequencies that would cause problems. Methods of performing this pre-filter-
The usual culprit in poor quality image warps comes not from aliasing, but from the
method used for interpolation prior to resampling. Chapter 4 will analyze the performance
structed from its samples using practical interpolators. If the original continuous image is
bandlimited and sampled above the Nyquist frequency, sampling theory shows it can be
exactly recovered with a perfect low-pass filter. In the spatial domain, this corresponds to
convolving with a two dimensional sinc function. This is not a practical option, as the sinc
function is infinitely long and falls off only as the reciprocal of distance. Instead, practical
implementations use an interpolation kernel of finite support. In general, the larger the
support of the interpolation kernel is allowed to be, the more accurately its frequency
response can approximate the ideal. However, longer interpolation kernels require greater
tation versus accuracy of reconstruction. Much work has been done to identify methods of
interpolation that strike an appropriate balance between these two competing require-
ments.
There are numerous papers that compare various interpolation methods for their accu-
racy of reconstruction [8]. Two general approaches to evaluation are usually taken. The
first is simply to resample test images using different interpolation methods and visually
assess the quality of the resultant image. The second, more analytical method is to base the
The first method is very straightforward, and appropriate for applications where dis-
play for human viewing is the ultimate goal. It has the natural advantage that it includes
the effects of human perception. However, it is somewhat ad hoc; the error is not quanti-
fied. For comparison of interpolation methods, all that can really be said is that one
"looks" better that the other. What may be reasonable for viewing may not be adequate if
The second method, on the other hand, is quantitative. Analysis in the frequency
domain is quite powerful. Given the frequency spectrum of an image, the sampling rate,
and the interpolation method, one can easily determine how well the method approximates
the ideal. This is the approach usually taken, and this sort of analysis can be found in most
level, it will attempt to quantify the error incurred by different interpolation methods in the
spatial domain in a way that is appropriate to the image warping application. To facilitate
analysis, this will be done in one dimension. First, take a continuous sinusoid of frequency
f Hz, actual(t) = sin (27tft). Next, sample this wave at 1/T Hz to produce,
samples [n] = sin( 274). Finally, using these samples and the selected interpolation
method, reconstruct the continuous sinewave. The reconstruction error is then calculated
as follows:
b
ReconstructionError =
ReconstructionError lactual (t) - interpolated(t) Idt
J (b - a)
a
Reconstruction Error Calculation (4.1)
This can be interpreted graphically as the area between the two functions divided by the
length over which the area is being computed. Figure 4.1 shows this graphical interpreta-
tion.
Figure 4.1: Reconstruction error illustration.
The length of integration is taken to be one complete period of the sinewave. The
reconstruction error is calculated and averaged for all possible phase shifts of the sample
points relative to the sinewave to give the expected error. This gives the average vertical
separation between the two functions, and because it is computed for all phase shifts,
A few different interpolation kernels will be examined in this manner. Figure 4.2
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
Linear:
h (x) IH(f)
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
-4 -3 -2 -1 0 1 2 3 4
I .1
Hamming Windowed Sinc: I 1
IH(f)l
.75
nc)*(.54 + .46cos(nx/2)) 0<1xI<2
h(x) ==
L0inc(x) 2<1x1
.25
201101 2 3. 4
-4 -3 -2 -1 0 1 2 3 4
I I
Lanczos Windowed Sinc:
h (x) IH(f)I
.I...
i .....
i . . .
.15...!...... ..
. .. . .............
...... .-.............-
.......
..... i.....
. ........... ..
....... ......-.......
.75 .75 ...
75.
( nx )* in (i x
i
2s rx/w)/ 0<IxI<2
h(x) = 0 nc
2<Ixl 25
o .25 ....... .. ...
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
0.8
0.5
,L 0.4
S0.3
0.2
0.1
0-
0 - -. ---------
:7
-0.1
0 100 200 300 400 500 600 700 800 900 1000
Frequency of sinusoid in Hz
Figure 4.3: Expected error vs. frequency for sampling rate of 2000Hz.
Figure 4.3 shows the expected error for three basic interpolation methods. These are
nearest neighbor interpolation, linear interpolation, and third order cubic convolution. The
sampling rate is fixed at 2000Hz in this experiment. The horizontal axis is the frequency
beyond 1000Hz, then the signal is no longer Nyquist sampled and the expected error value
will include the affects of aliasing, which is not meant to be measured. The vertical axis is
tion is the next best, followed by cubic convolution interpolation. There is a marked dif-
ference between all three methods. This is directly linked to the fact that nearest neighbor
uses one sample point for interpolation, and linear and cubic use two and four sample
points, respectively. This basic result is also confirmed by frequency domain interpreta-
tion.
Lil
In
30
Frequency of sinusoid in Hz
Figure 4.4: Expected error vs. frequency for sampling rate of 2000Hz.
Figure 4.4 shows expected error for two other interpolation methods. Cubic convolu-
tion is included as a reference to Figure 4.3. The Lanczos windowed sinc kernel is the best
performer for the higher frequencies, but cubic convolution is better at lower frequencies.
Overall though, their performance is very similar because they all use four sample points.
In hardware, if the interpolation kernel values are precalculated, then all methods that use
a four-by-four interpolation kernel should produce nearly identical results, given a fixed
sampling rate.
In the preceding calculations of expected error, the sampling rate was fixed, and the
expected error was found as a function of frequency of the underlying sinusoid. An equiv-
alent analysis could be made by fixing the frequency of the sinusoid and varying the sam-
pling rate, since the expected error will be the same for two sets of sampling rates and
expected error would be identical for a 200Hz sinusoid sampled at 1000Hz as for a 400Hz
give superior results if we oversample the analog signal. For example, could linear inter-
polation with an oversampled signal give less error than cubic convolution interpolation
with a Nyquist sampled signal? The answer is clearly, yes. But, how much oversampling is
needed? The following analysis will attempt to quantify these things through analysis in
Figure 4.5 shows the "sampling rate multiplier" for linear interpolation versus cubic
follows. If we sample a sinusoid of a given frequency, the expected error under a cubic
convolution interpolation scheme is the same as the expected error using a linear interpo-
lation scheme when the sampling rate for the linear scheme exceeds that of the cubic
scheme by the sampling rate multiplier for that particular expected error. If a large
expected error is tolerable, then the sampling rate multiplier is quite small. However, as
the desired expected error approaches zero, the multiplier grows rapidly.
6
Expected Error
Figure 4.6 shows the same sort of information, except here the vertical axis is the sam-
pling rate multiplier for nearest neighbor interpolation versus linear interpolation.
42
It has the same basic shape as Figure 4.5 except that the multiplier is considerably
larger for any given expected error. This is because nearest neighbor interpolation is such
a poor overall performer. Because the multiplier is so large and results at least as good as
linear interpolation on a Nyquist sampled image are desired, this case will not be analyzed
further.
There are many interesting implications that develop from Figure 4.5. Specifically,
note that for a tolerable error of .02, doubling the sampling rate and using a linear interpo-
lation scheme is comparable to cubic convolution. This leads to many interesting possibil-
as bilinear interpolation) for resampling on a double density sampled image. The interpo-
lation kernel size would be decreased from 4x4 to 2x2. This leads to a reduction in the cost
and number of components needed. The penalty in this approach is incurred in increasing
the sampling rate by a factor of two. There are two approaches to this. First, the analog
image could be sampled at twice the previous rate. The prohibitive disadvantage of this
approach is that the warper can no longer be a modular part of a complete vision system,
since it requires a different digitizer and image size for input. The second approach, which
is the one that will be examined, is to include a digital upsampler in the warper. This takes
the Nyquist sampled digital image and upsamples it by a factor of two in both directions.
Then this image is warped, using bilinear interpolation for resampling. This is a specific
example of a more general approach to performing high quality image warping efficiently
in hardware. The general approach is to upsample using a good filter, next warp using a
lower quality filter, finally downsample to give the final image. This approach will be
examined in detail.
Chapter 5
pixels around the referenced input location is used to compute the output pixel value. The
larger the number of pixels used, the more accurate the resampling can be. However, as
the number of pixels used increases, so does the cost and complexity of the hardware
implementation. The method described in this chapter deals primarily with a way to
One standard approach to performing the interpolation required for the resampling
uses a 2x2 neighborhood of pixels around the referenced address in the source image to
calculate each output pixel value. This is commonly called bilinear interpolation, which
simply refers to linear interpolation in two dimensions. Chapter 6 describes such a hard-
ware design in detail. For a real-time implementation with reasonable clock rates, this
means that every clock cycle, four pixels values must be accessed simultaneously. These
pixel values are then multiplied by the appropriate weights and summed to give the output
pixel value. In hardware, this corresponds to four separate memory banks that can be
accessed in parallel. Also, the four weighting coefficients must be generated, based on the
sub-pixel location of the reference point. Finally, a four-term sum of products must be per-
formed. The difficulty with this model comes when bilinear interpolation is no longer ade-
quate. In applications that require repeated warping of the same image, or even just high
quality sub-pixel translations, bilinear interpolation gives poor results. The next highest
quality interpolator uses a 3x3 pixel area in the source image to compute each output pixel
value. If the same paradigm is used, it leads to an unwieldy implementation. Now, nine
separate memory banks, nine coefficients, and a nine-term sum of products are required.
In applications where size, power and cost are at a premium, this is an unacceptable solu-
tion. If an even better interpolation is required, for example 4x4 or 6x6, the problem is
even worse. There are some optimizations that can be made under this approach, but the
fundamental problem of quadratic growth will always exist. One common approach to
avoiding this problem is the use of a two-pass method. This approach works well in many
The following provides a high level description of a method for performing an arbi-
trary image warp that can achieve high-quality interpolation while bypassing some of the
The three basic steps are to first increase the sampling rate, then warp the image using
a lower quality interpolator for resampling, and finally downsample the image to its origi-
nal size. These steps and their implementation are described below.
Typical digital images to be warped are sampled at the Nyquist rate. The first step in
the new warping method is to increase the sampling rate above this point. This can be
done by either sampling the analog image at a higher rate or by digitally processing an
already Nyquist-sampled image to increase its sampling rate. The latter of these two
approaches will be taken to maintain the modularity of the warper within a larger image
processing system.
Because the input image to the system is sampled at or above the Nyquist rate, it is a
sible to obtain the exact value of that analog image at any point, even though only the
sampled points are available. To achieve an efficient implementation, we limit the oper-
done in both the vertical and horizontal directions. For example, if the input image is
tant to use a very high quality interpolation method to obtain the values of the upsampled
image.
Conceptually, the upsampling occurs in two steps. The first step is to insert the appro-
priate number of zeros into the image. For example, if the image is being upsampled by 2,
then every other value in the upsampled image is zero. The values at the other locations
are the original samples from the source image. Then this intermediate image is lowpass
filtered to give the final upsampled image. The original image is critically sampled. Insert-
ing zeros compresses the Fourier transform by a factor of two. The final lowpass filter
leaves only a single copy of the compressed transform centered around every integer mul-
tiple of 27c. The spatial and corresponding frequency domain interpretation of this process
IH1(ejm)l IH3(eju)l
IH2(ejm)l
.×.
A./,
-2s 21L -2 2x1 -27
At this point, it is clear why a high quality lowpass filter is needed for the interpola-
tion. This lowpass filter must cleanly eliminate the undesired frequency components while
The Pyramid Chip, developed at the David Sarnoff Research Center, is an excellent exam-
ple of how this operation may be performed [12]. The upsampling can be implemented as
a separable operation, which maps well to hardware. For the warping application, it is
expected that a higher quality filter than the Pyramid Chip can provide will be needed.
Still, a 4-tap or 6-tap fixed coefficient filter to perform the upsampling can be implemented
is the amount by which the image will be upsampled. We constrain these values to be 2, 4,
8, etc. The second is the quality of the lowpass filter employed to perform the interpola-
tion. It is expected that a 4-tap or 6-tap filter will be adequate. However, implementing a
filter with more taps does not lead to a huge increase of complexity. The desired quality of
the resultant warped image will determine appropriate values for these two variables.
Step two in the procedure is to warp this oversampled image using a lower quality
interpolation filter for resampling. Because the image has been upsampled, its frequency
content has been compressed. A lower quality filter used in the resampling step for the
warp can give good results, as long as it has good characteristics over this smaller region
rt~
'* - -4
49
Overall, the warp using the upsampled image does a better job of eliminating the
unwanted duplicate high frequencies while leaving the desired frequencies undisturbed.
Using a lower quality filter at this warping stage can drastically reduce the complexity, as
Finally, the warped image is downsampled to the size it was before the upsampling
occurred. In practice, this can be performed quite efficiently by modifying the geometric
transformation function. The downsampling is then inherent in the warp equation. This
Figure 5.4 shows how the downsampling is performed as part of the image warp.
In this example, the standard geometric transformation specifies that the value in the
target image at [0,0] should come from the source image location (xl = 1.7, yl = 1.2).
pixels around this location. To access the same point in the upsampled image, simply mul-
tiply the referenced address by 2. In this case, the address to reference in the upsampled
source image would be (x2 = 3.4, y2 = 2.4). If the image was upsampled by a factor of 4,
this address is used to determine the output value. Besides this modification, the warp is
performed identically to the standard method described earlier. In effect, the neighborhood
of pixels that is used in the upsampled image corresponds to sample points of the original
analog image that are closer to the desired location than an equal-sized neighborhood of
At first glance, one drawback to this approach for a hardware implementation seems to
be that the warp will require more time to complete. The upsampling stage increases the
size of the image. This data then needs to be passed to the warper memory, where it can be
accessed at random. If the image data comes to the upsampling stage at a rate of one pixel
per clock cycle and leaves the upsampling stage to be stored in the memory for the warp at
the same rate, then there is a backlog delay incurred in this process, because more pixels
However, this problem can be avoided. Basically, all that is required is to increase the
bandwidth leaving the upsampling stage relative to the bandwidth entering the upsampling
stage by the same factor as the increase in image size. There are many possible ways to do
this. For example, assume that the warping stage uses bilinear interpolation. Then, there
are four separate memory banks holding the image to be warped. If the upsampling stage
upsamples by a factor of two in both directions, then the required bandwidth leaving the
upsampling stage is four times that of the bandwidth entering it. So, if the image is enter-
ing the upsampling stage at one pixel per clock cycle, it should leave at four pixels per
clock cycle. Because there are four independent memory banks holding the image, it is
possible to write four pixels to this memory each clock cycle, thus meeting the bandwidth
requirements. Then the only delay incurred will be a minor pipeline delay.
larly for a hardware implementation) over the standard one-pass approach. It provides:
1. A method of image warping that doesn't grow quadratically in complexity with the
resampling accuracy. The complete warping process is implemented in two steps. The first
is upsampling. The second is warping with downsampling. These two steps can be tuned
together to produce the desired quality warp. The complexity of the second step can be
fixed at a reasonable setting. Then the upsampling step can be changed to give the desired
quality.
2. A modular, independent, warping function that can process arbitrary spatial trans-
formations. There are many specialized warping algorithms that achieve excellent results
for particular applications; for example, rotation [11]. The warping procedure described in
this thesis takes a Nyquist sampled image as input and produces a warped image as output.
The warp function is not constrained to a particular type; it can be an arbitrary flow field.
ture. Because the image was upsampled by a factor of 2 N, the downsampling can be per-
formed as part of the warp by multiplying the vertical and horizontal components of the
referenced source address by this factor. This multiplication is simply a bit shift. This
examine the performance of the method for some very specific cases. There are basically
three degrees of freedom in the general method. These are: the amount by which the orig-
inal image is upsampled, the interpolation method used for upsampling, and finally, the
interpolation method used for warping. In determining these factors, it must be kept in
mind that the final goal is an efficient method for real-time image warping in hardware.
Therefore, in the following analysis two of these factors will be fixed. First, the original
image will always be upsampled by a factor of two in the vertical and horizontal direc-
tions. Second, the interpolation method in the warping stage will be fixed at bilinear.
There are a number of advantages to doing this. Upsampling by a non-integer factor adds
unnecessary complexity to the hardware. And, upsampling by a factor of two (as opposed
to larger factors) can be implemented in hardware very efficiently. If the warping stage
uses bilinear interpolation, a 2x2 pixel neighborhood is used. As discussed earlier, for a
real-time hardware implementation, this leads to four separate memory banks, four
weighting coefficients, and a four-term sum of products. Better interpolation methods for
the warping stage would required an increase in all these quantities. Finally, if the image is
upsampled by a factor of two, and if the warping stage has four separate memory banks
that can be accessed in parallel, then it is possible to pipe the image data through the
upsampling stage and into the warp memory without a backlog of the data in the upsam-
pling stage. Under these constraints, the only variable left to manipulate is the upsampling
method. The proposed method of image warping under different upsampling methods will
The familiar Lena image will be used for all of the tests. Two measures will be used to
assess the quality of the warp. The first will be a measure of Root Mean Square (RMS)
error between the final warped image and the original image. The second will be a visual
comparison. A total of five methods will be examined here. These are listed in Table 5.1.
Name Description
Bilinear No upsampling.
Bilinear interpolation used
in warp.
(2x2 pixel area.)
Keys-3 No upsampling.
Third order cubic convolu-
tion (a=-.5) used in warp.
(4x4 pixel area).
Keys-4 No upsampling.
Fourth Order Cubic Con-
volution used in warp.
(6x6 Pixel Area).
Up2(Keys-3) Upsample by two using
Keys-3.
(4 tap separable)
Bilinear interpolation used
in warp.
(2x2 pixel area).
Up2(Keys-4) Upsample by two using
Keys-4.
(6 tap separable)
Bilinear interpolation used
in warp.
Table 5.1: Warping Method Descriptions
The first test will be image rotation. The Lena image is rotated by 22.5 degrees a total
of 16 times to bring it back to its original orientation. Then, the RMS error between this
image and the original is taken. The results are summarized in Table 5.2.
Warping Method RMS Error
Bilinear 11.384572
Keys-3 5.638434
Keys-4 4.430744
Up2(Keys-3) 7.712898
Up2(Keys-4) 6.915473
Table 5.2: RMS Error for Rotation
As expected, bilinear gives the largest error, and the pure fourth order cubic convolu-
tion gives the smallest error. It is worth noting that both upsampling methods do perform
better than just bilinear, but not as well as the pure cubic convolution methods. To get a
better feeling for what these numbers mean in terms of actual image quality, the images
Figure 5.5 shows the original lena image and the resultant image after being rotated 16
times using bilinear interpolation. The bilinear image is noticeable blurred. The is because
Figure 5.6 shows the resultant image after rotation under the Keys-3 and Keys-4 inter-
polation methods. Both of these look much better than bilinear, with not nearly as much
blurring. This is because these interpolation kernels have a much sharper cut-off in the fre-
Figure 5.7 shows the Up2(Keys-3) and Up2(Keys-4) rotation images. These are also
both noticeable better than the bilinear image. They are, however, slightly worse in quality
than the pure Key-3 and Keys-4 images, being slightly more blurred.
Figure 5.7: Up2(Keys-3) and Up2(Keys-4) rotation images.
All of these results, both visual and RMS error, are consistent with our expectations.
Doing the upsampling followed by bilinear warping does improve the warped image qual-
ity over just bilinear warping, but, is still slightly inferior to just using better interpolation
filters in the warping stage. However, the upsampling method does lead to a more efficient
hardware implementation. The results from just this one test seem to suggest that using
this upsampling method can greatly increase the warped image quality over pure
bilinear,
without a drastic increase in hardware complexity.
The second test is a translation test. The same methods for warping will be examined.
The image is translated in a diamond pattern. The diamond cycle is composed of four
steps. First translate the image up and right by 1/4 pixel, then down and right by 1/4 pixel,
then down and left by 1/4 pixel and finally up and left by 1/4 pixel. This brings it back to
the original location, where the RMS error relative to the original image can be measured.
This process is then repeated on the same image, each time measuring the RMS error at
the end of each diamond cycle. The purpose of this test is to determine how the different
methods perform as a function of the number of times the image is warped. The results are
Diamond Translation
20 -
18
16 I - bilinear
14
----- Keys-3
12
10 ___*-- Keys-4
8
6 _ -- Up2(Keys-3)
4 ____ Up2(Keys-4)
2
0
Number of Cycles
Figure 5.9 provides a closer view of the graph for the first few values.
Diamond Translation
IVU
9
8
7
LU 56
•- Keys-4
- 4
3 -v Up2(Keys-3)
2 Up2(Keys-4)
1 J
0 -
0 1 2 3 4
Number of Cycles
There are a few interesting things to note from these figures. First, when the image has
only undergone of few cycles, the RMS error for the images is similar to that of the rota-
tion test. However, as the number of warps performed on the same image increases, the
upsampling methods rapidly degrade the image quality, while the standard methods
(Bilinear, Keys-3 and Keys-4) do not worsen as much. In fact, after 12 cycles, bilinear is
actually better than Up2(Keys-3). This seems to suggest that the upsampling method is not
appropriate for applications where the same image is to be warped many times in a repeti-
tive fashion.
To get a feel for how the RMS error numbers correspond to the image quality, the
images after 12 diamond cycles are shown below. Cycle number 12 was selected because
this is where the RMS error for the bilinear and Up2(Keys-3) method are the same.
Figure 5.10: Original and bilinear after 12 diamond cycles.
At this point, the relative image quality rankings are the same as for the rotation test.
From best to worst: Keys-4, Keys-3, Up2(Keys-4), Up2(Keys-3), and Bilinear. It is inter-
esting that even though Bilinear and Up2(Key-3) have the same RMS error at this point,
the type of error is quite different. Both images are blurred; however, the blurring of the
bilinear image is clearly more pronounced. The error in the Up2(Keys-3) image seems to
also be coming from a reduction in the number of grey levels in the image. Over her face,
blotches of the same intensity can be seen. This is very likely due to the repetitive warping
pattern.
kernel, namely Keys-3 and Keys-4. The complete upsampling process occurs in two steps.
In the first step, the image is expanded by a factor of two in both directions, inserting zeros
in every other location. Then this image is convolved with the interpolation kernel. Since
the kernel is separable, the convolution can be done in two passes, one horizontal and one
vertical. The 1-D interpolation kernel is then [-.0625 0 .5625 1 .5625 0 -.0625] under
Keys-3 and [.010416 0 -.09375 0 .583334 1 .58334 0 -.09375 0 .010416] for Keys-4.
Keys-3 is a 4-tap kernel for upsampling by two; the intermediate values of the image
(those not corresponding to the sample points) are obtained by taking a four-term sum of
products of the four nearest data points and the four odd-location coefficients [-.0625
.5625 .5625 -.0625] of the kernel. Similarly, Keys-4 is a six-tap kernel for upsampling by
two. Both these kernels will preserve the values at the sample locations and fill in the zero
values with the interpolated values. For upsampling by two, this is achieved by fixing the
center coefficient to 1, and fixing the coefficients at even locations to 0. Another important
property that both kernels possess is unity D.C. gain. That is, when convolved with an
image of constant value, the result is also that same constant. For upsampling by two, this
Using the Keys-3 and Keys-4 kernels for the upsampling step works relatively well.
However, improvements can be made. These kernels were meant to approximate a low-
pass filter. In this respect they work well. However, for our application, it is known that
after upsampling, the image will be warped using bilinear interpolation. The frequency
spatial frequencies. Using this fact, and the other constraints placed on the upsampling
kernel, a four-tap kernel (same length as the Keys-3 upsampling kernel) will be designed
To satisfy the sample point preservation and unity D.C. gain properties, the general
form of the symmetric 1-D four-tap kernel used for upsampling by 2 is:
ala
(.5-a) (.5-a)
Note that after applying all these constraints, there is still one degree of freedom left,
namely, the value of the constant a. To get the frequency response of this kernel, take the
In general, increasing the value of a beyond 1/2 will make the cut-off from passband to
stopband sharper, but will also add ripples. If a is exactly 1/2, the four-tap filter reduces to
a two-tap filter corresponding to a linear interpolation kernel. The frequency responses for
1.5 Is
B .
0,S
-3 - -1 0 1Figure25.14:
Frequency
response
1S\. 0 1
F-ým
2 3
bilinear interpolation method. Since it is known the bilinear interpolation attenuates the
high spatial frequency components, it makes sense to pick the upsampling filter in a way
that increases these same high frequency components, so that the overall effect on them is
as small as possible. Basically, the first filter must be picked so that the two filters together
closely approximate an ideal low-pass filter. There is a complication to this selection: the
bilinear interpolation method used for warping doesn't have the same spectral characteris-
tics for every warp. For example, if the image is merely translated by an integer amount,
there is no degradation of the high-frequency components. On the other hand, for a half-
pixel translation, the high spatial frequency components will be maximally attenuated.
Between these two extremes, there are many levels of attenuation. Therefore, the upsam-
pling kernel must be selected based on the range of possibilities that can occur in the
warping stage. Finally, for a hardware implementation, the kernel coefficients should all
be sums of powers of 2. This reduces the multiplication of the data values by the coeffi-
cients to an arithmetic bit shift and sum operation. Taking these factors into account,
through examination of a range of Fourier transforms, and also through tests on many dif-
ferent warp types, a value of a that gives good overall results was determined. This is a =
.59375. With this value for a, the kernel becomes [-.09375 0.0 .59375 1.0 .59375 0.0 -
.09375]. This kernel will be affectionately referred to as the Lohmeyer kernel. Figure 5.15
shows both the frequency response of this upsampling kernel followed by the bilinear ker-
nel used for integer translation and the frequency response of this upsampling kernel fol-
lowed by the bilinear kernel used for a half pixel translation on the upsampled image.
These two graphs represent the extremes of possible total frequency response of the sys-
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Frequency Frequency
Figure 5.15: Frequency response of overall system for integer and half-pixel translation
using the Lohmeyer kernel for upsampling.
To measure the performance of this upsampling method, the same tests as before will
be used. At this point, we will also examine the performance of one other four-tap kernels
defined by [-.125 0 .625 1 .625 0 -.125]. It is called the Pyramid kernel because this
upsampling kernel can be performed by a currently available VLSI chip known as the Pyr-
amid Chip, [12]. The Pyramid Chip was designed at the David Sarnoff Research Center. It
can perform the standard filter and resampling operations required to generate a variety of
image pyramids that are used in many machine vision tasks [2]. The Pyramid kernel is the
best kernel available in the Pyramid Chip for use in the upsampling operation for the
image warping application. Figure 5.16 shows the frequency response of this kernel, and
of this kernel cascaded with the frequency response of the bilinear kernel used for 1/2
I
Z6
r-
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Frequency Frequency
Figure 5.16: Frequency response of overall system for integer and half-pixel translations
using Pyramid kernel for upsampling.
Note that the Pyramid upsampling kernel peaks the high frequency components more than
The first test is rotation. All the RMS error results are summarized in Table 5.3 below.
Up2(Loh) performs very well in this test. In fact, the RMS error is slightly lower than
Keys-3. Up2(Pyr) is better than bilinear in this test, but is outperformed by nearly all the
other methods.
Figure 5.17 shows the images after rotation under these two new methods.
between the Keys-3 and Up2(Loh) image. The Up2(Pyr) image is slightly different from
the other images because it has the high frequency components increased, rather than
attenuated.
As before, the second test is translation using the diamond pattern. The same graphs as
before are shown in Figures 5.18 and 5.19, except that now they include the two new
methods.
Diamond Translation
bilinear
20
Keys-3
15 Keys-4
10 SUp2(Keys-3)
A -Up2(Keys-4)
-- Up2(Loh)
0- Up2(Pyr)
Number of Cycles
Diamond Translation
10-
9' - -"-- bilinear
8
Keys-3
7
6 --- Keys-4
5
4 Up2(Keys-3)
3
- Up2(Keys-4)
2
1 Up2(Loh)
0
2 3 4 Up2(Pyr)
Number of Cycles
image more rapidly than other methods as the number of warps increases. To get a feel for
the type of error that is being accumulated, the images obtained after 12 cycles of the dia-
The Up2(Pyr) image has been sharpened too much. The Up2(Loh) image has the same
type of defect as the other upsampling methods have. Along with some blurring, patches
are beginning to form. This is very likely due to the repetitive warping pattern.
To determine if the errors in the upsampling methods are really partially due to the
repetitive pattern of the warp, a final test is performed using all the different methods. This
is a random translation test. The image is translated horizontally and vertically by a ran-
dom amount between zero and one, and then translated back again. The RMS error is
taken at this point, as in the diamond pattern. This process is then repeated. The same
sequence of random translations is used for each method. The RMS error results for all the
30 -
-- bilinear
25
-Keys-3
20
• _Up2(Keys-3)
S15
-Keys-4
S10
Up2(Keys-4)
5
- Up2(Loh)
0
o o ,e cOO COON ~ CO OoO co ' = cO o --- Up2(Pyr)
Number of Cycles
Random Translation
7-
SI bilinear
6
5 -L.- Keys-3
4 Up2(Keys-3)
3 •- Keys-4
2
-- Up2(Keys-4)
1
- ~ Up2(Loh)
0 0
0 1 2 3 0 Up2(Pyr)
Number of Cycles
ever, all the other upsampling methods do much better. In particular, the Up2(Loh) method
performs well no matter how many times the image is translated. In fact, it is never worse
than the Keys-3 method. This is due to the fact that the warps now have a random transla-
tion. Thus, the image experiences a wide range of the possible overall filter effects. Some-
times the high frequencies are attenuated, and sometimes they are peaked. If the
translation is close to integer, they are peaked, since the bilinear interpolation for the warp
did not attenuate them too much, and the upsampling increased them. If the translation is
closer to a 1/2 pixel, then they are attenuated because now the attenuation in the warping
stage overpowers the peaking in the upsampling stage. Overall, this has a balancing effect
when the same image is warped many times. The images after 30 warp cycles are shown
The Up2(Pyr) image quality is very poor. This is because the high frequencies have
been peaked too much. The Up2(Keys-3) and Up2(Keys-4) images are slightly blurred,
and the bilinear image is very blurred, as before. The Keys-3, Keys-4 and Up2(Loh)
images all look much better than the other images. There is very little noticeable blurring
pling can indeed give very good results. For multiple warps of the same image, as long as
the warping pattern is not repetitive, the quality of a warp using a 2x2 pixel area for inter-
polation on an upsampled image can meet and even exceed that of a warp using a 4x4
pixel area for warp interpolation. These results were achieved for an upsampling kernel
that had only four taps. Increasing the number of taps of the upsampling kernel may pro-
vide even better results. Also, this method of warping leads to a greatly simplified hard-
Hardware Design
6.1 Background
This chapter describes the design of a hardware system that is capable of warping digital
images in real-time. This means that it can warp images at a rate greater than thirty frames
per second. It is one component in a very general machine vision hardware system that
will be capable of performing a variety of tasks. These tasks include image stabilization,
image registration, moving target detection, and obstacle avoidance for autonomous vehi-
cle control. Image warping is a basic operation needed to perform most of these tasks.
polynomial transformations.
- The system must run at 20Mhz, with relatively easy upgrade path to 30Mhz.
- For real-time operation, it must produce one output pixel each clock cycle
(ignoring overhead).
6.3 Components
A block diagram of the overall system is shown in Figure 6.1 below.
WARP MODULE
18
SRAM FOR 8
DATA_IN[7:0] 48 ' TMC2246 FO]R DATA OUT[7:0] 8
I DUAL PORT 8 INTERPOLATI ON
IMAGE MEMORY 8
/ '
.8
HAIN
VA IN
Control
: :::I
l
A A
Muxed Address
L 4 Coefficients
~~ '10
HA OUT, VA OUT
XILINX CONTI ROL CHIP WR_RDY RDRDY
| i i " ::":
S::":::: .I
Control t Integer Add ress
I
| |
PD[15:0]
ADDR[6:0] ON TMC2302 Fractional Address
ADDRESS COEFFICIENTS
WEn GENERATOR LUT
The actual hardware components used to realized the blocks above are: two Xilinx
chips (XC4005H) for control, two TMC2302 chips for warping address generation, one
TMC2246 for the four-term sum-of-products needed for bilinear interpolation, eight 128K
x 8 SRAM for image storage, and five 8Kx8 EEPROMS for the interpolation coefficient
look-up table. The use of these components to perform the warp is described in the follow-
ing sections.
6.4 High-Level Design
The following two tables list and describe the inputs and outputs to the warp board.
that describe a cubic polynomial geometric transform. The motion parameters are changed
by the address generation block into motion vectors representing a pointer for each pixel
in the warped image to the location of origin on the input image. These vectors point to
is performed using the integer and fractional part of the reference source address. For
bilinear interpolation, the integer part selects the upper left pixel of the 2x2 neighborhood
of pixels to be used for the interpolation. The fractional part determines the weights to be
applied to the sum of products of these four pixels to get the final interpolated result.
The hardware may also allow for a general flow-field input to handle arbitrary warps.
This design does not include this option, but can be easily modified to do so. In this case,
all that one needs are additional inputs to the control block for the values of the flow-field
and the timing that accompanies them. Internal to the control block, these flow-field val-
ues will be used instead of the values that come from the address generation block.
vertical addresses and control, and the other handles the horizontal addresses and control.
The internal logic of both chips is very similar. If only one chip was used to handle all the
addresses, over 250 input and output pins would be required. The FPGA chip that has this
many pins has much more logic resources than required. By spliting the control task into
two chips, it is possible to use chips that are much smaller and contain logic resources that
The warp control block provides the control for all the other blocks. It takes inputs
from external controllers and uses them to program and sequence all the operations that
need to be done to perform the warp. The warp control module also takes the addresses
produced by the address generation block and produces from these the correct addresses to
access in the SRAM memory block to provide the four pixel values. Finally, this block
provides the status and control outputs to the rest of the system, including the output
Before a warp can begin, all the registers of the TMC2302 and various other control
registers need to be programmed. The program data and address buses are used for this.
The complete machine vision hardware system consists of many modules. Each of the
modules has one WE\ pulse associated with it, which is only activated when data is writ-
ten to that module. The ADDR[6:0] and the PD[15:0] signals are shared by all modules.
To load the program data into the warp module, set the data and address and hold down
WE\ for two complete clock cycles. Listed below are the data and valid ranges for each
WARP CONTROL
DESCRIPTION
REGISTER
The image timing is defined by two signals that are associated with the digital video
data paths: HA (Horizontal Active) and VA (Vertical Active). Each positive VA signal
identifies the active (valid) time of the image. The internal images are always in progres-
sive scan form (no interlace). Each positive HA signal within the positive VA signal iden-
tifies an active horizontal line of the image. The VA signal can change state any time
during the horizontal blanking time (HA = low). The HA signal has to go high at the same
clock as the first valid pixel data on a line, and has to turn to low on the clock after the last
valid pixel data on that line. See Figure 6.2 for the relationship of the timing signals, data,
CLK
HA
VA
DATA
When the warper is ready to be written to, WR_RDY will go high. After programming
all the write parameter registers, the external controller will assert WR_ENA, WR_RDY
will then go low. A write will then begin with HA and VA going high. VA may go high
first, but the write does not begin until HA goes high. This process is shown in Table 6.4.
high. After programming all the read parameters (these go to the Xilinx as well as to the
TMC2302's), the external controller will assert RD_ENA. RD_RDY will then go low.
After RD_ENA, the read operation begins a deterministic number of clock cycles after the
of calculating a third-order polynomial transformation. One chip will compute the vertical
addresses, and the other will compute the horizontal. For a third-order polynomial in two
variables, there are sixteen constants. The third-order polynomial transformation that will
be used is:
u[x,y] = a + bx + cx 2 + dx 3 + ey + fyx + gyx 2 + hyx 3 + iy 2 + jy 2 x + ky 2x2 + ly 2x3 +
3 2 py 3x 3
my3 + ny x + oy x +
3
The address generator computes a sub-pixel address (u,v) to reference in the source
image for each pixel position [x, y] in the target image. Two TMC2302's are used to gen-
erate the vertical and horizontal addresses with 24 bits of precision each. Since the largest
image size is 1K x 512 pixels, 10 bits are needed from each chip for the integer part of the
address. That allows 14 bits for the fractional part. Only 5 bits of the fractional part will be
used. These five fractional bits from both of the TMC2302 chips are fed to the interpola-
tion coefficient look-up table. The fractional portion of the reference addresses controls
the weight that each pixel is given in the sum of products computed for the interpolation.
are two complete SRAM memory buffers, each large enough to hold an entire image.
While one is being written to, the other can be read from. In this way, it is possible for the
warping system to simultaneously warp an image already stored in the warper while tak-
Each of the two memory buffers is implemented with four 128K x 8 bit SRAM chips.
Together four SRAM chips hold one entire image in a checkerboard interleaved storage
Specifically, all pixel values that correspond to even horizontal and vertical addresses
are stored in SRAM block number one; all pixel values that correspond to odd horizontal
addresses and even vertical addresses are stored in block number two; etc. The entire
image is stored in this manner so that the four pixel values required for bilinear interpola-
A write to the image memory is fairly simple. The data values are fed directly to the
image memory module. The correct addresses and the appropriate memory block enables
are provided by the control module. The image is stored sequentially, line by line, accord-
A random access read from the image memory is done as part of computing a warp.
The control block takes the integer part of the reference source address. From this, it pro-
duces four addresses for the memory module, one for each block. In this way the four
pixel values closest to the reference source address can be obtained simultaneously.
inputs to this LUT are simply the fractional part of the reference source address. The out-
put is the set of weighting coefficients for the four pixel values that come from the SRAM
module. The pixels are weighted in linear proportion to how close they are to the reference
sub-pixel address. For example, if the fractional part of the horizontal address is 1/4 and
the fractional part of the vertical address is 3/4, then the upper left pixel gets a weight of 3/
16, the upper right gets a weight of 1/16, the lower left get a weight of 9/16, and the lower
This chip computes four 11-bit by 10-bit multiplications followed by a summation of the
four resultant values. The multipliers perform signed-bit multiplications, so only 10-bit by
9-bit (unsigned) multiplications are available. Based on the 5-bit fractional source
addresses, the coefficient look-up table should generate an 11-bin coefficient for each of
the four data values. This 11-bit result consists of one integer bit and ten fractional bits.
Since the integer bit can only be 1 when both fractional parts are 0, it is sufficient to repre-
sent only the fractional bits, and to set the result to (1 - 2-10) when the output should be
1.0000. Therefore only a 10-bit result is generated from the LUT. The TMC2246 chip can
compute the results either in fractional or integer representation. For proper results, the
fractional representation should be used. In the fractional representation, the 10-bit input
is represented as a sign with 9 fractional bits. The 11-bit input is represented as a sign bit,
one integer bit, and 9 fractional bits. By setting the most significant bit and least signifi-
cant bit to zero for the data and using the ten least significant bits of the 11-bit input (zero
in the most significant bit) for the coefficient, the result will be eight fractional bits, six
integer bits and a sign bit. The 8-bit result required comprises the seven most significant
bits of the fraction and least significant bit of the integer of the result.
6.11 Conclusion
This hardware meets all the design specifications. In particular, it is capable of:
- allowing most components to operate at 30Mhz (and those that currently do not
Conclusions
7.1 Summary of Work
This thesis first gives background on digital image warping theory and applications. Next,
it analyzes the performance of various time-domain interpolation kernels that can be used
for resampling. The results of this analysis lead to a new approach to performing general
flow-field image warps in real-time using digital hardware. This approach is analyzed and
compared to other methods. Finally, the thesis describes the design of a digital hardware
of areas within the field. Some possibilities for further work based on the contents of this
1. Include low-pass filtering in the upsampling stage with cut-off frequency based on
the warp parameters, to guarantee that no aliasing will occur after the image is warped.
2. Apply the upsampling image warping method to other methods of image warping.
3. Examine other upsampling and warping kernel combinations for even higher image
quality.
Appendix A
specific applications. It can be used quite effectively for the address generation of a poly-
to demonstrate the method. The address generator needs to compute for each integer pixel
position [x,y] in the resulting image, a pixel address [u,v] that references the input image.
y changes. This requires many multiplications and additions. The forward difference
method takes advantage of the fact that x and y are integers and the target image locations
are scanned left to right and top to bottom. If this is true, it can be shown that the current
value of u can be found from the previous value of u and an increment. This is shown
below.
Table 8.1 shows the value of u as y is increased. The increment is the difference
Value Increment
u[0,0] = a
u[0,1] = a+c+e +c+e
u[0,2] = a+2c+4e +c+3e
u[0,3] = a+3c+9e +c+5e
u[0,4] = a+4c+16e +c+7e
Table A.1: Increment Table
From this table it is apparent that each time y increases by one, the value for u incre-
u[0,0] = a
acce[0] = 0
acce[1] = e+c
accf[o] = 0
beginning row value is computed a similar approach is used to update u as the horizontal
accd [0] = 0
accd[1] = b+d+accf[y]
ward. Six registers are required to hold the constant coefficient values. Five accumulators
are used to hold the current value u[O,y], acce[y], u[x,y], accf[y], and accd[x]. Three 2-1
multiplexors and one 3-1 multiplexor are used to pipe the appropriate initialization values
to the accumulator.
Higher order polynomials can also easily be implemented using this method. As the
order of the polynomial increases, more levels of accumulation are required, but the basic
[1] Andrews, H.C., C.L. Patternson, III, "Digital Interpolation of Discrete Images,"
IEEE Transactions on Computers., vol c-25, no. 2, Feb. 1976.
[2] Burt, P.J., "The Pyramid as a Structure for Efficient Computation," Electrical
Computer & Systems Engineering Department, Rensselaer Polytechnique Institute,
Troy, NY 12181.
[3] Burt, P.J., P. Anandan, "Image Stabilization be Registration to a Reference Mosaic,"
1994 Image UnderstandingWorkshop, Nov. 13-16, Monterey, CA, 1994.
[4] Burt, P.J. and R.J. Kolczynski, "Enhanced Image Capture Through Fusion," Fourth
InternationalConference on Computer Vision, Berlin, Germany, 1993.
[5] Hansen, M., P. Anandan, K. Dana, G. van der Wal, P. Burt, "Real-time Scene
Stabilization and Mosaic Construction," David Sarnoff Research Center, Princeton,
NJ, 1994.
[6] Keys, R.G., "Cubic Convolution Interpolation for Digital Image Processing," IEEE
Trans, Acoust., Speech, Signal Process., vol. ASSP-29, pp. 1153-1160, 1981.
[7] Norton, A., A.P. Rockwood, and P.T. Skolmoski, "Clamping: A Method of
Antialiasing Textured Surfaces by Bandwidth Limiting in Object Space," Computer
Graphics, (SIGGRAPH '82 Proceedings), vol. 16, no. 3, pp. 1-8, July 1982.
[8] Parker, A.J., R.V. Kenyon, and D.E. Troxel, "Comparison of Interpolation Methods
for Image Resampling," IEEE Trans. Medical Imaging, vol. MI-2, no. 1, pp. 31-39,
March 1983.
[9] Schafer, R.W. and L.R. Rabiner, "A Digital Signal Processing Approach to
Interpolation," Proc. IEEE, vol. 61, pp. 692-702, June 1973.
[10] Oppenheim, A.V., R.W. Schafer, Discrete-Time Signal Processing, Prentice-Hall,
NJ, 1989.
[11] Unser, M., P. Thevanaz, L. Yaroslavsky, "Convolution-Based Interpolation for Fast,
High-Quality Rotation of Images," IEEE Trans. on Image Processing,vol. 4, No. 10,
October 1995.
[12] van der Wal, G.S., P.J. Burt, "A VLSI Pyramid Chip for Multiresolution Image
Analysis," InternationalJournalof Computer Vision, 8:3, 177-189, 1992.
[13] Wolberg, G., Digital Image Warping, IEEE Computer Society Press, CA, 1990.