DIP Basics01

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Linear Algebra and Image Processing:

Additional Theory regarding


Computer Graphics and Image Processing
not covered by David C. Lay
Dr. D.P. Huijsmans
LIACS, Leiden University
February 2012

1 Differences in conventions between Mathe-


matics, Computer Graphics textbooks and
Programming Languages
1.1 Order of multiplication
In Lay the European convention with respect to matrix vector multiplication
is used, i.e., Column-After convention: Matrix in front of a column vector; for
matrices to be concatenated later transformations are put in front of earlier
transformations.
In many (especially the early) US computer graphics textbooks a Row-
Vector-in-Front convention is followed: a row-vector is followed by a series of
matrices; early ones closest to the row, later ones are added to the right.
When converting between these systems one should take into account that:
Both vector and matrix data have to be replaced by their transposed versions
and the order of matrices to be concatenated should be reversed.

1.2 Choice of coordinate system


In mathematical texts an (x, y) coordinate system is usually accompanied by
an origin at the lower-left position and +x towards the right, +y pointing
upwards.

1
In Computer Graphics and Image Processing environment the video re-
fresh (row by row from top-left to bottom-right) is usually followed with the
origin at the top left of the screen and +x to the right and +y pointing
downwards! Matlab has an even a stranger convention for image arrays, by
storing it in the (y, x) instead of (x, y) order. So, for entry (i, j) in Matlab, i
represents the y-line and j the x-line.

1.3 Choice of serialization between programming lan-


guages
Images are multidimensional arrays, 2D for grayvalue images and 3D for color
images (R,G,B triplet intensities per pixel). Any multidimensional arrays
computer representation is serialized into an internal 1 dimensional range of
byte addresses. If we take an image array to have rows, columns and planes
one should keep in mind for a particular programming language which index
is the slowest and which the fastest changing one. Differences exist between
programming languages:
ˆ The Fortran (column by colomn) serialization for instance for a 2D
array is the opposite of a C(++) (row by row) serialization
ˆ Most programming languages with N size rows have addresses running
from [1, N ], but C(++) has its row indices running from [0, N − 1].
ˆ Matlab image and coordinate convention: Matlab stores pixels in (y, x)
order (line by line top to bottom) but for many coordinate transforma-
tion functions expects the pixel coordinates in (x, y) order! Matlab uses
the [1, N ] convention so the origin is at the topleft position just outside
the actual image!

2 Image Processing Convolution Filters


An image processing convolution filter uses a set of global position-dependent
weights to sum intensities in each local n × m pixel area into an inner product
value that is placed at the corresponding center of an output image.
For a 3 × 3 local filter, with pixel-intensities I(i, j):

 Local environment of pixel(i, j)  Weight


 coefficients used
I(i − 1, j − 1) I(i − 1, j) I(i − 1, j + 1) C11 C12 C13
 I(i, j − 1) I(i, j) I(i, j + 1)   C21 C22 C23 
I(i + 1, j − 1) I(i + 1, j) I(i + 1, j + 1) C31 C32 C33

2
The 2D environments of pixel-values and coefficients are serialized row-wise
into:

ˆ a row-vector (I(i − 1, j − 1), I(i − 1, j), . . . , I(i, j), . . . , I(i + 1, j), I(i +
1, j + 1)); and

ˆ a column vector: (C11 , C12 , . . . , C22 , . . . , C32 , C33 )T .

Their inner product Iout is written to the corresponding middle position (i, j)
of the output image. Of course different serializations are possible here but
the only important thing is that it is done the same way for the local pixel
values and position-dependent weight factors.

2.1 Gradient Estimators


The spatial derivatives of intensity patterns are often used in Image Process-
ing. (Because the image is 2D, the derivative, or gradient as it is often called,
is direction dependent.)

2.1.1 Roberts gradient


From the definition of a derivative of I(x, y),

∂I I(x + ∆x, y) − I(x, y) ∂I I(x, y + ∆y) − I(x, y)


= lim , = lim .
∂x ∆x→0 ∆x ∂y ∆y→0 ∆y
In a discrete setting with integer grid values for x and y, take ∆x = ∆y = 1:
∆I ∆I
(i, j) = I(i + 1, j) − I(i, j), (i, j) = I(i, j + 1) − I(i, j).
∆x ∆y
In other words, the differences between two neighboring intensities in a row
or column. These gradient are known as the Roberts x and y gradients.
Rewritten as local 1 × 2 or 2 × 1 environments of intensities and weight
coefficients, we have
 
  1
Rx (i, j) = I(i + 1, j) I(i, j) and
−1
 
  1
Ry (i, j) = I(i, j + 1) I(i, j) .
−1

3
The combination of two Roberts operators create a second derivative esti-
mator
 with coefficients
 horizontally or vertically (or diagonally) with weights
−1 2 −1 that are combined Laplace 3 × 3 operators like
     
0 −1 0 −1 −1 −1 1 −2 1
−1 4 −1 , −1 8 −1 , −2 4 −2 .
0 −1 0 −1 −1 −1 1 −2 1

Theory has it that zero crossings of this second derivative is an optimal seg-
mentation choice.

2.1.2 Sobel gradient


Other famous gradient operators are those from Sobel:
   
−1 0 1 1 2 1
Sx = −2 0 2 , Sy =  0 0 0 .
−1 0 1 −1 −2 −1

Effectively this gives an average of 4 Roberts gradient estimators, halving


noise.
All gradient like operators can be recognized by the fact that their sum
of coefficients equals zero (in a homogeneous local environment no gradient
would be present).
Smoothing filters can be recognized by the fact that their sum of coeffi-
cients equals one (a homogeneous local environment remains equal).

3 Including Translations in coordinate trans-


formations via linear algebra
Elementary coordinate transformations in 2D and 3D using 2 × 2 or 3 × 3
transformation matrices cannot implement translations of the origin, whereas
such a translation is an often used elementary transformation in Computer
Graphics and Image Processing, for instance when rotating an image around
its center instead of its origin in the top left corner.
The following trick is applied to include translations as an elementary
transformation: an extra dimension is added, the so-called homogeneous co-
ordinate which is chosen to be 1 and remain 1.

4
The 2 × 2 and 3 × 3 transformation matrices for 2D and 3D coordinate
transformations are also enlarged to 3 × 3 and 4 × 4 homogeneous transfor-
mation matrices for 2D and 3D transformations: Elementary homogeneous
transformations now look like this in 2D:
Translation
 2D Scaling 2D  Rotation 2D 

1 0 dx sx 0 0 cos α − sin α 0
0 1 dy   0 sy 0  sin α cos α 0
0 0 1 0 0 1 0 0 1
Homogeneous coordinate transformations in 2D are applied as follows:
2D Input Homogeneous
  Transformed
 0 Normalised
 0  2D Result
  x x x /h  0 
x y  y 0  y 0 /h x /h
y y 0 /h
1 h 1
For instance: rotate an image 30◦ (degrees) around its center (30, 70). To do
this we have to concatenate three elementary operations: translate origin to
center, rotate around this new center, translate back to old origin position.
The concatenated transformation matrix C becomes:
1 0 30 cos 30◦ − sin 30◦ 0 1 0 −30
   

C = 0 1 70  sin 30◦ cos 30◦ 0 0 1 −70


0 0 1 0 0 1 0 0 1
cos 30◦ − sin 30◦ (−30 cos 30◦ + 70 sin 30◦ + 30)
 

=  sin 30◦ cos 30◦ (−30 sin 30◦ − 70 cos 30◦ + 70) .
0 0 1
Applied to a pixel position (x, y):
   
  x x
x
→ y → C y
  
y
1 1
x cos 30◦ − y sin 30◦ − 30 cos 30◦ + 70 sin 30◦ + 30
 

= x sin 30◦ + y cos 30◦ − 30 sin 30◦ − 70 cos 30◦ + 70 .


1
Since this result is already in normalized form, the top 2 positions are the
resulting transformed point (x0 , y 0 ).
Note that under this transformation the rotation center (30, 70) remains
invariant: if (x, y) = (30, 70),
 0 
30 cos 30◦ − 70 sin 30◦ − 30 cos 30◦ + 70 sin 30◦ + 30
  
x 30
0 = ◦ ◦ ◦ ◦ = .
y 30 sin 30 + 70 cos 30 − 30 sin 30 − 70 cos 30 + 70 70

5
4 Why Inverse Coordinate Transformations
are used with Images
If one would use a forward coordinate transformation to copy a pixels intensity
value(s) to the new position, it may well happen that, due to rotation and/or
scaling effects, not all resulting pixels will be completely filled in the output
image area: when input positions are mapped onto non-neighboring output
positions, intermediate pixels in the output image might not receive a copy
of any input pixel position. Thus, the output image may have small holes.
To prevent this from happening, in Image Processing, image coordinate
transformations are carried out inversely: each pixel position in the output
image is visited; using the inverse coordinate transformation, its input loca-
tion (not necessarily integer-valued) is located; since the input image consists
of a set of integer grid positions (row and column numbers) this real position
may end up either:
ˆ outside the input grid, in which case a background value is used; or
ˆ within an input grid cell with 4 intensity values at its corners, in which
case a so-called interpolation recipe is used to determine what value will
be used as the output value.
The easiest interpolation recipe is to discard the fractional value of the calcu-
lated input floating point position and use the resulting integer corner position
intensity value. One can also round the floating point position and take the
nearest corner intensity value. One can also calculate a bi-linear interpolated
value from the 4 nearest corner intensity values. The inverse concatenated
transformation matrix can be obtained from the forward one by determining
its inverse. But the inverse is often most easily constructed from re-ordered
inverse elementary transformations, which are simple versions of the forward
ones. In 2D:
Inverse
 translation
  Inverse rotation  Inverse scaling
1 0 −dx cos(−α) − sin(−α) 0 1/sx 0 0
0 1 −dy   sin(−α) cos(−α) 0  0 1/sy 0
0 0 1 0 0 1 0 0 1

5 3D perspective transformations
A lucky coincidence of the use of homogeneous coordinates is that not only
the extra column at the back (for translations) but also the extra row below
can serve a useful purpose.

6
When a 4×4 coordinate transformation matrix has non-zero entries in the
(4, 1), (4, 2), (4, 3) positions, the matrix acts as a perspective transformation
with convergence points in the accompanying direction. The use of a single
non-zero value at the (4, 3) position introduces a central perspective trans-
formation along the z-axis (or x3 direction) with a central vanishing point.
But also in the horizontal direction and vertical direction, (pairs of) vanish-
ing points can be introduced for x going towards +∞ or −∞ and/or y going
towards +∞ or −∞.
An example of a central perspective projection is:
      
1 0 0 0 x x x/(1 + V z)  
0 1 0 0 y   y  y/(1 + V z) x/(1 + V z)
0 0 0 0 z  =  0  → 
       → y/(1 + V z)
0 
0
0 0 V 1 1 1+Vz 1

The 4 × 4 matrix is a concatenated perpective and projection matrix that


models projecting 3D space onto a 2D screen, as seen by an observer located
at (0, 0, −1/V ). Note that the first → is not a linear transformation, but on
the other hand it only needs to be done once during the calculation. Note also
that input points with z = −1/V or z < −1/V must be treated with care:
these correspond to locations that are “beside” or “behind” the observer,
relative to the screen.
In Lay Ch. 2.7, perspective projections are treated in a way that differs
from the usual set-up in Computer Graphics: in fig. 6 page 163 they place the
model points to be perspectively projected in a location between eye/camera
and projection screen, whereas Computer Graphics prefers to have the screen
between the eye/camera and the model points. Also the choice of which point
is the origin and which direction is positive may differ between users.
Users therefore have to make very clear how eye/camera, screen and model
scene are positioned with respect to each other and which point will be taken
as origin and in which direction +x, +y and +z are pointing.

6 Transforming Computer Graphics Models


In computer graphics points play the main role; points can be interpolated
pairwise into edges; triplets of points are interpolated into triangular patches;
patches build up surfaces; textures are mapped to surfaces. Since anything
but the defining points themselves can be interpolated, they are the only
quantities (point coordinates) that have to be transformed; everything else

7
(ordering of vertices; edges and surface triangles) remain the same. CG there-
fore only has to calculate a coordinate transformation for list of points. The
data matrix D (see Lay Ch. 2.7) consists of these vertices (one column per
vertex).

7 Continuous vs. discrete lines


Whereas continuous lines when not parallel or coinciding will always have an
intersection point, this need not be true for discretized lines! For instance,
the following discrete lines have no discrete intersection:

+ ∗
+ ∗
∗ +
∗ +

8 In-betweening with affine and convex com-


binations of point vectors
In Lay 4th Edition Ch. 8, affine and convex point combinations in space or
over time (often used in CG animations) of line segments and filled triangles
are treated.

You might also like