Image Projection 339

Image Projection
Goal: Introduce the basic concepts and mathematics for image projection.
Motivation: The mathematics of image projection allow us to answer two questions:
• Given a 3D scene, how does it project to the image plane? (“Forward” model.)
• Given an image, what 3D scenes could project to it? (“Inverse” model.) Vision is all about guessing
the scene and the story behind it. The latter is a (largely ignored) holy grail of computer vision.
Readings: Szeliski, Chapter 2.

CSC420: Image Projection c Allan Jepson, Sept. 2011 Page: 1
The Pinhole Camera
Image formation can be approximated with a simple pinhole camera,
(X,Y,Z)
(x,y,f) X
x
Z P
y
Image Plane, Z=f
Y
The image position for the 3D point (X, Y, Z) is given by the projective transformation
   
x X
  f  
y = Y 
  Z 
f Z
The distance between the image plane and the projective point P is called the “focal length,” f . Note:
• for mathematical convenience we put the image plane in front of the nodal point (since this avoids
the need to flip the image coords about the origin);
• image coordinate x is taken to the right, and y downwards. This agrees with the standard raster
order and the convention of a right-handed coordinate frame (X, Y, Z).
• the primary approximation here is that there is no optical blur, distortion, or defocus (discussed
later).
CSC420: Image Projection Page: 2
Coordinate Frames
Consider the three coordinate frames:

~ w . These are 3D coordinates fixed in the world, say with respect to
• World coordinate frame, X
one corner of the room.
• Camera coordinate frame, X ~ c. These are 3D coordinates fixed in the camera. The origin of the
camera coordinates is at the center of projection of the camera (say at d~w in world coords). The
z-axis is taken to be the optical axis of the camera (with points in front of the camera in the positive
z direction).
• Image coordinate frame, p~. The image coordinates are written as a 3-vector, p~ = (p1, p2, 1)T ,
with p1 and p2 the pixel coordinates of the image point. Here the origin is in the top-left corner of
the image (or, in Matlab, the top-left corner has pixel coords (1,1)). The first image coordinate p1
increases to the right, and p2 increases downwards.
Next we express the transforms from world coordinates to camera coordinates and then to image coor-
dinates.

Extrinsic Calibration Matrix
The extrinsic calibration parameters specify the transformation from world to camera coordinates,
which is a standard 3D coordinate transformation,
~ c = Mex[X
X ~ wT , 1]T . (1)
Here the extrinsic calibration matrix Mex is a 3 × 4 matrix of the form

Mex = R −Rd~w , (2)
with R is a 3 × 3 rotation matrix and d~w is the location, in world coordinates, of the center of projection
of the camera. The inverse of this mapping is simply
X ~ c + d~w .
~ w = RT X (3)
~ c (i.e., in the camera’s coordi-

The perspective transformation can now be applied to the 3D point X
nates),  
x1,c
f ~  
~xc = Xc =  .
x2,c  (4)
X3,c 
f
Everything here is measured in meters (say), not pixels, and f is the camera’s focal length.
Intrinsic Calibration Matrix
The intrinsic calibration matrix, Min, transforms the 3D image position ~xc (measured in meters, say)
to pixel coordinates,
1
p~ =Min~xc, (5)
f
where Min is a 3 × 3 matrix. The factor of 1/f here is conventional.
For example, a camera with rectangular pixels of size 1/sx by 1/sy , with focal length f , and piercing
point (ox, oy ) (i.e., the intersection of the optical axis with the image plane provided in pixel coordi-
nates) has the intrinsic calibration matrix
 
f sx 0 ox
 
Min .
=  0 f sy oy 
 (6)
0 0 1
Note that, for a 3D point ~xc on the image plane, the third coordinate of the pixel coordinate vector p~ is
p3 = 1. As we see next, this redundancy is useful.
~ w , to
Equations (1), (4) and (5) define the transformation from the world coordinates of a 3D point, X
the pixel coordinates of the image of that point, p~. The transformation is nonlinear, due to the scaling
by X3,c in equation (4).
A Note on Units
So far we have written the focal length f in meters. But note that only the terms f sx and f sy appear in
the intrinsic calibration matrix,  
f sx 0 ox
 
Min ,
=  0 f sy oy 

0 0 1
where sx,y are in the units of horizontal/vertical pixels per meter (and ox,y are in pixels).
Instead of meters, it is common to measure f in units of pixel width, that is, replace f sx by f . In which
case the intrinsic calibration matrix becomes
 
f 0 ox
 
Min ,
=  0 f a oy 
 (7)
0 0 1
where a = sy /sx is the (unitless) aspect ratio of a pixel (0 < a < 1 if the pixels are rectangular and
flat, a = 1 if the pixels are square, and a > 1 rectangular and tall).

Homogeneous Coordinates
The projective transform becomes linear when written in the following homogeneous coordinates,
~ wh = c(X
X ~ wT , 1)T ,
p~ h = d~p = d (p1, p2, 1)T .
Here c, d are arbitrary nonzero constants . The last coordinate of these homogeneous vectors provide
the scale factors. It is therefore easy to convert back and forth between the homogeneous forms and
the standard forms.
The mapping from world to pixel coordinates can then be written as the linear transformation,
~ h.
p~ h = MinMexX (8)
w
Essentially, the division operation in perspective projection is now implicit in the homogeneous vector
p~ h. The division is simply postponed until p~ h is rescaled by its third coordinate to form the pixel
coordinate vector p~.
Due to its linearity, equation (8) is useful in many areas of computational vision.

Example: Lines Project to Lines
As a first application of the perspective projection equation (8), con- Note that equation (9) shows that p~(s) is in the plane spanned by two
sider a line in 3D written in homogeneous coordinates, say constant 3D vectors. It is also in the image plane, p3 = 1. Therefore it
! ! is in the intersection of these two planes, which is a line in the image.
X~0 ~
t
~ h (s) =
X +s . That is, lines in 3D are imaged as lines in 2D. (Although, in practice,
1 0
some lenses introduce “radial distortion”, which we discuss later.)
Here X~ 0 is an arbitrary 3D point on the line expressed in world coor-
One caveat on eqn (9) is that some of these points may be behind the
dinates, ~t is a 3D vector tangent to the line, and s is the free parameter
principal plane (and therefore behind the camera). Using equations
for points along the line. To avoid special cases, we assume that the
(1) and (7) it follows that Xc,3 (s), the Z-component of the point on
line does not pass through the center of projection, and the tangent di-
the line written in camera coordinates, is equal to the third component
rection ~t has a positive inner-product with the optical axis (more on this
~ h (s) is p~ h (s), which we denoted by α(s) above. Thus the point is in front of
below). By equation (8), the image the point of X
the principal plane if and only if α(s) > 0 (and in front of the lens if
~ h (s) = p~ h (0) + s~pth ,
p~ h (s) = M X α(s) > c for some constant c > 0.)
where M = Min Mex is a 3 × 4 matrix, p~ h (0) = M ((X ~ 0 )T , 1)T , and Since β > 0 we have from (10) that 1/α(s) → 0 and s/α(s) → 1/β
p~th = M (~t T , 0)T . Note p~th and p~ h (0) are both constant vectors, inde- as s → ∞. Therefore, from (9), the image points p~(s) → (1/β)~pth as
pendent of s. Therefore the image of the 3D line, in pixel coordinates, s → ∞. Note that this limit point is a constant image point dependent
is only on the tangent direction ~t.
1 1 h s h
p~(s) ≡ p~ h (s) = p~ (0) + p~ , (9)
ph3 (s) α(s) α(s) t
In fact, in homogeneous world coordinates, the 4D vector (~t T , 0)T is
where α(s) = ph3 (s). Using equations (1) and (7) we find the point at infinity in the direction ~t. The perspective projection of this
point is simply p~th = M (~t T , 0)T , which is homogeneously equivalent
α(s) = p3h (0) + βs, for β = pt,3
h
= ~e3T Mex (~t T , 0)T , (10)
to the limit of the image points we derived above. The next example
where ~e3T = (0, 0, 1). The condition that the inner-product of ~t and the explores this fact further.
direction of the optical axis is positive is equivalent to β > 0.
CSC420: Image Projection Notes: 8

Example: Parallel Lines Project to Intersecting Lines
Next consider a set of parallel lines in 3D, say ~ h (s) all intersect at the
Thus the images of the parallel 3D lines Xk
! ! image point p~th . Moreover, it can be shown from equations (9) and
~0
X ~
t
~ kh (s) =
X k
+s . (10) that, under the natural condition that we only form the image of
1 0
points on the 3D line which are in front of the principal plane (i.e.,
Here all these lines have the same tangent direction ~t, and hence are Xc,3 (s) = α(s) > 0), the projected points on the image line segments
parallel in 3D (both in the world and camera coordinates). converge monotonically to p~th . That is, in the image, the projected line
segments all appear to terminate at p~th . (For example, note the sides of
To eliminate special cases, we again assume that none of these lines the road in the left figure above. Although, as the right figure shows,
passes through the center of projection, and ~t has a positive inner- we can always be surprised.)
product with the direction of the optical axis (i.e., β > 0, with β defined
as in equation (10)). In summary, the common termination point for the images of parallel
lines in 3D is the perspective projection of the 3D tangential direction
Then from the previous example we know that, as s → ∞, the perspec- ~t. It is referred to as the vanishing point.
~ h (s) all converge to the same image
tive projections of the points X k
point, namely p~th = M (~t T , 0)T .

Example: The Horizon Line
As another exercise in projective geometry, we consider multiple sets of parallel lines, all of which are coplanar in 3D. We show that the images of
each parallel set of lines intersect and terminate at a point on the horizon line in the image.
Consider multiple families of parallel lines in a plane, where each fam-

ily of lines has the tangent direction ~tj in 3D. From the previous analy- p~jh = M ([aj~t1 + bj~t2 ] T , 0)T = aj p~1h + bj p~2h
sis, the j th family must co-intersect at the image point (in homogeneous
Dividing through by the third coordinate, phj,3 , we find the point of in-
coordinates)
tersection of the j th family of lines is at the image point
p~jh = M (~tjT , 0)T . ! ! !
h h
1 a p
j 1,3 b p
j 2,3
p~j = h
p~jh = h
p~1 + p~2 = αj p~1 + βj p~2 .
Since the tangent directions are all assumed to be coplanar in 3D, any pj,3 pj,3 phj,3
two distinct directions provide a basis. That is, assuming the first two From this equation it follows that αj + βj = 1. (Hint, look at the last
directions are linearly independent, we can write row in this vector valued equation.) Hence the image point p~j is an
~tj = aj~t1 + bj~t2 , affine combination of the two image points p~1 and p~2 . Therefore the
horizon must be the line in the image passing through p~1 and p~2 , which
for some constants aj and bj . As a result, we have is what we wanted to show.
Example: 3D Sets of Parallel Lines
Many man-made environments have a wealth of rectangular solids. Sketch the lines and the three vanishing points for the (corrected) sets
The surface normals for the planes in these structures are restricted to of lines. You can select visible edges in the image to add further lines
just three orthogonal directions (ignoring signs). This means that there to these three sets. Also sketch the three horizon lines for the three sets
are three horizon lines, one for each surface normal. of parallel planes. In both cases use a suitable notation for vanishing
points and horizon lines that are far outside the image boundary.
It is also relatively common (with a good carpenter) to have 3D lines on
these surfaces which have three mutually orthogonal tangent directions It turns out that the resulting information is suitable for both determin-
~tk , k = 1, 2, 3. An example of such lines is shown on the right, with ing the focal length of the camera (assuming square pixels) and recon-
each family in a different colour. (But I suspect one of these sketched structing a scaled 3D model for the major planar surfaces of the porch.
lines does not correspond to an edge in the scene with one of the three See single-view metrology, say Szeliski, Sec. 6.3.3.
selected tangential directions, can you identify which one?)
Optical Distortion
Image with barrel distortion. Barrel distortion of square grid. Pincushion distortion.
Images from Wikipedia.
Imagine printing an image on a thin rubber sheet. For many cameras, this image is a spatially distorted
version of a perfect perspective transformation of the scene (e.g., top-left). This spatial distortion can
be corrected by warping (i.e., applying a variable stretching and shrinking to) the rubber sheet.
This correction can be done algorithmically by first estimating a parametric warp from sample image
data (perhaps simply one image containing many straight lines). Often a radial distortion suffices. The
overall process is called calibrating the radial distortion. (See Wikipedia, Distortion (Optics).)
This warp can then be applied to any subsequent image acquired by that camera; effectively unwarping
it to provide a new image which is a close approximation to perfect perspective projection.
Lenses
Finally we discuss a more detailed model of lenses, namely the thin lens model.
This model replaces the pinhole camera model, and is essential for:
• relating the optical properties of a lens, such as its focal length, to the parameter f (that we also
called “focal length”) in the pinhole camera model,
• characterizing the defocus of an image as a function of the depth of an object,
• understanding the critical optical blur which is performed before the image is sampled.

Thin Lens: Cardinal Points
The thin lens model provides a more general model for a camera’s lens than a simple pinhole camera.
It allows defocus to be modelled.
Nodal Distance(ND)
Image
Plane
F’
F
N,P
f f
• A cylindrically symmetric lens can be geometrically modelled by three pairs of cardinal points on
the optical axis, namely the focal, nodal, and principal points.
• Here we consider a thin lens, with the same material (such as air) on either side of the lens.
• For this case, the nodal and principal points all agree (denoted, N,P above), and are often called
the center of projection.
• The plane perpendicular to the optical axis containing P is called the principal plane.
• The focal points F and F’ are a distance f away from N. Here f is called the focal length of the lens.

Thin Lens: Principal Rays
~ will be focussed.
The cardinal points provide a geometric way to determine where a world point, O,
Image
O Plane
F’
F
N,P O’
f f
z z’
~ is focussed at O
The point O ~ 0 given by the intersection of (any two of the) three principal rays:
~ passing straight through the nodal point N of the lens.

• A ray from O
• The two rays that are parallel to the optical axis on one side of the principal plane, and pass through
the front or rear focal points (F and F’) on the opposite side of the lens.
~ which pass through the lens are focussed at O
All rays from O ~ 0 (behind the image plane shown above).
~
The lens equation f1 = z1 + z10 follows from this construction, where are z and z 0 be the distances of O
~ 0 to the principal plane.
and O
Thin Lens: Aperture and F-number
A lens aperture can be modelled using an occluder placed within the principal plane.
Image
O Aperture
Plane
F’
F
P,N O’
f f
From Wikipedia.
The aperture itself is the hole in this occluder. Let D denote the aperture diameter.
The f -number (or f-stop) of a lens is given by the ratio f /D.
For the defocussed situation shown above, the point source O is imaged to a small region in the image
plane (i.e., the projection of the aperture plus an additional blur region due to diffraction effects). The
size of this projected region is proportional to D, and therefore inversely proportional to the f-number.
As the f-number increases (i.e., D decreases), the lens behaves more like a pinhole camera, although,
due to diffraction the blur radius never decreases to zero.
Thin Lens: Depth of Field
The depth of field is the distance between the nearest and furthest objects in the scene that appear
acceptably in focus. That is, they are blurred by no more than a small fixed diameter.
F-number: 5 (i.e., “f/5” on the lens) F-number: 32 (i.e., “f/32”)
Since the size of the blurred region is inversely proportional to the f-number, a larger f-number provides
a larger depth of field. This is illustrated by the image pair above (from Wikipedia, depth of field).

Optical Blur, Sensor Elements, and Aliasing
Due to diffraction effects and the physical area of the light sensing elements (e.g., individual CCD
sensors), the incident light sensed by any camera has been spatially averaged over a small region in the
image plane. This (analogue) averaging plays a critical role in image formation.
A perspective image of an infinite checkerboard is rendered by a pinhole camera model (above left).
Due to the point sampling, the checks in the distance appear distorted. This is called “aliasing”. Given
a more appropriate model for the analogue optical blur this aliasing is eliminated (above right).

Resampling and Aliasing
Downsampling an image refers to reducing the number of pixels. E.g., downsampling by 2 uses every
second pixel in every second row. (This is also called decimation.) Before downsampling, care must
be taken that aliasing isn’t introduced in the downsampled image.
Resampling Rule of Thumb. One can safely resample an image by K, in each direction x and y, only
if the original image is smooth enough that, any point in the original image can be approximated (say
using bilinear interpolation) given only the 4 nearest downnsampled neighbours.
Otherwise the image should first be blurred (next lecture), then downsampled.
Other Issues in Image Projection and Formation
Intrinsic Calibration refers to a procedure to estimate the intrinsic parameters to the camera, namely
the parameters of the intrinsic calibration matrix Min (as, say, given in equation (7)), along with the
radial distortion parameters for the camera.
Extrinsic Calibration refers to estimating the extrinsic calibration matrix Mext, with respect to some
predetermined world coordinate frame. (For both types of calibration, see the Camera Calibration
Toolbox for Matlab, by Jean-Yves Bouguet.)
Radiometry, Reflection and Colour. In order to synthesize an image we also require some under-
standing of the measurement of light (i.e., radiometry), and reflectance (i.e., the interaction of light
with surfaces). See the additional readings on the course homepage for more information. Here we
will largely ignore these topics since firstly, we have enough on our plate already, and secondly, these
topics overlap with other courses (i.e., CSC320 and CSC418).
Digital Image Formation. A good overview is in Szeliski, Sec. 2.3.
Image Noise arises from most of the steps of digital image formation. In this course we will restrict
ourselves to simple noise models. Noise will be a constant companion from here on.

Aside: Orthographic Projection
Scaled orthographic projection provides a linear approximation to perspective projection, which is
applicable for a small object far from the viewer and close to the optical axis.
(X,Y,0)
(X,Y,Z) X
x
Z
y
Image Plane
Y
Given a 3D point (X, Y, Z), the corresponding image location under scaled orthographic projection is
! !
x X
= s0
y Y
Here s0 is a constant scale factor; orthographic projection uses s0 = 1.
There are several other alternative approximations to perspective projection.


Image Projection 339

Uploaded by

Copyright:

Available Formats

Image Projection 339

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Projection 339

Uploaded by

Copyright:

Available Formats

Image Projection

Motivation: The mathematics of image projection allow us to answer two questions:

Readings: Szeliski, Chapter 2.

Consider the three coordinate frames:

CSC420: Image Projection Page: 3

Here the extrinsic calibration matrix Mex is a 3 × 4 matrix of the form

~ c (i.e., in the camera’s coordi-

CSC420: Image Projection Page: 6

p~ h = d~p = d (p1, p2, 1)T .

CSC420: Image Projection Page: 7

CSC420: Image Projection Notes: 8

CSC420: Image Projection Notes: 9

Consider multiple families of parallel lines in a plane, where each fam-

• characterizing the defocus of an image as a function of the depth of an object,

CSC420: Image Projection Page: 13

CSC420: Image Projection Page: 14

~ passing straight through the nodal point N of the lens.

The f -number (or f-stop) of a lens is given by the ratio f /D.

F-number: 5 (i.e., “f/5” on the lens) F-number: 32 (i.e., “f/32”)

CSC420: Image Projection Page: 17

CSC420: Image Projection Page: 18

Digital Image Formation. A good overview is in Szeliski, Sec. 2.3.

CSC420: Image Projection Page: 20

There are several other alternative approximations to perspective projection.

You might also like