Cvdict
Cvdict
Cvdict
IMAGE PROCESSING
Dictionary of Computer Vision and
Image Processing
R. B. Fisher
University of Edinburgh
K. Dawson-Howe
Trinity College Dublin
A. Fitzgibbon
Oxford University
C. Robertson
CEO, Epipole Ltd
C. Williams
University of Edinburgh
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission
should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts
in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales
representatives or written sales materials. The advice and strategies contained herin may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care
Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Dictionary of Computer Vision and Image Processing / Robert B. Fisher . . . [et al.].
***
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars
Preface ix
References xiii
vii
PREFACE
ix
x PREFACE
portion of the field. Some of the concepts are also quite recent and, although
commonly used in research publications, have not yet appeared in mainstream
textbooks. Thus this book is also a useful source for recent terminology and
concepts.
Certainly some concepts are still missing from the dictionary, but we have
scanned both textbooks and the research literature to find the central and
commonly used terms.
Although the dictionary was intended for beginning and intermediate students
and researchers, as we developed the dictionary it was clear that we also had some
confusions and vague understandings of the concepts. It also surprised us that
some terms had multiple usages. To improve quality and coverage, each definition
was reviewed during development by at least two people besides its author. We
hope that this has caught any errors and vagueness, as well as reproduced the
alternative meanings. Each of the co-authors is quite experienced in the topics
covered here, but it was still educational to learn more about our field in the
process of compiling the dictionary. We hope that you find using the dictionary
equally valuable.
The authors would like to thank Xiang (Lily) Li and Georgios Papadimitriou for
their help with finding citations for the content from the first edition. We also
greatly appreciate all the support from the Wiley editorial and production team!
To help the reader, terms appearing elsewhere in the dictionary are underlined.
We have tried to be reasonably thorough about this, but some terms, such as 2D,
3D, light, camera, image, pixel and color were so commonly used that we decided
to not cross-reference all of these.
We have tried to be consistent with the mathematical notation: italics for scalars
(s), arrowed italics for points and vectors (~v ), and mathbf letters for matrices (M).
xi
The reference for most of the terms has two parts: AAA: BBB. The AAA
component refers to one of the items listed below. The BBB component normally
refers to a chapter/section/page in reference AAA. Wikipedia entries (WP) are
slightly different, in that the BBB term is the relevant Wikipedia page
http://en.wikipedia.org/wiki/BBB
REFERENCES
Dictionary of Computer Vision andImage Processing, First Edition. By Robert B. Fisher, et alxiii
c
ISBN *** 2012 John Wiley & Sons, Inc.
xiv REFERENCES
18. C. Chatfield; The Analysis of Time Series: An Introduction, Chapman and Hall,
London, 4th edition, 1989.
19. C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.
20. B. Croft, D. Metzler, T. Strohman. Search Engines: Information Retrieval in
Practice, Addison-Wesley Publishing Company, USA, 2009.
21. B. Cyganek, J. P. Siebert. An Introduction to 3D Computer Vision Techniques and
Algorithms, Wiley, 2009.
22. T. M. Cover, J. A. Thomas. Elements of Information Theory, John Wiley & Sons,
1991.
23. R. O. Duda, P. E. Hart. Pattern Classification and Scene Analysis, James Wiley,
1973.
24. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms,
Cambridge University Press, Cambridge, 2003.
25. D. Marr. Vision, Freeman, 1982.
26. A. Desolneux, L. Moisan, J.-M. Morel. From Gestalt Theory to Image Analysis,
Springer, 2008.
27. E. Hecht. Optics. Addison-Wesley, 1987.
28. E. R. Davies. Machine Vision, Academic Press, 1990.
29. E. W. Weisstein. MathWorldA Wolfram Web Resource,
http://mathworld.wolfram.com/, accessed March 1, 2012.
30. F. R. K. Chung; Spectral Graph Theory.American Mathematical Society, 1997.
31. D. Forsyth, J. Ponce. Computer Vision - a modern approach, Prentice Hall, 2003.
32. J. Flusser, T. Suk, B. Zitov
a. Moments and Moment Invariants in pattern
Recognition, Wiley, 2009.
33. A.Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, Bayesian Data Analysis,
Chapman and Hall, London, 1995.
34. A. Gersho, R. Gray. Vector Quantization and Signal Compression, Kluwer, 1992.
35. P. Green, L. MacDonald (Eds.). Colour Engineering, Wiley, 2003.
36. G. R. Grimmett, D. R. Stirzaker. Probability and Random Processes, Clarendon
Press, Oxford, Second edition, 1992.
37. G. H. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University
Press, Second edition, 1989.
38. S. Gong, T. Xiang. Visual Analysis of Behaviour: From Pixels to Semantics,
Springer, 2011.
39. H. Freeman (Ed). Machine Vision for Three-dimensional Scenes, Academic Press,
1990.
40. H. Freeman (Ed). Machine Vision for Measurement and Inspection, Academic Press,
1989.
41. R. M. Haralick, L. G. Shapiro. Computer and Robot Vision, Addison-Wesley
Longman Publishing, 1992.
42. H. Samet. Applications of Spatial Data Structures, Addison Wesley, 1990.
43. T. J. Hastie, R. J. Tibshirani, J. Friedman. The Elements of Statistical Learning,
Springer-Verlag, 2008.
44. R. Hartley, A. Zisserman. Multiple View Geometry, Cambridge University Press,
2000.
REFERENCES xv
1D: One dimensional, usually in direction parallel to the other axis, and
reference to some structure. Examples reading the numbers at the
include: 1) a signal x(t) that is a intersections [ JKS:1.4]:
function of time t, 2) the dimensionality
of a single property value or 3) one
degree of freedom in shape variation or
motion. [ EH:2.1]
primarily for the detection and location deformable curve representation such as
of objects (e.g., underwater or in air, as a snake . The term active refers to the
in mobile robotics, or internal to a ability of the snake to deform shape to
human body, as in medical ultrasound ) better match the image data. See also
by reflecting and intercepting acoustic active shape model . [ SQ:8.5]
waves. It operates with acoustic waves
in an analogous way to that of radar , active contour tracking: A
using both the time of flight and technique used in model based vision
Doppler effects, giving the radial where object boundaries are tracked in
component of relative position and a video sequence using
velocity. [ WP:Sonar] active contour models .
3 8
1
1 3
8
Yellow
Green Red 4 4
5
7
5 6
7
6
The noise added at each pixel (i, j) affine arc length: For a parametric
could be different. [ SEU:3.2] equation of a curve f~(u) = (x(u), y(u)),
arc length is not preserved under an
adjacent: Commonly meaning next affine transformation . The affine length
to each other, whether in a physical
Z u
sense of being connected pixels in an 1
(u) = (x
y x
y)
3
image, image regions sharing some 0
12 A
Albedo values
f(x)
Antimode
11 11
00 00
f(x) 111
000 111
000 00 11
11 00
01 01 01
10 10 10
111
000 111
000 11 00
00 11
0110 0110 0110
10 10 10
x=a x=b x
111
000 111
000 11 00
00 11
arc of graph: Two nodes in a graph
can be connected by an arc. The 00
11
00
11 00
11
00
11 00
11
00
11 000
111
000
111
dashed lines here are the arcs:
00
11 00
11 00
11 000
111
11 11
00 00
111
000 111
000 00 11
11 00
astigmatism: Astigmatism is a
refractive error where the light is
focused within an optical system, such
object
as in this example.
aspects
navigation requires the visual tasks of axis of elongation: 1) The line that
route detection, self-localization , minimizes the second moment of the
landmark location and data points. If {~xi } are the data points,
obstacle detection , as well as robotics and d(~x, L) is the distance from point ~x
tasks such as route planning and motor to line L, then P the axis of elongation A
control. [ WP:Driverless car] minimizes i d(~xi , A)2 . Let ~ be the
meanPof {~xi }. Define the scatter matrix
autoregressive model: A model that S = ~ )T . Then the
i (~
xi
~ )(~xi
uses statistical properties of past axis of elongation is the eigenvector of
behavior of some variable to predict S with the largest eigenvalue . See also
future behavior of that variable. A principal component analysis . The
signal xt at time t satisfies an figure below shows this axis of
autoregressive
Pp model if elongation for a set of points. 2) The
xt = n=1 n xtn + t , where t is longer midline of the bounding box
noise. [ WP:Autoregressive model] with largest length-to-width ratio. A
possible axis of elongation is the line in
autostereogram: An image similar to
this figure [ JKS :2.2.3]:
a random dot stereogram in which the
corresponding features are combined
into a single image. Stereo fusion
allows the perception of a 3D shape in
the 2D image. [ WP:Autostereogram]
B-rep: See
surface boundary representation . b-spline snake: A snake made from
[ BT:8] b-splines .
background modeling:
Segmentation or change detection
method where the scene behind the
objects of interest is modeled as a fixed
or slowly changing background , with
possible foreground occlusions . Each
pixel is modeled as a distribution which
is then used to decide if a given
observation belongs to the background
or an occluding object. [ NA:3.5.2]
background normalization:
Removal of the background by some
image processing technique to estimate
the background image and then
dividing or subtracting the background
from an original image. The technique
is useful for when the background is
non-uniform. The images below backlighting: A method of
illustrate this where the first shows the illuminating a scene where the
input image, the second is the background receives more illumination
background estimate obtained by than the foreground . Commonly this is
dilation with ball(9, 9) used to produce silhouettes of opaque
structuring element and the third is the objects against a lit background, for
(normalized) division of the input image easier object detection. [ LG:2.1.1]
by the background image. [ JKS:3.2.1]
bandpass filter: A signal processing
filtering technique that allows signals
between two specified frequencies to
pass but cuts out signals at all other
frequencies. [ FP:9.2.2]
24 B
9000
5000
2000
d2
d1 d1
(x,y)
d2
f12 f22
images, aerial and ground-level in the hope that any defects will
photographs. manifest themselves early in the
components life (e.g., 72 hours of
bundle adjustment: An algorithm typical use). 3) The practice of
used to optimally determine the three discarding the first several samples of
dimensional coordinates of points and an MCMC process in the hope that a
camera positions from two dimensional very low-probability starting point will
image measurements. This is done by be converge to a high-probability point
minimizing some cost function that before beginning to output samples.
includes the model fitting error and the [ NA:1.4.1]
camera variations. The bundles are the
light rays between detected 3D features butterfly filter: A linear filter
and each camera center. It is these designed to respond to butterfly
bundles that are iteratively adjusted patterns in images. A small butterfly
(with respect to both camera centers filter convolution kernel is
and feature positions). [ FP:13.4.2]
0 2 0
burn-in: 1) A phenomenon of early 1 2 1
tube-based cameras and monitors 0 2 0
where, if the same image was presented
for long periods of time it became It is often used in conjunction with the
permanently burnt into the Hough transform for finding peaks in
phosphorescent layer. Since the advent the Hough feature space, particularly
of modern monitors (1980s) this no when searching for lines. The line
longer happens. 2) The practice of parameter values of (p, ) will generally
shipping only electronic components give a butterfly shape with a peak at
that have been tested for long periods, the approximate correct values.
C
the baselines are parallel to the image Hough transforms , with the output of
planes and the horizontal axes of the one transform used as input to the
image planes are parallel. This results next.
in epipolar lines that are parallel to the
horizontal axes, hence simplifying the cascading Gaussians: A term
search for correspondences. referring to the fact that the
convolution of a Gaussian with itself is
Optical Centers Optical Axes another Gaussian. [ JKS:4.5.4]
CBIR: See
Y
content based image retrieval .
cartography: The study of maps and [ WP:Content-based image retrieval]
map-building. Automated cartography
is the development of algorithms that CCD: Charge-Coupled Device. A solid
reduce the manual effort in map state device that can record the number
building. [ WP:Cartography] of photons falling on it.
0
0 5 10 15 20 25 30
character verification: A process
|X|2, X R5
used to confirm that printed or
displayed characters are within some [ WP:Chi-square distribution]
tolerance that guarantees that they are
readable by humans. It is used in chi-squared test: A statistical test of
applications such as labeling. the hypothesis that a set of sampled
values has been drawn from a given
characteristic view: An approach to distribution. See also
object representation in which an chi-squared distribution .
object is encoded by a set of views of [ WP:Chi-square test]
C 39
a circle is given by
A
B
A
C = 4
P2 b c a
camera
color based image retrieval: An
example of the more general
image database indexing process ,
where one of the main indices into the
image database comes from either color
samples, the color distribution from a
sample image, or by a set of text color
object terms (e.g., red), etc. [ WP:Content-
based image retrieval#Color]
matching is expressed as :
C = RR + GG + BB
S3
S32
conditional replenishment: A
method for coding of video signals,
S1
S where only the portion of a video image
that has changed since the previous
S2
frame is transmitted. Effective for
sequences with largely stationary
S4 backgrounds, but more complex
S41
sequences require more sophisticated
S
algorithms that perform motion
S1 S2 S3
compensation.
S4
constrained optimization: - =
Optimization of a function f subject to
constraints on the parameters of the
function. The general problem is to find content based image retrieval:
the x that minimizes (or maximizes) Image database searching methods that
f (x) subject to g(x) = 0 and produce matches based on the contents
h(x) >= 0, where the functions f, g, h of the images in the database, as
may all take vector-valued arguments, contrasted with using text descriptors
and g and h may also be vector-valued, to do the indexing. For example, one
encoding multiple constraints to be can use descriptors based on
satisfied. Optimization subject to color moments to select images with
equality constraints is achieved by the similar invariants.
method of Lagrange multipliers . [ WP:Content-based image retrieval]
Optimization of a quadratic form
subject to equality constraints results context: In vision, the elements,
in a generalized eigensystem. information, or knowledge occurring
Optimization of a general f subject to together with or accompanying some
general g and h may be achieved by data, contributing to the datas full
meaning. For example, in a video
C 51
object
convex hull
input image
convexity ratio: Also known as
solidity. A measure that characterizes
deviations from convexity. The ratio for
area(X)
shape X is defined as area(C X)
, where
CX is the convex hull of X. A convex
figure has convexity factor 1, while all
other figures have convexity less than 1.
co-occurrence matrix: A
Z X
representation commonly used in
texture analysis algorithms. It records coordinate system transformation:
the likelihood (usually empirical) of two A geometric transformation that maps
features or properties being at a given points, vectors or other structures from
position relative to each other. For one coordinate system to another. It is
example, if the center of the matrix M also used to express the relationship
is position (a, b) then the likelihood between two coordinate systems.
that the given property is observed at Typical transformations include
an offset (i, j) from the current pixel is translation and rotation . See also
given by matrix value M(a + i, b + j). Euclidean transformation.
[ WP:Co-occurrence matrix] [ WP:Coordinate system#Transformations]
cooperative algorithm: An
algorithm that solves a problem by a coplanarity: The property of lying in
series of local interactions between the same plane. For example, three
adjacent structures, rather than some vectors ~a, ~b and ~c are coplanar if their
global process that has access to all scalar triple product (~a ~b) ~c = 0 is
data. The value at a structure changes zero. [ WP:Coplanarity]
iteratively in response to changing
values at the adjacent structures, such coplanarity invariant: A
as pixels, lines, regions, etc. The projective invariant that allows one to
expectation is that the process will determine when five corresponding
converge to a good solution. The points observed in two (or more) views
algorithms are well suited for massive are coplanar in the 3D space. The five
local parallelism (e.g., SIMD ), and are points allow the construction of a set of
sometimes proposed as models for four collinear points whose cross ratio
human image processing. An early value can be computed. If the five
algorithm to solve the points are coplanar, then the cross ratio
stereo correspondence problem used value must be the same in the two
54 C
A
cosine diffuser: Optical correction
mechanism for correcting spatial
L responsivity to light. Since off-angle
B
D light is treated with the same response
as normal light, a cosine transfer is
E
used to decrease the relative
responsivity to it.
C
cosine transform: Representation of
an signal in terms of a basis of cosine
core line: See medial line .
functions. For an even 1D function
corner detection: See f (x), the cosine transform is
curve segmentation . [ NA:4.6] Z
F (u) = 2 f (x) cos(2ux)dx.
corner feature detectors: See 0
interest point feature detectors and For a sampled signal f0..(n1) , the
curve segmentation . [ NA:4.6.4] discrete cosine transform is the vector
b0..(n1) where, for k 1:
coronary angiography: A class of
image processing techniques (usually r n1
based on X-ray data) for visualizing 1X
b0 = fi
and inspecting the blood vessels n i=0
surrounding the heart (coronaries). See r n1
2X
also angiography . bk = fi cos (2i + 1)k
[ WP:Coronary catheterization] n i=0 2n
score, where the sets being matched are cross-validation: A test of how well a
the local neighborhoods of each pixel. model generalizes to other data (i.e.,
[ NA:5.3.1] using samples other than those that
were used to create the model). This
cross ratio: The simplest projective
approach can be used to determine
invariant. It generates a scalar from
when to stop training/learning, before
four points of any 1D projective space
over-generalization occurs. See also
(e.g., a projective line). The cross ratio
leave-one-out test . [ FP:16.3.5]
for the four points ABCD below is
[ FP:13.1]: crossing number: The crossing
number of a graph is the minimum
number of arc intersections in any
(r + s)(s + t) drawing of that graph. A planar graph
s(r + s + t) has crossing number zero. This graph
has a crossing number of one
[ ERD:6.8.1]:
t
s
r
D A
B
C
B
A
b c a
BINORMAL
TANGENT
NORMAL
60 C
cylindrical mosaic: A
NORMAL
photomosaicing approach where
cut detection: The identification of individual 2D images are projected onto
the frames in film or video where the a cylinder. This is possible only when
camera viewpoint suddenly changes, the camera rotates about a single axis
either to a new viewpoint within the or the camera center of projection
current scene or to a new scene. remains approximately fixed with
[ WP:Shot transition detection] respect to the distance to the nearest
scene points.
cyclopean view: A term used in
stereo image analysis, based on the cylindrical surface region: A region
mythical one-eyed Cyclops. When of a surface that is locally cylindrical.
stereo reconstruction of a scene occurs A region in which all points have zero
based on two cameras, one has to Gaussian curvature , and nonzero
consider what coordinate system to use mean curvature.
D
61
62 D
Start
diffeomorphism: A differentiable
one-to-one map between manifolds.
The map has a differentiable inverse.
[ WP:Diffeomorphism]
difference-of-Gaussians operator:
A convolution operator used to locate
edges in a gray-scale image using an
Conjugate gradient search approximation to the
Laplacian of Gaussian operator. In 2D
DFT: See discrete Fourier transform . the convolution mask is:
[ SEU:2.5.1]
(x2 +y 2 )
(x2 +y 2 )
2 2
c1 e 1
c2 e 2
digital elevation map: A sampled blood vessels are made more visible by
and quantized map where every point using an X-ray contrast medium. See
represents a height above a reference also medical image registration .
ground plane (i.e., the elevation). [ WP:Digital subtraction angiography]
41 43 45 51 56 49 45 40
56 48 65 85 55 52 44 46
59 77 99 81 127 83 46 56
52 116 44 54 55 186 163 163
51 129 46 48 71 164 86 97
50 85 192 140 167 99 51 44
57 63 91 126 102 56 54 49
146 169 213 246 243 139 180 163
41 44 54 56 47 45 36 54 digitization: The process of making a
sampled digital version of some analog
digital image processing: signal (such as an image).
Image processing restricted to the [ WP:Digitizing]
domain of digital images .
[ WP:Digital image processing] dihedral edge: The edge made by two
planar surfaces. A fold in a surface:
digital signal processor: A class of
co-processors designed to execute
processing operations on digitized
signals efficiently. A common
characteristic is the provision of a fast
multiply and accumulate function, e.g.,
a a + b c.
[ WP:Digital signal processor]
[ OF:6.2.5] 4
4
3
3
2
2
1
1
0
0
1
0
1
0
1
0
0
0
1
1
4 3 2 1 1 1 1 1 1 1
4 3 2 2 2 2 2 2 2 2
document mosaicing:
Image mosaicing of documents.
Initial Estim
ate downhill simplex: A method for
finding a local minimum using a
divisive clustering: simplex (a geometrical figure specified
Clustering/cluster analysis in which all by N + 1 vertices) to bound the
items are initially considered as a single optimal position in an N -dimensional
set (cluster) and subsequently divided space. See also optimization .
into component subsets (clusters). [ WP:Nelder-Mead method]
Maxim
low level vision . [ BKPH:1.4]
um o
earth movers distance: A metric for rthog
Position A Position B
111
000 Motion of the observer
111
000
000
111 000
111
eigenface: An eigenvector determined ellipsoid: A 3D volume in which all
from a matrix A in which the columns plane cross sections are ellipses or
E 75
circles. An ellipsoid is the set of points MPEG and JPEG image compression .
2 2 2
(x, y, z) satisfying xa2 + yb2 + zc2 = 1. [ WP:Code]
Ellipsoids are used in computer vision
as a basic shape primitive and can be endoscope: An instrument for visually
combined with other primitives in order examining the interior of various bodily
to describe a complex shape. [ SQ:9.9] organs. See also fiberscope .
[ WP:Endoscopy]
elliptic snake: An
active contour model of an ellipse energy minimization: The problem
whose parameters are estimated of determining the absolute minimum
through energy minimization from an of a multivariate function representing
initial position. (by a potential energy-like penalty) the
distance of a potential solution from
elongatedness: A the optimal solution. It is a
shape representation that measures specialization of the optimization
how long a shape is with respect to its problem. Two popular minimization
width (i.e., the ratio of the length of algorithms in computer vision are the
the bounding box to its width), as LevenbergMarquardt and Newton
illustrated below. See also eccentricity . optimization methods.
[ WP:Elongatedness] [ WP:Energy minimization]
X
P (x) log P (x)
xX
EM: See expectation maximization .
[ FP:16.1.2]
with the understanding that
empirical evaluation: Evaluation of 0 log 0 := 0. For a multivariate
computer vision algorithms in order to distribution, the joint entropy H(X, Y )
characterize their performance by of X, Y is
comparing the results of several
algorithms on standardized test
problems. Careful evaluation is a X
difficult research problem in its own P (x, y) log P (x, y)
(x,y)XY
right.
pass. See also epipolar geometry . function of the translation and rotation
[ FP:10.1.1] of the camera in the world reference
frame. See also the fundamental matrix
Image Epipolar Lines
. [ FP:10.1.2]
Euclidean transformation: A
transformation that operates in
Euclidean space (i.e., maintaining the
Euclidean spatial arrangements).
error propagation: 1) The Examples include rotation and
propagation of errors resulting from one translation. Often applied to
computation to the next computation. homogeneous coordinates. [ FP:2.1.2]
2) The estimation of the error (e.g., [ SQ:7.3]
variance) of a process based on the Euler angle: The Euler angles
estimates of the error in the input data (, , ) are a particular set of angles
and intermediate computations. describing rotations in three
[ WP:Propagation of uncertainty] dimensional space. [ JKS:12.2.1]
essential matrix: In EulerLagrange: The
binocular stereo, a matrix E expressing EulerLagrange equations are the basic
a bilinear constraint between equations in the calculus of variations ,
corresponding image points u, u in a branch of calculus concerned with
camera coordinates: u Eu = 0. This maxima and minima of definite
constraint is the basis for several integrals. They occur, for instance, in
reconstruction algorithms. E is a
78 E
Lagrangian mechanics and have been method works well even when there are
used in computer vision for a variety of missing values. [ FP:16.1.2]
optimizations, including for surface
interpolation. See also expectation value: The mean value
variational approach and of a function (i.e., the average expected
variational problem . [ TV:9.4.2] value). If p(x) is the probability density
function of a random variable
R x, the
Euler number: The number of expectation of x is x = p(x)xdx.
contiguous parts (regions) less the [ VSN:A2.2]
number of holes. Also known as the
genus. [ AJ:9.10] expert system: A system that uses
available knowledge and heuristics to
even field: The first of the two fields solve problems. See also
in an interlaced video signal. knowledge based vision . [ AL:11.2]
[ AJ:11.1]
exponential smoothing: A method
even function: for predicting a data value (Pt+1 ) based
A function where f (x) = f (x) for all x. on the previous observed value (Dt )
[ WP:Even and odd functions#Even functions] and the previous prediction (Pt ).
Pt+1 = Dt + (1 )Pt where is a
weighting value between 0 and 1.
event analysis: See [ WP:Exponential smoothing]
event understanding .
[ WP:Event study]
Dt
Image from a sequence of images Movement detected in the image
Pt (=0.5)
0 1 2 3 4 5 6 7 8 9
Time
Area
Maximum
Ferets diameter
Rectangularity
Stabilized sequence
fiber optics: A medium for
transmitting light that consists of very
thin glass or plastic fibers. It can be
Stabilized feature
used to provide much higher bandwidth
for signals encoded as patterns of light
pulses. Alternately, it can be used to
feature tracking: See transmit images directly through
feature based tracking . [ TV:8.4.2] rigidly connected bundles of fibers, so
as to see around corners, past obstacles,
feature vector: A vector formed by etc. [ EH:5.6]
the values of a number of image
features (properties), typically all fiberscope: A flexible fiber optic
associated with the same object or instrument allowing parts of an object
image. [ SEU:2.6.1] to be viewed that would normally be
inaccessible. Most often used in medical
feedback: The use of outputs from a examinations. [ WP:Fiberscope]
system to control the systems actions.
[ WP:Feedback] fiducial point: A reference point for a
given algorithm, e.g., a fixed, known,
Ferets diameter: The distance easily detectable pattern for a
between two parallel lines at the calibration algorithm.
extremities of some shape that are
tangential to the boundary of the figureground separation: The
shape. Maximum, minimum and mean segmentation of the area of the image
values of Ferets diameter are often representing the object of interest (the
used (where every possible pair of figure) from the remainder of the image
parallel tangent lines is considered). (the background).
84 F
Focal Length
flow field: See optical flow field .
[ OF:9.2] focal point: The point on the
optical axis of a lens where light rays
flow histogram: A histogram of the
from an object at infinity (also placed
optical flow in an image sequence. This
on the optical axis) converge.
can be used, for example, to provide a
[ FP:1.2.2]
qualitative description of the motion of
the observer. Focal Point
fractal representation: A
representation based on self-similarity. Yworld
Z
Ycylinder
identification of people based on their Original Image Normal first derivative Gaussian first derivative
Gaussian smoothing: An
image processing operation aimed to gaze direction tracking: Continuous
attenuate image noise computed by gaze direction estimation (e.g., in a
convolution with a mask sampling a video sequence or a live camera feed).
Gaussian distribution . [ TV:3.2.2]
gaze location: See
gaze direction estimation .
generalized cone: A
generalized cylinder in which the swept
curve changes along the axis.
[ VSN:9.2.3]
Axis
[ WP:Genetic algorithm]
Gestalt: German for shape. The geodesic distance: The length of the
Gestalt school of psychology, led by the shortest path between two points along
German psychologists Wertheimer, some surface. This is different from the
Kohler and Koffka in the first half of Euclidean distance that takes no
the twentieth century, had a profound account of the surface. The following
example shows the geodesic distance
G 95
geometric
gesture analysis: Basic analysis of
representation: See geometric model.
video data representing human gestures
[ WP:RGB color model#Geometric representation]
preceding the task of
gesture recognition .
geometric shape: A shape that takes [ WP:Gesture recognition]
a relatively simple geometric form (such
gesture recognition: The recognition
as a square, ellipse, cube, sphere,
of human gestures generally for the
generalized cylinder , etc.) or that can
purpose of humancomputer
be described as a combination of such
interaction. See also
geometric primitives.
hand sign recognition .
[ WP:Tomahawk (geometric shape)]
[ WP:Gesture recognition]
geometric transformation: A class
of image processing operations that
HI STOP
transform the spatial relationships in an
image. They are used for the correction
of geometric distortions and general
image manipulation. A geometric
transformation requires the definition of
a pixel coordinate transformation
together with an interpolation scheme.
For example, a rotation does Gibbs sampling: A method for
[ SEU:3.5]: probabilistic inference based on
transition probabilities (between
states). [ WP:Gibbs sampling]
Intensity
Gradient
global: A global property of a
mathematical object is one that Position in image row
depends on all components of the Position in image row
Gradient Filter
-1 0 1
* -2 0 2 =
-1 0 1
Z
Vectors representing various
A surface orientations Gradient Space
B q
D E C
C
B
E
Y A p
D
TIME
gray scale moment: A moment that of a number of pixels) from the local
is based on image or region neighborhood are used. Grid filters
gray scales . See also binary moment. require a training phase where noisy
data and corresponding ideal data are
gray scale morphology: See presented.
gray scale mathematical morphology.
[ SQ:7.2] ground following: See
ground tracking .
gray scale similarity: See
gray scale correlation . ground plane: The horizontal plane
that corresponds to the ground (the
gray scale texture moment: A surface on which objects stand). This
moment that describes texture in a concept is only really useful when the
gray scale image (e.g., the Haralick ground is roughly flat. The ground
texture operator describes image plane is highlighted here:
homogeneity). [ WP:Ground plane]
gray scale transformation: A
general term describing a class of
image processing operations that apply
to gray scale images , and simply
manipulate the gray scale of pixels.
Example operations include
contrast stretching and
histogram equalization .
Helmholtz reciprocity: An
observation by Helmholtz about the
bidirectional reflectance
distribution function fr (~i, ~e) of a local
Hankel transform: A simplification surface patch, where ~i and ~e are the
of the Fourier transform for radially incoming and outgoing light rays
symmetric functions. respectively. The observation is that
[ WP:Hankel transform] the reflectance is symmetric about the
incoming and outgoing directions, i.e.,
hat transform: See fr (~i, ~e) = fr (~e,~i). [ FP:4.2.2]
Laplacian of Gaussian (also known as
Mexican hat operator ) and/or Hessian: The matrix of second
top hat operator . derivatives of a multi-valued scalar
[ WP:Top-hat transform] function. It can be used to design an
orientation-dependent
Harris corner detector: A second derivative edge detector
corner detector where a corner is
" 2 f (i,j) 2 f (i,j) #
2
detected if the eigenvalues of the matrix [ FP:3.1.2]. H = 2 i f (i,j)
ij
2 f (i,j)
M are large and locally maximum ji j 2
(f (i, j) is the intensity at point (i,j)). heterarchical/mixed control: An
approach to system control where
" #
f f f f
i i i j
M = f f f f . control is shared amongst several
i j j j systems.
To avoid explicit comutation of the
eigenvalues, the local maxima of heuristic search: A search process
det(M) 0.004 trace(M) can be that employs common-sense rules
used. This is also known as the (heuristics) to speed up search.
Plessey corner finder . [ BB:4.4]
[ WP:Harris affine region detector#Harris corner measure]
hexagonal image representation:
An image representation where the
Hartley transform: Similar pixels are hexagonal rather than
transform to the Fourier transform ,
H 105
applied.
600 600
0
0 Grey Scale 255
Frequency
Frequency
anti-mode/trough in a
bi-modal histogram for use in hit and miss/hit or miss operator:
thresholding). A morphological operation where a
new image is formed by ANDing
histogram equalization: An (logical AND) together corresponding
image enhancement operation that bits for every pixel of an input image
processes a single image and results in and a structuring element. This
an image with a uniform distribution of operator is most appropriate for
H 107
homogeneous representation: A
representation defined in
HK: See mean and Gaussian curvature projective space . [ HZ:1.2.1]
shape classification . homography: The relationship
[ WP:Gaussian curvature] described by a
HK segmentation: See homography transformation .
mean and Gaussian curvature [ WP:Homography]
shape classification . homography transformation: Any
[ WP:Gaussian curvature] invertible linear transformation between
HMM: See hidden Markov model . projective spaces. It is commonly used
[ FP:23.4] for image transfer , which maps one
planar image or region to another. The
holography: The process of creating a transformation can be estimated using
three dimensional image (a hologram) four non-collinear point pairs.
by recording the interference pattern [ WP:Homography]
produced by coherent laser light that
has been passed through a homomorphic filtering: An
diffraction grating. [ WP:Holography] image enhancement technique that
simultaneously normalizes brightness
homogeneous, homogeneity: 1. ( and enhances contrast. It works by
Homogeneous coordinates :) In applying a high pass filter to the
projective n-dimensional geometry, a original image in the frequency domain,
point is represented by a n + 1 element hence reducing intensity variation (that
vector, with the Cartesian changes slowly) and highlighting
representation being found by dividing reflection detail (that changes rapidly).
the first n components by the last one. [ SEU:3.4.4]
Homogeneous quantities such as points
are equal if they are scalar multiples of homotopic transformation: A
each other. For example a 2D point is continuous deformation that preserves
represented as (x, y) in Cartesian the connectivity of object features (e.g.,
coordinates and in homogeneous skeletonization ). Two objects are
coordinates by the point (x, y, 1) and homotopic if they can be made the
any multiple thereof. 2. (Homogeneous same by some series of homotopic
texture:) A two (or higher) dimensional transformations.
pattern, defined on a space S R2 for Hopfield network: A type of neural
which some functions (e.g., mean, network mainly used in optimization
standard deviation) applied to a problems, which has been used in
window on S have values that are object recognition . [ WP:Hopfield net]
independent of the position of the
window. [ WP:Homogeneous space]
108 H
hyperspectral image .
[ WP:Hyperspectral imaging]
11
00 hypothesize and test: See
hypothesize and verify . [ JKS:15.1]
camera . [ TV:2.3]
Original Image Image with Gaussian Noise Image with Salt and Pepper Noise
IMAGE PLANE
OPTICAL AXIS
LENS
X
111111111111
000000000000
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
0000000000000011
111111111111
000
111
000000
111111
000
111
? ? ?
DATA 4
...
X - CONSISTENCY FAILURE
M1 - MODEL 1
M2 - MODEL 2
M3 - MODEL 3
* - WILDCARD
126
K
127
128 K
knowledge representation: A
general term for methods of computer
encoding knowledge. In
SWEPT computer vision systems, this is usually
TRAJECTORY
knowledge about recognizable objects
and visual processing methods. A
common knowledge representation
Kirsch compass edge detector: A
scheme is the geometric model that
first derivative edge detector that
records the 2D or 3D shape of objects.
computes the gradient in different
Other commonly used vision knowledge
directions according to which
representation schemes are
calculation mask is used. Edges have
graph models and frames . [ BT:9]
high gradient values, so thresholding
the intensity gradient magnitude is one Koenderinks surface shape
approach to edge detection . A Kirsch classification: An alternative to the
mask that detects edges at 45 degrees is more common mean curvature and
[ SEU:2.3.4]: Gaussian curvature 3D
surface shape classification labels.
3 5 5 Koenderinks scheme decouples the two
3 0 5 intrinsic shape parameters into one
3 3 3 parameter (S) that represents the
local surface shape (including
cylindrical, hyperbolic, spherical and
knowledge-based vision: A style of planar) and a second parameter (C)
image interpretation that relies on that encodes the magnitude of the
multiple processing components capable curvedness of the shape. The shape
of different image analysis processes, classes represented in Koenderinks
some of which may solve the same task classification scheme are illustrated:
in different ways. Linking the
components together is a reasoning
algorithm that knows about the
capabilities of the different components,
130 K
adjacent gradient values that lie across computation for the iterative and
(as contrasted with along) the edge to sorting algorithms but can be more
be set to zero. [ RN:6.2] robust to outliers than the
least mean square estimator .
Laws texture energy measure: A [ JKS:13.6.3]
measure of the amount of image
intensity variation at a pixel. The least square curve fitting: A
measure is based on 5 one dimensional least mean square estimation process
finite difference masks convolved that fits a parametric curve model or a
orthogonally to give 25 2D masks. The line to a collection of data points,
25 masks are then convolved with the usually 2D or 3D. Fitting often uses the
image. The outputs are smoothed Euclidean , algebraic or
nonlinearly and then combined to give Mahalanobis distance to evaluate the
14 contrast and rotation invariant goodness of fit. Here is an example of
measures. [ PGS:4.6] least square ellipse fitting
[ FP:15.2-15.3]:
least mean square estimation: Also
known as least square estimation or
mean square estimation. Let ~v be the
parameter vector that we are searching
for and ei (~v ) be the error meaasure
associated with the ith of N data items.
The error measure often used is the
Euclidean , algebraic or
Mahalanobis distance between the ith
data item and a curve or surface being
fit, that is parameterized by ~v . Then
the mean square error is: least square estimation: See
N
least mean square estimation .
1 X 2
ei (~v )
N i=1 least squares fitting: A general term
for a least mean square estimation
The desired parameter vector ~v process that fits some parametric
minimizes this sum. shape, such as a curve or surface , to a
[ WP:Least squares] collection of data. Fitting often uses
the Euclidean , algebraic or
least median of squares estimation: Mahalanobis distance to evaluate the
Let ~v be the parameter vector that we goodness of fit. [ BB:A1.9]
are searching for and ei (~v ) be the error
associated with the ith of N data items. least square surface fitting: A
The error measure often used is the least mean square estimation process
Euclidean , algebraic or that fits a parametric surface model to
Mahalanobis distance between the ith a collection of data points, usually
data item and a curve or surface being range data. Fitting often uses the
fit that is parameterized by ~v . Then the Euclidean , algebraic or
median square error is the median or Mahalanobis distance to evaluate the
middle value of the sorted set {ei (~v )2 }. goodness of fit. The range image (below
The desired parameter vector ~v left) has planar and cylindrical surfaces
minimizes this median value. This fitted to the data (below right).
estimator usually requires more [ JKS:3.5]
L 135
+Y
+Z (INTO PAGE) SCENE
OBJECT
d H
OPTICAL
h AXIS
D
+X IMAGE
PLANE LENS
line spread function: The line spread linear features: A general term for
function describes how an ideal features that are locally or globally
infinitely thin line would be distorted straight, such as lines or straight
after passing through an optical system. edges.
Normally, this can be computed by
integrating the point spread functions linear filter: A filter whose output is a
of an infinite number of points along weighted sum of its inputs, i.e., all
the line. [ EH:11.3.5] terms in the filter are either constants
or variables. If {xi } are the inputs
line thinning: See thinning . (which may be pixel values from a
[ JKS:2.5.11] local neighborhood or pixel values from
the same position in different images of
linear: 1) Having a line-like form. 2) A the same scene, etc.), then the linear
mathematical description for a process filter output would be
P
ai xi + a0 , for
in which the relationship between some some constants ai . [ FP:7]
input variables ~x and some output
variables ~y is given by ~y = A~x where A linear regression: Estimation of the
is a matrix. [ BKPH:6.1] parameters of a linear relationship
between two random variables X and Y
linear array sensor: A solid-state or given sets of samples ~xi and ~yi . The
semiconductor (e.g., CMOS) sensor in objective is to estimate the matrix A
which all of the photosensitive elements and vector ~a that minimize the residual
are in a single 1D line. Typical linear r(A, ~a) = i k~yi A~xi ~ak2 . In this
P
array sensors have between 32 and 8192 form, the ~xi are assumed to be
elements and are used in line scan noise-free quantities. When both
cameras. variables are subject to error,
orthogonal regression is preferred.
linear discriminant analysis: See
[ WP:Linear regression]
linear discriminant function .
[ SB:11.6] linear transformation: A
mathematical transformation of a set of
linear discriminant function:
values by addition and multiplication
Assume a feature vector ~x based on
by constants. If the set of values is a
observations of some structure.
vector ~x, the general linear
(Assume that the feature vector is
transformation produces another vector
augmented with an extra term with
~y = A~x, where ~y need not have the
value 1.) A linear discriminant function
same dimension as ~x and A is a
is a basic classification process that
constant matrix (i.e., is not a function
determines which of two classes or cases
of ~x). [ SQ:2.2.1]
the structure belongs to based on the
sign of the Plinear function lip shape analysis: An application of
l = ~a ~x = ai xi , for a given coefficient computer vision to understanding the
vector ~a. For example, to discriminate position and shape of human lips as
between unit side squares and unit part of face analysis . The goal might
diameter circles based on the area A, be face recognition or
the feature vector is ~x = (A, 1) and the expression understanding .
coefficient vector ~a = (1, 0.89) . If
l > 0, then the structure is a square, lip tracking: An application of
otherwise a circle. [ SB:11.6] computer vision to following the
140 L
larger and smaller values of this calculus. For example, a square can be
variance. Large values of this property defined as: square(s)
occurs in highly textured or varying polygon(s) & number of sides(s, 4)
areas. & e1 e2 (e1 6= e2 &
side of (s, e1 ) & side of (s, e2 )
log-polar image: An & length(e1 ) = length(e2 )
image representation in which the & (parallel(e1 , e2 )
pixels are not in the standard Cartesian | perpendicular(e1 , e2 ))) .
layout but instead have a space varying
layout. In the log-polar case, the image long baseline stereo: See
is parameterized by a polar coordinate wide baseline stereo .
and a radial coordinate r. However,
unlike polar coordinates , the radial long motion sequence: A
distance increases exponentially as r video sequence of more than just a few
grows. The mapping from position frames in which there is significant
(, r) to Cartesian coordinates is camera or scene motion. The essential
r r
( cos(), sin()), where is some idea is that the 3D scene structure can
design parameter. Further, the amount be inferred by effectively a stereo vision
of area of the image plane represented process. Here the matched
by each pixel grows exponentially with image features can be tracked through
r, although the precise pixel size the sequence, instead of having to solve
depends on factors like amount of pixel the stereo correspondence problem . If
overlap, etc. See also foveal image . a long sequence is not available, then
The receptive fields of a log-polar analysis could use optical flow or
image (courtesy of Herman Gomes) can short baseline stereo .
be seen in the outer rings of:
look-up table: Given a finite set of
input values { xi } and a function on
these values, f (x), a look-up table
records the values
{ (xi , f (xi )) } so that the value of the
function f () can be looked up directly
rather than recomputed each time.
Look-up tables can be easily used for
color remapping or standard functions
of integer pixel values (e.g., the
logarithm of a pixels value).
[ BKPH:10.14]
Chain are the possible configurations of type style, or a particular face viewed
the problem. at the right scale. It is similar to
[ WP:Markov chain Monte Carlo] template matching except the matched
filter can be tuned for spatially
Markov random field (MRF): An separated patterns. This is a
image model in which the value at a signal processing term imported into
pixel can be expressed as a linear image processing . [ AJ:9.12]
weighted sum of the values of pixels in
a finite neighborhood about the matching function: See
original pixel plus an additive random similarity metric . [ DH:6.7]
noise value. [ JKS:7.4]
matching method: A general term
Marrs theory: A shortened term for for finding the correspondences between
Marrs theory of the human vision two structures (e.g., surface matching )
system. Some of the key stages in this or sets of features (e.g.,
integrated but incomplete theory are stereo correspondence ). [ JKS:15.5.2]
the raw primal sketch ,
full primal sketch , 2.5D sketch and 3D mathematical morphology
object recognition . [ BT:11] operation: A class of mathematically
defined image processing operations in
MarrHildreth edge detector: An which the result is based on the spatial
edge detector based on multi-scale pattern of the input data values rather
analysis of the zero-crossings of the than values themselves. For example, a
Laplacian of Gaussian operator . morphological line thinning algorithm
[ NA:4.3.3] would identify places in an image where
a line description was represented by
mask: A term for an m n array of data more than 1 pixel wide (i.e., the
numbers or symbolic labels. A mask pattern to match). As this is
can be the smoothing mask used in a redundant, the thinning algorithm
convolution , the target in a would chose one of the redundant pixels
template matching or the kernel used to be set to 0. Mathematical
in a mathematical morphology morphology operations can apply to
operation, etc. Here is a simple mask both binary and gray scale images .
for computing an approximation to the This figure shows a small image patch
Laplacian operator [ TV:3.2]: image before and after a thinning
operation [ SQ:7]
MEAN CURVATURE
GAUSSIAN CURVATURE - 0 +
motion discontinuity: When the only linear motion along the optical
smooth motion of either the camera or axis with constant velocity. Another
something in the scene changes, such example might allow velocities and
as the speed or direction of motion. accelerations in any direction, but
Another form of motion discontinuity is occasionally discontinuities, such as for
between two groups of adjacent pixels a bouncing ball. [ BB:7]
that have different motions. motion representation: See
motion estimation: Estimating the motion model. [ BB:7]
motion direction and speed of the motion segmentation: See
camera or something in the scene . motion layer segmentation . [ TV:8.6]
[ RJS:5]
motion sequence analysis: The class
motion factorization: Given a set of of computer vision algorithms that
tracked feature points through an process sequences of images captured
image sequence , a measurement matrix close together in space and time,
can be constructed. This matrix can be typically by a moving camera. These
factored into component matrices that analyses are often characterized by
represent the shape and 3D motion of assumptions on temporal coherence
the structure up to an 3D that simplify computation. [ BB:7.3]
affine transform (which is removable
using knowledge of the motion smoothness constraint: The
intrinsic camera parameters ). assumption that nearby points in the
image have similar motion directions
motion field: The projection of the and speeds, or similar optical flow .
relative motion vector for each scene
158 M
B
multi-level: See multi-scale method.
multi-scale integration: 1)
Combining information extracted by
using operators with different scales . multi-scale representation: A
2) Combining information extracted representation having image features
from registered images with different or descriptions that belong to two or
scales. These two definitions could just more scales . An example might be
be two ways of considering the same zero crossings detected from
process if the difference in operator intensity images that have received
scale is only a matter of the amount of increasing amounts of
smoothing . An example of multi-scale Gaussian smoothing . A multi-scale
integration occurs combining edges model representation might represent
extracted from images with different an arm as a single generalized cylinder
amounts of smoothing to produce more at a coarse scale, two generalized
reliable edges. cylinders at an intermediate scale and
with a surface triangulation at a fine
multi-scale method: A general term scale. The representation might have
for a process that uses information results from several discrete scales or
obtained from more than one scale of from a more continuous range of scales,
image. The different scales might be as in a scale space . Below are zero
obtained by reducing the image size or crossings found at two scales of
by Gaussian smoothing of the image. Gaussian blurring.
Both methods reduce the [ WP:Scale space#Related multi-
spatial frequency of the information. scale representations#Related multi-
The main reasons for multi-scale scale representations]
M 161
G
multi-sensor geometry: The relative
placement of a set of sensors or
multiple views from a single sensor but
from different positions. One key
consequence of the different placements B
is ability to deduce the 3D structure of
the scene. The sensors need not be the
same type but usually are for
multi-spectral segmentation:
convenience. [ FP:11.4]
Segmentation of a
multi-spectral analysis: Using the multi-spectral image. This can be
observed image brightness at different addressed by segmenting the image
wavelengths to aid in the understanding channels individually and then
of the observed pixels. A simple version combining the results, or alternatively
uses RGB image data. Seven or more the segmentation can be based on some
bands, including several infrared combination of the information from
wavelengths are often used for satellite the channels.
remote sensing analysis. Recent [ WP:Multispectral segmentation]
hyperspectral sensors can give
multi-spectral thresholding: A
measurements at 100200 different
segmentation technique for
wavelengths. [ SQ:17.1]
multi-spectral image data. A common
multi-spectral image: An image approach is to threshold each spectral
containing data measured at more than channel independently and then
one wavelength. The number of logically AND together the resulting
wavelengths may be as low as two (e.g., images. An alternative is to cluster
some medical scanners), three (e.g., pixels in a multi-spectral space and
RGB image data), or seven or more choose thresholds that select desired
bands, including several infrared clusters. The images below show a
wavelengths (e.g., satellite colored image first thresholded in the
remote sensing ). Recent blue channel (0100 accepted) and then
hyperspectral sensors can give ANDed with the thresholded green
measurements at 100200 different channel (0100 accepted).
wavelengths. The typical
image representation uses a vector to
record the different spectral
measurements at each pixel of an image
array. The following image shows the
162 M
WHITE
A information
M I(A, B) = H(B) H(B|A) where
CE
A
H(x) is the entropy. [ CS:6.3.4]
RF
BR
SU
EN
O
GRE
EN
W
mutual interreflection: See
RE
G
RED SURFACE mutual illumination .
mutual information: The amount of
information two pieces of data (such as
N
f~(W; ~x).Typically, a neural network is noise: A general term for the deviation
trained to predict the relationship of a signal away from its true value.
between the ~xs and ~y s of a given In the case of images , this leads to pixel
collection of training examples . values (or other measurements) that are
Training means setting the weights different from their expected values.
matrices Pto minimize the training error The causes of noise can be random
e(W) = i d(~yi , f~(W; ~xi )) where d factors, such as thermal noise in the
measures distance between the network sensor, or minor scene events, such as
output and a training example. dust or smoke. Noise can also represent
Common choices for d(~y , ~y ) include the systematic, but unmodeled, events such
2-norm k~y ~y k2 . [ FP:22.4] as short term lighting variations or
quantization . Noise might be reduced
Newtons optimization method: To or removed using a noise reduction
find a local minimum of function method. Here are images without and
f : Rn 7 R from starting position ~x0 . with salt-and-pepper noise . [ TV:3.1]
Given the functions gradient f and
Hessian H evaluated at ~xk , the Newton
update is ~xk+1 = ~xk H1 f . If f is a
quadratic form then a single Newton
step will directly yield the global
minimum. For general f , repeated
Newton steps will generally converge to
a local optimum. [ FP:3.1.2]
noise reduction: An
image processing method that tries to
N 167
NON-ACCIDENTAL TERMINATION
NON-ACCIDENTAL
PARALLELISM
non-maximal suppression: A
technique for suppressing multiple non-rigid registration: The problem
responses (e.g., high values of of registering, or aligning, two shapes
gradient magnitude ) representing a that can take on a variety of
single edge or other feature. The configurations (unlike rigid shapes).
resulting edges should be a single pixel For instance, a walking person, a fish,
wide. [ JKS:5.6.1] and facial features like mouth and eyes
are all non-rigid objects, the shape of
non-parametric clustering: A data which changes in time. This type of
clustering process such as registration is frequently needed in
k-nearest neighbor that does not medical imaging as many human body
assume an underlying probability parts deform. Non-rigid registration is
distribution. considerably more complex than rigid
registration. See also alignment ,
non-parametric method: A registration , rigid registration .
probabilistic method used when the
form of the underlying probability non-rigid tracking: A tracking
distribution is unknown or multi-modal. process that is designed to track
Typical applications are to estimate the non-rigid objects . This means that it
a posteriori probability of a can cope with changes in actual object
classification given an observation. shape as well as apparent shape due to
Parzen windows or k-nearest neighbor perspective projection and observer
classifiers are often used. viewpoint .
[ WP:Non-parametric statistics]
non-symbolic representation: A
non-rigid model representation: A model representation in which the
model representation where the shape appearance is described by a numerical
of the model can change, perhaps under or image-based description rather than
the control of a few parameters. These a symbolic or mathematical description.
models are useful for representing For example, non-symbolic models of a
objects whose shape can change, such line would be a list of the coordinates of
as moving humans or biological the points in the line or an image of the
specimens. The differences in shape line. Symbolic object representations
may occur over time or be between include the equation of the line or the
different instances. Changes in endpoints of the line.
apparent shape due to
perspective projection and observer normal curvature: A plane that
viewpoint are not relevant here. By contains the surface normal ~n at point
contrast, a rigid model would have the p~ to a surface intersects that surface to
same actual shape irrespective of the form a planar curve that passes
viewpoint of the observer. through p~. The normal curvature is the
curvature of at p~. The intersecting
non-rigid motion: A motion of an plane can be at any specified
object in the scene in which the shape orientation about the surface normal.
of the object also changes. Examples See [ JKS:13.3.2]:
include: 1) the position of a walking
persons limbs and 2) the shape of a
beating heart. Changes in apparent
shape due to perspective projection
and viewpoint are not relevant here.
N 169
NP-complete: A concept in
computational complexity covering a
normal distribution: See special set of problems. All of these
Gaussian distribution . [ AJ:2.9] problems currently can be solved, in the
worst case, in time exponential O(eN )
normal flow: The component of
in the number or size N of their input
optical flow in the direction of the
data. For the subset of exponential
intensity gradient . The orthogonal
problems called NP-complete, if an
component is not locally observable
algorithm for one could be found that
because small motions orthogonally do
executes in polynomial time O(N p ) for
not change the appearance of local
some p, then a related algorithm could
neighborhoods.
be found for any other NP-complete
normalized correlation: 1) An image algorithm. [ SQ:12.5]
or signal similarity measure that scales
NTSC: National Television System
the differences between the signals by a
Committee. A television signal
measure of the average signal strength:
recording system used for encoding
2 video data at approximately 60 video
P
(xi y i )
p Pi fields per second. Used in the USA,
2
P 2
( i xi )( i yi )
Japan and other countries. [ AJ:4.1]
This scales the difference so that it is
less significant if the inputs are larger. nuclear magnetic resonance
The similarities lie in the range [0,1], (NMR): An imaging technique based
where 0 is most similar. 2) A statistical on magnetic properties of the atomic
cross correlation process where the nuclei. Protons and neutrons within
correlation coefficient is normalized to atomic nuclei generate a magnetic
lie in the range [ 1,1], where 1 is dipole that can respond to an external
most similar. In the case of two scalar magnetic field. Several properties
variables, this means dividing by the related to the relaxation of that
standard deviations of the two magnetic dipole give rise to values that
variables. [ RJS:6] depend on the tissue type, thus allowing
identification or at least visualization of
NOT operator: See invert operator . the different soft tissue types. The
[ SB:3.2.2] measurement of the signal is a way of
measuring the density of certain types
novel view synthesis: A process of atoms, such as hydrogen in the case
whereby a new view of an object is of biological NMR scanners. This
synthesized by combining information technology is used for medical body
from several images of the object from
170 N
plane is the 3D scene plane where all which images are being supplied. See
points are exactly in focus on the also observer motion estimation .
image plane (assuming a perfect lens [ WP:Observer]
and the optical axis is perpendicular to
the image plane). The object plane is observer motion estimation: When
illustrated here: an observer is moving, image data of
[ WP:Microscopy#Oblique illumination] the scene provides optical flow or
trackable scene feature points . These
allow an estimate of how the observer is
LENS
moving relative to the scene, which is
useful for navigation control and
OPTICAL AXIS
position estimation. [ BKPH:17.1]
IMAGE PLANE OBJECT PLANE
obstacle detection: Using visual data
object recognition: A general term to detect objects in front of the
for identifying which of several (or observer, usually for mobile robotics
many) possible objects is observed in applications.
an image. The process may also include
computing the objects image or scene Occams razor: An argument
position , or labeling the image pixels attributed to William of Occam
or image features that belong to the (Ockham), an English nominalist
object. [ FP:21.4] philosopher of the early fourteenth
century, stating that assumptions must
object representation: An encoding not be needlessly multiplied when
of an object into a form suitable for explaining something (entia non sunt
computer manipulation. The models multiplicanda praeter necessitatem).
could be geometric models , Often used simply to suggest that,
graph models or appearance models , other conditions being equal, the
as well as other forms. [ JKS:15.3] simplest solution must be preferred.
Notice variant spelling Ockham. See
object verification: A component of also minimum description length .
an object recognition process that [ WP:Occams razor]
attempts to verify a hypothesized
object identity by examining evidence. occluding contour: The visible edge
Commonly, geometric object models of a smooth curved surface as it bends
are used to verify that object features away from an observer . The occluding
are observed in the correct image contour defines a 3D space curve on
positions. [ FP:18.5] the surface, such that a line of sight
from the observer to a point on the
objective function: 1) The cost space curve is perpendicular to the
function used in an optimization surface normal at that point. The 2D
process. 2) A measure of the misfit image of this curve may also be called
between the data and the model. the occluding contour. The contour can
[ SQ:2.3] often be found by an edge detection
oblique illumination: See process. The cylinder boundaries on
low angle illumination . both the left and right are occluding
contours from our viewpoint [ FP:19.2]:
observer: The individual (or camera)
making observations. Most frequently
this refers to the camera system from
O 173
open operator: A
mathematical morphology operator
applied to a binary image . The
operator is a sequence of N erodes
followed by N dilates , both using a
specified structuring element . The
odd field: Standard interlaced video operator is useful for separating
transmits all of the even scan lines in touching objects and removing small
an image frame first and then all of the regions. The right image was created
odd lines. The set of odd lines is the by opening the left image with an
odd field. [ AJ:11.1] 11-pixel disk kernel [ SB:8.15]:
Focal Point
Optical Axis
optical center: See focal point . completely determine the image motion,
[ FP:1.2.2] as this has two degrees of freedom. The
equation provides only one constraint,
optical character recognition thus leading to an aperture problem .
(OCR): A general term for extracting [ WP:Optical flow#Estimation of the optical flow]
an alphabetic text description from an
image of the text. Common specialisms
include bank numerals, handwritten optical flow field: The field composed
digits, handwritten characters, cursive of the optical flow vector at each pixel
text, Chinese characters, Arabic in an image. [ FP:25.4]
characters, etc. [ JKS:2.7]
optical flow field segmentation:
optical flow: An instantaneous The segmentation of an optical flow
velocity measurement for the direction image into regions where the optical
and speed of the image data across the flow has a similar direction or
visual field. This can be observed at magnitude. The regions can arise from
every pixel, creating a field of velocity objects moving in different directions
vectors. The set of apparent motions of or surfaces at different depths. See also
the image pixel brightness values. optical flow boundary .
[ FP:25.4]
optical flow region: A region where
optical flow boundary: The the optical flow has a similar direction
boundary between two regions where or magnitude. Regions can arise from
the optical flow is different in direction objects moving in different directions,
or magnitude. The regions can arise or surfaces at different depths. See also
from objects moving in different optical flow boundary .
directions or surfaces at different
depths. See also optical flow smoothness constraint:
optical flow field segmentation . The The constraint that nearby pixels in an
dashed line in this image is the image usually have similar optical flow
boundary between optical flow moving because they usually arise from
left and right: projection of adjacent surface patches
having similar motions relative to the
observer . The constraint can be relaxed
at optical flow boundaries .
INLIERS
human body into limbs, head, and parameters, which is updated via a
trunk. Part segmentation methods dynamical model and observation
exist for both 2D and 3D data, that is, model to produce the new set
intensity images and range images , representing the posterior distribution.
respectively. Various geometric models See also condensation tracking .
have been adopted for the parts, e.g., [ WP:Particle filter]
generalized cylinders , superellipses ,
and superquadrics . See also particle segmentation: A class of
articulated object segmentation . techniques for detecting individual
[ BM:6.2.2] instances of small objects (particles)
like pebbles, cells, or water droplets, in
partially constrained pose: A images or sequences. A typical problem
situation whereby an object is subject is severe occlusion caused by
to a number of constraints restricting overlapping particles. This problem has
the number of admissible orientations been approached successfully with the
or positions, but not fixing one watershed transform .
univocally. For instance, cars on a road
are constrained to rotate around an particle tracking: See
axis perpendicular to the road. condensation tracking
based on only the magnitude (not the sensor, converting light to an electric
phase) of the Fourier transform . signal. [ WP:Photodiode]
[ WP:Phase retrieval]
photogrammetry: A research area
phase spectrum: The concerned with obtaining reliable and
Fourier transform of an image can be accurate measurements from
decomposed into its phase spectrum noncontact imaging, e.g., a digital
and its power spectrum . The phase height map from a pair of overlapping
spectrum is the relative phase offset of satellite images. Consequently, accurate
the given spatial frequency . camera calibration is a primary
[ EH:11.2.1] concern. The techniques used overlap
many typical of image processing and
phase unwrapping technique: The pattern recognition . [ FP:3.4]
process of reconstructing the true phase
shift from phase estimates wrapped photometric invariant: A feature or
into [, ] . The true phase shift characteristic of an image that is
values may not fall in this interval but insensitive to changes in illumination.
instead be mapped into the interval by See also invariant .
addition or subtraction of multiples of
2. The technique maximizes the photometric decalibration: The
smoothness of the phase image by correction of intensities in an image so
adding or subtracting multiples of 2 at that the same surface (at the same
various image locations. See also orientation) will give the same response
Fourier transform . regardless of the position in which it
[ WP:Range imaging#Interferometry] appears in the image.
a1 = A(x1 )
y1 = B(a1 )
a2 = A(x2 )
y2 = B(a2 )
a3 = A(x3 )
....
ai = A(xi )
yi = B(ai )....
PITCH
DIRECTION
a b intensity values. See also intensity ,
A= a rotation matrix and
c d intensity image , and intensity sensor .
~t = e
a translation vector. See pixel interpolation: See
f
also Euclidean , affine and image interpolation . [ WP:Pixelation]
holography transforms .
pixel jitter: A frame grabber must
pixel counting: A simple algorithm to estimate the pixel sampling clock of a
determine the area of an image region digital camera, i.e., the clock used to
by counting the numbers of pixels read out the pixel values, which is not
composing the region. See also region . included in the output signal of the
[ WP:Simulation cockpit#Aircraft Simpits]camera. Pixel jitter is a form of
image noise generated by time
variations in the frame grabbers
pixel division operator: An operator estimate of the cameras clock.
taking as input two gray scale images,
I1 and I2 , and returning an image I3 in pixel logarithm operator: An
which the value of each pixel is image processing operator taking as
I3 = I1 /I2 . input one gray scale image, I1 , and
returning an image I2 in which the
pixel exponential operator: A value of each pixel is
low-level image processing operator I2 = c logb (| I1 + 1 |). This operator is
taking as input one gray scale image, used to change the dynamic range of
I1 , and returning an image I2 in which an image (see also
the value of each pixel is I2 = cbI1 . contrast enhancement ), such as for the
This operator is used to change the enhancement of the magnitude of the
dynamic range of an image. The value Fourier transform . The base b of the
of the basis b depends on the desired logarithm function is often e, but it
degree of compression of the dynamic does not actually matter because the
range. c is a scaling factor. See also relationship between logarithms of any
logarithmic transformation , two bases is only one of scaling . See
pixel logarithm operator . The right also pixel exponential operator . The
image is 1.005 raised to the pixel values right image is the scaled logarithm of
of the left image: the pixel values of the left image
[ SB:3.3.1]:
and middle images (scaled by 255 for planar patch extraction: The
contrast here) [ SB:3.2.1.2]: problem of finding planar regions, or
patches, most commonly in
range images . Plane extraction can be
useful, for instance, in
3D pose estimation , as several
model-based matching techniques yield
higher accuracy with planar than
non-planar surfaces.
~y = A(~x
~)
so that
~x = A1 ~y +
~
Usually only a subset of the
components of ~y is sufficient to
approximate ~x. The elements of this
subset correspond to the largest
primal sketch: A representation for
eigenvalues of the covariance matrix.
early vision introduced by Marr ,
See also
focusing on low-level features like
KarhunenLo`eve transformation .
edges. The full primal sketch groups
[ FP:22.3.1]
the information computed in the raw
primal sketch (consisting largely of principal component basis space:
edge, bar , end and blob feature In principal component analysis , the
information extracted from the images), space generated by the basis formed by
for instance by forming the eigenvectors, or eigendirections, of
subjective contours . See also the covariance matrix.
MarrHildreth edge detection and [ WP:Principal component analysis]
raw primal sketch . [ RN:7.2]
principal component
primary color: A color coding scheme representation: See
whereby a range of perceivable colors principal component analysis .
can be made by a weighted combination [ FP:22.3.1]
of primary colors. For example, color
television and computer screens use principal curvature: The maximum
red, green and blue lightemitting or minimum normal curvature at a
chemicals to produce these three surface point, achieved along a
primary colors. The ability to use only principal direction . The two principal
three colors to generate all others arises curvatures and directions, together
from the tri-chromacy of the human completely specify the local surface
eye, which has cones that respond to shape. The principal curvatures in the
three different color spectral ranges. two directions at the point X on the
See also additive and subtractive cylinder of radius r below are 0 (along
color. [ EH:4.4] axis) and 1r (across axis). [ JKS:13.3.2]
203
204 Q
quantization: See
spatial quantization . [ SEU:2.2.4]
Q 205
quasi-invariant: An approximation of
quantization error: The an invariant . For instance,
approximation error created by the quasi-invariant parameterizations of
quantization of a continuous variable, image curves have been built by
typically using a regularly spaced scale approximating the invariant arc length
of values. This figure with lower spatial derivatives.
[ WP:Quasi-invariant measure]
(that is, binary images in which each compression it will be hard to see the
pixel is assigned to black or white at structure in the pixels with the low
random), and a second image that is values. The left image shows the
derived from the first. This figure magnitude of a 2D Fourier transform
with a single bright spot in the middle.
The right image shows the logarithm of
the left image, revealing more details.
[ AJ:7.2]
recognition by
parts: See recognition by components.
[ WP:Object recognition (computer vision)#Recognition
recursive region by growing:
parts] A class of
recursive algorithms for region growing
. An initial pixel is chosen. Given an
recognition by structural adjacency rule to determine the
decomposition: See neighbors of a pixel, (e.g., 8-adjacency),
recognition by components . the neighboring pixels are explored. If
reconstruction: The problem of any meets the criteria for addition to
computing the shape of a 3D object or the region, the growing procedure is
surface from one or more intensity or called recursively on that pixel. The
range images. Typical techniques process continues until all connected
include model acquisition and the image pixels have been examined. See
also adjacent , image connectedness ,
212 R
MEDIUM 1
INCIDENT RAY REFLECTED RAY
MEDIUM 2
REFRACTED RAY
B B
A A D
C D C
region boundary extraction: The merged into the region when the data
problem of computing the boundary of are consistent with the previous region.
a region, for example, the contour of a The region is often redescribed after
region in an intensity image after each new set of data is added to it.
color based segmentation . Many region growing algorithms have
the form: 1) Describe the region based
region decomposition: A class of on the current pixels that belong to the
algorithms aiming to partition an image region (e.g., fit a linear model to the
or region thereof into regions . See also intensity distribution). 2) Find all
region based segmentation . [ JKS:3.2] pixels adjacent to the current region. 3)
Add an adjacent pixel to the region if
region descriptor: 1) One or more
the region description also describes
properties of a region, such as
this pixel (e.g., it has a similar
compactness or moments . 2) The data
intensity). 4) Return to step 1 as long
structure containing all data pertaining
as new pixels continue to be added. A
to a region . For instance, for image
similar algorithm exists for region
regions this could include the regions
growing with 3D points, giving a
position in the image (e.g., the
surface fitting . The data points could
coordinates of the center of mass ), the
come from a regular grid (pixel or
regions contour (e.g., a list of 2D
voxel) or from an unstructured list. In
coordinates), some indicator of the
the latter case, it is harder to determine
region shape (e.g., compactness or
adjacency. [ JKS:3.5]
perimeter squared over area), and the
value of the regions homogeneity index. region identification: A class of
[ NA:7.3] algorithms seeking to identify regions
with special properties, for instance, a
region detection: A vast class of
human figure in a surveillance video, or
algorithms seeking to partition an
road vehicles in an aerial sequence.
image into regions with particular
Region identification covers a very wide
properties. See for details
area of techniques spanning many
region identification , region labeling ,
applications, including remote sensing ,
region matching ,
visual surveillance , surveillance , and
region based segmentation .
agricultural and forestry surveying. See
[ SOS:4.3.2]
also target recognition ,
region filling: A class of algorithms automatic target recognition (ATR),
assigning a given value to all the pixels binary object recognition ,
in the interior of a closed contour object recognition , pattern recognition
identifying a region . For instance, one .
may want to fill the interior of a closed
region invariant: 1) A property of a
contour in a binary image with zeros or
region that is invariant (does not
ones. See also morphology ,
change) after some transformation is
mathematical morphology ,
applied to the region, such as
binary mathematical morphology .
translation , rotation or
[ SOS:4.3.2]
perspective projection . 2) A property
region growing: A class of algorithms or function which is invariant over a
that construct a connected region by region .
incrementally expanding the region,
usually at the boundary . New data are
R 215
region thereof, into parts (subregions) if where that functionality did not exist.
a given homogeneity criterion is not [ WP:Regression analysis]
satisfied over the region. See also
region , region based segmentation , regularization: A class of
region merging . [ RJS:6] mathematical techniques to solve an
ill-posed problem . In essence, to
registration: A class of techniques determine a single solution, one
aiming to align , superimpose, or match introduces the constraint that the
two objects of the same kind (e.g., solution must be smooth, in the
images, curves, models); more intuitive sense that similar inputs must
specifically, to compute the geometric correspond to similar outputs. The
transformation superimposing one to problem is then cast as a
the other. For instance, image variational problem , in which the
registration determines the region variational integral depends both on the
common to two images, thereby finding data and on the smoothness constraint.
the planar transformation (rotation and For instance, a regularization approach
translation) aligning them; similarly, to the problem of estimating a function
curve registration determines the f from a set of values y1 , y2 , . . . , yn at
transformation aligning the similar (or the data point ~x1 , . . . , ~xn , leads to the
same) part of two curves. This figure minimization of the functional
N
X
H(f ) = (f (~xi ) yi )2 + (f )
i=1
G
components, Gx and Gy ,q
for each pixel. arctan Gxy can then be estimated as for
The gradient magnitude G2x + G2y any 2D vector. See also edge detection ,
G
Roberts cross gradient operator ,
and orientation arctan Gxy can then be Sobel gradient operator , Sobel kernel ,
estimated as for any 2D vector. See also Canny edge detector ,
edge detection , Canny edge detector , Deriche edge detector,
Sobel gradient operator , Sobel kernel , Hueckel edge detector ,
Deriche edge detector, Kirsch edge detector ,
Hueckel edge detector , MarrHildreth edge detector ,
Kirsch edge detector , OGorman edge detector . [ SEU:2.3.5]
MarrHildreth edge detector ,
OGorman edge detector ,
Robinson edge detector . [ JKS:5.2.1] robust: A general term referring to a
technique which is insensitive to noise
Roberts kernel: A pair of kernels, or or other perturbations. [ FP:15.5]
masks, used to estimate perpendicular
components of the image gradient robust estimator: A statistical
within the estimator which, unlike normal
Roberts cross gradient operator : least square estimators , is not
distracted by even significant
percentages of outliers in the data.
0 1 1 0 Popular robust estimators in computer
vision include RANSAC ,
least median of squares , and
1 0 0 1 M-estimators . See also
outlier rejection. [ FP:15.5]
The masks respond maximally to edge
oriented to plus or minus 45 degrees robust regression: A form of
from the vertical axis of the image. regression that does not use outlier
[ JKS:5.2.1] values in computing the fitting
parameters. For example, if doing a
Robinson edge detector: An least square straight line fit to a set of
operator for edge detection, computing data, normal regression methods use all
an estimate of the directional first data points, which can give distorted
derivatives of the image in eight results if even one point is very far away
directions. The image is convolved with from the true line. Robust processes
the eight kernels, three of which as either eliminate these outlying points or
shown here reduce their contribution to the results.
The figure below shows a rejected
1 1 1 1 1 1 1 1 1
outlying point [ JKS:6.8.3]:
1 2 1 1 2 1 1 2 1
1 1 1 1 1 1 1 1 1
REJECTED OUTLIER
Two of these, typically those
responding maximally to differences
along the coordinate axes, can be taken
as estimates of the two components of
the gradient,
qGx and Gy . The gradient INLIERS
magnitude G2x + G2y and orientation
R 221
ROLL
DIRECTION
224
S 225
scene coordinates: A 3D
coordinate system that describes the screw motion: A 3D transformation
position of scene objects relative to a comprising a rotation about an axis ~a
given coordinate system origin. and translation along ~a. The general
Alternative coordinate systems are Euclidean transformation ~x 7 R~x + ~t is
camera coordinates , a screw transformation if R~t = ~t.
viewer centered coordinates or [ VSN:8.2.1]
object centered coordinates .
search tree: A data structure that
[ JKS:1.4.2]
records the choices that could be made
scene labeling: The problem of in a problemsolving activity, while
identifying scene elements from image searching through a space of alternative
data, associating them to labels choices for the next action or decision.
representing their nature and roles. See The tree could be explicitly created or
also labeling problem, region labeling , be implicit in the sequence of actions.
relaxation labeling , For example, a tree that records
image interpretation , alternative model-to-data feature
scene understanding . [ BB:12.4] matching is a specialized search tree
called an interpretation tree . If each
scene reconstruction: The problem non-leaf node has two children, we have
of estimating the 3D geometry of a a binary search tree. See also
scene, for example the shape of visible decision tree , tree classifier .
surfaces or contours, from image data. [ DH:12.4.1]
See also reconstruction ,
shape from contour and the following SECAM: SECAM (Sequential Couleur
shape from entries or avec Memoire) is the television
architectural model , volumetric broadcast standard in France, the
, surface and slice based reconstruction. Middle East, and most of Eastern
[ WP:Computer vision#Scene reconstruction] Europe. SECAM broadcasts 819 lines
per second. It is one of three main
television standards throughout the
scene understanding: The problem world, the other two being PAL (see
of constructing a semantic PAL camera ) and NTSC . [ AJ:4.1]
interpretation of a scene from image
data, that is, describing the scene in second derivative operator: A
terms of object identities and linear filter estimating the second
relationships among objects. See also derivative from an image at a given
image interpretation , point and in a given direction.
object recognition , Numerically, a simple approximation of
symbolic object representation , the second derivative of a 1D function f
semantic net , graph model , is the central (finite) difference, derived
relational graph. from the Taylor approximation of f :
fi+1 2fi + fi1
SCERPO: Spatial Correspondence, fi = + O(h)
Evidential Reasoning and Perceptual h2
Organization. A well known vision where h is the sampling step (assumed
system developed by David Lowe that constant), and O(h) indicates that the
demonstrated recognition of complex truncation error vanishes as h. A
polyhedral objects (e.g., razors) in a similar but more complicated
complex scene. approximation exists for estimating the
S 229
SIMD: See
single instruction multiple data . simple lens: A lens composed by a
[ RJS:8] single piece of refracting material,
shaped in such a way to achieve the
similarity: The property that makes desired lens behavior. For example, a
two entities (images, models, objects, convex focusing lens. [ BKPH:2.3]
features, shape, intensity values, etc.)
or sets thereof similar, that is, simulated annealing: A
resembling each other. A coarse-to-fine, iterative optimization
similarity transformation creates algorithm. At each iteration, a
perfectly similar structures and a smoothed version of the energy
similarity metric quantifies the degree landscape is searched and a global
of similarity of two possibly minimum located by a statistical (e.g.,
non-identical structures. Examples of random) process. The search is then
similar structures are 1) two polygons performed at a finer level of smoothing,
identical except for a change in size, and so on. The idea is to locate the
and 2) two image neighborhoods whose basin of the absolute minimum at
intensity values are identical except for coarse scales, so that fine-resolution
scaling by a multiplicative factor. The search starts from an approximate
concept of similarity lies at the heart of solution close enough to the absolute
several classic vision problems, minimum to avoid falling into
including stereo correspondence , surrounding local minima. The name
image matching , and derives from the homonymous
geometric model matching . procedure for tempering metal, in
[ JKS:14.3] which temperature is lowered in stages,
each time allowing the material to
similarity metric: A metric reach thermal equilibrium. See also
quantifying the similarity of two coarse-to-fine processing . [ SQ:2.3.3]
entities. For instance, cross correlation
is a common similarity metric for image single instruction multiple data
regions. For similarity metrics on (SIMD): A computer architecture
specific objects encountered in vision, allowing the same instruction to be
see feature similarity , graph similarity , simultaneously executed on multiple
gray scale similarity . See also processors and thus different portions of
point similarity measure , matching . the data set (e.g., different pixels or
[ DH:6.7] image neighborhoods). Useful for a
variety of low-level image processing
similarity transformation: A operations. See also MIMD ,
transformation changing an object into pipeline parallelism , data parallelism ,
a similar-looking one; formally, a parallel processing. [ RJS:8]
conformal mapping preserving the ratio
of distances (the magnification ratio). single photon emission computed
The transformation matrix, T, can be tomography (SPECT): A medical
1
written as T = B AB, where A and imaging technique that involves the
B are similar matrices, that is, rotation of a photon detector array
representing the same transformation around the body in order to detect
after a change of basis. Examples photons emitted by the decay of
include rotation, translation, expansion previously injected radionuclides. This
and contraction (scaling). [ SQ:9.1] technique is particularly useful for
creating a volumetric image showing
S 239
skew symmetry: A skew symmetric See also tilt , shape from texture .
contour is a planar contour such that [ FP:9.4.1]
every straight line oriented at an angle
with respect to a particular axis, slant normalization: A class of
called the skew symmetry axis of the algorithms used in handwritten
contour, intersects the contour at two character recognition, transforming
points equidistant from the axis. An slanted cursive character into vertical
example [ BB:9.5.4]: ones. See
handwritten character recognition,
optical character recognition .
of optimization are all issues that must may be computed. See also
be dealt with in an effective snake. fuzzy morphology .
[ TV:5.4]
soft morphology: See
SNR: See signal-to-noise ratio . soft mathematical morphology .
[ AJ:3.6]
soft vertex: A point on a polyline
Sobel edge detector: An whose connecting line segments are
edge detector based on the almost collinear. Soft vertices may arise
Sobel kernels . The edge magnitude from segmentation of a smooth curve
image E is the square root of the sum into line segments. They are called
of squares of the convolution of the soft because they may be removed if
image with horizontal and vertical the segments of the polyline are
Sobelpkernels, given by replaced by curve segments. [ JKS:6.6]
E = (Kx I)2 + (Ky I)2 . The
Sobel operator applied to the left image
gives the right image [ JKS:5.2.2]: solid angle: Solid angle is a property
of a 3D object: the amount of the unit
spheres surface that the objects
projection onto the unit sphere
occupies. The unit spheres surface area
is 4, so the maximum value of a solid
angle is 4 steradians [ FP:4.1.2]:
L Surface
normal
N
P
Surface
p
( || ~x p~ ||2 (~n (~x p~))2 , ~n (~x p~)). predicted at that point by a spline x (t)
The spin image is the histogram of all fitted to neighboring values. [ AJ:8.7]
of the (, ) values for the surface.
Each selected points p~ leads to a split and merge: A two-stage
different spin image. Matching points procedure for segmentation or
compares their spin images by clustering . The data is divided into
correlation. Key advantages of the subsets, with the initial division being a
representation are 1) it is independent single set containing all the data. In
of pose and 2) it avoids ambiguities of the split stage, subsets are repeatedly
representation that can occur with subdivided depending on the extent to
nearly flat surfaces. [ FP:21.4.2] which they fail to satisfy a coherence
criterion (for example, similarity of
splash: An invariant representation of pixel colors). In the merge stage, pairs
the region about a 3D point. It gives a of adjacent sets are found that, when
local shape representation useful for merged, will again satisfy a coherence
position invariant object recognition. criterion. Even if the coherence criteria
are the same for both stages, the merge
spline: 1) A curve ~c(t) defined as a stage may still find subsets to merge.
weightedPnsum of control points: [ VSN:3.3.2]
~c(t) = i=0 wi (t)~ pi , where the control
points are p~1...n and one weighting (or SPOT: Systeme Probatoire de
blending) function wi is defined for lObservation de la Terre. A series of
each control point. The curve may satellites launched by France that are a
interpolate the control points or common source of satellite images of
approximate them. The construction of the earth. SPOT-5 for example was
the spline offers guarantees of launched in May 2002 and provides
continuity and smoothness. With complete coverage of the earth every 26
uniform splines the weighting functions days. [ WP:SPOT (satellites)]
for each point are translated copies of
each other, so wi (t) = w0 (t i). The spot detection: An image processing
form of w0 determines the type of operation for locating small bright or
spline: for B-splines and Bezier curves, dark locations against contrasting
w0 (t) is a polynomial (typically cubic) backgrounds. The issues here are what
in t. Nonuniform splines reparameterize size of spot and amount of contrast.
the t axis, ~c(t) = ~c(u(t)) where u(t)
spur: A short segment attached to a
maps the integers k = 0..n to knot
more significant line or edge . Spurs
points t0..n with linear interpolation for
often arise when linear structures are
non-integer values of t. Rational splines
tracked through noisy data, such as by
with n-D control points are perspective
an edge detector . This figure shows
projections of normal splines with
some spurs [ SOS:5.2]:
(n + 1)-D control points.
2) Tensor-product splines define a 3D
surface ~x(u, v) as a product of splines
in u and v. [ JKS:6.7]
For non-steerable filters such as Gabor are relative orientation : the rotation
filters, the response must be computed and translation relating the two
at each orientation, leading to higher cameras. Achieved in several ways: 1)
computational complexity. conventional calibration of each camera
independently; 2) computation of the
steganography: Concealing of essential matrix or fundamental matrix
information in non-suspect carrier relating the pair, from which relative
data. For example, encoding orientation may be computed along
information in the low-order bits of a with one or two intrinsic parameters; 3)
digital image. [ WP:Steganography] for a rigid stereo rig, moving the rig
and capturing multiple image pairs.
step edge: 1) A discontinuity in image
[ TV:7.1.3]
intensity (compare with fold edge ). 2)
An idealized model of a step-change in stereo correspondence problem:
intensity. This plot of intensity I versus The key to recovering depth from stereo
X position shows an intensity step edge is to identify 2D image points that are
discontinuity in intensity I at a projections of the same 3D scene point.
[ JKS:5]: Pairs of such image points are called
correspondences. The correspondence
I problem is to determine which pairs of
image image points are
correspondences. Unfortunately,
matching features or image
neighborhoods is usually ambiguous,
leading to both massive amounts of
computation and many alternative
X solutions. To reduce the space of
a
matches, corresponding points are
steradian: The unit of solid angle . usually required to satisfy some
[ FP:4.1.2] constraints, such as having similar
orientation and contrast, local
stereo: General term for a class of smoothness, uniqueness of match. A
problems in which multiple images of powerful constraint is the epipolar
the same scene are used to recover a 3D constraint: from a single view, an image
property such as surface shape, point is constrained to lie on a 3D ray,
orientation or curvature. In binocular whose projection onto the second image
stereo, two images are taken from is an epipolar curve. For pinhole
different viewpoints allowing the cameras, the epipolar curve is a line.
computation of 3D structure. In This greatly reduces the space of
trifocal, trinocular and multiple-view potential matches. [ JKS:11.2]
stereo, three or more images are
available. In photometric stereo , the stereo convergence: The angle
viewpoint is the same, but lighting between the optical axes of two sensors
conditions are varied in order to in a stereo configuration:
compute surface orientation.
260
T 261
3. Image plane
moves
1. World point position can vary over time.
{
z z
( x , y ). The tilt angle may be defined
z z
as tan1 ( y / x ). [ FP:9.4.1]
274
V 275
vanishing point: The image of the where the truth term measures
point at infinity where two parallel 3D fidelity to the data and the beauty
lines meet. A pair of parallel 3D lines term is a regularizer. These can be seen
are represented as ~a + ~n and ~b + ~n. in a specific example: smoothing. In
The vanishingpoint is the image of the the conventional approach, smoothing
~n might be considered the result of an
3D direction . This sketch shows
0 algorithm: convolve the image with a
the vanishing points for a road and Gaussian kernel. In the variational
railroad [ TV:6.2.3]: approach, the smoothed signal P is the
signal that best trades off smoothness,
measured as R the square of the second
VANISHING POINT derivative (P (t))2 dt, and fidelity to
VANISHING LINE the data, measured as the squared
difference
R between the input and the
output (P (t) I(t))2 dt, with the
balance chosen by a parameter :
Z
E(P ) = (P (t) I(t))2 + (P (t))2 dt
viewing space: The set of all possible viewsphere: The set of camera
locations from which an object or scene positions from which an object can be
could be viewed. Typically these observed. If the camera is orthographic,
locations are grouped to give a set of the viewsphere is parameterized by the
typical or characteristic views of the 2D set of points on the 3D unit sphere.
object. If orthographic projection is At the camera position corresponding
used, then the full 3D space of views to a particular point on the viewsphere,
can be simplified to a viewsphere . all images of the object due to camera
rotation are related by a 2D-to-2D
viewpoint: The position and image transformation, i.e., no parallax
orientation of the camera when an effects occur. See aspect graph . The
image was captured. The viewpoint placement of a camera on the
may be expressed in viewsphere is illustrated here:
absolute coordinates or relative to some
arbitrary coordinate system, in which
case the relative position of the camera
and the scene (or other cameras) is the
relevant quantity.
seemingly normal room) and the Ponzo space given one or more images of it.
illusion: Solutions differ according to several
factors including the number of input
images (one, as in model based
pose estimation , multiple discrete
images, as in stereo vision , or video
sequences, as in motion analysis ), the a
priori knowledge assumed (i.e.,
camera calibration available or not, full
perspective or simplified projection
model, geometric model of target
available or not).
1 1 1 1
0 0 0 0
1 1 1 1
0 2 4 0 2 4 0 2 4 0 2 4
+ +
+ +
+
-
-
1,2 1,1 1,0 1,1 1,2
contain the given edge, as seen here descriptions of the surface between the
[ JKS:13.5.1]: edges, and in particular, does not
include information for hidden line
LINKED EDGES removal. This is a wire frame model of
a cube [ BT:8]:
FACE 2 FACE 1
CURRENT EDGE
LINKED EDGES
289
Y
290
Y 291