Artificial Vision For Robots
Artificial Vision For Robots
Artificial Vision For Robots
VISION
FOR
ROBOTS
ARTiFiCiAL
ViSiOn
FOR
ROBOTS
Edited by
Professor I Aleksander
&\
Kogan
Page
Chapters 2 to 10 first published in
Digital Systems for Industrial Automation
by Crane Russak & Company Inc, 3 East 44th Street
New York, NY 10017, United States of America
Copyright © Crane Russak & Company Inc 1981 and 1982
Copyright © Chapters I, 11 and 12 Igor Aleksander 1983
Softcover reprint of the hardcover 1st edition 1983
Chapter 1 Introduction 9
l. Aleksander
The Front End 12; Image Processing 13; Pattern
Recognition 13; Applications 14; Structure of the
Book 16; Future Directions for Robot Vision 21
Part I: Techniques
The three key words that appear in the title of this book need some
clarification.
First, how far does the word robot reach in the context of indus-
trial automation? There is an argument maintaining that this range
is not fixed, but increases with advancing technology. The most
limited definition of the robot is also the earliest. The history is
worth following because it provides a convincing backdrop to the
central point of this book: vision is likely to epitomize the technolo-
gical advance, having the greatest effect in enlarging the definition
and range of activity of robots.
In the mid 1950s it was foreseen that a purely mechanical arm-like
device could be used to move objects between two fixed locations.
This was seen to be cost-effective only if the task was to remain fixed
for some time. The need to change tasks and therefore the level of
programmability of the robot was a key issue in the broadening of
robot activities. Robots installed in industry in the early 1960s
derived their programmability from a device called apinboard. Ver-
tical wires were energized sequentially in time, while horizontal
wires, when energized, would trigger off elementary actions in the
manipulator arm. The task of reprogramming was a huge one, as
pins had to be reinserted in the board, connecting steps in time with
robot actions. Thousands of pins might have been involved, requir-
ing many hours, if not days, of work.
In the 1960s programmability became available through connec-
tions with large mainframe computers. Despite the awkwardness of
the umbilical connections that such a methodology demanded, large
motor car producers were not deterred from developing plans for
the deployment of armies of manipulator arms in the welding of cars
on automatic production lines. The automatic Fiat car assembly
plant in Turin was the product of such thinking.
In the meantime the initiative taken by the Digital Equipment
Corporation in putting on the market small machines, brought the
9
Artificial Vision for Robots
10
Introduction
take robots from the laboratory into the factory, whose activities
thrive on the sales of sightless robots wrote [1];
'To date, robots have largely been insensate, but roboticists are striving
to correct this deficiency. When robots do boast of sight and touch the
list of applications ... will merit a large supplement;'
11
Artificial Vision for Robots
the control program would know when to branch between one part
of the program and the other. Now imagine the presence of a vision
camera and a recognition system that can be pointed at conveyor
belt A. Clearly there would be no need for special jigs as the vision
system could not only identify the parts but also inform the manipu-
lator as to where they were placed. Then, armed with a paintbrush,
the robot could mark the parts appropriately without having to pick
them up. This illustrates the way in which the use of vision transfers
the development overheads from mechanical complexity and preci-
sion to visual processing which, as already stressed, benefits from
favourable microprocessing economics.
The need for good vision for robots thus becomes undeniable, but
is it technologically feasible?
12
Introduction
size from the full image. Also, some simple image processing tasks
can be executed in the frame store itself. At this stage it may be
worth explaining some of the terms used in robot vision studies.
Image Processing
Image processing has to be distinguished from pattern recognition.
Image processing implies a transformation of an image, whilst pat-
tern recognition makes a statement regarding the content of the
image.
Much can be achieved on the strength of image processing alone.
For example, some objects could be distinguished solely by meas-
urement of parameters such as area and perimeter of a silhouette-
type shape. Both of these tasks would fall under the heading image
processing.
Another typical image processing task is noise removal. Again,
this type of processing can be carried out by the frame store using
the following technique. A very small window (ie 3 x3 pixels) scans
the exterior of the entire image. A simple detection mask can be
made to remove any parts of the image that appear isolated.
Vision systems vary much in terms of how much reliance is placed
on image processing, and how much of a given method could be des-
cribed as pattern recognition. Examples of systems stressing each of
these two methodologies will be found in this book. Usually the
more general methods rely more heavily on pattern recognition,
whereas the more problem-oriented systems tend to use image pro-
cessing techniques.
Pattern Recognition
The history of pattern recognition dates back to the mid 1950s and is
rich with work ranging from the highly practical to the obstrusely
theoretical. The technical details concerning the major trends of this
branch of computer science or engineering are left for the main body
of this book (see Aleksander, Stonham and Wilkie, Chapter 10).
Here, it may be worth qualifying some of the issues that make pat-
tern recognition a central necessity for most robot systems. In pure
form, a pattern recognizer receives an image as input, and produces
a statement as to which of a limited number of objects the image
belongs. The two approaches employed are adaptive and a/gorith-
13
Artificial Vision for Robots
Applications
One way of discussing the applications of robot vision is to list those
things that clearly cannot be achieved by the sightless robot. This,
however, may be the wrong way to approach it. If it is assumed that
a robot has vision as a matter of course, it is soon discovered that the
robot is better able to carry out tasks that even sightless robots can
do, and does so more efficiently.
14
Introduction
15
Artificial Vision for Robots
trol. Bin picking, as the name implies, is the ability to pick up a part
from a jumble of equal parts lying heaped up in a bin. Depalletiza-
tion is a similar problem, but one in which the parts are arranged on
a pallet in a more ordered way. Quality control demands the separa-
tion of faulty or damaged parts from good ones, even if the nature
of the damage is very slight. These applications have one character-
istic in common, and that is the likelihood that the parts cannot be
seen by the vision system as silhouettes. It is for this reason that Part
III is devoted to a methodology aimed at solving such non-silhouette
problems, even though it has, as yet, not become industrial practice.
I. Techniques;
II. Applications;
III. Adaptive processing for vision.
16
Introduction
17
Artificial Vision for Robots
18
Introduction
III. Adaptive processing for vision: This section of the book con-
tains four papers describing one particular development at BruneI
University in England. It is an in-depth study of the WISARD sys-
tem (derived mnemonically from the names of the authors: WIlkie,
Stonham and Aleksander's Recognition Device). This is an adap-
tive, general purpose recognition device, which is taught its task by
an operator. The WISARD philosophy is somewhat different from
the pattern recognition strategies discussed in the rest of the book. It
is a special network of microcircuit memory units operating with a
frame store that derives an image from a television camera. The
system can equally well receive input from almost any imaging
device. It shows examples of images together with their desired
classifications, and subsequently classifies unknown images into the
taught classes. Being, in theory, a totally parallel system, it can pro-
vide up to two million decisions per second. In practice, such speeds
are excessive, particularly since imaging devices can currently
deliver only 50 images per second for standard TV cameras, increas-
ing to a few hundred images for solid-state cameras. It is for this
reason that the WISARD system is organized in a semi-parallel way
so that it operates faster than conventional computing systems with-
out the expense of total parallelism.
The first paper by Stonham (see Chapter 9) discusses the funda-
mental operation of memory networks, showing the way in which
their performance depends on the size of the network and other
important parameters. Some applications, both within and outside
robotics, are discussed to demonstrate the generality of the method.
The second paper (see Chapter 10) compares the WISARD meth-
odology to some of the other standard approaches, some of which
have been described in the earlier parts ofthis book. It is shown that
given special purpose construction of the computing equipment, the
parallel or semi-parallel systems as employed in the WISARD sys-
19
Artificial Vision for Robots
muscles in natural vision. This gives the robot eye the ability to 'look
around' objects, a helpful adjunct both in object recognition and
assembly strategy planning.
References
1. Engelberger, J. F. Robotics in Practice Kogan Page, 1980.
2. Coiffet, P. Robot Technology Volume 2: Interaction with the Environment
Kogan Page, 1983.
3. Linsay, P. H. and Norman, D. A. Human Information Processing: An Intro-
duction to Psychology Academic Press, New York, 1972.
21
Part I:
Techniques
Chapter 2
Software or Hardware for
Robot Vision?
J. D. DESSIMOS· and P. KAMMENOS
Laboratoire de traitement des signaux
Ecole Polytechnique Federale de Lausanne
Switzerland
Introduction
Industrial automation is a field of growing interest driven by the need for
breakthroughs in productivity. Many developments are being reported
where vision is included in the control loop that drives production tools,
e.g., [1, 2]. The Signal Processing Laboratory of the Swiss Federal
Institute of Technology in Lausanne has been studying a robot vision
system for the past few years. Several results on 2-D scene analysis have
been reported [3-12]. It is well known that for the same task, different
algorithms may be optimal (in speed, accuracy, and cost) depending on
the host equipment. Specialized hardware is always faster than a
general-purpose computer (GPC). However, such a computer is best
suited to simple algorithms.
The question usually arises very early for a given application whether
*J. D. Dessimoz is now with the University of Rhode Island, USA.
25
Artificial Vision for Robots
Figure 2. A gaussian filter (a) applied to the picture of Figure 1 leads to smooth
results (b). The iterative use (3 iterations) of a simple operator (c) applied to the
same picture leads to similar results (d)
27
Artificial Vision for Robots
If there is exactly one n-connected chain that contains all the non-zero
neighbors of P, then P can be removed.
28
Software or Hardware for Robot Vision?
The step towards the next edge element should be of minimal size in order to
avoid leaving the edge and jumping to another contour.
The angle defined by three consecutive tracker positions, past, present,
and future, is always maximal (minimal); this insures that the tracker does
not enter the contour, but keeps on tracking its "left" ("right") edge.
29
Artificial Vision for Robots
000 011
oP 0 oP 0
111 111
(a) (b)
(e)
30
Software or Hardware for Robot Vision?
18 9 15 A
17 8 2 5 14
10 3 8 8
18 7 ! 4 13
19 11 12
leI
(1)
closed loops are found when the trackers return to their starting posi-
tions.
31
Artificial Vision for Robots
polygonal and spline approximation [15, 16, 17], and straight-line and
circle-arc fitting [18]. Each of these strategies has its advantages. At the
extremes, however, a curvilinear filtering [9] provides the best accuracy
(Fig. 6b), and a look-up table conversion (e.g., [4)) leads to the fastest
implementation (Fig. 6c).
The former technique takes into account noise and signal statistics in
the Fourier domain, and thus is closely related to the least square filter
(Wiener). Its peculiarity lies in the fact that curves need not be sampled
and part of the processing can be recursively designed.
A curve C is conveniently described by parametrized functions xes)
and yes):
c: {x(s) ,y(s)} (2-2)
If xes) and yes) are quantized, they become piecewise constant func-
tions. The curve reconstructed from these piecewise constant functions
consists of isolated picture elements (pixels). Moving along the quan-
tized curve means repeatedly jumping extremely rapidly from one curve
element to the next, and then resting there for a finite amount of "time"
(~s).
In images, spatial variables are more meaningful than time variables.
Therefore, the parameter s is usually defined as the curve arc length:
(2-3)
32
Software or Hardware for Robot Vision?
la)
(b)
(e)
Figure 6. (a) Original data. (b) Smooth curves result from a curvilinear filtering.
Filter parameters are chosen differently in each case, depending on noise power.
(c) Data are smoothed using the look-up table of Figure 7
c:KJ
8 8 ,
~#~¥~ ~ lSI ~
1.1180
~dDcJl~~ ~ ~ ~ t;;J
~ ~ ri ~ r;;J 1IV2'
34
R
CENTRO"ID
(l
(b)
R'
~
i..
=
(a)
0-
I.
(l
~
(e) a
Figure 8. Polar mapping (b) is made independently for each element (pixel) of the original picture represented on a cartesian grid (a).
~ 1:1
VI The simplified polar representation (c) is obtained by computing the mean value of R for each angle
i...
Artificial Vision for Robots
/' ;'
...
i:II:I
o
r:r
a
(al (b) <
1;;'
IoU
cs·
-...J Figure 9. In the processor described in 12 • the polar mapping of the image (a) is performed by a polar raster scan (b) ....::s
Artificial Vision for Robots
Conclusion
Hints are given for the choice of algorithms specific for implementation
either on a general-purpose (micro) computer or on specialized hard-
ware. More valuable are perhaps the examples of image analysis
38
R(a)
POLAR SIGNATURE
G OBJECT
,.- ..
,
5(8) •,
, _,
• SYMMETRY AXIS DIRECTION •
:\ (MINIMUM OF ODD COMPONENT OF POLAR SI~NATURE)
:0.. . :0..... 8 ~
i
li
.=
III
=
'ROTATION OF THE OBJECT ON ~
REFERENCE POSITION li
;'
L -_ _ _......._ _ _..L.._ _ _-'-_ _..L.....L....... L-_.....I_ _ _---lI....___.L-_ _ _G:..JI(SYMMETRY AXIS PARALLEL TO X·AXIS)
..
~
a
<
IN Figure 10. The orientation of the object is defined modulo 180 degrees by the direction of a symmetry axis t
t:I
'" ""
~ >
a
:!!
Q.
!!.
~
e-CI
~
...
~
g
w 1il'
a::
::::l
~
>
a::
S2 ::::l
u
I
S1'
~
CURVILINEAR ABSCISSA
Figure 11. The orientation is defined by an arbitrary vector. A short curvature function is associated with vector ends
Software or Hardware for Robot Vision?
Acknowledgment
The authors wish to thank Prof. F. de Coulon for many helpful discus-
sions and stimulating remarks.
References
1. Business Week, "Robots join the labor force," June 1980, pp 62-76.
2. IEEE Computer, special issue on machine perception, May 1980, pp 11-63.
3. F. de Coulon, P. Kammenos, "Polar coding of planar objects in industrial robot
vision," Neue Technik. No. 10, 1977, pp 663-671.
4. P. Kammenos, "Performances of Polar Coding for Visual Localisation of Planar
Objects," Proc. 8th International Symposium on Industrial Robots, Stuttgart, W.
Germany, pp 143-154, May-June 1978.
5. J. D. Dessimoz, "Visual Identification and Location in a Multi-object Environment
by Contour Tracking and Curvature Description," Proc. 8th Intern. Symp. on
Industr. Robots, Stuttgart, Germany, pp 764-777, May-June 1978.
6. J. D. Dessimoz, M. Kunt, G. H. Granlund, J. M. Zurcher, "Recognition ar.d
handling of overlapping industrial parts," 9th Intern. Symp. on Industr. Robots,
Washington, USA, March 13-15, 1979, pp 357-366.
7. P. Kammenos, "Extraction de contours en traitement eJectronique des images I:
Principaux operateurs de traitement," Bull. de l'ASE.' Zurich, No. 11, Juin 1979,
pp 525--531.
8. J. M. Zurcher, "Extraction de contours en traitement electronique des images II:
Processeur specialise pour signal video," Bull. de./' ASE. Zurich, No. II, Juin
1979, pp 532-536.
9. J. D. Dessimoz, "Curve smoothing for improved feature extraction from digitized
pictures," Signal Processing. I, No.3, July 1979, pp 205--210.
10. 1. D. Dessimoz, "Specialised edge-trackers for contour extraction and line-
thinning," Signal Processing. 2, No. I, Jan. 1980, pp 71-73.
41
Artificial Vision for Robots
11. J. D. Dessimoz, "Sampling and smoothing curves in digitized pictures," Proc. 1st
EUropean SIgnal Processing COnference, EUSIPCO-80, Lausanne, Sept. 16-19,
1980, pp 157-165.
12. J. M. Zurcher, "Conception d 'un systeme de perception visuel pour robot indus-
triel," Comptes-rendus des Joumees de Microtechnique, Ecole Polytechnique
Federale de Lausanne, Sept. 1978.
13. H. Blum, "A transformation for extracting new descriptors of shape," Models
for the perception of speech and visual form, W. Wathen-Dunn, ed., MIT Press,
1967, pp 362-380.
14. C. Arcelli, L. Cordelli, S. Levialdi, "Parallel thinning of binary pictures," Elec-
tronic Letters (GB), 11(7), 1975, pp 148-149.
15. T. Pavlidis, "Polygonal Approximations by Newton's Method," IEEE Transac-
tions on Computers. vol. C-26, no. 8, Aug. 1977.
16. J. E. Midgley, "Isotropic Four-Point Interpolation," Computer Graphics and
Image Processing. vol. 9, pp 192-196, 1979.
17. H. G. Barrow and R. J. Popplestone, "Relational Descriptions in Picture Proces-
sing," Artificial Intelligence. vol. 2, pp 377-396, 1971.
18. F. L. Bookstein, "Fitting Conic Sections to Scattered Data," Computer Graphics
and Image Processing. 9, pp 56-71, 1979.
19. M. Berthod,J. P. Maroy, "Leaming and syntactic recognition of symbols drawn on
a graphic tablet, "Computer Graphics and Image Processing, vol. 9, pp 166-182,
1979.
42
Chapter 3
Comparison of Five Methods for the
Recognition of Industrial Parts
J.POT
Jet Propulsion Laboratory, Pasadena, USA. Currently at
Laboratoire d' Automatique de Montpellier, France
P. COIFFET
Laboratoire d' Automatique de Montpellier, France
U.S.T.L.
P. RIVES
Bell Northern Ltee, Verdun, Canada. Currently at
Laboratoire d' Automatique de Montpellier, France
Introduction
Some computer vision systems already exist (I), and their utiliza-
tion will grow. For robots that have to manipulate objects, vision is
a very powerful method of carrying information about these ob-
jects.
This information is extremely complex. An image is composed of
a great number of pixels. Much research has been done to find
methods to interpret images. Nevertheless, the vision problem has
not been mastered. Some systems are working under severe con-
straints.
In this article, we describe a system working under the following
constraints: objects are stationary and non-overlapping; the cam-
era is situated above the working plane and its optic axis is perpen-
43
Artificial Vision for Robots
dicular to the plane; the lights are calibrated and there is reasonable
contrast between objects and background so that we can get a bina-
ry image using a fixed threshold.
Given these conditions, which are usually found in the vision
systems working in industry, we are trying to reduce these condi-
tions (2), this article describes some methods of object recognition
and the learning accomplished by the system. The goal is the
achievement of the system which:
44
Five Metbods for tbe Recognition of Industrial Parts
45
Artificial Vision for Robots
Initialization
We choose the first feature. The initial set consists of all the initial
classes.
Iterations
Step 1: For one set of classes CK 1,... ,CKn, we select 2 classes
CKO and CKl, using a criterion.
Step 3: Using a criterion, we verify that these two classe<; are sepa-
rated. If not, we go to step 6.
46
Five Methods for the Recognition of Industrial Parts
6 DSKO
~--
PYR1
PYRO
DSK1
4 4 - - - - - - -- - -- -- - ~~~
2 ~~~l\ 2
~3-;r~~:~~ ~;!R~
eve' "H'
PYRO, DSKO; DSK 1 ;CYLO;
CYl1;CUBE--~~..
I
DSK10
cno 2
'Pyr1,dsko,cYlo, cu~e
DSK1
7
6~ _______ ~S~PH~E ____________
1
47
Artificial Vision for Robots
Step 4: Each class of the initial set CKI, ... ,CKn is attributed to
one of the two subsets, according to its positidn with resect to the
separating hyperplane. If a class straddles the hyperplane, it is at-
tributed to both subsets.
Step 5: If all the subsets have only one class, the process is complet-
ed. This means that each class can be described by a set of linear
discriminant functions. If not, we go to step 6.
For step 1
For step 2
For step 3
Two classes CKO and CKI are separated if all the samples of one
class are on the same' side of the discriminant hyperplane.
For step 4
48
Five Methods for the Recognition of Industrial Parts
the projections of the centers of inertia of the two classes CKO and
CKl.
When this algorithm is completed, we have a discriminant tree
(Figure 2a, b). This tree is represented by cells. Some cells charac-
terize the nodes of the tree. Each of these cells contains five values
(iI, i2, i3, i4, i5). For the node K, i5 is the number of coefficients of
the linear discriminant function. Other cells contain the coefficients
CKi and the threshold SK of this function. Thus, for a given image,
if we compute the values of the features PI, ... ,Pn, we can compute
the value of the question at the node K, that is:
Is
QK = ~ (CKi) X (Pi) - SK (3-1)
= I
49
Artificial Vision for Robots
•
•
•• • •
•
•
1 paramo
2 par.
-
Figure 2b. The tree generated by the learning algorithm
50
Five Metbods for tbe Recognition of Industrial Parts
J J
SK ~ dkj ~ IXJ-gkjl/okj (3-2)
J = 1 J 1
51
Artificial Vision for Robots
trapezoidal law
X 1 area
gaussian law
p(x 1 1 X 1 a rea
histogram
(30 levels)
X 1 area
Figure 3. Estimations of the densities of probability
52
Five Methods for the Recognition of Industrial Parts
and a and f3 are such that 90% of the learning samples are between
x2 and x3, and 99% between x 1 and x4, if we use a Gaussian model.
Normal law is given by the formula:
where gkj and akj are the average and standard deviation of the jth
feature for the class k.
The histograms are computed using a discretization of the field of
variation of the features in L parts. The choice of L is very impor-
tant. Some tests were made using L between 10 and 30.
If 1Nk is the number of learning samples of the class k, and NKj 1
the number of instances in which the value of the j th feature be-
longs to the lth interval. The probability p «xl/Ck) is given by
Nkjl/Nk.
Thus the recognition algorithm is as follows:
Initialization:
We first compute the value of the first feature of the object.
For each class k, we compute the a posteriori probabilities p(kl
xl), using the formula:
3. If p(kl xl, .. ,xj )is greater than the threshold for a class kO, the
object is recognized. If not, we introduce a new feature and go to
step l.
4. If all the features have been selected, and there is still an ambigui-
ty, we have two alternatives:
4.a) We can choose the class for which p(k/xI, . .,xj) is a maxi-
mum.
4.b) We can resolve the ambiguity using other sensors.
53
Artificial Vision for Robots
54
Five Methods for the Recognition of Industrial Parts
Where g' and 0' are the adapted values of g and 0, x is the value of
the feature, and a is a coefficient which allows us to choose the
speed of the adaptation.
Conclusion
There is a big difference between methods using the linear discrimi-
nant functions and the others. The coefficients cannot be adapted
easily using this method. Nevertheless, it can be used for specific
problems, if the set of objects is fixed.
The use of histograms is the more theoretically justifiable
55
Artificial Vision for Robots
References
1. For instance: Optomation II (General Electric), OCTEK 4200, Opto Sense
Visual Inspection System (Copperweld), VSIIO (Machine Intelligence Corpora-
tion).
2. P. Rives. Utilisation d'une camera solidaire de I'organe terminal d'un manipu-
lateur dans une teache de saisie automatique. These 3 cycle, Montpellier, 1981.
0
56
Five Methods for the Recognition of Industrial Parts
57
Chapter 4
Syntactic Techniques in Scene Analysis
s. GAGLIO
P. MORASSO
V. TAGLIASCO
Introduction
The degree of complexity of manipulation processes in human and
advanced robots is such that it is not possible to think in terms of direct
visuo-motor transformations, such as those present in low-level animals
or simple industrial robots.
An alternative to the inextricable difficulty of the direct approach, a
linguistic approach first of all provides a reduction of complexity by
passing from the world of processes to the world of abstract symbolic
descriptions. However, abstraction alone is not sufficient to define and
justify a language. A second requirement is to provide a sufficient
representative power to preserve much of the "richness" of the vis-
uomotor world.
In a linguistic approach, this power is obtained by adding to the
abstraction capability (linked to the definition of' 'primitive features' ') a
generative capability, that is, a system of rules which allow us to
assemble, disassemble, and compare symbolic structures.
A competence theory for the verbal language [1, 2] is a model, in terms
of concepts and rules, of the mental structures which underlie speaker-
listener communication, independently of the processes which generate
specific streams of phonatory movements/sound/auditory signals.
Similarly, a competence theory for the vi suo-motor language is a
model of the primitive notions and structuring rules which underlie
58
Syntactic Techniques in Scene Analysis
59
Artificial Vision for Robots
/ -
Figure 1. Pixels of digitized image can be distributed among receptive fields
Edges
Two edge detection algorithms were developed in our laboratory for
performing some experimentation with scene analysis.
The two algorithms were inplemented for the PDP 11/34 minicom-
puter and were used to process digitized images on a TV camera by
means of a Tesak VD501 Image System (interfaced with the PDP 11/34
by means of a digital parallel I/O interface DRI1C), which stores an
image with a resolution of 512 x 512 pixels, each of them quantized
with 8 bits.
Hueckel Operator
This operator has been proposed by Hueckel [6] for a continuous
distribution of intensity F (x,y) over a unit circle, which is optimally
fitted by an "edge," that is, a bidimensional step function .
In our computer implementation of the Hueckel operator, we ex-
pressed the operator by means of explicit formulas, which weighted the
61
Artificial Vision for Robots
Gradient Operator
Another operator was developed in our laboratory [7] which uses a 5 x 5
recepti ve field and is based on the computation of the intensity gradient
(by means of a very simple 5 x 5 mask). If an edge is detected, a smaller
3 x 3 mask is used to refine the position of the edge.
For the class of images studied, the computation time is about 45
seconds.
Suinming up, the initial phase of feature extraction transfonns the
image of Fig. I into the image of Fig. 2. From the computational point of
view, the transfonned image is expressed as a set of LISP-like descrip-
tors, according to the following structure:
Lines
A primitive line can be defined as a collection of adjacent edges limited
between two vertices, where a "vertex" is either "the point at which
several edges intersect" or "the point at which the contour has a
significant curvature change. "
It is then possible to determine a "vector" between the initial and the
final vertices (the direction of the vector can be chosen in such a way to
guarantee a given direction, for example, clockwise, for each closed
contour of the scene) and, accordingly, to classify the lines as
"straight," "concave," or "convex."
Furthermore, curved lines can be approximated by circular arcs,
which can be identified by the following attributes: i) length, ii) initial
slope, and iii) final slope.
Then it is possible to associate with each line a LISP-like descriptor of
the following type:
62
Syntactic Techniques in Scene Analysis
-
- -
Figure 2. The image of Figure 1 is transformed into the images of Figure 2
(a and b) through feature extraction
63
Artificial Vision for Robots
Figure 3a. The highest curve corresponds to the g (x) of the automaton which
recognizes straight lines
65
Artificial Vision for Robots
The * symbol (also known as Kleene operator) means that the string inside the brackets can be
repeated any number of times, including zero.
66
Syntactic Techniques in Scene Analysis
Description of Objects
Each object of the scene can be associated to a LISP-like descriptor of
the following type:
External Operators
External operators express the fact that two figures have a line in
common, completely or in part, or one contains the other. Furthermore,
if a line or part of a line is in common, it is convenient to make explicit
whether one figure is inside the other or not.
Accordingly, we have defined 5 operators which are shown in Table
I. The external operators are used to compose figures according to the
following syntax:
67
Artificial Vision for Robots
Table I
External Operators
y i-
Lr L:J
°IHlUtot' ~ «I t
U
ob,.
CJ
ObJau obJ. obJ. obj.
~
" "
.~A
,
·~2 A
"6') A
alV·2 A
&26·] A
rtlvl'"
" "
Ca.!politi!;>T'I
W' CJ CJ EJ CJ
b,
Obj. CrtCPOSlTIOM c
b4
b,
b2
b,
b2
ObJ. C(ICp()51nClf .,
I b4
Obj.
b,
b2
CCla'OSlTl~ =
• b4
b,
b2
Obj. cCICPOStn<lt ..
I
Lht
(A( y alb l )8) (ACdlth l I!) (A( Y<.to1 Hbtol/l) )8) (A( 6(a t .1 Hhl"lll) )1) (1.( E )1)
Figures
After the analysis at the level of "lines, " an "object" is expressed as a
set of "closed contours" related by "external operators. ' ,
The following logical step in the process of making explicit the
structure of the scene consists of describing the figures in terms of their
properties but, since the number of possible contours is infinite, it is
convenient to rely on the definition of a small set of "primitive figures"
and on a small number of operators which allow us to "segment" a
closed contour into a structure of primitive figures. We shall call these
operators "internal operators" (as opposed to the external operators
defined in the previous section) because they introduce virtual lines for
the segmentation.
Primitive Figures
Primitive figures are closed contours which exhibit certain properties,
that is, specific relations among the constituent lines. For example,
standard geometrical figures (such as squares, triangles, truncated cir-
cles, etc.) can be defined using relations of the following type:
• angles between consecutive lines
• parallelism between two lines
• equality between the attributes of two lines
In particular, the contour of a "square" has the following properties (11,
68
Syntactic Techniques in Scene Analysis
12, 13 , 14 are the lines of the contour; ar. a2' a3' a 4 are the angles between
them; "type" and "length" are attributes of each line descriptor):
y y y
r2
q2
ql q) rl ')
X X x
q4
r4
69
Artificial Vision for Robots
where the position and orientation attributes relate the intrinsic frame to
a frame fixed in the environment.
The recognition of primitive figures can be performed by procedures
which analyze the representative string of a given contour, verify the
defining properties of the figure (for example, the procedure which tries
to recognize squares will use properties (6.1» and, eventually, generate
descriptors of the type (6.2).
For real images it is quite clear that the verification of properties will
always be associated with a degree of uncertainty, and it seems conven-
ient to model such situations by interpreting the properties of a primi-
tive figure as "fuzzy relations." In other words, each relation will be
verified by the procedure with a "degree of fuzziness" (measured by a
value in the interval (0,1» and the global degree of fuzziness of the
recognition will be computed as some weighted sum of the individual
degrees of fuzziness [10].
Description of Figures
Each figure of the scene can be associated to a LISP-like descriptor of
the following type:
70
Syntactic Tecbniques in Scene Analysis
Internal Operators
Internal operators allow us to segment a figure into simpler figures.
The decomposition of a figure into two simpler figures corresponds to
tracing virtual lines between significant points of the contour of the
figure. When this is done, the relationship between the two resulting
figures can be expressed by means of operators (internal operators) quite
similar to the external operators previously defined.
Table D
o,.pur ~
Internal Operators
"
, ;;
0'''' t:J 0 CJ
obJo obj. obJe
Object
", ",
.~ • "~2 • a2liJ A &]'7- 2
•
", "
D EJ D D
Pi,ure.
b4 b2 B B B I
b4 b] b2 b4 b b2 bit b) b2
b, J
Ca.poliUon OI>J. ~ITIOII~ Obj. CCItP(I)I nON = Ob]. COMPOSITION = Obj. CCWqnnmr -
List
(A( '.tht )8) (A( ,,", b, )B) (A( ~(.1.1)(bl.11J))B) (A(;; (', ., )( b, .1/3)1)
(in terms of explicit relations among primitive figures), which allows the
computation of any information relevant for manipulation or other robot
tasks.
The whole process can be represented, using the terminology
suggested by Lindsay & Norman [11] for describing visual perception,
as a "pandemonium," where a large set of "demons" detect the
presence of conditions or relations in the present description of the scene
and, consequently, trigger the intervention of other demons, generating
other layers of description or modifying their attributes.
We have seen that relations among contours and, then among figures,
can be described by means of external operators. Therefore, we need
demons which correspond to external operators and that are activated by
demons which verify relations among the figures, such as inclusion or
line contiguity.
The process can be sketched in the following way:
(i) flexes;
(ii) concave lines;
(iii) convex lines;
(iv) concave angles.
Feature demons receive the contours of the figures as input and output
of the sought features.
A decomposition demon is activated by a combination of the outputs
of some feature demons and gives as an output a decomposition of the
figure into two new figures. The decomposition is expressed by a string
which contains the names of the two figures, together with an internal
operator, and is written in the COMPOSITION-attribute of the figure.
At the same time, a descriptor is created for each figure and a new step of
recursion is taken.
When feature demons give a negative result (no feature is found), the
figure is analyzed by primitive figure classifiers, which look for rela-
tions among contour lines. These classifier demons may succeed (and
this stops the process) or they may recognize contour parts which are
characteristic of a given primitive figure. In the case of partial recogni-
tion, the combination of the outputs of different primitive figure clas-
sifiers again activate decomposition demons, and the process goes on
until no more decomposition is found (for example, the contour of a
"house-figure" determines the partial recognition of the "roof" by a
"triangle classifier" and of the "house body" by a "rectangle clas-
sifier, " allowing a decomposition demon to segment the contour into
two new sub-figures).
When the decomposition process is terminated, the produced strings
are parsed by classification demons, each of which has knowledge of the
73
Artificial Vision for Robots
References
1. Chomsky, N., Aspects of a theory of syntax, Cambridge: MIT Press, 1965.
2. Parisi, D., Illinguaggio come processo cognitivo, Torino: Boringhieri, 1972.
3. Yoda, H., A new attempt of selecting objects using a hand-eye system, Hitachi
Review 22, 362-365, 1973.
4. De Coulon, F., Kammenos, P., Polar coding of planar objects in industrial robot
vision, N.T .• 10,663-670, 1977.
5. Marr, D., Early processing of visualinformation, Phil. Trans. R. Soc. London, 275,
483-524, 1976.
6. Hueckel, M.H., An operator which locates edges in digitized pictures. Stanford U.
Lab. Art. Intel .• AIM-105, 1969.
7. Carrosio, C., Sacchi, E., Viano, G., Robotic vision: an implementation of the
Hueckel operator for edge detection, Genoa Un.E.E. Dept., Tech. Rep., 1980.
8. Gaglio, S., Marino, G., Morasso, P., Tagliasco, V., A linguistic approach to the
measurement of 3-D motion of kinematic chains, Proceed. 10th ISIR-5th CIRT.
Milan, March 5-7, 1980.
9. Fu, K. S. , Syntactic methods in pattern recognition, London: Academic Press, 1974.
10. Zadeh, L.A., Fuzzy sets,lnf. Control. 8,338-353,1%5.
11. Lindsay, P.H., Norman, D.A., Human information processing, London:
Academic Press, 1977.
74
Part II:
Applications
Chapter 5
Recognition of Overlapping
Workpieces by Model-Directed
Construction of Object Contours
w. RATTIeR
Introduction
The methods for the recognition of overlapping workpieces are
based on the recognition of the shape of an object. A direct compa-
rison of the shape as it is done, for example, by correlating pictures
pixel by pixel, is very time-consuming because of the various degrees
of freedom of the object's position. Therefore only the contour
lines of an object are usually analyzed. Visible parts of a contour
line are compared with reference contour lines which correspond to
a model which may be regarded as an abstract description of the
contour lines of an object. Various methods being developed differ
in the comparison strategy and in the type of the models.
There are two basically different comparison strategies. In the first
strategy visible parts of given contour lines are separately interpreted as
parts of a reference contour which are in accordance with a geometrical
model of an object. The recognition is achieved when the single interpre-
tations are consistent. In the second strategy a reference contour is
constructed using visible parts of contour lines as construction elements
and using production rules as a generative model of an object. Recogni-
tion can be performed if a reference object is constructed fairly com-
pletely. Most of the methods quoted in the literature are based on the first
77
Artificial Vision for Robots
The Scene
The recognition system is outlined for scenes having a complexity
depicted in Fig. 1a and Fig 1b. Scenes with a comparable complexity are
to be found when workpieces are isolated by simple strip-off
mechanisms. Then, up to four or five workpieces are in the scene. Apart
from the overlapping of the workpieces, perspecti ve deformations and
bad contrast situations occur. Perspective deformations, however, are
limited as long as the workpieces are relatively flat. For testing the
system screws and bent metal parts have been used.
The Model
The model of a workpiece is an abstract description of its shape. Here the
workpiece is characterized by an arrangement of straight lines. The
model is an ideal arrangement of straight lines which corresponds to the
contour lines of an object.
The basic element of the model is a segment. A segment is a straight
line which is assumed to be a part of an imaginary ideal reference object.
The position of a segment is given by the coordinate values of its start
78
Recognition of Overlapping Workpieces
and end point or, alternati vely, by the coordinate values of its start point
and its length and orientation.
A model is represented by a sequence of segments. Beginning with a
start segment a complete ideal reference object can be constructed by a
successive determination of the position of segments. In order to charac-
terize the contour of a given workpiece the relations between segments
must· be specified accordingly. The consecutive determination of the
position of lines has the advantage that arrangements may be defined
which depend on relative values of length, orientation, and position
between single lines and not on absolute values. The relative position
may vary within a given range. The range of variation can be selected for
each segment separately.
Fig. 2 shows arrangements of segments which characterize the work-
pieces depicted in Fig. 1. The segments of a model are marked by thick
lines. Dashed lines show contour parts of the objects which are not taken
79
Artificial Vision for Robots
into consideration by the model. The number at each segment shows its
order in the sequence of segments. Obviously, adjacent segments of a
sequence need not touch each other.
80
Recognition of Overlapping Workpieces
81
Artificial Vision for Robots
start state
fmalzstate
/ \" / (xE'YE)
descriRtion of 9 i-
Ideal posiflon data '( 9
x A ' YA' x E 'YE ' L , 9
real posiflon data xA ' YA ' x E ' YE ,L , 9 (xkYA)
OPt· startmg with a new segment, OP2 complettilg the actual segment
82
Recognition of Overlapping Workpleces
reference point /
(xref iYref); intersection fXJII1it reference orientation
of 51 and 52
9ref; real orientatIOn of 51
\ 52
I
f--
(xref iYref) 51 th-'l
nce leng
refe re th of 51
~ ideal leng
Lref -
real position of Sf
..... ---
5i
ideal position of 5i
FIgure 5. An example for the determination of reference data of a new segment
operation. The relative position data needed for the computation of the
ideal position data are also assigned to each follower state. An example
for the determination of the ideal position data of a new segment is given
in Fig. 6.
/ searCh range
Y ~\ ideol position of 53
Figure 6. An example for the determination of ideal position and search range
84
Recognition of Overlapping Workpieces
of the new segment. If there is more than one line in the search range
different realizations of a reference object can be constructed depending
on which line is used for beginning the new segment. The real position
data of a realization are taken from the position data of the line, used for
beginning the new segment. If there is no line in the search range the
ideal position data are taken as real position data storing the fact that a
gap of a whole segment has been bridged. The real and ideal position
data of the states of each realization are stored. In Fig. 7, an example
with two lines in the search range is given. Taking each line as a part of
the new segment, two new realizations are constructable.
7/
, ,
5j "
~
/
/'~
\
\
\
S2 S2 \
--
53 \
\
-- -- 5,
-- -
I
-- ~
57 S,
Figure 7. New realizations of a reference object when two lines are in a search range
85
Artificial Vision for Robots
ideal endpoint
modified
search range
~-~----real endp~nt
/ .... -_ ..
1--- ~----
•
Figure 8. Modified search range for a realization depicted in Figure 7
86
Recognition of Overlapping Workpieces
gap
new
---------~
I
/
1--- •
Figure 9. An example for updating the real position data of a segment
87
Artificial Vision for Robots
scene model
companson constNchon
data data
~
constNctlOl1
of Incomplete
reference
storage
~
,
selechon
88
Recognition of Overlapping Workpieces
ideal length of their start states. Only those reference objects are consid-
ered and stored of which the states have a real length almost as long as
the ideal length given by a model. Reference objects with start states of
shorter length are taken into the store as long as the limiting number of
reference objects has not been reached.
Starting from the start state of all stored reference objects, new
representations are constructed. The construction of new representa-
tions is done by traversing the state transition diagram. In order to do
this the production rules described in Section 5 are applied to the lines of
the scene table.
89
Artificial Vision for Robots
distortions are entered into the store. The reference objects in the store
are ordered according to the value of resemblance. The size of the store
is bounded. If the store is complete already, then when a new representa-
tion is to enter, the reference object with the smallest value of re-
semblance is canceled. When applying this strategy only reference
objects are in the store which have a low degree of distortion and a large
value of resemblance.
Results
The iterative construction of the object contours have been applied to 20
scenes with bent metal parts and screws. In no scene was the position of
an object determined wrongly. In 17 scenes one or more workpieces
were correctly recognized and only in 3 scenes was no object found. Fig.
90
Recognition of Overlapping Workpieces
11 shows two samples of such scenes. The objects which were recog-
nized are marked. All lines of the scene table used for constructing the
reference object are marked by a common number. The number shows
which lines belong to the same object.
The computing effort for recognizing an object depends on the scene
complexity and on the type of the model. When analyzing scenes with
the screws, 10 iterations on the average are needed for the construction
of a reference object which leads to a recognition. For scenes with the
bent parts about 60 iterations are necessary.
For further development it is planned to combine the model with
comers and lines in order to be able to enlarge the complexity both of the
scenes and the objects. As a next step, objects with curved contour lines
will be analyzed. Because of the flexibility of the model it is easy to
change the model in order to adapt the system to new objects.
Acknowledgment
The research reported in this paper was supported by the Bundesministerium fUr
Forschung und Technologie of the Federal Republic of Germany under contract 08 IT
5807.
References
1. W. A. Perkins: A Model-Based Vision System for Industrial Parts. IEEE Trans. on
Computers, Vol. C-27, No.2, February 1978, pp. 126--143.
2. B. Neumann: Interpretation of Imperfect Object Contours for Identification and
Tracking, Proc. of the 4th Int. Joint Conf. on Pattern Recognition, Kyoto, Japan,
1978, pp. 691-693.
3. J. D. Dessimoz, M. Kunt, and J. M. Zurcher: Recognition and Handling of Overlap-
ping Industrial Parts, 9th Intern. Symp. on Industrial Robots, March 1979,
Washington D.C., pp. 357-366.
4. H. Tropf: Analysis-by-Synthesis Search for Semantic Segmentation, Applied to
Workpiece Recognition. Proc. of the 5th Int. Conf. on Pattern Recognition,
Miami, USA, December 1980, pp. 241-244.
5. H. Tropf: Analysis-by-Synthesis Search to Interpret Degraded Image Data, Robot
Vision and Sensory Controls Conference Proceedings, Stratford-Upon-Avon,
United Kingdom, April 1981, pp. 25-33.
6. K. S. Fu: Syntactic Methods in Pattern Recognition, Academic Press, New York,
1974.
7. J. Rechenberg: Evolutionsstrategie, Friedrich Fromann Verlag, Stuttgart-Bad
Cannstatt, 1973.
92
Chapter 6
Simple Assembly Under
Visual Control
P. SARAGA and B. M. JONES
Philips Research Laboratories, England
93
AnJliClal Vision for Kobots
A priori
knowledge
World model
main control
Motor Visual
control processing
part of a larger body, the three storeys of the tower are painted
matt black and are mounted on a matt black base. This means that
the storeys cannot be seen from above but must be located from
the side. The system design should be such that the tower can be
positioned anywhere within an area of approximately 70 x 70mm.
The rings are assumed to lie on a horizontal surface each within an
area of approximately 40 x 40mm.
Although this is not a real industrial task, it contains the key
elements of a number of practical industrial problems. The task
was solved by an experimental system consisting of a computer
controlled manipulator equipped with TV cameras. (See Fig. 2.)
The Manipulator
The manipulator has 4 degrees of freedom of which 3 were used
94
Simple Assembly Under Visual Control
for this task. Radial, rotation, and vertical (R, (), Z) motions are
provided by hydraulic actuators with integral measurement sys-
tems. Each actuator can move 100mm in 4000 steps of 25 microns
each. Although the resolution is 2Sf.l the absolute accuracy of
each axis is only 0.1 mm. The Rand Z axes are driven directly
while the () axis has a mechanical advantage of approximately 4,
giving an absolute accuracy of ±O.4mm. It can be seen that by
itself this manipulator is not accurate enough to perform the task.
The fourth degree of freedom which was not needed for this
task is an electrically powered wrist mounted just above the gripper.
The manipulator is controlled by special purpose electronics con-
nected to a small computer (a Philips P8SI).
System Operation
The flow chart of the system operation is shown in Figure 4.
96
Simple Assembly Under Visual Control
P857
Vision
software
P851
Telpin
Manipulator
T. V. - computer
controller'
interface
Electronics
for
servo - control
Once a ring has been located, and its size determined, the manipu-
lator is instructed to pick up the ring and take it to a position high
above the approximate position of the tower. This operation is
under the control of the P8SI, and the view of the tower is not
obscured by the manipulator during this period. Therefore the
P8S7 can use the two horizontal views to determine the position
in space of the appropriate storey of the tower at the same time
as the mechanical operation is taking place.
If the tower storey is successfully located then the manipulator
lowers the ring to a position immediately above the appropriate
storey of the tower, where the relative position of ring and tower
can be checked using the horizontal TV cameras. The manipulator
now moves the ring until it is exactly above the correct storey.
The ring is then placed on the tower.
While the P8S1 is controlling the final placement of the ring
onto the tower, the P8S7 is using the vertical TV camera to locate
the next ring.
97
Artificial Vision for Robots
PC:;7 1'851
1
Check placement of laat ring complete~
ri n& on tower
I
Send ring placement complete
! !
Send Command + Data to pick up ring
!
Locate appropriate storey of tower
ring
,
Receive Command + Data to pick-up
1
Check pick-up of last rins complete
!
..-....-- Send ring pick-up complete
!
Was tower located successf ully
no ~ y~s
Send Conmand + Data to _
1
Receive Command + Data to lower
lower ring above the ring above the storey
storey
1
Locate ring poslt~on held
in gr ipper and compute
world position
1
Was ring located successfully
0 ~yes
/
Send Command + Data to ---..Receive Command + Data to correct
correct ring position ring position and place ring on
and place ring on tower tower
l
Replace ring onto the table
1
Check ring replacement complete __ ~
!
Send ring replacement complete
t
Figure 4, System operation flow chart
98
Simple Assembly Under Visual Control
Picture Processing
In order that the assembly task can be carried out at a realistic
rate it is necessary that the picture processing employed be fast.
Thus very simple algorithms have to be employed. To illustrate
this the optical processing used in the task will be described in
more detail.
Ring Location
One of the three prescribed areas of the field of view (Figure Sa)
of the vertical camera is sampled at a low resolution into the P8S?
as a S bit grey level image. The grey level image is thresholded and
the binary image is edge traced [2], and black areas are located.
If an object corresponding to one of the possible rings is not
found, further attempts to detect a ring by changing the threshold
are made. If a ring still cannot be found the next area is examined.
Once a ring is detected its approximate center is calculated and
two further regions centered on the approximate ring position are
sampled at high resolution to determine the accurate X and Y
coordinates of the ring centers in TV units (Figures 5b, Sc). The
99
Artificial Vision for Robots
0 ·.
Q
o.
(c) High resolution: Determination of Y coordinate
100
Simple Assembly Under Visual Control
regions are searched from both ends until a white or black edge is
found and the appropriate coordinate is taken as the mean of these
two edges.
The position of the ring is then converted to world coordinates
and the manipulator is instructed to pick up the ring and bring it
to a position approximately above the tower.
Location of Tower
The same processing is carried out on each of the two horizontal
TV channels used to locate the tower. First a large coarse resolution
scan (Figure 6a) is carried out to locate a black horizontal bar of the
correct size for the approximate storey. Then a second smaller scan
(Figure 6b) at higher resolution is taken at about the approximate
center of the tower. This is used to locate the top of the storey.
Small scans at high resolution are then taken at each side of the
tower (Figures 6c, 6d) and the black/white boundary points found.
The separation between these points is computed and only those
pairs whose separation is within 5 pixels of the correct storey
widths (which range from 50 to 110 pixels) are accepted. When 8
pairs have been found meeting these criteria, the mean of their
center position is found and taken as the center of the storey. The
mean of the error in width between the acceptable pairs and the
correct storey width is used as an error function to correct either
the expected width of the tower or the threshold for the next
cycle of the system. Thus the system is self-correcting and can
allow for drift in the TV video level. The process is repeated on
the second camera. The TV coordinates are converted to a line
in world space parallel to each optic axis and then the point of
closest approach between the two lines is found and taken as the
world coordinates of the tower. The tower may be placed in any
position where all storeys can be seen by both cameras. The field
of view of each camera is set to be approximately 100 x 100mm
which completely covers the 70 x 70mm area in which the tower
may be.
Visual Feedback
The sampling system can resolve approximately 600 x 600 points
in the TV field of 10mm x 100mm giving a resolution of", 0.17mm.
The TV position of the tower is known to a rather higher accuracy
since it is obtained by averaging a number of edge points. The
101
Artificial Vision for Robots
102
Simple Assembly Under Visual Control
103
Artificial Vision for Robots
104
Simple Assembly Under Visual Control
(\j
L--+__________~~
106
Simple Assembly Under Visual Control
Camera View
Calibration
In order to use a visually controlled machine it is necessary to
relate the images from the various TV cameras to the movements
of the manipulator and to the positions of the various parts being
handled. The calibration process determines the relationship be-
tween "frames of reference" associated with each part of the over-
all system. Whenever the machine is modified or adjusted, the
relationships change, and recalibration is required. If effective
flexibility is to be achieved it is important that calibration is
both simple and rapid.
It is generally most convenient to relate all the manipulator and
TV frames to a single arbitrary Cartesian "world frame." Thus, if
any unit is moved, only its relationship to the world is changed.
Calibration of the manipulator involves determining the relation-
ship between the movements of the R, e, Z actuators and the world
109
Artificial Vision for Robots
110
Simple Assembly Under Visual Control
Distributed Processing
In the example given of positioning rings on a tower we used two
processors. The division of the task between the two processors is
such that only simple commands with parameters are passed to
the P851, which acts as a slave of the P857. This simple division
allows a degree of parallel processing and a consequent increase
in overall speed of the system. There is no problem of data division
in this system since all the data is stored in the P857.
It is possible to devise other organizations using more processors
to allow additional processes to be carried out in parallel. For
example, if a separate processor was used for each horizontal
camera then the determination of the tower and ring position in
each view could be carried out simultaneously.
It is necessary to consider other factors before adding processors
indiscriminately. For example, the division of tasks must not
require large amounts of data to be transferred between processors,
nor should it require the maintenance of the same data in two
111
Artificial Vision for Robots
Conclusion
The experiments described in this paper have shown that back lit
profile images using parallel light can be usefully applied to some
tasks in mechanical assembly. It has been demonstrated that the
advantages of this approach include: rapid size independent picture
analysis allowing identity, position, and orientation to be deter-
mined by direct measurements; immunity to ambient illumination;
and simple 3D object location using 2 or more views.
Although the system has only been applied to a simple example
we intend to apply the same methods to a number of practical
examples. The use of many processors including specialized high-
speed picture processors in systems of this type is now becoming
practical. It is therefore becoming important to find methods for
both constructing and using these systems to give the desired
qualities of speed, modularity, flexibility, and conceptual simplicity.
Acknowledgments
We would like to acknowledge the contribution of D. Paterson and
A.R. Turner-Smith who, with our colleagues at Philips Research
Laboratories, Eindhoven, constructed the manipulator. We would
also like to thank our Group Leader, J.A. Weaver, for his help
and encouragement.
112
Simple Assembly Under Visual Control
References
1. Saraga, P., and Skoyles, D.R.: An experimental visually controlled pick
and place machine for industry, 3rd International Joint Conference on
Pattern Recognition, Coronado, California, November 1976, pp. 17-21.
2. Saraga, P., and Wavish, P.R.: Edge tracing in binary arrays, Machine Per-
ception of Patterns and Pictures, Institute of Physics (Lon d) Conference
ser. No. 13 (1972), pp. 294-302.
3. British Patent Application No. 7942952: An object measuring arrange-
ment December 1979.
4. Habell, K.J., and Cox, A.: Engineering nptics (Pitman 1953, pp. 255-
263).
5. Duda, R.O., and Hart, P.E., Pattern classification and scene analysis
(Wiley Interscience, 1973).
113
Chapter 7
Visually Interactive Gripping
of Engineering Parts from Random
Orientation
c. J. PAGE and A. PUGH
Department of Production Engineering
Lanchester Polytechnic, England
Department of Electronic Engineering
University of Hull, England
Introduction
During the early 1970's work was initiated by a small number of
research groups worldwide on visually interactive robot systems.
These developments were preceded by impressive research from
artificial intelligence groups in establishments such as M.I.T. [1],
S.R.I [2, 3] , and Edinburgh [4]. One of the earliest groups active
in industrial applications of visual feedback was based at the
University of Nottingham, where the "SIRCH" robot became
operational in 1972 [5, 6] . This robot manipulative device will be
described later. About the same time, two Japanese groups reported
similar interests, one group was Hitachi featuring a visually inter-
active machine conceptually similar to "SIRCH" [7], the other
with Mitsubishi reporting experiments with an "eye-in-hand"
robot applied to retrieval of motor brushes from a quasi-random
presentation [8] .
Over recent years the interest in visually interactive industrial
robots has grown dramatically, with S.R.I. [9] and General
Motors [10] reporting marketable devices that can be integrated
114
Visually Interactive Gripping
Problem Definition
The research machine used for evaluation consists of a turret
assembly with three manipulators and an objective lens mounted
on its periphery. The turret can be moved in three-dimensional
space over a work surface by means of three mutually orthogonal,
computer-controlled linear stepping-motor tables. Figure I shows
a photograph of the machine configuration. The turret, Figure 2,
can be likened to the lens assembly of a multi-objective micro-
scope in that any of the three grippers can be indexed round into
the reference position used for picking up components. A small
television camera mounted over the turret assembly is used to pro-
vide the sensory feedback. A fourth station on the turret is used to
mount the objective lens, which transmits an image of the scene
below it up through an optical endoscope to the television camera
above. The center of the machine's field of view therefore corre-
sponds to the center of each manipulator when indexed into posi-
tion. Because of this, operation proceeds on a dead-reckoning
basis as the machine is effectively "blind" when manipulating a
component. The image of the scene below the "eye" of the
machine is digitized and stored in the supervisory computer,
which then moves the turret so that the center of the field of view
is over the center of the component. The television camera is geared
115
Artificial Vision for Robots
116
Visually Interactive Gripping
Figure 2. A close-up of the manipulating head showing the three gripping de\'ices
and the objective lens of the optical system
117
Artificial Vision for Robots
Manipulative Techniques
Fundamental Considerations
The field of view may contain one or more components, each of
Which is represented as a silhouette of its plan view. It must be
remembered that a component may be several objects physically
touching. The manipulators must be able to pick up one com-
118
Visually Interactive Gripping
119
Artificial Vision for Robots
120
a
2
0 3
Qort contour
a 1
a 2 <
~.
~
b 3 '<!
1:1
c 4 1O
-~
~.
~
Figure 3. Preliminary processing of a simple scene: (a) before processing, (b) after processing :I.
tv
-- :'""
Artificial Vision for Robots
77-.,.... -0.:.----·-------
.
I
--.
I
----~
I
I
I
I
I
I
----~-----
I
I
I
Figure 4. Definition of sampling area for suction-cup center. The sucker profile is
shown on the left, with the component's enclosing rectangle and the sampling area
for the suctioh-cup center as dotted and chained lines respectively
(0) ( b)
Figure 6. A test for ascertaining whether particular coordinates are within the
body of a component
If the line touches the contour rather than crosses it, this fact is
ignored. The number of times that the line crosses the periphery is
first examined. To this quantity is added the number of contour
crossings registered by the first internal hole; to this second num-
ber is added the number of crossings registered by the next hole;
and so on until the final result - the total number of times that
the line crosses boundaries in the line-drawing representation of
123
Artificial Vision for Robots
124
Visually Interactive Gripping
C Enter )
•
,
Scan digital image of scene and trace
contours of component to be handled
f
I
r-< Is the part's enclosing rectangle large enough to
encompass the suction cup~
t yes
Set scanning window limits and first candidate
gripping position, and initialize figure of
,
merit
l
,
t no
(
r-
Is the present position inside the part? ~
t no
I ,,
Set next hypothetical gripping point
I
Is the figure of merit equal to its initial value?
~
• no
~
( Exit )
125
Artificial Vision for Robots
.D
~
'iii
...
I':
Q)
.S
00
I':
00.
0':Co
--
e
00
>:
:;
...
I':
Q)
'5
00
I':
00.
0':Co
00
3:
;..;
0
<c;;
'3
Co
°2
os
8
...
Q)
OJ
c::
0 00.
Q)
.<::
....
oci
f
:I
IX>
~
126
Visually Interactive Gripping
(d)
Figure 9. The possible ways of using the pincer manipulator, (a) periphery only,
(i) external (ii) internal, (b) one internal hole only, (i) external (ii) internal,
(c) periphery and internal hole, external mode, (d) two internal holes, external
mode
127
Artificial Vision for Robots
(0 )
(c)
Figure 10. Situations arising when using the pincer manipulator: (a) edges longer
than gripper jaws, (b) offsetting necessitated by component shape and edges
shorter than jaws, (c) failure caused by component shape and edges shorter
than jaws
129
Artificial Vision for Robots
130
Visually Interactive Gripping
segment, are added into the difference sums. This process con-
tinues until the sums go outside the specified ranges. The start and
end coordinates, the angle, the tolerance inherent in the angle of
the line, and the first and last vector numbers are then stored. The
center point of the next line is chosen one vector point displaced
from the end of the previous vector. Sequential lines whose angles
differ by less than the sum of the individual angle tolerances
(weighted in proportion to the length of each line) are combined
into a single line. If possible, this new line is merged with the
previous one and so on until two lines cannot be combined, at
which point line extraction from the contour chain vectors is
resumed. The line-merging process is necessary to ensure that each
edge of the image of a straight-sided shape produces one line only
when line-fitting is performed on it. Additional procedures are per-
formed to eliminate redundant lines at the start and end of the
contour chain (which are one and the same point).
After the conversion of the constituent contours of a part into
piecewise-linear approximations, the gripper-fitting algorithms are
applied. The contour or contours for consideration are chosen
according to a simple hierarchical rule, namely that the periphery
is considered first, then individual holes, followed by the periphery
paired with each internal hole, and finally every combination of
two from the internal holes of the component (in cases where
there are two or more holes). The basic scene analysis and line-
131
Artificial Vision for Robots
-
,
-
(0 ) (b)
Figure 13. Reversing the sense of internal-hole contours. The arrow on each line
denotes its direction: (a) before reversal, (b) after reversal
132
Visually Interactive Gripping
I
(
,
Enter )
n
'\
no
\'
( All combinat ions examined?
~ no I
>l
Have minimum overlap
no
( Are both lines longer than the minimum overlap?
~
I
t yes
no
( Are the 1 ioes antiparallel r
'\
I
, yes
,
no
( Do the lines overlap in the spatial sen!!e?
'\
I
, yes
yes I '\ no
Are the lines re-entrant?
t I
,
Set internal mode Set external mode
t yes
no
(IS the amount of overlap greater than the minimum? )
t yes
,
yes
( Has the line pair been accepted previously? '\
I
no
(
~
Are there any obstructions to gripping?
, Jno
{ Can the gripper be fitted?
no
I
J
Set gripper post ion as center of t yes
I
overlap region
Compu~e the nearest POSSibl:~
I
position of qrinner to cente
Of overlap reR.ion
( Exit J
133
Artificial Vision for Robots
and at least one line crosses the other line, then the two lines over-
lap. When it has been established that the selected pair of lines
may be a possible choice for gripping edges, the amount of overlap
and the perpendicular distance apart of the lines must be calcu-
lated. The mode of gripping to be used, either external or internal,
must be ascertained before a decision can be made on whether or
not the manipulator can accommodate the chosen edges, for the
mode affects the maximum and minimum permissible distance
apart of the lines. This can be determined irrespective of the type
of contour being considered - whether a periphery or an internal
hole (these are the only two cases where the gripper can be used
internally) - by noting the rotational sense of the two lines under
consideration. If the sense is counterclockwise, external gripping
must be used, whereas if it is clockwise the internal mode is
required. This is illustrated in Figure 15.
It has been noted that the overlapping region of the two candi-
date lines need not be as long as the gripper jaws. Nevertheless, for
practical purposes some lower limit must be set; a suitable value is
one eighth the length of the jaws. For maximum speed of opera-
tion, the processing is terminated when the first position that can
be used for manipulation is found, regardless of whether other,
more suitable features exist. To alleviate this situation to some
extent, without affecting processing time, line selection is executed
in stages according to a minimum-length criterion. In the first
stage, only lines longer than the jaws are considered. For the next
stage or pass the minimum length is halved, and this process is
repeated for each subsequent pass until the final stage (with the
minimum length set to one-eighth of the jaw length) has been
completed.
When a pair of lines that satisfy all the conditions for manipula-
tion has been found, the surrounding areas must be examined for
obstructions by other features of the parent component itself or
by separate parts which may be in close proximity. Testing for
possible obstructions involves scrutinizing the area swept by the
closing or opening jaws. The algorithms construct imaginary "jaw
zones" which define the swept area when the manipulator is posi-
tioned centrally over the region of overlap of the selected lines.
The interior of each zone is then examined for intruding contour
segments. The length of each zone is greater than that of the grip-
per jaws to allow for offsetting the manipulator to an unobstructed
134
t---I
__ : _I~)
./
... ----- --- ...... ....,
/ \
I
~
\
\
\ /
,
I
I
./
' ..... _---- - ....
[ I
(i ) ( i i)
(a)
/-- - ~
( ___ 1
~..... -
--
--,
_/
~=) L
<
;;.
!!.
=
~
...
1:1
( i) (ii)
(b) J
~
Figure 15. Determining gripping mode: (a) periphery, (i) external (ii) internal; (b) hole, (i) external (ii) internal. (The arrowed lines :I.
.... represent the candidate gripping edges, and the arrowed chained lines denote the rotational sense of the line pair. Note that for
:g
w j.
v. external mode the sense is counterclockwise, while for the internal mode it is clockwise.)
Artificial Vision for Robots
136
,-------
I --,
« __ J'
I
(i ) (0 ) (i i )
~---------------~
IL_ _ _-1
I
r-- ---,
IL __ I
__.J
-r-------t-
<
Iil'
.:
~
S'
;-
(b)
J~
::I•
..... 'C:I
IN Figure 16. Formation of jaw zones for detection of obstructions to gripping: (a) single contour, (i) external (ii) internal; (b) double 'C:I
-..J
contour, external mode only. (The arrowed lines denote the candidate edges. The jaw zones are shown as chained lines.) i
Artificial Vision for Robots
( Enter )
•
Store sets of contours and quanti-
tative parameters for all separate
components
+
Select component to be handled
i
Home in on selected part
t
Scan work-area again and select
nearest match to the upper-level
version
,
yes I
~
Can the sucker be fitted? )
t '\. no
-,
Can the gripper be fitted?
/
ye~
t
Has the manipulator grasped the
part?
; y~s
?--
I Place the part
J
,
I
( Exit )
139
Artificial Vision for Robots
r------------, r-----------l
I 1 I 1
I 2 I 2
I I
I I I
I I I
IL _ _ _ _ _ _ _ _ _ _ _ _ ..I IL ___________ -.JI
(0 ) (b)
Figure 18. Problems caused by parts partly outside the field of view: (a) with no
additional processing, (b) after putting a border of white picture points around the
frame
140
Visually Interactive Gripping
Figure 19. Tightly packed array of components. (The numbers refer to the order
in which the parts are handled. The gripping position for component 1 is shown in
heavy line.)
142
Visually Interactive Gripping
Figure 21. Touching components. (When an attempt at handling is made from the
position shown, the components are pushed apart.)
Concluding Remarks
The automatic manipulation system described above has been
extensively tested with many types of engineering components
and has proved to be accurate and reliable. The principal reason
for this success is the computational accuracy of the sucker and
gripper-fitting software. Even though the resolution of the imaging
system is not high (128 X 128 picture elements), it has been found
possible to handle reliably some comparatively small components
and also many with intricate shapes. A particularly important pro-
perty of the software is its speed of operation. Other existing
systems exhibiting a similar degree of sophistication are sometimes
143
Artificial Vision for Robots
Acknowledgments
The mechanical configuration of the handling system described in
this paper was developed jointly with Professor W. B. Heginbotham
of the Production Engineering Research Association, Melton
Mowbray, England. His collaboration with this work is greatly
appreciated.
144
Visually Interactive Gripping
References
1. Winston, P.H., The M.LT. Robot, in Machine Intelligence 7, Meltzer, B.
and Michie, D. (ed.), pp. 431-63, (Edinburgh University Press, 1972).
2. Duda, R.O. and Hart, P.E., Experiments in scene analysis, Proceedings of
the First National Symposium on Industrial Robots, I.I.T. Research
Institute, Chicago, April 1970, pp. 119-130.
3. Forsen, G.E., Processing visual data with an automaton eye, in Pictorial
Pattern Recognition, Proceedings of Symposium on Automatic Photo-
interpretation, pp. 471-502, (Thompson, Washington, D.C., 1968).
4. Barrow, H.G. and Crawford, G.F., The Mark 1-5 Edinburgh Robot Facil-
ity, in Machine Intelligence 7, Meltzer, B. and Michie, D. (ed.), pp. 465-
480, (Edinburgh University Press, 1972).
5. Pugh, A., Heginbotham, W.B. and Kitchin, P.W., Visual feedback applied
to programmable assem bly machines, Second International Symposium
on Industrial Robots, I.I.T. Research Institute, Chicago, May 1972, pp.
77-88.
6. Heginbotham, W.B., Page, C.1., and Pugh, A., A Robot research at the
University of Nottingham, Fourth International Symposium on Industrial
Robots, Japan Industrial Research Association, November 1974, pp.
53-64.
7. Yoda, H., Ikeda, S., and Ejiri, M., A new attempt at selecting objects
using a hand-eye system, Hitachi Review, 22, Part 9, pp. 362-5,1972.
8. Tsuboi, Y., and Inoui, T., Robot assembly using tv camera, Sixth Inter-
national Symposium on Industrial Robots, University of Nottingham,
March 1976, pp. (B3) 21-32.
9. Gleason, G.1., and Agin, G.1., A modular vision system for sensor-con-
trolled manipulation and inspection. Ninth International Symposium on
Industrial Robots, sponsored by the Society of Manufacturing Engineers
and the Robot Institute of America, Washington, D.C., March 1979, pp.
57-70.
10. Ward, M.R., Rossol, L., and Holland, S.W., Consight: A practical vision-
based robot guidance system. Ibid., pp. 195-211.
11. Kelly, R., Birk, J., Duncan, D., Martins, H., Tella, R., A robot system
which feeds workpieces directly from bins into machines. Ibid., pp. 339-
355.
12. Dessimoz, J.D., Hunt, M., Zurcher, J.M., and Granlund, G.H., Recogni-
tion and handling of overlapping industrial parts. Ibid., pp. 357-366.
13. Page, C.1., Visual and tactile feedback for the automatic manipulation of
engineering parts, Ph.D. thesis, University of Nottingham, U.K., 1974.
145
Artificial Vision for Robots
14. Heginbotham, W.B., Page, C.1., and Pugh, A., A practical visually inter-
active robot handling system, The Industrial Robot, Vol. 2, No.2, 1975.
pp.61-66.
15. Freeman, H., Techniques for the digital-computer analysis of chain-
encoded arbitrary plane curves, Proc. Nat. Electronics, Can!, Vol. 17,
1961.
146
Chapter 8
An Interface Circuit for a Linear
Photodiode Array Camera
D. J. TODD
147
Artificial Vision for Robots
Figure 1. Printed circuit board of camera. The photodiode array is the integrated
circuit near the center
Circuit Operation
The computer commands the circuit to store a line from the camera
by sending a low pulse on one bit, the "start" bit, of the output port.
This clears flip-flop 1 in readiness for the next SCAN pulse from the
camera. When this SCAN pulse is received from the camera flip-
flop 2 is set, allowing clock pulses to be received by the clock input
of each MCI4517B shift register. The 4-bit value of a pixel is trans-
ferred to the shift register on the rising edge of the signal OSC.Q2.
About 1 J.LS after this edge, the "start conversion" signal for the
next pixel to the ADC goes high. Conversion then takes place in the
next 8 cycles of the 7413 clock. The ADC is a MicroNetworks
MN5213, which can convert to 8 bits in 6 J.Ls. Shifting of data into
the shift register continues for the 64 pixels, until ended by the next
SCAN pulse. Once flip-flop I has been set (i.e., Q 1 is low) no more
SCAN pulses will be accepted until the computer does another
148
Interface Circuit for an LP A Camera
MC14517B
~-----!D
a BIT3
64
CK
WE
10k
MN5123
2
MSB
SC
CK
.OOI).JF
J
OSC
Start bit
Scan bit )0
Figure 2. Interface circuit. Both flip-flops are 7473; all AND gates are 7408; the
Schmitt trigger circuits are 7414 and 7413. The shift registers are Motorola
MC 14517B. The signals OSC and SCAN are generated by the camera
149
Artificial Vision for Robots
Start bit U
-l,
Scan
SC
CK
EOC
Figure 3b. Timing diagram. Detail showing ADC waveforms, The number of
cycles in the CK burst is not important as long as it is greater than 8
150
Interface Circuit for an LP A Camera
no
yes
in HL register. Set
pixel count in B to 64
no
151
Artificial Vision for Robots
"start" bit output. Therefore, the line of data will remain in the
register until the computer chooses to read it out. To do this it
outputs a low pulse on another bit, the "shift out" bit, of the output
port, repeating this cycle 64 times.
Programming
The flow chart of a subroutine to read a line from the camera is
shown in Fig. 4. In order to synchronize with the camera scanning
cycle, the computer must wait until the SCAN signal goes high. On
detecting this, the computer gives the "start" pulse, then waits long
enough for the camera to shift a complete line into the shift register.
It then does its 64 input cycles, storing each 4-bit value in one
element of an array.
An example of a 64-element line image obtained using this inter-
face is shown in Fig. 5. The central peak is produced by light re-
flected from a narrow white object.
15r-------------------------------------------------~
~ 10
~
= 5
Components Used
The photodiode array is the IPL 4064; the board it is mounted on is
the K4064. The lens is an FI,9, 25 mm television lens.
Acknowledgments
This work was supported in part by the Whitworth Foundation. I
should like to thank Dr. A. H. Bond for the use of the facilities of
Queen Mary College Artificial Intelligence Laboratory.
152
Part III:
Adaptive Processing for Vision
Chapter 9
Networks of Memory Elements:
A Processor for Industrial Automation
T. J. STONHAM
Introduction
Automation is, in essence, the implementation of decisions by a
machine, on data which are obtained from the environment on which the
machine is to operate. The data may take many forms, and be derived
from various sources. Visual images, discrete measurements, and proc-
essed measurements in the form of spectra are typical examples.
At the outset, automation appears to be an exercise in instrumenta-
tion; this is a valid description where a machine-be it a mechanical
device, a digital computer, or some other electronic processor-is
required to perform a monitoring and controlling role on its environ-
ment, and where the control strategy can be specified.
A pump and float switch can be employed to control automatically the
level of water in a tank. A typical arrangement is shown in Fig. 1. The
essential feature of the system is that it is deterministic. The data-the
liquid being at, or lower than, a predefined level--can be adequately
measured, and provide suitable information for the machine (the
pump/switch system) to operate on under normal circumstances.
If, however, under extraordinary conditions, the rate of inflow to the
tank is greater than the pump capacity, -the system will fail. Other system
155
Artificial Vision for Robots
Pump Control Wa t er
-
... Data
Pump
I out
SWitch -
1
r-
Float
-
Water In -
--- --
-- -
-- --
f-- Tank -
- -
----
---
(0) (b)
Figure 2. An example where different numerals are more similar than versions of
the same numeral. A point-by-point comparison reveals (a) and (b) to be closer
than (a) and (c) or (b) and (d)
158
Networks of Memory Elements
1& G-G-GCGGGCCCC I
161
Artificial VIsion for Robots
(a) (b)
(c ) (d )
Let the universal set of patterns that could occur on a binary matrix be
u. Several sets A, B, C, etc. within that universal set need to be
identified. Let a representative sample of, but not all of, the patterns
within sets A, B, C, etc., be available to the designer. These patterns
form training sets for the learning system. Ideally, one requires a
generalization set ~ which can be interpreted from the training set.
However, depending on the nature of the training set, the system may
undergeneralize, and not recognize all the patterns in sets A, B, C; or it
163
Artificial Vision for Robots
Overgeneralization
Undergeneralization
164
Networks of Memory Elements
Reject set
(a)
Reject set
Error set
(b)
165
Artificial Vision for Robots
may overgeneralize and accept spurious patterns outside the data sets
(see Fig. 6). It is important to note that. the relationships between
patterns within a set pattern and between sets are not necessarily linear
functions of Hamming distance [3] [4].
It is unlikely that precise correspondence between GA and A will be
achieved, and overgeneralization with respect to a given data set can be
tolerated (and is indeed desirable) provided that generalization sets of
different data categories do not overlap, causing patterns to be identified
with more than one training set (a reject condition; see Fig. 7a), or a
generalization set of one category overlaps another data set which is not
fully covered, through undergeneralization due to an inadequate train-
ing set (an error condition; see Fig. 7b). A fuller discussion of generali-
zation properties is given in [5].
A generalizing processor can be implemented with a single layer
network of memory elements, as shown in Fig. 8. Each memory
element, having a n-bit address field, samples an input matrix and
extracts either at random or in some predetermined wayan n-tuple of
pattern displayed there (in this case n = 3). The n-tuple subpattern
provides an address for the memory and a flag (logical 1 if the stores
have been initially set to logical 0) is stored, to indicate the occurrence of
the particular value of subpattern being sampled on the input matrix. A
single layer network of memory elements, which will be referred to as a
discriminator, is exposed to a representative set of patterns from a given
class of data, and a discriminator function is set up in the memory
elements by flagging the appropriate locations of memory addressed by
/
A binary matrix
Read/write
control
Filure 8. A discriminator comprising three memory elements
166
Networks of Memory Elements
the subpatterns obtained from the input space. The discriminant func-
tion is derived solely from the data, and requires no a priori class
description or rules. Furthermore, there is no direct storage of pattern
data, and the amount of storage in a discriminator is independent of the
size of the training data set.
Having trained a discriminator, an unknown pattern to be classified
accesses the stored data in the memory elements-again using sub-
pattern values as address data for the memories, and a decision is made
on the outputs. In the simple arrangement in Fig. 8 an AND function is
used. However, other functions including a numerical summation of the
o~tputs, can be employed.
The generalization properties of the discriminator in Fig. 8 can be
illustrated in the following example:
TA 3 patterns
x x x x x • x. x x
8 patterns
x x x x x x x
• x x x x • x x x x
Ti\ 3 patterns
167
Artificial Vision for Robots
s = H.X.Y bi ts (9-2)
n
Applications
Networks of memory elements configured as learning pattern recogniz-
ers have been applied to a wide range of problems including character
recognition, spectral identification, medical diagnostics, and automatic
fault detection in digital equipment ([5] and references Cited therein). In
this section emphasis will be placed on the recognition of visual images
of piece-parts using single layer networks of memory elements.
The separability of different data categories does depend on Ham-
ming distance, although this dependence is not linear in n-tuple recog-
nizers. A rule of thumb is (not surprisingly): the more alike two different
objects are, the more difficult it is to distinguish between them. Exam-
*RAM: Random Access Memory; ROM: Read-Only-Memory.
168
Staticiser
512x512 .IA/ol ..J
Camera store
n- tuples
(Holds one
~
Viewing Recognition
area Control frame)
store
1 Store
selection I
Input
connection Teaching- Responses
address
strategy
J
Selection of
strategy z
-'--'------,
ic
~
c
-
~
ac
1/0
~
Manlmachi ne
interface
0'1 Figure 9. Block schematic of WISARD pattern recognition device
I,C)
- f;
;l'
Artificial Vision for Robots
256·25&
128 ·128
64 .64
32.32
16 .16
1~
1dL-----~------~----~------~----~----~
2 4 6 8 10 12
n-tuple size
Recognition of Piece-parts
The silhouettes of two different keys are shown in Fig. 11. Each key was
allowed to take up any position in a frame with an aspect ratio of 2 to 1,
provided it was completely contained within the frame. The maximum
dimension of this frame was approximately equal to the greatest dimen-
sion of the key. Two discriminators were trained, one for each key, on
digitized patterns taken for the frame at a resolution of 32 x 16 pixels.
The discriminators each comprised 64 memory elements addressed by
random 8-tuple samples, and the keys were removed and reinserted into
170
Networks of Memory Elements
Unconstrained Piece-parts
In the previous problem the keys, although allowed to take up any
position in the frame, were in effect constrained to less than 30° of
rotational freedom per quadrant. If parts can be observed in any orienta-
tion during training, the generalization set will be considerably larger,
though not necessarily resulting in overgeneralization.
If the discriminator shown in Fig. 8 is trained on a rotating bar pattern.
x • •• x x
x . x x x x • x •
• x • x . • x
a b
c d
Figure 12. Four bolts
172
Networks of Memory Elements
32
. . '.
24 "
.
'
•
'
~--,'
,+, t
,
.•.. -+
, ............
+---,' , A ..., •
. . '+/" ....." I
" /
¥
o~-----------------------------------
Ti me
-
Figure 13. Responses of discriminators to rotated versions of bolt (a).
x response of disc (a)
• response of disc (b)
+ response of disc (c)
... response of disc (d)
store are to be monitored and an item can only be removed after having
been identified and recorded by a pattern recognizer, the parts can be
precisely located in a viewing jig. Variation between different images
obtained from a given piece-part viewed by a camera arise mainly from
electronic noise and the effects of spatial quantization, which can be
reduced to acceptable, ifnot negligible, levels. In practice, a situation is
approached whereby one pattern defines a data class and-in the case of
a part store-possibly tens of thousands of piece-parts (data classes)
have to be recognized,
The assignment of a single discriminator to detect each piece-part
becomes impracticable under these conditions. The amount of storage
would be far greater than a direct library store of the patterns (the latter,
however, would be restricted to serial implementation) and the generali-
zation properties of the networks would not be exploited. The feasibility
of using a discriminator in a highly deterministic pattern environment,
to detect more than one data category, has therefore been examined.
173
Artificial Vision for Robots
Table 1
Training specification for a 4-discrirninator "2 in N" classifier
TRAINING SPECIFICATION
Discriminator 1 2 3 4
Training A A F F
Categories B E B E
C D D C
CATEGORY/DISCRIMINATOR ASSIGNMENT
A 1 and 2
B 1 and 3
C 1 and 4
D 2 and 3
E 2 and 4
F 3 and 4
175
Artificial Vision for Robots
100
...)(-
I
I
I
I
I
70 I
I
I
.J.
60
40
30
20
10
12 24 36 48 60 72 84 96
Number of n-tuple samples
176
Networks of Memory Elements
Summary
Networks of memory elements provide means of identifying a wide
range of binary patterns. A formal description of the data does not have
to be made available, as the recognition mechanism does not rely on any
established method of analysis of the data it is classifying. The recogni-
tion functions are derived from a known representative set of patterns of
each data class. The memory elements in the discriminators do not store
patterns (they assume the role of adaptive logic circuits which, after
training, detect, for a given category, allowable n-tuple subpatterns on
the input space). Therefore the storage per discriminator and operating
time are independent of the number of training patterns per category.
The technique can be applied to pattern detection problems, hence the
opportunity for establishing hitherto unknown relationships within a
data base can be exploited.
Finally, the physical implementation of a pattern recognizer based on
networks of memory elements is very flexible. Where slow pattern
processing is acceptable (of the order of seconds) a serial simulation on a
conventional computer or microprocessor based system can be consid-
ered. Hardware versions using RAMs or ROMs will provide faster
processing, the operating speed being dependent on the degree to which
the system has been structured for parallel processing.
Acknowledgements
The author wishes to thank Mr M. Arain of Brunei University in respect of the
medical pattern recognition data.
The financial support from the United Kingdom Science Research Council
(WISARD Project) is also acknowledged.
177
Artificial Vision for Robots
References
1. T. J. Stonham (1976): A classification system for alphanumeric characters based on
learning network techniques. Digital Processes. vol. 2, p. 321.
2. W. W. Bledsoe and I. Browning (1959): Pattern recognition and reading by
machine. Proc. Eastern Joint Compo Conf.• p. 225.
3. I. Aleksander (1970): Microcircuit learning nets--Hamming distance behaviour.
Electronic Letters. vol. 6, p. 134.
4. T. J. Stonham (1977): Improved Hamming distance analysis for digital learning
networks. Electronic Letters. vol. 6, p. 155.
5. I. Aleksander and T. 1. Stonham (1979): A guide to pattern recognition using random
access memories. Computers & Digital Techniques. vol. 2, p. 29.
6. L. Gupta (1980): Orientation independent pattern recognition with networks of
memory elements. M. Sc. dissertation, BruneI University, U.K.
178
Chapter 10
Computer Vision Systems for
Industry: Comparisons
I. ALEKSANDER
T. J. STONHAM
B. A. WILKIE
Department of Electrical Engineering & Electronics
BruneI University, England
Abstract This article is written for those who are not familiar
with the problems of automatic, computer-based pattern recogni-
tion. It surveys known methods in the light of opportunities
offered by silicon chip technology.
The article also discusses some of the design decisions made in the
creation of WISARD *, a fast pattern recognition computer built
at BruneI University. Its structure has been optimized for silicon
chip implementation.
·WISARD: Mnemonic for WLlkie, Igor and Stonham's Recognition Device, a computer with
a novel architecture, built with the support of the U.K. Science and Engineering Research
Council, to whom thanks are due.
179
Artificial Vision for Robots
180
Computer Vision Systems for Industry
181
Artificial Vision for Robots
182
Computer Vision Systems for Industry
184
Computer Vision Systems for Industry
Although this leaves out many other trends and strands, it is felt
that the computational properties of those missed out are similar to
the ones mentioned. The thrust of the discussion is aimed to show
that (c) is best suited to novel silicon-chip architectures and for this
reason was chosen as the design method for the "brain" of
WISARD.
185
Artificial Vision for Robots
Discriminant Functions
This method attempts to overcome the problem of having to store
many templates for each pattern class in the following way. A nu-
merical value called a weight is associated with each pixel. So if
186
Computer Vision Systems for Industry
187
Artificial Vision for Robots
188
Computer Vision Systems for Industry
tExperience has shown that n = 8 is more than adequate for most applications.
189
Artificial Vision for Robots
= 2 x 512 X 512 x 2n
n
2n
~ .5 X 106
n.
The central questions in all problems solved by using the n-
tuple method is "what is a suitable value of n?" Often this
can only be answered by experimentation, while in this case
it may be shown (by appealing to the case where the n-
tuples are placed in either horizontal or vertical lines more
than 12 pixels apart) that n = 2 is sufficient. It may also be
shown that n = 2 will deal with recognition of many verti-
cal and horizontal lines as well as the estimation of whether
a line is more horizontal than vertical. It may further be
shown that a random connection with n = 2 will suffice.
Hence, evaluating the last formula the resulting; storage is:
106 bits
512 x2 512
x 2 X 10- 6 seconds
~~ second
190
Computer Vision Systems for Industry
Special Architectures
Looking very briefly at the three methodological categories, we see
that they all could be made more efficient by means of special
architectures. Mask matching, for example, could be based on a
parallel system of registers and comparators. However, it may be
shown that the performance of a mask matching system is equiva-
lent to an n-tuple system with n = 1, and therefore not only limited
but lacking in generality. Therefore it would appear unwise to
launch into the design of such special architectures with mask-
matching methodology in mind.
Discriminant function methods rely heavily on calculation, and
one could envisage an array system where each cell performs the
necessary weight-multiplication calculation. In fact, such architec-
tures exist (Gostick [6] and Duff [7]) and the research may be
worth doing. However, the performance uncertainties of the
scheme mitigate against optimism.
The n-tuple scheme therefore remains not only as a good way of
organizing a conventional architecture but also a good candidate
for special architectures.
To achieve speed, one arranges the n-tuples to be addressed in a
semi-parallel way, where the degree of parallelism and other char-
acteristics such as window size, value of n etc., are under operator
control. WISARD was constructed with these advantages in mind
and details of design decisions will be published in due course.
191
Artificial Vision for Robots
Fixed Implementations
So far, the main concern here has been the structure of learning
systems which improve their performance by being "shown" a suit-
able training set. However, there are many applications where a
fixed, pattern recognition task has to be carried out over and over
again. The n-tuple method lends itself to a process of reduction
which implements either as logic gates, Programmed Logic Arrays
or Read-only Memories, the essential parts of the learned logic in a
learning system. This is not the place to dwell on details of this type
of procedure, except to realize that it is purely a mechanical and
easily computable procedure. For example, if the horizontal/verti-
cal problem had been solved with horizontal 2~tuples, with the two
elements of the 2-tuple l3 pixels apart, then it can be shown that for
the "horizontal" pattern class the memories could be replaced by
AND gates, whereas for the "verticals" case these could be replaced
by "Exclusive OR" gates. In fact, the designs of such logic is based
on the realization that the contents of the store associated with each
n-tuple is merely a truth table derived during a learning process.
For details of this see Aleksander [8 J.
References
1. Ullman, J. R. (1973) Pattern Recognition Techniques, Butterworth.
2. Batchelor, B. (1978) Pattern Recognition: Ideas in Practice.
3. Fu, K. S. (1976) Digital Pattern Recognition, Springer Verlag.
4. Rutovitz, D. (1966) "Pattern Recognition" J. Roy Stat Soc., Series B/4, p. 504.
5. Aleksander, I. & Stonham, T. J. "A Guide to Pattern Recognition Using
RAM's", lEE J. Dig. Sys & Computers, Vol. 2, No.1 (l979( + ».
6. Gostick, R. W. ICL Tech fair, 1979, Vol. 1, No.2, pp. 116-135.
7. Duff, K. J. B. Parallel Processing Techniques, in Batchelor (1978) (see above).
8. Aleksander, I. (1978) Pattern Recognition with Memory Networks, in Batchelor
(1978) (see above).
193
Artificial Vision for Robots
APPENDIX 1
Complete bibliography on Adaptive Pattern Recognition by the Brunei Team
ALEKSANDER, 1.
Fused Adaptive Circuit which Learns by Example, Electronics Letters,
August 1965.
ALEKSANDER, 1.
Design of Universal Logic Circuits, Electronics Letters, August 1966.
ALEKSANDER, I., NOBLE, D. J., ALBROW, R. C.
A Universally Adaptable Monolithic Module, Electronic
Communicator, July-August 1967.
ALEKSANDER, I., ALBROW, R. C.
Adaptive Logic Circuits, Computer JournaL May 1968.
ALEKSANDER, 1., ALBROW, R. C.
Pattern recognition with Adaptive Logic Elements in "Pattern
Recognition," lEE 1968.
ALEKSANDER, 1., ALBROW, R. C.
Microcircuit Learning Nets: Some test with hand-written numerals,
Electronic Letters, 1968, p. 408.
ALEKSANDER, 1., MAMDANI, E. H.
Microcircuit Learning Nets: Improved recognition by means of Pattern
feedback, Electronics Letters, 1968, p. 425.
ALEKSANDER, 1.
Brain cell to microcircuit, Electronics & Power, 16,48-51,1970.
ALEKSANDER, 1.
Some psychological properties of digital learning nets, Int. J. Man-
Machine Studies, 2, 189-212, 1970.
ALEKSANDER, 1.
Microcircuit learning nets: Hamming distance behaviour, Electronics
Letters, 6, 134, 1970.
ALEKSANDER, I., FAIRHURST, M. C.
Pattern Learning in humans and electronic learning nets, Electronics
Letters, 6, 318, 1970.
194
Computer Vision Systems for Industry
ALEKSANDER, I.
Electronics for intelligent machines, New Scientist, 49, 554, 1971.
ALEKSANDER, I.
Artificial Intelligence and All That, Wireless World, October 1971.
ALEKSANDER, I.
Action-Oriented Learning Networks, Kybernetes 4, 39-44, 1975.
ALEKSANDER, I.
Pattern Recognition with Networks of Memory Elements, B. Batchelor.
Plenum Publications, in 'Pattern Recognition: Ideas in Practice,'
London 1976.
ALEKSANDER, I.
Intelligent memories and "the Silicon chip, Electronics and Power Jour.
April 1980.
WILSON, M. J. D.
Artificial Perception in Adaptive Arrays, lEE Trans. on Systems, Man
and Cybernetics, Vol. X, No. I, 1980, pp. 25-32.
STONHAM, T. J.
Improved Hamming Distance Analysis for Digital Learning Networks,
Electronics Letters, Vol. 6, p. 155, 1977.
STONHAM, T. J.
Automatic Classification of Mass Spectra, Pattern Recognition, Vol. 7,
p. 235, 1975.
195
Artificial Vision for Robots
196
Chapter 11
Memory Networks for Practical Vision
Systems: Design Calculations
I. ALEKSANDER
Department of Electrical Engineering and Electronics
Brunel University, England
Introduction
For most, neural modelling in the design of image recognition sys-
tems has a historical beginning in 1943 with the work of McCulloch
and Pitts [1] and a quasi-death in 1969 with the condemnation by
Minsky and Papert [2] of Rosenblatt's development of such a
model into a "perceptron" [3]. The argument in [2] centres on the
blindness of such systems to geometrical and topological properties
of images for which the calculational powers offered by classical
computer structures are better suited. It was also seen that the
inferential power of some artificial intelligence (AI) programs
offered more to image recognition (or "scene analysis") than the
neural modelling approach.
This paper is a report on work which might, in some sense, have
appeared to be "heretical" in the late 1960s [4], as it suggested that
the neural model could be used to design systems which would
benefit from novel solid-state architectures. Indeed, the practical
recognition system called WISARD* [5] has been based on the use
of 32,000 random access memory (RAM) silicon chip devices as
"neurons" in a single-layer "neural net". This system will be
addressed indirectly in parts of this paper, while the central aim is to
·Wilkie's, ~tonham's and ~leksander's Recognition Qevice.
197
Artificial Vision for Robots
LjWjXj=E) (11-1)
where Xi is the ith, binary synaptic input of the neuron and Wi is its
associated synaptic "weight", -1 < Wj < 1 whereas () is a threshold
such that the neuronal axon fires if Lj Wi Xj ~ () and does not
otherwise.
If the input consists of N synapses, for instance 1 ~ j ~ N, j = 1,
2, 3 etc, equation (11-1) can be written in the more general form:
(11-2)
where X is the set ~ XI> X2, X3 ..• X2N ~ of binary patterns and W is
the weight vector (WI> W2,'" Wj ..• WNP, whereas ~O,I( indicate
firing, 1 or not, O. T indicates a vector transposition. A RAM, on
the other hand, performs the mapping:
(11-3)
Memory Networks for Practical Vision Systems
where M is a binary vector of the form (mo, mI ... mzw1), which is,
in fact, the content of the memory, where the "inputs" of the mem-
ory are N "address" terminals, and the output is a single binary
terminal. Thus, mj is a single-bit "word" addressed by the jth pat-
tern from X. One notes the following differences between the model
represented by equation (11-2) and that represented by equation
(11-3):
1. In the former, the logical function is stored as a vector W of N
continuous variables while in the latter it is stored in a binary
vector M of 2N - 1 variables.
2. The former can achieve only "linearly separable" functions,
which is a number much inferior (but hard to calculate [6] ) to
all the functions (2 N of them) that the latter can achieve, each
being characterized by a distinct M vector in the latter case.
None of this so far describes the way that the W or M values are
built up, that is, the way the model "learns". In most cases the pro-
cedure from the point of view of a user is much the same. The system
is "given" the desired output for specific Xj patterns present at the
input (from a "training" set). Some machinery needs to be put in
motion within the system to bring about the desired mapping. In
neural models using W this can be quite a complex procedure need-
ing many iterations [6]. In the latter models the bits mj of M are set
directly, with the "desired" response being fed to the RAM through
its "data input" terminal.
When "perceptrons" were central to image recognition work
it was thought that the differences mentioned in [1] gave the sys-
tem some of its "intelligence" since fewer examples of the
X ~. ~ 0, 1~ mapping need be shown to the system in order
for it to "fix" its function. Now, is this "fixing" an arbitrary pro-
cess which is as likely to worsen performance as to improve it? As
will be seen, it is the structure of the network that allows a measure
of control over the discriminatory powers of the system, and not so
much the generalization effects within the elements of the net. Thus,
the total lack of generalization within RAMs is of little conse-
quence; what is important is the way they are deployed in networks
and the way such networks are trained.
Technologically, W calls for storage of analog or highly multi-
valued data. This has led some experimenters to build machines in
which W was implemented with roomfuls of potentiometers or
199
Artificial Vision for Robots
Training is the process of presenting the system (at the overall input)
with examples of patterns in the ith class and "writing" into the ith
discriminator, logical Is at all of its K "data in" terminals. It is
assumed that all the memories are reset to 0 at the start of a training
run.
(11-5)
200
Memory Networks for Practical Vision Systems
d = JMAX[Z(~i)] (11-7)
if:
or else d = 4> .
d = JMAX[Z(~i)] (11-11)
if:
201
Artificial Vision for Robots
R=pXb (11-13)
202
Memory Networks for Practical Vision Systems
Q! R!
P(N, Q, R) = N! (Q-N)! N! (R-N)!
(11-14)
_ Q! (R-N)!
- R! (Q-N)!
~12 = ~l e ~2 (11-15)
P (~3) = P(N, S13, R) + P(N, S23, R) -P(N, J ~13 n ~23 t ' R) (11-16)
(11-17)
Note that this leads directly to the most likely response ofthe discri-
minator (say the jth one):
203
Artificial Vision for Robots
rj = P(~3) or, for a better way of relating rj to X3, this may be writ-
ten:
(11-18)
" _ (R-N)! [ R!
rJ(~l) - R! (R-N)! (11-20)
Multi-discriminator Analysis
Again only a simple case with two discriminators is considered in
order to demonstrate trends which occur in more complex situa-
tions. Assume that there are two discriminators giving responses rl
and r2 and that each is trained on one pattern, say ~I and ~2 respec-
204
Memory Networks for Practical Vision Systems
tively. The system is tested with a third pattern ~3. Then the
responses to the third pattern are in full notation: rl (~3) and r2 (~3) •
According to equations (11-7) and (11-8) the decision goes to:
d = JMAX[rl (~3), r2(~3)] (11-22)
f =MAX[rl(~3),r2(~3)] (11-23)
Without loss of generality let d = 1, f = rl(K3)' It is convenient to
redefine the confidence on d in equation (11-10) in the following
relative way:
(11-24)
(11-25)
Then, we have
(11-27)
205
Artificial Vision for Robots
Saturation
Starting with all RAMs at zero, the total number of logical Is in the
memory of a discriminator is a function of the number of training
patterns seen by that discriminator. Let the average relative number
of Is in a single RAM be:
M(t) = [number of Is] IZN (11-28)
Then it is expected that the most likely response to an arbitrary (or
noisy) input pattern ~a is:
(11-29)
This is called the degree of saturation of the discriminator. Then, it
is possible to define an absolute confidence factor CONCa), as the
confidence in relation to the response to ~a' Hence:
M(t)
CON(a) = 1 - MAX[Z(Kl)] (11-30)
1 1
fl.M(t) -- 2N
- • K • R[I-r(Kt)] (11-31)
= _1
2N
• N[I-rCKt)]
since K = ~ .
N
206
Memory Networks for Practical Vision Systems
N t
M(t) = N L [l-r(~j)l (11-32)
2 i =1
(11-33)
where Sst is the similarity between the tth pattern and ~s where es is
the pattern in T most similar to ~t. The table below indicates some
rates of saturation Ll M(t).
N=2 4 8
Sst/ R
0.9
0.5
.095
.375
.086
.234
.018
.03
! LlM(t)
207
Artificial Vision for Robots
problems:
1. position-free recognition;
2. coping with several shapes and occlusion;
3. finding the position and orientation of simple shapes.
To simplify matters, a particular example will be followed through-
out this section and that is that there are two specific shapes related
as shown in Figure 1.
512
512
Thus, set A is the set of all 128 x 128 squares that could appear in a
512 x512 frame and B a set of triangles with dimensions as shown.
The two shapes have the same area so that the possibility of mere
area measurement is removed. Further, note that the black/white
shading and the geometrically definable nature of the objects is
arbitrary. The system is capable equally of dealing with textured
objects which would defeat most geometrically orientated recogni-
tion techniques such as described in [13 1.
Position-free Recognition
For simplicity, it is assumed that the above two shapes can appear
only in their' 'stable" positions with respect to the lower edge of the
frame. This gives one orientation for square A and three for triangle
B. The first step in selecting N is deciding on the size of the training
set. This is guided by the fact that an arbitrarily placed object must
provide an area of overlap with an element of its approximate train-
ing set which is greater than the maximum overlap between the two
objects. A simple but tedious calculation shows that for this condi-
tion to hold the discriminators can be trained in a scan-like fashion
with a total of 81 patterns for discriminator A (the square) while B
requires 114 triangles with the long side horizontal and 242 with the
short side horizontal.
208
Memory Networks for Practical Vision Systems
(11-34)
where S 1 is the overlap area of the new training pattern with the one
above; S2 is the overlap area of the new training pattern with the one
to the left; and S12 is the common area between S1and S2 as before.
This expression can be summed for the entire scan from a know-
ledge of the areas of overlap which need to be calculated for each
stable position. Taking the triangle as the more stringent case, it is
interesting to tabulate M(t), the percentage saturation as a function
of N in a significant range:
N= 8 9 10 11
M(t) triangle = > 100% > 100% 80% 52%
Very similar results are obtained if both objects are triangles (Xtt),
with r t(Xtt) being the highest.
210
Memory Networks for Practical Vision Systems
The predicted confidence level is zero and this fact alone, despite the
much higher response, in comparison with r~st), allows detection of
the occlusion or quasi-occlusion as being a specific case of the pre-
sence of two objects.
Although in practice the number of discriminators might be
increased, which then would be trained separately for multiple
object classes and occlusions, the above results are indicative of the
discrimination that is available without taking this step.
211
Artificial Vision for Robots
References
1. McCulloch, W. S.; Pitts, W. H. A logical calculus of the ideas imminent in
nervous activity. Bulletin of Mathematical Biophysics 1943, 5, 15-33.
2. Minsky, M. L.; Papert, S. Perceptrons: An Introduction to Computational
Geometry MIT Press, Cambridge, 1969.
3. Rosenblatt, F. Principles of Neurodynamics Spartan, Washington, 1962.
4. Aleksander, I.; Albrow, R. C. Pattern recognition with adaptive logic circuits.
Proceedings oflEE - NPL Conference on Pattern Recognition 1968,42, 1-10.
5. Aleksander, I., Stonham, T. J., Wilkie, B. A. Computer vision systems for
industry. Digital Systems for Industrial Automation 1982, 1 (4).
6. Widrow, B. Generalisation and information storage in networks of adaptive
neurons. In Self-organising Systems eds Yovits et al; Spartan, New York, 1962.
7. Wilkie, B. A. Design ofa High-resolution Adaptive Pattern Recogniser Thesis,
BruneI University, in preparation.
8. British Patent Application No. 8135939 November, 1981.
9. Bledsoe, W. W.; Browning, I. Pattern recognition and reading by machine.
Proceedings East. J.C.C. 1959,225-232.
10. Ullman, J. R. Experiments with the N-tuple method of pattern recognition.
IEEE Transactions on Computers 1969, 1135.
213
Artificial Vision for Robots
214
Chapter 12
Emergent Intelligence from Adaptive
Processing Systems
I. ALEKSANDER
Department of Electrical Engineering and Electronics
BruneI University, England
Introduction
Artificial intelligence (AI) work in the 1970s has superseded earlier
research into the mechanisms of intelligent behaviour based on pat-
tern-recognizing neural nets. This largely originated in 1943 with the
McCulloch and Pitts model [1] of the neuron. The main reasons for
this demise are understood to be the following:
1. The "perceptron" limitations of neural nets were stressed by
Minsky and Papert in 1969 [2] . These largely centred on the
inability of some nets to carry out simple object-counting
operations.
2. The writing of programs that have a surface behaviour which,
if attributed to man, would be said to require intelligence, has
become a well-developed and understood science [3] .
3. The generality of the conventional computer structure dis-
courages a study of architectures incompatible with it.
4. Neural nets are programmed through exposure to data and in
this sense are' 'learning machines" . It is argued that the age of
the universe would be required to learn human-like intelligent
behaviour and this is best avoided by more direct program-
ming.
In this paper, it is argued that 1 is invalid as it applies only to a limi-
ted class of neural nets; 2 is one of many methodologies for studying
215
Artificial Vision for Robots
216
Emergent Intelligence from Adaptive Processing Systems
217
Artificial Vision for Robots
218
Emergent Intelligence from Adaptive Processing Systems
Response (R)
~
S
L Calculator Decision (D)
;J
N
Confidence (..1)
Teach = ~ 1, 2 ... c ~
R = } fl ,f2 ... fC t
R = MAX (R) = fM
D=M
.:l = fM - MAX (RIIIM)
for the net. The same connection may be used for all discriminators
(coherent net) or not (incoherent net). Such coherence may be
shown to have little effect on behaviour.
Training involves one discriminator per class to be distinguished.
For a representative set of images a logical 1 is fed to all the elements
of the discriminator corresponding to the known class of the current
image. When an unknown pattern is present at the input, the num-
ber of Is at each discriminator output (out of a total of C) is count-
ed. The first properties, proven and discussed in Aleksander [8],
should now be stated for completeness.
(12-1)
(12-2)
(12-3)
220
Emergent Intelligence from Adaptive Processing Systems
221
Artificial Vision for Robots
Response
Image
~-,--
Clocked
Delay
Teach
Classification
222
Emergent Intelligence from Adaptive Processing Systems
@s
Speck VDU
Image Response
M Processor 1+---""
,
, I
~
, I{
WISARD
SLN
Mixer
Camera 1 Camera 2
Composite
Image
%
100
80
Confidence
60
40
------ --------
20
0 -------------- Time
%
100
80
60 Left
Resoonse
40
20
0 Time
%
100
Right
80 Response
60
Time
40 -----
20
0
Speck No Speck Speck No
Left Right Speck
_______ S L N Behaviour
224
Emergent Intelligence from Adaptive Processing Systems
225
Artificial Vision for Robots
OIP Training
Patterns
--+
S
L
N
~------
•
With Short-term Memory, i/p removed
Feedback
226
Emergent Intelligence from Adaptive Processing Systems
Teach
Clocked
Delay
227
Artificial Vision for Robots
lows. Start with, say, a blank state b, and input i l • This, upon apply-
ing a training pulse, transfers i l to the state. rhe input is then
changed to i2 and so on. The end of the sequence again can be identi-
fied with a blank. This training sequence can be summarized as
follows:
1 i1 b i1
2 i2 i1 i2
3 13 h i3
n b b
diagrams are incomplete and the departure from the taught sequen-
ces remains unspecified. Acceptance in terms ofthis kind is seen as a
state sequence that is sympathetic to the input sequence. Clearly, a
departure from the stored sequence indicates non-acceptance.
229
Artificial Vision for Robots
the system, it will enter the JANE cycle, whereas for 1M2 it will enter
the JON cycle. Also given a repeated presentation of JANE, IMl
will be recreated as will 1M2 for JON.
High-Resolution Window
Window Position Control
Next
Window
Position
S
L
N
Low-Resolution Window
Decision/Inner Image
Image
230
Emergent Intelligence from Adaptive Processing Systems
Conclusion
Clearly, the progressive series described does not end at level 3.
231
Artificial Vision for Robots
References
1. McCulloch, W. S.; Pitts, W. H. A logical calculus of the ideas imminent in ner-
vous activity. Bulletin of Mathematical Biophysics 1943, 5, 115-133.
2. Minsky, M. L.; Papert S. Perceptrons: An Introduction to Computational
Geometry MIT Press, Cambridge, 1969.
232
Emergent Intelligence from Adaptive Processing Systems
233