Optodigital neural network classifier
Alain Bergeron
National Optics Institute
369 Franquet
SainteFoy, Quebec, Canada G1 P 4N8
Email:
[email protected]
Henri H. Arsenault, FELLOW 5PIE
Universite Laval
COPL
SainteFoy, Quebec, Canada G1 K 7P4
Abstract. A twolayer neural network architecture for carrying out optodigital classification operations is proposed. The optical neural network
implementation is suitable for pattern recognition and classification into
digital format. The neural network is based on an optical correlator, an
optoelectronic threshold, and an optodigital encoder. The module needs
only one laser light source, and the light propagation from the input to the
output is uninterrupted. The output is the class of the input pattern encoded in a digital format. Experimental results using different images and
classes are presented. The classification can be changed arbitrarily by
changing the encoding mask. © 1997 Society of Photo-Optical Instrumentation
Engineers. [500913286(97)020114J
Michel Doucet
Luc Veilleux
Denis Gingras
National Optics Institute
369 Franquet
SainteFoy, Quebec, Canada G1 P 4N8
1
Subject terms: optical neural networks; correlator; optoelectronic threshold; optodigital encoder.
Paper 25027 received Feb. 23,1997; revised manuscript received June 16, 1997;
accepted for publication June 17, 1997.
IntrOduction
2
The neural network is an emerging tool for pattern recognition, target tracking, and many imagerelated processing_
So far, many optical implementations have been oriented
towards associative memories 1•2 where the degraded input
image is reconstructed to provide better image quality.
Various results published on associative memories in the
literature are promising. However, this strategy supposes
that the prime user of the system is a human being, because
the improved output, an image, still needs to be analyzed
by a human operator. Automatic target recognition often
requires an entirely autonomous system, i.e., one where the
output can be directly used by a control system.
In order to achieve this kind of system, many optical
neural networks rely on electronics or software to process
information coming out of the optical neural network. This
approach was in great part imposed by the lack of reliable
nonlinear optical or optoelectronics operations. The result
is usually an optical system performing a large amount of
computation in a short period of time, but the whole process is slowed down by inputoutput electronic data transfer.
In this paper, an optodigital neural network classifier is
proposed. The system classifies an input object into a binary optical signal intended to be compatible with numerical system formats. The detection of the input object is
achieved with a correlator coupled to an optoelectronic
thresholding module,3 which allows uninterruptedopticalpath propagation. The threshold also permits one to reduce
the noise content in the output of the first neural layer. The
output signal from the first layer is then directly forwarded
to an optodigital encoder,4 which permits the optical conversion into a digital code. So all the massive processing
operations are performed with optics, and the output is a
compact digital sequence providing classification of the input. The system could also be used as an automated tracking system by an appropriate choice of masks and filters.
3134 Opt. Eng. 36(11) 31343139 (November 1997)
Classifier Architecture
The classifier architecture is composed of three main modules inserted between an input and an output layer. The
complete architecture of the optical neural network is
shown in Fig. 1. The input is displayed on a liquidcrystal
television screen illuminated with a collimated laser beam.
The first module, a correlator,5 performs the detection of
the object. The correlation peak output is processed with
Camera
CCD
Collimator
Input
Laser
Image
Fourier
I
Filter
II ar~ CD
Lens
Polarizer
Fourier
~1
=!=f==Len=s~
Beam
~
spliner
I~
Liquid
Crystal
".1
Fourier
LHfLe_n_S,_f'noiSVCI~1t-:; -:~; : ; _
00913286/97/$10.00
OPTOELECTRONIC THRESHOLD
Output
oflhe
flRST LAYER: INPUT
CORRELATOR
·DETECTION
correlator
Fig. 1 Architecture of the optodigital neural classifier.
© 1997 Society of PhotoOptical Instrumentation Engineers
Bergeron et al.: Optodigital neural network classifier
the optoelectronic threshold module. This layer performs a
nonlinear operation on the correlation performed by the optical correlator. The third module is a Dammann-gratingbased optodigital converter. The processed output of the
first layer, a delta-like function, is converted into a binary
optical code to be compatible with numerical systems.
The input image e(x,y) is injected into the system via a
transparent mask, although a liquid-crystal television
screen could also be used for real-time operation. The incident collimated beam is spatially modulated by the mask
and forwarded to the correlator. When a filter with impulse
response h I (x ,y) is used in the Fourier plane, the output is
given by the correlation:
where sex ,Y) represents the output and the subscript stands
for the layer number. The choice of the filter h1(x,y) is
important because it directly sets the input of the nonlinear
function module. The two-dimensional correlation plane
sex ,y) is then input to the optoelectronic thresholds. This
module implements the nonlinear function, and is used to
clean the output from the correlator. To clean the correlation, the image presented to the input is replicated by
means of a beamsplitter and forwarded to a CCD camera.
The resulting image is then mapped to a liquid-crystal television screen located further in the light propagation path.
The two-dimensional spatial topology of the replicated optical signal is preserved, and the correlation plane undergoes a modification corresponding to its intensity value.
The overall effect of the optoelectronic thresholding module is to attenuate small light intensities and to leave highintensity light levels unchanged. The threshold is set by the
CCD saturation level. If the intensity of a point incoming
on the camera is lower than the threshold value, the LCTV
will attenuate the beam. If the intensity is equal to or higher
than the CCD saturation level, the LCTV is fully opened
and the light beam passes unchanged. The overall threshold
value can be set with an attenuator located in the tapping
path. The output of the nonlinear processor will correspond
to a cleaned delta-like function. Mathematically this is expressed by
s(x,y) = 8(x -
Xo
+ xI'Y- Yo + YI)'
(2)
where X o and Yo are the positions of the object in the input
scene, whereas x I and YI are the filter locations in the spatial domain.
To process the information, one could analyze the output
with a computer. However, a new generation of spatial
light modulators promises rates of thousands of frames per
second, which would create a data flow bottleneck and
make the optical processor totally useless. In order to overcome this problem, it is possible to go one step further,
with an optodigital encoder as a third module. If the input
object is centered in the input scene, the position of the
output will be imposed by the filter position. Provided that
many filters are spatially multiplexed, the maximumcorrelation-peak position will correspond to the position of
the memory object that correlates the most with the input
object. So, if many objects are encoded in the filter plane,
the position of the maximum correlation peak will identify
the object at the input. Because each memory object is
different, only one maximum correlation peak is obtained.
The processed output of the nonlinear dimensional function
will be a single delta-like function whose position identifies
the object.
A Dammann-grating-based optodigital position encoder4
converts an input luminous point into a digital code corresponding to the position of the object. It takes advantage of
the fact that a correlation peak is narrow and that it can
represent a one when it has a high value, and a zero when
it has a low value. This is especially true if an optoelectronic threshold module, which binarizes the correlation
output, is used.
The second-layer input scene is duplicated by means of
a Dammann grating. The multiple images are then projected onto binary encoding masks, each of which encodes
one bit of the bit sequence representing the position of the
peak. If a point is on the right side of a mask, a one will be
coded; on the left side, a zero. If the light transmitted by
individual masks is collected on a detector, the thresholded
signals provided by the detectors constitute a digital representation of the peak position and are compatible with numerical systems.
The output of the system can thus be detected with the
help of single detectors (one detector behind each mask
subregion) instead of using standard CCD cameras. So the
second layer avoids the use of CCD cameras and likewise
the need for processing a whole image at the output. In fact,
only a fewer single detectors have to be used, so, instead of
waiting for each frame of a CCD camera and processing the
information with a computer, the answer can be obtained at
very high speed, because the few single detectors can be
driven at very high rate. To process an M X M pixel image,
only 2 log2M masks are needed, so a 512X 512 image will
require 16 output detectors. Since single detectors can run
much faster than cameras, the data flow bottleneck is eliminated by the optics, and the overall system is in fact only
limited by the SLM speed.
3
Learning
The learning phase of the digital classifier is simple, because there is no modification of weights and it is performed off line. The filter of the first correlation is composed of spatially multiplexed images h li(x ,y). In order to
provide a high energy efficiency and sharp peaks, a phaseonly filter is used. The filter is coded in order to give to the
correlator the following impulse response:
N
hl(x,Y)=L hli(x-xli,Y-Y/J·
(3)
1=1
The filter is easily obtained by Fourier-transforming
and keeping only the phase of the transform. The
choice of the position of each memory template, (Xli 'YI;) ,
sets the location of the peak. The classification can thus be
performed by changing the memory template position or by
changing the numerical encoding sequence. It should also
be noted that the encoding sequence can associate the same
coding to two different templates. So the classifier allows
one to classify similar objects in the same numerically encoded category or to join two different kinds of object into
hI (x ,Y)
Optical Engineering, Vol. 36 No. 11, November 1997 3135
Bergeron et al.: Optodigital neural network classifier
(a)
(c)
(b)
Fig. 2 Three examples of image used for the classification. The acid symbol (a) was not recorded in
the memory.
the same class. The system has a reduced translation invariance, since the object cannot translate by more than the
corresponding dimension of an encoding section in the encoding plane. Finally, if only one template is recorded in
the correlator, the system is transformed into a real-time
tracking system provided that a SLM such as a liquidcrystal television screen is used in the input plane.
The number of neurons can be easily obtained from the
dimensions of the support used to display the images. For
an M X M image, the number of input values is determined
by the size of the modulator used to display the image
(M 2 ). A 512X 512 modulator yields 2.62X 105 input values. Each point of the optoelectronic threshold plane performs a nonlinear operation on the correlation plane. Each
correlation point corresponds to set of multiplications, and
integrations. So the number of neurons is given by the
number of resolvable points in the optoelectronic threshold
(M 2 ). The neural classifier would show 2.62X lOS neurons
for a 512X 512-pixel modulator, each neuron processing
2.62 X 105 input values.
In the second layer, there cannot be more inputs than the
number of positions to be coded. This number is dependent
Fig. 3 Memory template used for the phaseonly filter generation in
the first layer.
3136 Optical Engineering, Vol. 36 No. 11, November 1997
on the size of the memory. For example, a 512X 512-pixel
reference memory with reference templates of 64x 64 pixels gives 64 inputs (N). These inputs are classified according to the binary mask of the second layer. As six sections
are required to encode 64 inputs (log2 N), there will be six
weights in the second layer for 64 templates in the memory.
Finally, six single detectors are used to detect the overall
results. The last nonlinear function can be considered to be
applied to the electrical signal provided by these six detectors. So the second layer really uses six neurons.
From this example it is to be noted that the input image
of 2.62X 105 pixels is completely processed by the system
and the result to be forwarded to a computer only takes 6
bits of space. This shows, beyond the analysis of the number of neurons, the real capabilities of the system. The system is only limited in speed by the SLM refreshing rate.
4
Experimental Results
The optodigital architecture of Fig. 1 was built. Both the
input image and the correlator filter were recorded on highresolution photographic film with a laser writer. The input
images were binary with 256 X256 pixels. The numerical
sequence was also recorded on the same type of film. For
these experiments, a 320X 200 pixel LCTV operated at a
video frame rate (30 frame/s) was used in the optoelectronic threshold. The extinction ratio was limited to around
2% by the LCTV contrast ratio (=50: 1). The Dammann
grating used has a 20-,um pitch with a diffraction efficiency
of approximately 65%.
Four images were recorded in the first-layer memory
(wheat, biological hazard, fire, and skull). Five images were
presented at the input. Two experiments were performed. In
the first one, the objects in the memory were each assigned
to a different encoded class. The acid symbol, not included
in the memory, did not cause any response. In the second
experiment, the skull and the biological hazard symbol
were assigned to the same class.
Figure 2 shows three input images: the acid, biological
hazard, and wheat symbols. Figure 3 shows the information
recorded in the filter. The correlations obtained are shown
in Fig. 4. The correlation of Fig. 4(a) was obtained with the
image of Fig. 2(a) as the input. In the same manner, Fig.
Bergeron et al.: Optodigital neural network classifier
(a)
(b)
(c)
Fig. 4 Correlation of the input images(a) acid, (b) wheat, (c) biological hazardwith the template of
Fig. 3. The maximum correlation peak depends on the input object, and its location depends on its
corresponding position in the filter template.
(a)
(b)
(c)
Fig. 5 Correlations produced with the skull (b) and the fire (c) symbols cleaned with the optoelectronic
threshold (referred to the third and fourth positions in the filters, see Fig. 2). Because the correlation
value of the acid symbol (a) is not high enough, no energy is transmitted to the second layer. The skull
and fire symbols can be assimilated to a delta function.
Zone for Zone for Zone for
the first the second the third
replica
replica
replica
Fig. 6 Binary pattern used for class encoding. The cleaned correlation is imaged on each of the three vertical bands. The two bands
on the left encode the class, and one on the right encodes the presence or absence of an object. The white zones are transmissive,
and the class can be read on a horizontal band from left to right.
4(b) corresponds to Fig. 2(b) and Fig. 4(c) to Fig. 2(c).
Because the acid symbol was not included in the memory,
the correlation with the acid symbol gave rise to only small
crosscorrelation values [Fig. 4(a)]. The wheat pattern produces a maximum correlation value at the first position,
whereas the biological hazard symbol produces a bright
peak at the second position. The skull and the fire symbol
produce the same kind of results. Figure 5 shows the output
for the skull and fire symbols (the third and fourth positions) after the optoelectronic threshold. In the acid-symbol
correlation plane, all the values vanish. The skull and the
fire produce clean delta-like functions at the third and
fourth positions.
The image is then replicated laterally by means of the
Dammann grating to reproduce three identical images. The
replicated output of the thresholded results is multiplied by
the encoding mask of Fig. 6. The results are presented in
Optical Engineering, Vol. 36 No. 11, November 1997 3137
Bergeron et al.: Optodigital neural network classifier
(a)
(b)
(c)
Fig. 7 Classification of the acid symbol (a) in class 000, the skull (b) in class 011, and the fire symbol
(c) in class 111.
Fig. 8 Second classification template. The codes for the third and the fourth position (vertical) are the
same and will include two different objects in the same class.
(a)
(c)
(b)
Fig. 9 Second classification for the acid (a) 000, the skull (b) 011, and the fire symbol (c) 011. Both
skull and fire symbols are now part of the same class.
Fig. 7. The skull peak, located in the third position, is multiplied with a dark mask, producing a zero, and a transparent mask, producing a one. The last transparent mask indicates the presence of an object. So the whole encoded class
is OIl. The fire symbol, located in the fourth position, is
encoded with three transparent masks, producing a class
111. If the encoding mask is changed for the one of Fig. 8,
the skull and the fire symbol will be encoded with the same
numerical sequence (011). The acid will still be encoded in
the class 000. These results are shown in Fig. 9. From those
results it should be clear that the system is fully invariant
under translation along the horizontal axis, whereas vertically an object can translate by only one-fourth the total
image height.
The overall system performs two Fourier transformations and three image multiplications. The capacity of the
system is only limited by the LCTV frame rate. For example, with a commercially available ferroelectric SLM at
1000 frames/s, the classifier performs 4.4X 109 operations/s
and 4.5 X 1013 interconnections/so This system, inspired by
3138 Optical Engineering, Vol. 36 No. 11, November 1997
neural networks, is inherently robust because crosscorrelations are eliminated in the first layer. An operation
range of 50% of the maximum value is also provided by the
binary encoding scheme. An intensity lower than 50% of
the maximum correlation value is set to zero, whereas an
intensity above 50% of the maximum correlation value is
set to one. The overall path is uninterrupted because the
system output directly comes from the laser.
5
Conclusion
An optodigital neural network classifier has been implemented. The experiments performed yielded correct classification for both objects included in the memory and not in
the memory. The classifier combines the classical possibilities of the optical correlator with the nonlinear capabilities
of the neural networks via the optoelectronic thresholder, to
the compatibility of digital optics with the optodigital encoder. Coupled with LCTVs and cameras, it can be made
versatile, and it can be modified easily to provide a real-
Bergeron et al.: Optodigital neural network classifier
time tracking system. This system could be used in many
applications because of its inherent compatibility with digital systems.
Acknowledgment
This research was supported by grants from the Natural
Sciences and Engineering Research Council of Canada
(NSERC), from the Fonds pour la formation des chercheurs
et I' aide it la recherche (FCAR) program of Quebec, and
from the JSTF program of the Canadian Ministry of External Affairs.
Michel Doucet received his BSc degree
in physics in 1988 from Universite du Quebec
Chicoutimi, Canada, and his MSc
degree in optics in 1991 from Universite
Laval, Quebec, Canada. He has been a
researcher at the National Optics Institute
since 1992, working on the development
of optical correlators, 3D laser measurement systems, sensors for plastic sorting,
and sensors related to machine vision systems. His research interests include optical information processing,
machine vision, pattern recognition, and speckle.
a
References
1. N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, "Optical implementation of the Hopfiel model," Appl. Opt. 24, 14691475 (1985).
2. E. Paek and D. Psaltis, "Optical associative memory using Fourier
transform holograms," Opt. Eng. 26, 428433 (1987).
3. A. Bergeron, H. H. Arsenault, E. Eustache, and D. Gingras, "Optoelectronic thresholding module for winnertakeall operations in optical neural networks," Appl. Opt. 33, 14631468 (1994).
4. A. Bergeron, H. H. Arsenault, and D. Gingras, "Dammanngratingbased optodigital position converter," Opt. Lett. 20, 18951897
(1995).
5. A. van der Lugt, "Signal detection by complex filtering," IEEE
Trans. Inf Theory ITlO, 139145 (1964).
Alain Bergeron received his BSc degree
in physics engineering at Universite Laval
in 1987. He completed his MSc in computer generated holograms in 1988 at the
same university. Until 1991 he worked in
research and development at the National
Optics Institute (NOI) on graded reflectivity
mirrors and fiber optic sensors. He then
undertook his PhD studies in optical implementation of neural networks in a joint
project of NOI, Universite Laval, and the
Communication Research Laboratory of Japan. Since 1994, he has
been a researcher at NOI and he is currently in charge of the processors and algorithms group in the Canadian Optical Computing
Consortium, OPCOM. His current fields of interest include pattern
recognition systems, optical computing, neural networks, and vision
systems.
Luc Veilleux received his DEC in physics
technology at the CEGEP of La Pocatiere,
Quebec, in 1992. Since 1992, he has been
a technologist at the National Optics Institute (NOI). He has been working on thin
film deposition, electronic control, and
guided wave device realization. He is currently working in the Digital and Optical
System Sector on projects related to optical correlators, 3D vision, and neural networks.
Denis Gingras received his BSc and MSc degrees in electrical engineering from Laval University in 1980 and 1984, respectively, and
his DrS in 1989 from the Ruhr-Universistat Bochum, Germany. His
work has been on signal and image processing. From 1989 to 1990,
he was a STA fellowship award recipient as a guest researcher at
the Communication Research Laboratory in Tokyo, Japan. He is
currently director of the Digital and Optical Systems Sector at the
National Optics Institute, in Quebec City, Canada. His current research interests include signal and image processing, neural networks, and artificial vision. Dr. Gingras is a member of IEEE, INNS,
EURASIP, and SPIE.
Henri H. Arsenault is a professor in the
Department of Physics at Laval University
in Quebec City, Canada. He is the author
of more than 100 publications in optical
and digital information processing, pattern
recognition, optical computing, and artificial intelligence. He is a fellow of the Optical Society of America and of SPIE, the
International Society of Optical Engineering. He has filled a number of functions in
optical societies. He is coeditor of the book
Optical Processing and Computing, is coauthor of the book An Introduction to Optics in Computers, and has contributed chapters to
various books.
Optical Engineering, Vol. 36 No. 11, November 1997 3139