Laminart PDF
Laminart PDF
Laminart PDF
www.elsevier.com/locate/visres
Abstract
A laminar cortical model of stereopsis and later stages of 3D surface perception is developed and simulated. The model describes
how initial stages of monocular and binocular oriented filtering interact with later stages of 3D boundary formation and surface
filling-in in the lateral geniculate nucleus and cortical areas V1, V2, and V4. In particular, it details how interactions between layers
4, 3B, and 2/3A in V1 and V2 contribute to stereopsis, and clarifies how binocular and monocular information combine to form 3D
boundary and surface representations. Along the way, the model modifies and significantly extends the disparity energy model.
Neural explanations are given for psychophysical data concerning: contrast variations of dichoptic masking and the correspondence
problem, the effect of interocular contrast differences on stereoacuity, PanumÕs limiting case, the Venetian blind illusion, stereopsis
with polarity-reversed stereograms, da Vinci stereopsis, and various lightness illusions. By relating physiology to psychophysics, the
model provides new functional insights and predictions about laminar cortical architecture.
Ó 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Cortical model; Depth perception; Stereopsis; Surface perception; Cortical layers; Lightness perception; Monocular–binocular
interactions
0042-6989/03/$ - see front matter Ó 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0042-6989(03)00011-7
802 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
Fig. 2. (a) The same-sign hypothesis: only edges that have the same
contrast polarity can be stereoscopically fused to produce a percept of
depth. (b) As it is traversed, the boundary of the ellipse changes its
contrast polarity relative to the background, thereby illustrating the
need for object boundaries to be represented in a contrast-invariant
manner. See text for details.
3D surface capture. The present article considers only The 3D LAMINART model overcomes this limi-
the filling-in of achromatic lightness. tation using identified cells in laminar circuits, and
How does the brain ensure that lightness fills-in at resimulates all the data previously simulated by
only the correct depths? Grossberg (1994) proposed McLoughlin and Grossberg (1998), in particular the
properties of this boundary–surface interaction that data on contrast variations of the correspondence
helped to explain many data about 3D figure-ground problem and dichoptic masking. In addition, the new
perception. Here, one of these properties proved essen- model can simulate still more psychophysical data than
tial to explain 3D surface percepts that arise in stere- its non-laminar predecessors, including: the Venetian
opsis research. Namely, visible surfaces arise in cortical blind illusion, four different examples of da Vinci ste-
area V4 only if they are enclosed by connected bound- reopsis (Gillam, Blackburn, & Nakayama, 1999;
aries (see Fig. 4). In particular, a rectangular connected Nakayama & Shimojo, 1990), stereopsis with opposite-
boundary may be composed of one vertical binocular contrast stimuli, the effect of interocular contrast dif-
boundary, one vertical monocularly viewed boundary, ferences on stereoacuities and various lightness illusions.
and two horizontal boundaries that code no disparity In so doing, it demonstrates more of the roles that
information. This connected boundary can support a boundary and surface representations play in depth
visible surface percept at the depth corresponding to the perception. The 3D LAMINART model also makes
binocular boundary if all other constraints are satisfied. neurophysiological predictions, including that there
Such a boundary can contain the filling-in process. exist: (1) In V1 cells that obey the ratio constraint on
However, if the vertical binocular boundary is missing, binocular fusion. The model proposes that some bin-
as it would be at a different depth plane, then the total ocular simple cells in layer 3B obey an obligate property
boundary is not connected, and a visible percept will whereby they can be activated only if they receive ap-
not be evident at that depth because filling-in can dis- proximately equal inputs from both left and right eye
sipate out of the boundary gap. This example illus- monocular simple cells in layer 4. The constraints that
trates how the monocular–binocular interface problem determine cell firing depend upon the ratios of left and
(item (4) above), and thus the correspondence problem right monocular cell activity. This property explains the
(item (3) above), influence visible percepts of 3D sur- ratio constraint on stereoscopic fusion that is illustrated
faces. in Fig. 10 below. The obligate property is predicted to be
The present model refines aspects of the FACADE caused by a balance between excitatory inputs from
model of 3D vision and figure-ground perception layer 4 monocular simple cells and inhibitory inputs
(Grossberg, 1994, 1997). The FACADE model included from layer 3B inhibitory interneurons. The interneurons
a (non-laminar) model of stereopsis and 3D planar are themselves activated by layer 4 monocular simple
surface perception (Grossberg & McLoughlin, 1997; cells and mutually inhibit each other, in addition to in-
McLoughlin & Grossberg, 1998) that modified and hibiting the binocular simple cells. (2) In V2 cells that
generalized the disparity energy model of stereopsis solve the correspondence problem using a disparity fil-
(Ohzawa et al., 1990). This generalization incorporated ter. (3) In V4 a filling-in mechanism that completes
rectification prior to binocular combination, absent visible 3D surface representations within connected
from the original disparity energy model, which has boundaries. These results were briefly reported in Howe
recently received independent experimental support and Grossberg (2001).
(Cumming, 2002; Read, Cumming, & Parker, 2002). It
also proposed that positional shifts between left and
right eye cortical inputs code disparities, rather than
phase shifts, which has also received experimental sup- 2. Model description
port (Tsao & Livingstone, in press). The FACADE
model also incorporated a disparity filter to help solve The model consists of four component networks
the correspondence problem (Howard & Rogers, 1995) which process: V1 binocular boundaries, V1 monocular
as well as mechanisms for filling-in 3D surface percepts boundaries, V2 boundaries, and V4 surfaces. For a
from 3D boundary representations. In particular, the mathematical description, the reader is referred to Ap-
FACADE model explained the fact that stereoscopic pendix A. A description of the neurophysiological and
fusion is generally impossible when the left and right eye anatomical evidence that supports all the model pro-
stimuli differ too much in contrast (Smallman & McKee, cessing stages is found in Section 4.1. In order to reduce
1995). However, in the form developed by Grossberg the computational load, the model currently considers
and McLoughlin, the FACADE model could not ex- only horizontal and vertical contours and five depth
plain why stereoscopic fusion is always possible in the planes. Even so, the model includes approximately
special case where each eye sees only a single bar, re- 185,000–333,000 cells depending on the simulation. Al-
gardless of the contrast difference of the two bars though model cells and cells in vivo will be clearly dis-
(McKee et al., 1994; Smallman & McKee, 1995). tinguished in the text, model cells will be referred to by
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 805
physiological labels because their properties so closely These proposed interactions between layers 4, 3B and
match those found in vivo. 2/3A are consistent with neurophysiological data, as
detailed in Section 4.1, and instantiate key operations of
the disparity energy model (Ohzawa et al., 1990), which
2.1. V1 binocular boundaries itself is strongly supported by physiological evidence; for
a review (see Ohzawa (1998)). As discussed in Section 1,
The network that processes the V1 binocular the need for preprocessing before the site of binocular
boundaries is located in the V1 interblob region and combination, such as that carried out by layer 4 of our
includes the binocular cells in layers 3B and 2/3A. It model, has recently been demonstrated by Cumming
carries out stereoscopic fusion of vertical contours, but (2002) and Read et al. (2002), who showed that pre-
not of horizontal contours, which it assumes cannot be processing was required to explain subtleties in physio-
stereoscopically fused. This network implements the logical data not captured by the original disparity
same-sign hypothesis (see Section 1, item (1)). As shown energy model.
in Fig. 1, inputs to the left and right eyes activate
monocular simple cells in layer 4 of the V1 interblob 2.2. V1 monocular boundaries
regions. Left and right eye monocular simple cells con-
jointly activate binocular simple cells in layer 3B whose The network that processes the V1 monocular
depth sensitivity is determined by the relative retinal boundaries comprises the monocular cells in layers 4, 3B
disparity of the layer 4 monocular cells that project to and 2/3A of the V1 interblob region. It is similar to the
them. The model implements the same-sign hypothesis binocular boundaries network, but represents both
by assuming that only layer 4 simple cells with the same horizontal and vertical boundaries whereas the binocu-
contrast polarity project to a single layer 3B simple cell. lar boundaries network represents only vertical bound-
These layer 3B simple cells are therefore selective for aries. Binocular boundary cells preferentially represent a
binocular disparity and a prescribed contrast polarity. particular depth plane, but this is not true of monocular
Binocularly fused vertical contours that occupy corre- boundary cells. How, then, do monocular and binocular
sponding points on the two retinas are seen as a single boundaries interact? A proposed solution of this mon-
boundary in the fixation plane, whereas vertical con- ocular–binocular boundary interface problem assumes
tours that are displaced relative to each other are seen as that the outputs of the monocular boundary cells are
a single boundary either in front of or behind the fixa- added to all depth planes in cortical area V2 along their
tion plane, depending on their displacement, as detailed respective lines-of-sight (see Section 1, item (4) and Fig.
in Appendix A (Eq. (A.10)). 3). Appendix A, Eqs. (A.12) and (A.13), describe this
There are also inhibitory cells in layer 3B. As is de- process quantitatively.
scribed in Appendix B, these cells ensure that the bin- As noted in Section 1 (item (4)), the V2 disparity filter
ocular simple cells act like the ‘‘obligate cells’’ of Poggio helps to solve the monocular–binocular interface prob-
(1991): The activity of such a binocular simple cell is lem, as well as the correspondence problem, by elimi-
suppressed by these inhibitory interneurons if the mag- nating most of the monocular representations that are
nitudes of the left and right eye inputs differ too much not at the correct depth. This previously unexpected
(see Section 1, item (2)). In particular, these obligate property of the disparity filter is crucial to understand-
cells respond to binocular, but not to monocular, stim- ing the monocular–binocular interactions described in
ulation. These obligate cells help to solve the corre- this paper. It can best be understood by studying the
spondence problem by ensuring that only similar stimuli model simulations in Section 3.
in the left and right eye retinal images are stereoscopi-
cally fused. 2.3. V2 boundaries
The next processing stage implements contrast-
invariant boundary detection (see Section 1, item (1)). The disparity filter network that processes V2
Layer 3B simple cells that are sensitive to the same po- boundaries is located in the V2 pale stripes (see Section
sition and disparity, but opposite contrast polarities, 1, item (3)). The V1 binocular boundaries network at-
pool their signals at layer 2/3A complex cells. These tempts to match every vertical edge in one retinal image
complex cells therefore respond to both contrast polar- with every other nearby vertical edge in the other retinal
ities and so can generate three-dimensional object image that has the same contrast polarity and approxi-
boundaries even if the objectÕs contrast polarity, with mately the same magnitude of contrast. Fig. 3 shows the
respect to the background, reverses as the boundary is resultant matches if each eye sees two bars. V1 makes
transversed. In summary, the two layers 3B and 2/3A, four matches. Only the two in the fixation plane are
acting together, can realize the same-sign hypothesis and correct matches. The other two are false matches be-
also begin to compute object boundaries in front of tween retinal images that do not correspond to the same
textured backgrounds. object. Such false matches are known to occur in V1 but
806 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
less readily in V2 (Bakin, Nakayama, & Gilbert, 2000; and that interact through on-center, off-surround cir-
Cumming & Parker, 2000). As they typically do not give cularly symmetric receptive fields. The present model
a veridical depth perception, these false matches must be utilizes such model neurons, which are analogous to
suppressed. those found in the LGN, as summarized in Section 4.1
Fig. 3 illustrates how the disparity filter works. To and defined in Appendix A (Eqs. (A.1)–(A.3)). These
encourage unique matching, the model assumes that model neurons are excited by spots of light applied to
each neuron inhibits all other neurons that share either the center of their receptive fields but are inhibited by
of its monocular inputs; that is, shares one of its mon- those applied outside this central region. The excitatory
ocular lines-of-sight. This is represented by the solid and inhibitory components of the receptive fields are
lines between neurons in Fig. 3. This rule on its own balanced so that cell responses are attenuated to spa-
could ensure that only two of the four initial matches in tially uniform or slowly varying stimulation. The cells
Fig. 3 survive, but it could not guarantee that it is the therefore respond preferentially to luminance borders.
false matches that are suppressed. A second form of At a later processing stage, these border signals propa-
inhibition ensures this. This inhibition acts across depth gate throughout those surface regions that are com-
and within cyclopean position. It is represented by the pletely enclosed by boundaries to complete the lightness
dashed line between each neuron with every other neu- representation. Propagation occurs via a filling-in pro-
ron that is directly in front of or behind it. These two cess that is akin to a diffusion process, as defined in
types of inhibition work together to ensure that the two Appendix A (Eqs. (A.17)–(A.23)). Propagating signals
matches in the fixation plane typically win, thereby can dissipate across space unless the region is sur-
solving the correspondence problem. It should be rounded by a connected boundary (see Fig. 4). As in
stressed that the disparity filter operates only on verti- Grossberg (1994), the present model proposes that the
cally oriented cells, as the model assumes that horizontal final stage of filling-in occurs in V4, where visible surface
boundaries cannot be fused and therefore cannot give percepts are predicted to occur. Section 3 summarizes
rise to false matches. It will be shown in Section 3.1.4 how such a filling-in process, when confined by the 3D
how this filter is also able to explain how, in some sit- boundaries of the present model, can explain da Vinci
uations, double matching can occur, as in PanumÕs stereopsis, as well as many aspects of lightness percep-
limiting case, an example of stereopsis that many pre- tion, thereby linking the modelÕs explanations of surface
vious models (e.g., Grimson, 1981; Marr & Poggio, depth and lightness.
1976) could not explain.
Boundaries help give rise to 3D surface percepts in This section summarizes simulations that predict how
the manner summarized in Section 1 (item (5)). Al- monocular and binocular information interact in the
though our main goal is to explain percepts of surface visual cortex. We will consider, in turn, contrast varia-
depth, percepts of surface lightness are also simulated to tions of dichoptic masking, stereoacuity, PanumÕs lim-
show that our development of cortical depth perception iting case, contrast variations of the correspondence
mechanisms are consistent with simulations in related problem, the Venetian blind illusion, stereopsis with
modeling studies of surface brightness and lightness opposite-contrast stimuli, da Vinci stereopsis, and the
(e.g., Grossberg & Kelly, 1999; Kelly & Grossberg, Craik–OÕBrian–Cornsweet lightness illusion. The main
2000). Such a unified set of simulations supports the key aim of these simulations is to illustrate how the modelÕs
FACADE prediction that the same process fills-in four component networks interact with each other to
surface lightness, color, and depth (Grossberg, 1994). explain the percepts reported by human subjects. These
Previous simulations of lightness often focused on explanations constitute testable predictions for linking
computing the relative lightnesses of surface regions (but psychophysical percepts to their cortical mechanisms.
see Grossberg, Mingolla, & Williamson, 1995). Once Like the model diagram shown in Fig. 1, the simulation
relative lightness is estimated, then absolute lightness figures should be read from the bottom up, with the
can be computed in many cases by assuming that the bottom two rows representing the input and the V1
lightest surface of the group is white and calculating the boundary representations, the next two rows represent-
absolute lightnesses of all other surfaces relative to that ing the V2 boundary representations and the top row
one (Wallach, 1976). representing the V4 surface representations. Further-
Grossberg and Todorovic (1988) computed the rela- more for each of the top four rows, depth increases from
tive lightness of two surfaces by first discounting the left to right, with the middle plot representing the fixa-
effects of a spatially non-uniform illumination (see Sec- tion plane, the two leftmost plots representing the two
tion 1, item (5)). Discounting the illuminant can be near depth planes and the two right plots representing
achieved by neurons that obey cell membrane equations the two far depth planes.
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 807
3.1. Dichoptic masking ond plot of this row. The vertical boundaries in this
disparity plane are consequently stronger than those in
3.1.1. The basic paradigm the other four depth planes, which they then suppress
In the basic paradigm considered by McKee et al. via the line-of-sight inhibition of the V2 disparity filter
(1994), the contrast threshold for the detection of a low (cf., Fig. 3) to give the final V2 boundary representations
contrast bar presented to one eye was found to increase shown in the fourth row. Notice, in particular, that all
radically when a high contrast bar was presented to the horizontal boundaries have survived since the disparity
other eye. Furthermore it was not necessary for the two filter only inhibits vertical boundaries. In contrast to the
bars to be at retinal correspondence. The model expla- horizontal boundaries, only the vertical boundaries in
nation of this percept is as follows. The high contrast the near disparity plane, represented by the second plot
bar is presented to the left eye and the low contrast bar of this row, have survived. As explained in Section 2.4,
to the right, as shown by the middle two plots in the lightness signals, originating at the location of the
bottom row of Fig. 5. The outer two plots of the bottom boundaries, propagate throughout this disparity plane.
row show the simulated monocular boundary repre- Because the near disparity plane contains a connected
sentations. Since their contrasts differ greatly, these two boundary that completely encloses a bar-shaped region,
bars cannot be stereoscopically fused in V1 due to the these boundaries can contain the filling-in of the light-
inhibitory circuit in layer 3B, as explained in Section 2.1. ness signals to cause the bar-shaped surface percept
This accounts for the absence of V1 binocular bound- shown in one plot of the top row. The other filling-in
aries representations in the second row. As the monoc- signals dissipate and do not give rise to a conscious
ular boundaries do not yet have a depth associated with surface percept (see Fig. 4). Because the bars in the left
them, they are added to all depth planes in V2 along and right eye inputs are perceived to occupy the same
their respective monocular lines-of-sight, as shown in position in 3D space, the high contrast bar masks the
the third row of this figure. In this row, each of the five low contrast bar. In summary, this simulation shows
plots represent a different depth, with those on the left how the left and right inputs can be fused to form a
representing depth planes nearer than the fixation plane single percept in V4 even though their contrasts are so
and those on the right the converse. As we move across different that they cannot be fused by the binocular cells
this row the allelotropic shifts (cf., Fig. 3) cause the left in V1.
monocular boundaries to be added to locations further The fact that the V2 disparity filter can fuse bars
to the right in successive depth planes, while the right whose contrasts are too different to be fused in V1 has
monocular boundaries are added to locations further to ramifications for stereoacuity. In particular Schor and
the left. The left and right monocular boundaries coin- Heckmann (1989) noted that increasing the contrast of
cide in the near disparity plane represented by the sec- the image equally in both eyes increases stereoacuity,
but increasing the contrast of the image in just one eye
decreases stereoacuity. The model explanation is simply
that in the first case fusion could occur in V1 but in the
second only in V2. Since V1 cells in general have smaller
receptive fields than V2 cells that correspond to the same
region of visual space, the model is therefore able to
explain why stereoacuity is greater in the first case than
in the second.
is why the disparity filter of the present model encour- the second plot of this row, which is caused by the in-
ages unique matching but does not enforce it. The model appropriate fusion of the right bar of the left input with
will now be shown to simulate all the data from the the left bar of the right input. As usual, the monocular
Smallman and McKee (1995) extensive study of the boundaries are added to all depth planes in the V2 dis-
correspondence problem even though it does not enforce parity filter along their respective monocular lines-
the uniqueness constraint. In so doing, it clarifies the of-sight, as shown in the third row of this figure. In
crucial role that monocular–binocular interactions play addition, the binocular bar representations are also
in these percepts. added to V2, coinciding with the middle bar representa-
tion in the second plot and both bar representations of
the fourth plot. Those vertical boundaries that receive
3.2.1. Control experiment
binocular input, being stronger, quickly inhibit via the
Smallman and McKee (1995) initiated their study by
V2 disparity filter all other vertical boundaries that
performing a control experiment in which each eye was
share their lines-of-sight and only receive monocular
presented with two bars, all four bars having the same
input. The two sets of vertical boundaries in the fourth
high contrast. Subjects reported seeing two identical
plot, both of which receive binocular input, cooperate
bars, both in the far disparity plane. Fig. 8a shows the
via the disparity filter to inhibit the vertical boundaries
corresponding model simulation.
of the middle bar representation of the second plot,
Since the left input is displaced leftwards relative to
which also receive binocular input. This happens be-
the right input, the vertical edges of the two bars fuse in
cause the middle bar boundaries receive binocular in-
the far disparity plane in V1, as is shown by the fourth
puts that share monocular inputs with their inhibitors.
plot of the second row. In addition to this, there is a
The final V2 boundary representations are shown in the
false match in the near disparity plane of V1, shown in
fourth row. The model correctly predicts that subjects
see both bars in the far disparity plane. In summary, this
simulation shows how the line-of-sight inhibition of the
V2 disparity filter ensures that the false match that is
present in V1 (second plot of the second row) is elimi-
nated. The V2 disparity filter is therefore the reason why
the model can solve the correspondence problem.
Fig. 8b shows a more complicated version of the
correspondence problem. Once again the false matches
are shown in the second plot of the second row and the
correct matches in the fourth plot. Since there are more
correct matches than false matches, the latter are again
suppressed by the former via the line-of-sight inhibition
of the V2 disparity filter. This simulation shows that the
model can be applied to more general versions of the
correspondence problem than that shown in Fig. 8a. In
Section 3.3 the model is applied to a particularly com-
plex version of the correspondence problem known as
the Venetian blind illusion. These simulations of the
correspondence problem, the Venetian blind illusion
(Figs. 11 and 12) and da Vinci stereopsis (Figs. 14 and
15), among others, clarify how the model will generalize
to natural images by showing how it deals with a variety
of potentially confusing matches within the fusion
range.
Fig. 11. Simulation of the Venetian blind effect (Howard & Rogers,
1995).
Fig. 12. (a) Simulation of one component of the Venetian blind effect. Fig. 13. Simulation of stereopsis with a polarity-reversed stereogram.
(b) Simulation of the other component. See text for details.
the right edge of the white bar fuses with the left edge of situations are often caused by each eye viewing the
the black bar to form a boundary representation in the world from a slightly different position, leading to par-
far disparity plane of V1; see the fourth plot of the tial occlusions where part of a scene is visible to only one
second row. Also, the left edge of the white bar fuses eye. The model clarifies how the percept of depth caused
with the right edge of the black bar to form a boundary by such stimuli can be explained in terms of monocular–
representation in the near disparity plane of V1; see the binocular interactions.
second plot of the second row. Unlike Fig. 13a, there are
now two boundary representations in V1. The monoc-
3.5.1. Stimuli of Nakayama and Shimojo (1990)
ular boundaries are added to V2 along their respective
In this set of experiments, a thick bar was presented
lines-of-sight. The binocular boundaries are also added
to both eyes and a thin bar only to the right eye, as
to V2, overlapping with the middle vertical boundaries
shown in the first row of plots of Fig. 14a. Subjects re-
in the second and fourth plots of the third row. These
ported perceiving the thin bar behind the thick bar, at a
two boundaries, being stronger, suppress all other ver-
depth that was consistent with the right edge of the thin
tical boundaries via the recurrent inhibition of the V2
bar of the right input being fused with the right edge of
disparity filter. However, because they are equally
thick bar of the left input.
strong, they cannot suppress each other. The final
The model explanation is as follows. The vertical
boundary representations are shown in fourth row. No
boundaries of the thick bar are registered binocularly in
regions are completely enclosed by boundaries and so
the near disparity plane in V1, as shown by the second
the model predicts that there will be no stable depth
plot of the second row, and the right edge of the thin bar
percepts.
is matched with the right edge of the thick bar to be
This prediction is correct in as far as it goes, in that
registered binocularly in the far disparity plane in V1, as
subject do not achieve any stable surface percepts, but
shown by the fourth plot. The left edge of the thin bar is
in practice unstable surface percepts may form if sub-
registered only monocularly because it cannot be mat-
jects experience binocular rivalry. Describing binocular
ched with either of the edges of the left input. As usual,
rivalry is beyond the scope of our simulations. However,
the monocular boundaries are added to all depth planes
it has been qualitatively modeled in Grossberg (1987) in
a manner that is consistent with the present model
simulations.
The key point here is that whether or not an anti-
correlated stereogram induces a stable depth percept
depends on the vergence of the subject. One vergence
position enables the visual system to match the left and
right inputs only in a single way. Other vergence posi-
tions lead to two binocular boundaries in V1, and
consequently no stable depth percepts in V4, as dem-
onstrated by Fig. 13b. Subjects may also be able to use
attention to choose between the two possible ways of
matching the left and right inputs. Section 4.4 shows
how the model may be extended to incorporate atten-
tional effects.
Regardless of whether subjects use vergence or at-
tention to make sure their visual system can only fuse
the left and right inputs in one way, as more elements
are included in the left and right inputs, the harder it is
to ensure unambiguous fusion. The model suggests that
this is the reason why complex anticorrelated stereo-
grams (i.e., those anticorrelated stereograms that con-
tain many separate elements) induce little or no depth
perception whereas simple anticorrelated stereograms
do (Howard & Rogers, 1995; Julesz, 1971).
in the V2 disparity filter along their respective lines-of- Todorovic (1988) to explain several lightness illusions. It
sight. This is why two thin bar representations and one is therefore claimed that the present model can explain
thick bar representation are seen in all disparity planes the same large set of lightness illusions. (See Grossberg
of the third row, with the slight complication that in all and Kelly (1999), Grossberg and Pessoa (1998), Kelly
cases the thick bar representation overlaps with at least and Grossberg (2000) and Pessoa, Mingolla, and Neu-
one of the two thin bar representations. The V1 binoc- mann (1995) for other articles that explain additional
ular boundary representations are also added to the V2 lightness and brightness data using this filling-in mech-
disparity filter, overlapping with the leftmost vertical anism.)
boundary in the second plot and the rightmost vertical The Todorovic–OÕBrian–Cornsweet effect (COCE) is
boundary in the fourth plot. These vertical boundaries, simulated to illustrate this claim. In Grossberg and
being stronger, inhibit, via the recurrent line-of-sight Todorovic (1988), the COCE was simulated using only a
inhibition of the disparity filter, all the other vertical monocular input. The simulation herein uses inputs to
boundaries that share any of their lines-of-sight. This both eyes and shows that the binocular model can also
means that they do not inhibit those vertical boundary simulate this percept. The stimuli are shown in the
representations originating from the two monocularly middle two plots of the bottom row of Fig. 16. Both eyes
viewed edges of the right input because these vertical see the same stimulus, which consists of two abutting
boundaries do not share any of their lines-of-sight. regions of the same uniform lightness separated by a
The final V2 boundary representations are shown in the lightness cusp. Subjects report perceiving both regions
fourth row. V4 fills-in surfaces in those regions that are as having uniform lightness, with the left region ap-
completely enclosed by boundaries, resulting in the pearing darker than the right.
percept of a thin near bar and a thin far bar, as reported The model explains the COCE as follows. The input
by human subjects (Gillam et al., 1999). is binocularly fused to form three vertical binocular
In the previous display, at least one edge of each re- boundaries in the nearest disparity plane of V1, repre-
gion could be binocularly fused. In contrast, in Fig. 15b sented by the leftmost plot of the second row. As always,
the middle bar of the right eye stimulus is perceived both the V1 binocular and monocular boundaries are
entirely monocularly. added to the V2 disparity filter, with the monocular
The model simulation is as follows. The left eye sees a boundaries being added to all depth planes along their
single bar while the right eye sees three separate bars. The respective lines-of-sight, as shown by the plots in the
left edge of the bar of the left input again fuses with the third row. The vertical boundaries in the nearest dis-
left edge of the leftmost bar of the right input to form a parity plane are stronger because they receive both
binocular boundary in the second plot of the second row. monocular and binocular inputs. They therefore inhibit
Similarly, the right edge of the bar of the left input again the vertical boundaries in the other disparity planes via
fuses with the right edge of the rightmost bar of the right the recurrent line-of-sight inhibition of the disparity
input to form a binocular boundary in the fourth plot of filter. The final V2 boundaries are shown by the plots in
the second row. Again the monocular boundaries are the fourth row. The boundaries in the nearest disparity
added to V2 along their respective lines-of-sight, as plane confine the V4 diffusion of the lightness signals
shown by the third row. The binocular V1 boundaries
are also added to V2. The binocular boundary in the
second plot of the second row overlaps with the first
vertical boundary in the second plot of the third row.
Similarly, the binocular boundary of the fourth plot of
the second row overlaps with the last vertical boundary
in the fourth plot of the third row. The surviving V2
boundaries are shown in the fourth row.
Only those boundaries that completely enclose a re-
gion can contain the lightness signals that originate at
the location of the boundaries, and so only these regions
give rise to surface percepts in V4. The model therefore
correctly predicts that three surfaces will be seen, each at
a different depth as reported experimentally (Gillam
et al., 1999).
The filling-in mechanism utilized by the model V4 Fig. 16. Simulation of the Craik–OÕBrian–Cornsweet lightness illu-
simulations is equivalent to that used by Grossberg and sion. See text for more details.
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 817
that originate at the edges of the regions. Those lightness of its cells are simple (Hubel & Wiesel, 1968; Schiller,
signals originating from the left side of the cusp are Finlay, & Volman, 1976).
darker than those originating from the right side. This As discussed in Section 2.1, the model assumes that
lightness difference is propagated, by the V4 filling-in polarity-specific binocular matching occurs in layer 3B.
mechanism, throughout the respective regions, causing This is consistent with observations that a significant
the left region to appear uniformly darker than the right, proportion of layer 3B comprises simple cells (Dow,
as shown by the leftmost plot of the top row. 1974), that layer 3 contains a significant number of
binocular cells (Hubel & Wiesel, 1968; Poggio, 1972),
and that projections to it can be independent of ocular
4. Discussion dominance (Katz, Gilbert, & Wiesel, 1989).
The model suggests that binocular layer 2/3A cells
4.1. Supporting physiological and anatomical data pool responses from layer 3B cells of both contrast po-
larities so that they can represent the boundaries of
This section shows that all the relevant physiological objects whose contrast polarity, with respect to the
and anatomical data of which we are aware support the background, changes as the boundary is transversed. In
model. The model does not, however, consider cortical keeping with this suggestion, it is known that layer 3B
areas V3, V3A and MT, even though there is evidence projects throughout layer 2/3A (Callaway, 1998), and
that these areas play a role in depth perception (e.g., that layers 2 and 3 each contain significant numbers of
Backus, Fleet, Parker, & Heeger, 2001). These areas binocular and complex cells (Poggio, 1972).
were not needed to simulate the modelÕs targeted data. The model further suggests that there is a group of
The function of area V3A appears to be particularly cells in layer 2/3A and 3B that respond only to binoc-
controversial, with studies suggesting that it is variously ular, and not to monocular, stimulation. Such ‘‘obligate
concerned with relative disparity (Backus et al., 2001), cells’’ are known to exist in macaque V1 (Poggio & Fi-
saccades (Nakamura & Colby, 2000a, 2000b) and pre- scher, 1977; Smith, Chino, Ni, & Cheng, 1997), with
hensile hand movements (Nakamura et al., 2001). As a about 40% of tuned excitatory neurons being obligatory
further complication, there is some evidence that the (Poggio & Talbot, 1981), including almost all ‘‘tuned
function of macaque V3A differs from that performed zero’’ neurons (Poggio, 1991). Obligate cells do not
by human V3A (Tootell et al., 1997). appear to be as prevalent in cat (Anzai, Bearse, Free-
When the model diagram in Fig. 1 is compared to the man, & Cai, 1995).
list of data below, it can be seen that the model makes The model predicts that all these interactions occur in
predictions concerning brain physiology and anatomy the V1 interblob regions, which is in keeping with ob-
beyond what is known. One prediction is that there is an servations that V1 interblobs are highly selective for
inhibitory circuit in V1 which causes the binocular cells orientation but relatively unselective for color (Merigan
in layers 3B and 2/3A not to respond if the inputs to the & Maunsell, 1993).
left and right eyes differ too greatly in contrast. Another
is that there is a disparity filter in V2 that employs line-
4.1.2. V1 monocular boundaries
of-sight inhibition. A third prediction is that there is a
The model suggests that the V1 monocular bound-
surface filling-in mechanism that leads to visible per-
aries are formed by a process that is a simplification of
cepts and is located in V4 (among other places; see
that which forms the V1 binocular boundaries. Conse-
Grossberg, 1994). This section should be read in con-
quently, much of the above data applies equally to the
junction with Fig. 1, which interprets each model stage
monocular boundaries network. Additional support
anatomically.
for this network comes from observations that layer 3
(Hubel & Wiesel, 1968; Poggio, 1972) and layer 2
4.1.1. V1 binocular boundaries
(Poggio, 1972) of V1 each comprise a large proportion
Consistent with the model, the LGN contains circu-
of monocular cells.
larly symmetric on-center, off-surround receptive fields
(Kandel, Schwartz, & Jessell, 2000, pp. 529). LGN le-
sion studies have shown that the parvocellular, but not 4.1.3. V2 boundaries
the magnocellular, pathway is critical for fine stereopsis The model assumes that the V2 boundaries are lo-
(Schiller, Logothetis, & Charles, 1990a, 1990b). Just as cated in the V2 pale stripes. This is consistent with ob-
V1 layer 4 is the major recipient of this parvocellular servations that the V2 pale stripes receive the major
input in vivo (Callaway, 1998), it is also the input layer projection from the V1 interblob regions, while receiving
of model V1. Also, in accord with the model, layer 4 is no significant projection from the V1 blob regions, and
known to output to layer 3B, but not to layer 2/3A, of are highly orientationally selective (Roe & TsÕo, 1997),
V1 (Callaway, 1998), a large proportion of it is mon- while also containing a complete map of visual space
ocular (Hubel & Wiesel, 1968; Poggio, 1972), and many (Roe & TsÕo, 1995).
818 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
The model is further consistent with data that V2 is that, at least for certain stimuli, binocular neurons that
mainly binocular (Hubel & Livingstone, 1987; Roe & are tuned to different spatial frequencies will respond to
TsÕo, 1997), is mainly disparity-sensitive (Poggio & different false matches. Consequently they argue that
Fischer, 1977; von der Heydt, Zhou, & Friedman, false matches can be eliminated simply by pooling the
2000), contains many complex cells (Hubel & Living- responses of several binocular neurons, each tuned to a
stone, 1987), receives input into layer 4 (Rockland & different spatial frequency. Although they demonstrated
Virga, 1990) and outputs to V4 (Xiao, Zych, & Fell- the proficiency of their model when it was presented
eman, 1999), which itself is highly selective for disparity with white noise stimuli, it is not clear how their model
(Merigan & Maunsell, 1993). In addition, the V2 pale could be extended to other stimuli, in particular those
stripes are disparity-selective (Peterhans, 1997). situations were contrast affects the perceived solution of
According to the model, an important function of V2 the correspondence problem (Section 3.2) or where
is to suppress false matches by utilizing a disparity filter. monocular information contributes to depth perception
This is consistent with observations that cells readily (Section 3.5).
exhibit false matches in V1 (Cumming & Parker, 2000), Another way to solve the correspondence problem is
but not in V2 (Bakin et al., 2000). to utilize a disparity filter that implements the unique-
matching rule, which states that any given feature in one
4.1.4. Surfaces retinal image is matched at most with one feature in the
Surfaces are built up through interactions between other retinal image (Grimson, 1981; Marr & Poggio,
the V1 blobs, the V2 thin stripes, and V4, consistent with 1976; for a review see Howard & Rogers, 1995, pp. 42–
the fact all these regions are linked by major projections 43). As discussed in Section 1, this rule fails in PanumÕs
(Livingstone & Hubel, 1984; Xiao et al., 1999), that the limiting case (Gillam et al., 1995; McKee et al., 1995;
V2 thin stripes are the least orientationally selective area Panum, 1858).
of V2 (Peterhans, 1997) and contain a complete map of This failure caused Grossberg and McLoughlin
visual space (Roe & TsÕo, 1995). (1997) and McLoughlin and Grossberg (1998) to design
a disparity filter that encouraged unique matching
4.2. Comparison with other theories and models without enforcing it. Their model forms the foundation
for our own and can simulate much of the same data,
One of the most popular explanations of monocular– including most of the dichoptic masking and the corre-
binocular interactions is the ecological optics hypothesis spondence problem data. Their model also makes an
of Nakayama and Shimojo (1990). This hypothesis incorrect psychophysical prediction: that if each eye sees
suggests that visual systems attempt to interpret un- a single bar, then the ratio constraint on stereoscopic
paired image points in terms of occlusion. For example, fusion (Smallman & McKee, 1995) ensures that fusion
in Fig. 14, both eyes see a thick bar but only the right will occur only if the magnitudes of the contrasts of the
eye a thin bar. According to the ecological optics hy- two bars do not differ too greatly. This is inconsistent
pothesis, the visual system interprets these stimuli by with experimental findings which indicate that the ratio
assuming that the thin bar is located behind the thick constraint does not apply to this special case (McKee
bar at the exact distance that would cause the thick bar et al., 1994; Smallman & McKee, 1995).
to hide it from the left, but not from the right, eye. The present model refines the Grossberg and
While this hypothesis is consistent with the percepts McLoughlin model to correct this short-coming. In
evoked by the stimuli in Figs. 14 and 15, it cannot ex- particular, for the purposes of the disparity filter, the
plain the percept evoked by the stimuli of Fig. 13, be- Grossberg and McLoughlin model assigned all unfused
cause this stimulus cannot be explained in terms of boundaries to the fixation plane, whereas the present
occlusion. If we wish to understand the response of the model adds unfused boundaries to all fixation planes
visual system to all possible stimuli, not just the ones and then lets the V2 disparity filter eliminate boundary
that can be interpreted in terms of occlusion, then it is representations as necessary. As explained in Section
necessary to offer a mechanistic account that can deal 3.1.1, this procedure allows in the special case where
with a broader data set in a unified way, as the present each eye sees only a single bar the two bars to be bin-
model does. ocularly fused regardless of their contrast difference. The
One of the most successful mechanistic models of present model has simulated all the data considered by
stereopsis is the disparity energy model (Ohzawa et al., McLoughlin and Grossberg (1998), specifically the data
1990). However, this model does not solve the corre- on contrast variations of dichoptic masking and the
spondence problem in that it may match vertical con- correspondence problem, and has also simulated addi-
tours in the two retinal images that correspond to tional data including the Venetian blind illusion, four
different objects. Fleet, Wagner, and Heeger (1996) have different examples of da Vinci stereopsis (Gillam et al.,
proposed how the disparity energy model could be ex- 1999; Nakayama & Shimojo, 1990), stereopsis with op-
tended to avoid this problem. In their paper they note posite-contrast stimuli, the effect of interocular contrast
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 819
differences on stereoacuity and the Craik–OÕBrian– both the correspondence problem and the monocular–
Cornsweet lightness illusion. Furthermore, unlike its binocular interface problem by utilizing its disparity fil-
predecessor, it has been mapped onto known cortical ter. The correspondence problem arises because V1
cells and laminar circuits within cortical areas V1, V2 sometimes incorrectly fuses contours that belong to
and V4. different objects. The monocular–binocular interface
The model presented in this article and the Grossberg problem is caused because the V1 monocular bound-
and McLoughlin model both instantiated key aspects of aries, not having a definite depth association, are initially
FACADE theory. Another model, which is also a sim- added to all depth planes. Finally, the surface network
plified version of FACADE theory, was used to explain includes cells in V4, the V2 thin stripes and the V1 blobs.
a series of experiments on the McCollough effect, an It is necessary because it is surface percepts, not
orientation-sensitive, long-lasting, chromatic after-effect boundary percepts, that subjects report in the experi-
(Grossberg, Hwang, & Mingolla, 2002). Unlike the mental studies considered by this paper and also because,
present model, the binocular cells in the McCollough as illustrated by all of the simulations, not all boundaries
effect model did not exhibit the ‘‘obligate’’ property in give rise to a percept of depth.
that they responded to monocular inputs, albeit less If anything, the model in Fig. 1 is too simple to ex-
strongly than to binocular inputs. Although, the plain all data about depth perception. Fortunately, the
McCollough effect model did not make use of obligate analysis in this article has opened a clear path to gen-
cells, inserting such cells into the model would not dis- eralize the model, as illustrated below.
rupt its simulations. These obligate cells would merely
be unnecessary. Similarly, the addition of non-obligate
binocular cells into V1 of the present model, while un- 4.4. Generalizing to natural images, 3D boundary com-
necessary, would not reduce its explanatory power. In pletion and 3D attention
particular, such an addition would merely increase the
number of false matches that occur in V1. These false One of the long-term goals of this modeling work is
matches would be eliminated by the V2 disparity filter, to extend the present model so that it can be applied to
in the manner outlined in Section 2.3, and so would not natural images. The simulations already done show that
contribute to the final percept. Taken together, these the model can resolve a wide range of potentially con-
two models help explain the differing roles of obligate fusing false matches. There remain, however, two im-
and non-obligate binocular cells in the broader context pediments that the model first needs to overcome.
of FACADE theory and help to functionally explain First, the present model can represent only 3D planes
why both obligate and non-obligate cells have been that are flat and perpendicular to the observer. To ana-
found experimentally to exist (Poggio, 1991). lyze natural images, the model needs to be extended to
represent slanted and curved surfaces in 3D. A parallel
4.3. Model robustness and complexity line of research has begun to demonstrate how it can be
consistently generalized to explain such data (Grossberg
The model is robust in the sense that the absolute & Swaminathan, 2003; Swaminathan & Grossberg,
values of the model parameters can be varied over large 2001).
ranges without disrupting its explanations of data; only Second, the present model shows how boundaries can
their values relative to each other are important. Fur- be formed using bottom-up inputs from the outside
thermore, there is considerable scope when choosing world. It does not, however, indicate how horizontal
individual parameter values, since no single parameter interactions can be used to complete these boundaries
proves to be critical in any simulation. where pixels are missing either due to internal brain
The model is minimally complex in the sense that each imperfections, such as the blind spot in the retina, or due
of its four interacting networks, V1 binocular bound- to incomplete contours in external inputs, whether due
aries, V1 monocular boundaries, V2 boundaries, and V4 to noise, occluding surfaces, spatially discrete texture
surfaces, are essential. The V1 binocular boundaries elements, illusory contour stimuli, or even missing pixels
network is needed to explain stereopsis and the contrast in impressionist paintings. Nor does it clarify how these
ratio constraint observed in stereoscopic fusion (Small- circuits can develop, be modified by learning, or mod-
man and McKee, 1995). The V1 monocular boundaries ulated by top-down attention. This omission can be
network plays a role in explaining da Vinci stereopsis overcome as follows.
(Gillam et al., 1999; Nakayama & Shimojo, 1990), A parallel line of modeling has developed quantita-
dichoptic masking (McKee et al., 1994), contrast varia- tive explanations and simulations of how processes of
tions of the correspondence problem (Smallman and perceptual development, learning, grouping, and atten-
McKee, 1995) as well as some examples of stereopsis tion may be achieved by laminar cortical circuits
with opposite-contrast stimuli (e.g., Howe & Watanabe, (Grossberg, 1999a, 1999b; Grossberg et al., 1997; Gross-
in press). The V2 boundaries network is needed to solve berg & Raizada, 2000; Grossberg & Williamson, 2001;
820 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
A.1. LGN
dXij
L=R
L=R L=R L=R L=R
X
L=R
¼ eXij þ ða Xij ÞIij Xij gpqij Ipq ;
dt p6¼i;q6¼j
ðA:1Þ
Table 1
The allelotropic shift (s) is the amount that the left and right monocular contours must be displaced to form a single fused binocular contour. It
depends on the disparity. It is zero for matches in the fixation plane because these matches are between contours at retinal correspondence
Disparity (d)
V. Near disparity Near disparity Zero disparity Far disparity V. Far disparity
Allelotropic shift (s) )8 )4 0 +4 +8
Fig. 3 illustrates the allelotropic shift and shows that a left monocular contour needs to be shifted more to the right for matches that are further from
the observer, whereas a right monocular contour needs to be shifted in the opposite direction.
H =V ;L=R;þ=
h iþ h iþ
At steady-state the membrane potentials, Bij , þ Qijd
V ;R;þ=
þ Qijd
V ;R;=þ
; ðA:10Þ
of the layer 3B monocular cells are given by:
H =V ;L=R;þ=
Bij
H =V ;L=R;þ= þ
¼ 2½Sij
; ðA:7Þ where c1 and a and are constants (0.29, 6) representing
the rate of decay of the membrane potential and the
where the multiplicative factor of 2 compensates for the strength of the inhibition. Appendix B proves that the
fact that the monocular simple cells receive inputs from exact values of a and c1 are not critical. Under mild
only one eye whereas the binocular simple cells, dis- constraints on these parameters, the binocular cells act
cussed in the next section, receive input from both eyes. like the ‘‘obligate cells’’ of Poggio (1991), responding
only when their left and right inputs are approximately
A.4. Layer 3B inhibitory cells equal in magnitude. Eq. (A.10) was solved at equilib-
rium, using the theorem described in Appendix B to
The layer 3B inhibitory cells, all responding only to speed up the simulations. Fig. 19 shows that the calcu-
vertical boundaries, receive excitatory input from layer 4 lated and simulated values are essentially identical.
and inhibitory input from all other inhibitory interneu-
rons that correspond to the same position and disparity.
Their cell membrane potentials, Qijd
V ;L=R;þ=
, are deter- A.6. Layer 2/3A monocular and binocular complex cells
mined at equilibrium by the following equations:
h V1 layer 2/3A consists of both monocular and bin-
V ;L;þ= 1 h V ;L;þ= iþ V ;R;þ=
iþ
ocular complex cells. These complex cells pool the cell
Qijd ¼ SðiþsÞj b Qijd
c2 membrane potentials of monocular/binocular layer 3B
h iþ h iþ
V ;R;=þ V ;L;=þ simple cells of like orientation and both contrast polar-
þ Qijd þ Qijd ðA:8Þ
ities at each position. At steady-state their membrane
H =V ;L=R=B
potentials, Cijd , are given by:
and
h h iþ h iþ
1 h V ;R;þ= iþ iþ H =V ;L=R=B H=V ;L=R=B;þ H =V ;L=R=B;
V ;R;þ= V ;L;þ= Cijd ¼ Bijd þ Bijd : ðA:11Þ
Qijd ¼ SðisÞj b Qijd
c2
h iþ h iþ
V ;L;=þ V ;R;=þ
þ Qijd þ Qijd ; ðA:9Þ A.7. V2 layer 4
where c2 and b are constants (4.5, 4) representing the In V2, virtually all cells are binocularly driven
decay rate of the membrane potential and the strength (Hubel & Livingstone, 1987), consistent with the model
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 823
Table 2
A.8. V2 layer 3B
The inhibition coefficients mdd 0
V. Near Near Zero Far V. Far
Analogous to layer 4, at steady-state the cell mem-
H
brane potentials, Nijd , of the horizontally oriented layer V. Near – 3 5 3 2
Near 0.4 – 2.8 1.5 0.4
3B cells are given by: Zero 0.2 1.3 – 1.3 0.2
h iþ Far 0.4 1.5 2.8 – 0.4
H H
Nijd ¼ Jijd : ðA:14Þ V. Far 2 3 5 3 –
Each neuron is inhibited by every other neuron that shares either of its
V2 layer 3B contains the disparity filter (cf., Fig. 3) in inputs by an amount that depends on the disparities of the inhibited
which each vertically oriented cell is inhibited by every and inhibiting neurons (cf., Fig. 3). See text for further discussion of
other vertically oriented cell that shares either of its parameter choices.
824 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
h iþ f
H=V H =V
Tijd ¼ 50 Nijd ; ðA:16Þ Ppqijd ¼ H V H V
;
1 þ hðTði0:5Þðjþ0:5Þd þ Tði0:5Þðjþ0:5Þd þ Tðiþ0:5Þðjþ0:5Þd þ Tðiþ0:5Þðjþ0:5Þd Þ
h iþ h iþ Case 1
dQVijd;R;þ
¼ c2 QVijd;R;þ þ SðisÞj
V ;R;þ
b QVijd;L;þ V ;L;þ
0 < SðiþsÞj V ;R;þ
; SðisÞj ; V ;L;
SðiþsÞj V ;R;
; SðisÞj < 0;
dt
h iþ h iþ V ;L;þ V ;R;þ
b SðiþsÞj b SðisÞj
þ QVijd;L; þ QVijd;R; ; ðB:3Þ 6 V ;R;þ ; and 6 V ;L;þ : ðB:11Þ
c2 SðisÞj c2 SðiþsÞj
h iþ h iþ
dQVijd;L; Under these conditions, (B.4) and (B.5) imply that, for
¼ c2 Qijd þ SðiþsÞj b QVijd;R;
V ;L; V ;L;
sufficiently large t,
dt
h iþ h iþ
þ Qijd V ;R;þ
þ Qijd V ;L;þ
; ðB:4Þ QVijd;L; ; QVijd;R; 6 0: ðB:12Þ
V ;L;þ V ;R;þ
By (B.12), and recalling for this case 0 < SðiþsÞj ; SðisÞj ,
and (B.2) and (B.3) can be approximated at large times:
h iþ h iþ
dQVijd;R; dQVijd;L;þ h iþ
¼ c2 QVijd;R; þ SðisÞj
V ;R;
b QVijd;L; ¼ c2 QVijd;L;þ þ SðiþsÞj
V ;L;þ
b QVijd;R;þ ðB:13Þ
dt dt
h iþ h iþ
þ QVijd;L;þ þ QVijd;R;þ ; ðB:5Þ and
dQVijd;R;þ h iþ
V ;L;þ
where SðiþsÞj V ;R;þ
and SðisÞj are monocular simple cell activities ¼ c2 QVijd;R;þ þ SðisÞj
V ;R;þ
b QVijd;L;þ : ðB:14Þ
dt
that are defined by (A.4) and (A.6), 0 < c1 and
Eqs. (B.13) and (B.14) are used to draw the phase-plane
0 < b < c2 < a < c2 þ b: ðB:6Þ plot shown in Fig. 20a. Eq. (B.11) implies:
Under these conditions, the system converges expo- V ;R;þ
SðisÞj V ;L;þ
SðiþsÞj
nentially to the unique equilibria specified by (1)–(4) 0< 6 ðB:15Þ
provided that the inputs are constant. c2 b
and
V ;L;þ V ;R;þ V ;L; V ;R;
(1) if 0 < SðiþsÞj ; SðisÞj ; SðiþsÞj ; SðisÞj < 0;
V ;L;þ V ;R;þ
V ;L;þ V ;R;þ
b=c2 6 SðiþsÞj =SðisÞj ; and b=c2 6 SðisÞj V ;R;þ V ;L;þ
=SðiþsÞj , SðiþsÞj SðisÞj
0< 6 : ðB:16Þ
c2 b
then at equilibrium
1 a From these equations and where the nullclines intersect
BVijd;B;þ ¼ 1 V ;L;þ
SðiþsÞj V ;R;þ
þ SðisÞj ; ðB:7Þ
c1 c2 þ b the axes in Fig. 20a, it follows that the nullclines must
V ;L;þ
(2) if 0 < SðiþsÞj V ;R;þ
; SðisÞj V ;L;
; SðiþsÞj V ;R;
; SðisÞj < 0; and cross each other at a point where
V ;R;þ V ;L;þ
SðisÞj =SðiþsÞj < b=c2 , 0 6 QVijd;L;þ ; QVijd;R;þ : ðB:17Þ
then at equilibrium This allows us to remove the rectification in (B.13) and
(B.14) which in turn allows us to perform local analysis
1 a
BVijd;B;þ ¼ V ;R;þ
SðisÞj þ 1 V ;L;þ
SðiþsÞj ; ðB:8Þ on the linear system
c1 c2
V ;L;þ V ;R;þ V ;L; V ;R; V ;L;þ
(3) if 0 < SðiþsÞj ; SðisÞj ; SðiþsÞj ; SðisÞj < 0; and SðiþsÞj = c2 b
J¼ : ðB:18Þ
V ;R;þ
SðisÞj < b=c2 , b c2
Proof. First note by (A.6) that, out of the four possible The equilibrium point is:
V ;L;þ V ;R;þ V ;L; V ;R;
inputs SðiþsÞj , SðisÞj , SðiþsÞj , and SðisÞj , at most only two
can be positive. This greatly simplifies the subsequent 1 V ;L;þ
QVijd;L;þ þ QVijd;R;þ ¼ V ;R;þ
SðiþsÞj þ SðisÞj : ðB:20Þ
analysis. c2 þ b
826 S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829
and
QVijd;L;þ > 0: ðB:25Þ
Eqs. (B.24) and (B.25) imply that (B.13) and (B.14) can
be rewritten as:
dQVijd;L;þ
¼ c2 QVijd;L;þ þ SðiþsÞj
V ;L;þ
ðB:26Þ
dt
and
dQVijd;R;þ
¼ c2 QVijd;R;þ þ SðisÞj
V ;R;þ
bQVijd;L;þ : ðB:27Þ
dt
Linear analysis of
c2 0
J¼ ðB:28Þ
b c2
Case 5 and
V ;L;þ
V ;L;þ
0 < SðiþsÞj V ;R;
; SðisÞj ; V ;L;
SðiþsÞj V ;R;þ
; SðisÞj < 0; SðiþsÞj
QVijd;L;þ ¼ : ðB:47Þ
V ;L;þ V ;R; c2
b SðiþsÞj b SðisÞj
6 V ;R; ; and 6 V ;L;þ : ðB:36Þ Using these equations, and recalling that for this case
c2 SðisÞj c2 SðiþsÞj V ;L;þ V ;R;þ
0 < SðiþsÞj and SðisÞj < 0, we see that at large times (B.1)
By analogy with Case 1, in particular (B.12) and (B.20), is approximated by:
at equilibrium:
dBVijd;B;þ a V ;L;þ
¼ c1 BVijd;B;þ þ SðiþsÞj
V ;L;þ
SðiþsÞj : ðB:48Þ
QVijd;L; ; QVijd;R;þ 6 0 ðB:37Þ dt c2
By analogy with Cases 5–7: Grossberg, S. (1999a). How does the cerebral cortex work? Learning,
attention and grouping by the laminar circuits of visual cortex.
BVijd;B;þ < 0: ðB:60Þ Spatial Vision, 12, 163–186.
Grossberg, S. (1999b). The link between brain learning, attention, and
Case 9 consciousness. Consciousness and Cognition, 8, 1–44.
Grossberg, S., Hwang, S., & Mingolla, E. (2002). Thalamocortical
V ;L;þ
SðiþsÞj V ;R;þ
; SðisÞj V ;L;
; SðiþsÞj V ;R;
; SðisÞj ¼ 0: ðB:61Þ dynamics of the McCollough effect: Boundary-surface alignment
through perceptual learning. Vision Research, 39, 3796–3816.
By inspection, (B.2)–(B.5) imply that at equilibrium, Grossberg, S., & Kelly, F. (1999). Neural dynamics of binocular
brightness perception. Vision Research, 39, 3796–3816.
QVijd;L;þ ; QVijd;R;þ ; QVijd;L; ; QVijd;R; ¼ 0: ðB:62Þ Grossberg, S., & McLoughlin, N. (1997). Cortical dynamics of three-
dimensional surface perception: Binocular and half-occluded scenic
images. Neural Networks, 10, 1583–1605.
By (B.62) and (B.1) implies that at equilibrium,
Grossberg, S., Mingolla, E., & Ross, W. D. (1997). Visual brain and
visual perception: How does the cortex do perceptual grouping?
BVijd;B;þ ¼ 0: ðB:63Þ
Trends in Neuroscience, 20, 106–111.
Grossberg, S., Mingolla, E., & Williamson, J. (1995). Synthetic
As we have now considered all possible cases, the the- aperture radar processing by a multiple scale neural system for
orem is proved. boundary and surface representation. Neural Networks, 8, 1005–
1028.
Grossberg, S., & Pessoa, L. (1998). Texture segregation, surface
References representation, and figure-ground separation. Vision Research, 38,
2657–2684.
Anzai, A., Bearse, M. A., Freeman, R. D., & Cai, D. (1995). Contrast Grossberg, S., & Raizada, R. D. (2000). Contrast-sensitive perceptual
coding by cells in the catÕs striate cortex: Monocular vs. binocular grouping and object-based attention in the laminar circuits of
detection. Visual Neuroscience, 12, 77–93. primary visual cortex. Vision Research, 40, 1413–1432.
Backus, B. T., Fleet, D. J., Parker, A. J., & Heeger, D. J. (2001). Grossberg, S., & Swaminathan, G. A laminar cortical model for
Human cortical activity correlates with stereoscopic depth percep- 3D perception of slanted and curved surfaces and of 2D
tion. Journal of Neurophysiology, 86, 2054–2068. images: Development, attention and bistability. Submitted for
Bakin, J. S., Nakayama, K., & Gilbert, C. D. (2000). Visual responses publication.
in monkey area V1 and V2 to three-dimensional surface configu- Grossberg, S., & Todorovic, D. (1988). Neural dynamics of 1-D and
rations. The Journal of Neuroscience, 20, 8188–8198. 2-D brightness perception: A unified model of classical and recent
Callaway, E. M. (1998). Local circuits in primary visual cortex of the phenomena. Perception and Psychophysics, 43, 241–277.
macaque monkey. Annual Review of Neuroscience, 21, 47–74. Grossberg, S., & Williamson, J. R. (2001). A neural model of how
Callaway, E. M., & Wiser, A. K. (1996). Contributions of individual horizontal and interlaminar connections of visual cortex develop
layer 2–5 spiny neurons to local circuits in macaque primary visual into adult circuits that carry out perceptual groupings and learning.
cortex. Visual Neuroscience, 13, 907–922. Cerebral Cortex, 11, 37–58.
Cumming, B. G. (2002). Receptive field structure and disparity tuning Howard, I. P., & Rogers, B. J. (1995). Binocular vision and stereopsis.
in primate V1. Vision ScienceS Society (abstracts), 288. New York: Oxford University Press.
Cumming, B. G., & Parker, A. J. (2000). Local disparity not perceived Howe, P. D. L., & Grossberg, S. (2001). Laminar cortical circuits for
depth is signaled by binocular neurons in cortical area V1 of the stereopsis and surface depth perception. Society for Neuroscience
macaque. The Journal of Neuroscience, 20, 4758–4767. Abstracts, 164.17.
Dow, B. M. (1974). Function classes of cells and their laminar Howe, P. D. L., & Watanabe, T. (in press). A new test of the disparity
distribution in monkey visual cortex. Journal of Neurophysiology, energy model of stereopsis. Vision Research.
37, 927–946. Hubel, D. H., & Livingstone, M. S. (1987). Segregation of form, color,
Fleet, D. J., Wagner, H., & Heeger, D. J. (1996). Neural encoding of and stereopsis in primate area 18. The Journal of Neuroscience, 7,
binocular disparity: Energy models, position shifts and phase 3378–3415.
shifts. Vision Research, 36, 1839–1857. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional
Frisby, J. P. (2001). Limited understanding of PanumÕs limiting case. architecture of monkey striate cortex. Journal of Physiology, 195,
Perception, 30, 1151–1152. 215–243.
Gillam, B., Blackburn, S., & Cook, M. (1995). PanumÕs limiting case: Julesz, B. (1971). Foundations of cyclopean perception. Chicago: The
Double fusion, convergence error, or Ôda Vinci stereopsisÕ. Percep- University of Chicago Press.
tion, 24, 333–346. Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of
Gillam, B., Blackburn, S., & Nakayama, K. (1999). Stereopsis based neural science (4th ed.). Chicago: University of Chicago Press.
on monocular gaps: Metrical encoding of depth and slant without Katz, L. C., Gilbert, C. D., & Wiesel, T. N. (1989). Local circuits and
matching contours. Vision Research, 39, 493–502. ocular dominance columns in monkey striate cortex. The Journal of
Grimson, W. E. (1981). A computer implementation of a theory of Neuroscience, 9, 1389–1399.
human stereo vision. Philosophical Transactions of the Royal Kelly, F. J., & Grossberg, S. (2000). Neural dynamics of 3-D surface
Society (B), 292, 217–253. perception: Figure-ground separation and lightness perception.
Grossberg, S. (1987). Cortical dynamics of three-dimensional form, Perception and Psychophysics, 62, 1596–1619.
color, and brightness perception: II. Binocular theory. Perception Krol, J. D., & van de Grind, W. A. (1983). Depth from dichoptic edges
and Psychophysics, 41, 117–158. depends on vergence tuning. Perception, 12, 425–438.
Grossberg, S. (1994). 3D vision and figure-ground separation by visual Livingstone, M. S., & Hubel, S. H. (1984). Anatomy and physiology of
cortex. Perception and Psychophysics, 55, 48–120. a color system in the primate visual cortex. The Journal of
Grossberg, S. (1997). Cortical dynamics of three-dimensional figure- Neuroscience, 4, 309–356.
ground perception of two-dimensional figures. Psychological Re- Marr, D., & Poggio, T. (1976). Cooperative computation of stereo
view, 104, 618–658. disparity. Science, 194, 283–287.
S. Grossberg, P.D.L. Howe / Vision Research 43 (2003) 801–829 829
McKee, S. P., Bravo, M. J., Smallman, H. S., & Legge, G. E. (1995). grouping, attention, and orientation contrast. Visual Cognition, 8,
The Ôuniqueness constraintÕ and binocular masking. Perception, 24, 431–466.
49–65. Read, J. C. A., Cumming, B. C., & Parker, A. J. (2002). Simple cells
McKee, S. P., Bravo, M. J., Taylor, D. G., & Legge, G. E. (1994). can show non-linear binocular combination. Vision ScienceS
Stereo matching precedes dichoptic masking. Vision Research, 34, Society (abstract), 287.
1047–1060. Rockland, K. S., & Virga, A. (1990). Organization of individual
McLoughlin, N., & Grossberg, S. (1998). Cortical computation of cortical axons projecting from area V1 (area 17) to V2 (area 18) in
stereo disparity. Vision Research, 38, 91–99. the macaque monkey. Visual Neuroscience, 4, 11–28.
Merigan, W. H., & Maunsell, J. H. R. (1993). How parallel are the Roe, A. W., & TsÕo, D. Y. (1995). Visual topography in primate V2:
primate visual pathways. Annual Review of Neuroscience, 16, 369– Multiple representation across functional stripes. The Journal of
402. Neuroscience, 15, 3689–3715.
Nakamura, K., & Colby, C. L. (2000a). Visual, saccade-related, and Roe, A. W., & TsÕo, D. Y. (1997). The functional architecture of area
cognitive activation of single neurons in monkey extrastriate area V2 in the macaque monkey. Cerebral Cortex, 12, 295–333.
V3A. Journal of Neurophysiology, 84, 677–692. Schiller, P. H., Finlay, B. L., & Volman, S. F. (1976). Quantitative
Nakamura, K., & Colby, C. L. (2000b). Updating of the visual studies of single-cell properties in monkey striate cortex. I.
representation in monkey striate and extrastriate cortex during Spatiotemporal organization of receptive fields. Journal of Neuro-
saccades. Proceedings of the National Academy of Sciences, 99, physiology, 39, 1288–1319.
4026–4031. Schiller, P. H., Logothetis, N. K., & Charles, E. R. (1990a). Role of the
Nakamura, H., Kuroda, T., Wakita, M., Kusunoki, M., Kato, color-opponent and broad-band channels in vision. Visual Neuro-
A., Mikami, A., Sakata, H., & Itoh, K. (2001). From three- science, 5, 321–346.
dimensional space vision to prehensile hand movements: The Schiller, P. H., Logothetis, N. K., & Charles, E. R. (1990b). Functions
lateral intraparietal area links the area V3A and the anterior of the colour-opponent and broad-band channels of the visual
intraparietal area in macaque. The Journal of Neuroscience, 21, system. Nature, 343, 68–70.
8174–8187. Schor, C., & Heckmann, T. (1989). Interocular differences in contrast
Nakayama, K., & Shimojo, S. (1990). da Vinci stereopsis: Depth and and spatial frequency: Effects on stereopsis and fusion. Vision
subjective occluding contours from unpaired image points. Vision Research, 29, 837–847.
Research, 30, 1811–1825. Smallman, H. S., & McKee, S. P. (1995). A contrast ratio constraint on
Ohzawa, I. (1998). Mechanisms of stereoscopic vision: The disparity stereo matching. Proceedings of the Royal Society of London B, 260,
energy model. Current Opinion in Neurobiology, 8, 509–515. 265–271.
Ohzawa, I., DeAngelis, G. C., & Freeman, R. D. (1990). Stereoscopic Smith, E. L., Chino, Y., Ni, J., & Cheng, H. (1997). Binocular
depth discrimination in the visual cortex: Neurons ideally suited as combination of contrast signals by striate cortical neurons in the
disparity detectors. Science, 249, 1037–1041. monkey. Journal of Neurophysiology, 78, 366–382.
Panum, P. L. (1858). Physiologische Untersuchungen ueber das Sehen Swaminathan, G., & Grossberg, S. (2001). Laminar cortical circuits for
mit zwei Augen. Kiel: Schwerssche Buchhandlung (translated by C. the perception of slanted and curved 3D surfaces. Society for
Hubscher, Hanover, NH: Dartmouth Eye Institute, 1940). Neuroscience Abstracts, 619.49.
Pessoa, L., Mingolla, E., & Neumann, H. (1995). A contrast- and Tootell, R. B. H., Mendola, J. D., Hadjikhani, N. K., Ledden,
luminance-driven multiscale network model of brightness percep- P. J., Liu, A. K., Reppas, J. B., Sereno, M. I., & Dale,
tion. Vision Research, 35, 2201–2223. A. M. (1997). Functional analysis of V3A and related areas in
Peterhans, E. (1997). Functional organization of area V2 in the awake human visual cortex. The Journal of Neuroscience, 17, 7060–
monkey. Cerebral Cortex, 12, 335–357. 7078.
Poggio, G. F. (1972). Spatial properties of neurons in striate cortex of Tsao, D. Y., & Livingstone, M. S. (in press). Spatiotemporal maps of
unanesthetized macaque monkey. Investigative Ophthalmology, 11, disparity-selective simple cells in macaque V1. Neuron.
369–377. von der Heydt, R., Zhou, H., & Friedman, H. S. (2000). Represen-
Poggio, G. F. (1991). Physiological basis of stereoscopic vision. In tation of stereoscopic edges in monkey visual cortex. Vision
Vision and visual dysfunction. Binocular vision (pp. 224–238). Research, 40, 1955–1967.
Boston, MA: CRC. Wallach, H. (1976). On perception. New York: Quadrangle/The New
Poggio, G. F., & Fischer, B. (1977). Binocular interaction and depth York Times Book Co., p. 8.
sensitivity in striate and prestriate cortex of behaving rhesus Wang, Z., Wu, X., Ni, R., & Wang, Y. (2001). Double fusion does not
monkey. Journal of Neurophysiology, 40, 1392–1405. occur in PanumÕs limiting case: Evidence from orientation dispar-
Poggio, G. F., & Talbot, W. H. (1981). Mechanisms of static and ity. Perception, 30, 1143–1149.
dynamic stereopsis in foveal cortex of the rhesus monkey. Journal Xiao, Y., Zych, A., & Felleman, D. J. (1999). Segregation and
of Physiology, 315, 469–492. convergence of functionally defined V2 thin stripe and interstripe
Raizada, R., & Grossberg, S. (2001). Context-sensitive bindings by the compartment projections to area V4 of macaques. Cerebral Cortex,
laminar circuits of V1 and V2: A unified model of perceptual 9, 792–804.