NN 17 809

Neural Networks 17 (2004) 809–821
www.elsevier.com/locate/neunet
2004 Special Issue
Integration of form and motion within a generative model of visual cortex

Paul Sajda*, Kyungim Baek
Department of Biomedical Engineering, Columbia University, New York, NY 10027, USA
Received 31 March 2004; accepted 31 March 2004
Abstract
One of the challenges faced by the visual system is integrating cues within and across processing streams for inferring scene
properties and structure. This is particularly apparent in the inference of object motion, where psychophysical experiments have shown
that integration of motion signals, distributed across space, must also be integrated with form cues. This has led several to conclude that
there exist mechanisms which enable form cues to ‘veto’ or completely suppress ambiguous motion signals. We describe a probabilistic
approach which uses a generative network model for integrating form and motion cues using the machinery of belief propagation and
Bayesian inference. We show, using computer simulations, that motion integration can be mediated via a local, probabilistic
representation of contour ownership, which we have previously termed ‘direction of figure’. The uncertainty of this inferred form cue is
used to modulate the covariance matrix of network nodes representing local motion estimates in the motion stream. We show with
results for two sets of stimuli that the model does not completely suppress ambiguous cues, but instead integrates them in a way that is a
function of their underlying uncertainty. The result is that the model can account for the continuum of bias seen for motion coherence
and perceived object motion in psychophysical experiments.
q 2004 Elsevier Ltd. All rights reserved.
Keywords: Form and motion integration; Generative model; Belief propagation; Aperture problem; Hypercolumn; Direction of figure; Contour ownership;
Occlusion
1. Introduction within the same level of the processing hierarchy. This is

due to the ‘generalized aperture problem’1 with which the
The classical view of information processing in visual visual system is confronted. An individual neuron or local
cortex is that of a bottom-up process in a feed-forward population of neurons ‘sees’ only a limited patch of the
hierarchy (Hubel & Wiesel, 1977). However bottom-up visual field. To form coherent representations of objects,
information that encodes physical properties of the sensory non-local informational dependencies, and their uncertain-
ties, must be integrated across space and time, as well as
input is often insufficient, uncertain, and even ambiguous—
other stimulus and representational dimensions.
for example consider the classic demonstrations of the
One particularly striking example illustrating the nature
Dalmatian dog (Thurston & Carraher, 1966) and Rubin’s
of the visual integration problem is that of inferring object
vase (Rubin, 1915). Psychophysical (Adelson, 1992; Driver
motion in a scene. The motion of a homogeneous contour
& Spence, 1998), anatomical (Budd, 1998; Callaway, 1998)
(or edge) is perceptually ambiguous because of the ‘aperture
and physiological (Bullier, Hupé, James, & Girard, 1996;
problem’—i.e. a single local measurement along an object’s
Martinez-Conde et al., 1999) evidence suggests that
bounding contour cannot be used to reliably infer an
integration of bottom-up and top-down processes plays a
object’s motion. However, this ambiguity can be potentially
crucial role in the processing of the sensory input. For overcome by measuring locally unambiguous motion
example top-down factors, such as attention, can result in signals, tied to specific visual features, and then integrating
strong modulation of neural responses as early as primary these to form a global motion percept. An early study by
visual cortex (V1) (McAdams & Read, 2003). Information Adelson (Adelson & Movshon, 1982) has suggested that
also flows laterally between populations of cortical neurons
1
We use the term ‘generalized aperture problem’ to distinguish it from
* Corresponding author. Tel.: þ 1-212-854-5279. the term ‘aperture problem’ which is traditionally associated with local
E-mail address: [email protected] (P. Sajda). motion estimation and which we will discuss in more detail below.
0893-6080/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2004.03.013
810 P. Sajda, K. Baek / Neural Networks 17 (2004) 809–821
Nomenclature N a set of neighboring nodes

T compatibility/transition matrix between neigh-
b local belief
boring nodes
c random variable for observed cue (network
w weight for different form cues
model) or normalizing/dividing constant
f weight function modulating local covariance in
(equations)
the motion stream
i; j; k indices of the nodes
a modulation variable for exponent of the weight
x hidden variable
function
y observed variable
rmax maximum value of the weight function
z variable for prior input
v velocity
E local interaction between hidden and observed
Cov covariance matrix for local velocity
nodes
m mean vector for local velocity
M message passed from a hidden node to a
S inverse covariance matrix for local velocity
neighboring hidden node
these two stages—local motion measurement and inte- on local motion at junctions where line terminators are
gration—are indeed involved in visual motion perception. defined. A visual surface can be defined by associating an
There are several visual features that have been identified object’s boundary to a region representing the object’s
as being unambiguous local motion cues. Examples of such surface. The basic problem in this surface assignment is to
features are line terminators and junctions. Line terminators determine the contour ownership (Nakayama et al., 1989).
have been traditionally classified into two different types: We represent ownership using a local representation which
intrinsic terminators, that are due to the natural end of a line, we call the ‘direction of figure’ (DOF) (Sajda & Finkel,
and extrinsic terminators, that are not created by the end of a 1995). In our model the DOF at each point of the contour is
line itself but rather a result of occlusion by another surface represented by a hidden variable whose probability is
(Nakayama, Shimojo, & Silverman, 1989). Intrinsic ter- inferred via integration of bottom-up, top-down and lateral
minators are claimed to provide an unambiguous signal for inputs. The ‘belief’ in the DOF is used to estimate occlusion
the true velocity of the line, while extrinsic terminators boundaries between surfaces that are defined by where the
generate a locally ambiguous signal which presumably DOF changes—the ownership junction (Finkel & Sajda,
should have little influence for accurate motion estimation 1994). In the model, the probability of extrinsic line
(Shimojo, Silverman, & Nakayama, 1989). One problem is terminators is a function of the probability of these
that there is ambiguity in all local measurements and ownership junctions. Thus rather than completely suppres-
therefore it is not a simple matter of determining which of sing the motion signals at the extrinsic terminators, the
the motions are ambiguous or unambiguous but the degree degree of certainty (i.e. belief) in the DOF is considered as
of ambiguity—i.e. degree of certainty of the cue. the strength of the evidence for surface occlusion and used
In this paper we describe a generative network model to determine the strength of local motion suppression.
for integrating form and motion cues which directly The remainder of the paper is organized as follows.
exploits the uncertainty in these cues. Generative models Section 2 describes our generative network model, putting it
are probabilistic models which directly model the in the context of the organizational structure of visual
distribution of some set of observations and underlying cortex. Though the model we describe does not use
hidden variables or states (Jebara, 2004). The advantages biologically realistic units (e.g. conductance based inte-
of a generative model are that (1) it uses probabilities to grate-and-fire neurons) it is instructive to consider how the
represent the ‘state of the world’ and therefore directly generative model maps to the cortical architecture. We next
exploits uncertainties associated with noise and ambiguity, describe the details of the integration process between the
(2) through the use of Bayesian machinery, one can infer form and motion streams, a process that exploits informa-
the underlying state of hidden variables, (3) it naturally tional uncertainty. We describe how this uncertainty is
enables integration via multiple sources, including those propagated through the network using Bayesian machinery.
arising from bottom-up, top-down, and lateral inputs as We then present two sets of simulation results, illustrating
well as those arising from other streams and (4) it results the interaction of the form and motion streams. The first set
in a system capable of performing a variety of analysis of simulations shows how the model can generate results for
functions including segmentation, classification, synthesis, motion coherence stimuli consistent with the psychophysi-
compression, etc. cal experiments of McDermott, Weiss, and Adelson (2001).
We describe a generative network model that accounts We show how form cues change the model’s inference of
for interaction between form and motion in a relatively perceived motion, in particular a gradual transition from
simple way, focusing on the influence of ‘direction of figure’ incoherent to coherent motion. We then demonstrate results
P. Sajda, K. Baek / Neural Networks 17 (2004) 809–821 811
for the classic barber-pole stimulus (Wallach, 1935), Schofield, & Fitzpatrick, 1997; Crowley & Katz, 1999;
showing how occlusion influences the certainty in the Horton & Hocking, 1996; Hubel & Wiesel, 1977; Tsunoda,
perceived object motion through form (DOF) cues. Yamane, Nishizaki, & Tanifuji, 2001). Callaway has
proposed a generic model of vertical connectivity connect-
ing layers within columns in primary visual cortex of cat
2. Generative network model and monkey (Callaway, 1998). Although the majority of
connections are within columns, anatomical and physio-
2.1. Hypercolumns in the visual cortex logical results indicate that there exist horizontal, long-
range connections between sets of columns and that these
Since the term ‘hypercolumn’ was coined by Hubel and connections give rise to complex, modulatory neuronal
Wiesel (1977) it has been used to describe the neural responses (Gilbert, 1992; Polat & Norcia, 1996; Stettler
machinery necessary to process a discrete region of the et al., 2002). In addition, a well-defined laminar structure
visual field. Typically, a hypercolumn occupies a cortical has been identified with input and output projections to a
area of , 1 mm2 and contains tens of thousands of neurons. column being a function of specific layers. Grossberg and
Current experimental and physiological studies have colleagues have proposed that this laminar architecture is a
revealed substantial complexity in neuronal response to key functional organizational principle of visual cortex
multiple, simultaneous inputs, including contextual influ- and that particular neural circuits integrating information
ence, as early as V1 (Gilbert, 1992; Kapadia, Ito, Gilbert, & bottom-up, top-down and laterally may account for a
Westheimer, 1995; Kapadia, Westheimer, & Gilbert, 2000). wide variety of perceptual phenomena in static and
Such contextual influences, considered to be mediated by dynamic vision (Grossberg & Williamson, 2001; Raizada
modulatory influences of the extraclassical receptive field, & Grossberg, 2003).
are presumed to arise from the lateral inputs from The generative network model we construct is organized
neighboring hypercolumns and/or from feedback from around the laminar hypercolumn structure of the visual
extrastriate areas (Bullier et al., 1996; Martinez-Conde cortex. In our model, a hypercolumn ‘module’ consists of a
et al., 1999; Polat, Mizobe, Pettet, Kasamatsu, & Norcia, set of nodes representing random variables which are either
1998; Polat & Norcia, 1996; Stettler, Das, Bennett, & observations to the column or hidden. The observations are
Gilbert, 2002). It appears that in the visual system, limited by the effective aggregate field of the column—i.e.
information computed locally across the input space by the aperture over visual space in which the hypercolumn
individual hypercolumn is propagated within the network constructs its bottom-up input. Though observations and
and integrated, perhaps in a modulatory way, to influence representations are local, integration via belief propa-
neural responses which ultimately correlate with perception. gation (described below) enables information to be passed
The organization of columnar circuitry in the visual more globally. Fig. 1 compares the generative network
system has been extensively studied (Bosking, Zhang, hypercolumn structure to that of a cortical hypercolumn.
Fig. 1. A simplified diagram of cortical connections focusing on the laminar structure of a hypercolumn (left) and the isomorphic architecture used in our
generative network model (right). For hypercolumns in primary visual cortex (V1) the LGN provides bottom-up input via strong connections to layer IV and
indirect connections passing through layer VI. The top-down, feedback signals from extrastriate cortex pass into layer VI then project into layer IV. Feed-
forward connections from layer IV project to layer II/III, which forms intrinsic lateral, long-range connections between hypercolumns. In our generative
network model cues corresponding to bottom-up input are represented as random variables ci in a set of nodes. Those cues are combined, together with top-
down prior knowledge, represented by the variable z; to form distributions for the observation variables y. Observations are passed to the nodes representing
hidden variables x; corresponding to layers II/III in the hypercolumn structure. Spatial integration is performed by passing probabilities between neighboring
hidden nodes.
Given a set of observations, including top-down and lateral

inputs, inference is used to compute the most likely states of
the hidden nodes. For example, in our model of the form
stream, the hidden variables in each column are the DOF at
each point along a contour. The DOF is inferred by
considering the bottom-up cues, such as curvature and
similarity, as well as top-down cues which can be
interpreted as prior beliefs see (Baek & Sajda (2003) for
details of the form generative network model). A critical
feature of the network model is the use of belief propagation Fig. 2. Illustration of local message passing from node i to node j: Open
circles are hidden variables, while shaded circles represent observed
to transmit information regarding the states of the random variables. The local belief at node j is computed by combining the incoming
variables, including their underlying uncertainties. messages from all its neighbors and the local interaction Ej :
2.2. Belief propagation (Fig. 2). The current local belief is estimated by combining
all incoming messages and the local observations.
In a generative network model a node represents a It has been shown that, for singly connected graphs
random variable and links specify the de-pendency (networks), BP converges to exact marginal probabilities
relationships between variables (Jebara, 2004). The states (Yedidia et al., 2003). Although how it works for general
of a random variable can be hidden in the sense that they are graphs is not well understood, empirical results have
not directly observable. A hidden variable state is inferred demonstrated that BP can converge to correct solutions
from the states of other hidden variables and the available even with networks having loops (Freeman, Pasztor, &
observations, through application of a local message passing Carmichael, 2000).
algorithm called belief propagation (BP) (Pearl, 1988). In
this section, a BP algorithm is described for undirected 2.3. Generative network model for form and motion streams
network models with pairwise potentials. It has been shown
that most generative network models (i.e. graphical models) Fig. 3 shows the generative network model used for
can be converted into this general form (Yedidia, Freeman, estimating perceived motion through integration of form
& Weiss, 2003). information. The model consists of two structurally
Let x be a set of hidden variables and y a set of observed identical but functionally different processing streams for
variables. The joint probability distribution of x given y is form and motion. The form and motion properties are
given by, inferred from local observations using BP for integrating
Y Y local measurements across space. In the current model, the
Pðx1 ; …; xn lyÞ ¼ c Tij ðxi ; xj Þ Ei ðxi ; yi Þ ð1Þ two streams interact unidirectionally so that influence flows
i;j i
from the form stream to motion stream only, i.e. form
where c is a normalization constant, xi represents the state of constrains motion perception.
the node positioned at i; Tij ðxi ; xj Þ captures the compatibility The network model first infers DOF in the form stream. As
between neighboring nodes i and j, and Ei ðxi ; yi Þ is the local described in detail in Baek and Sajda (2003), the local
interaction between the hidden and observed variables at interaction between hidden variable xdof i and observed
location i: An approximate marginal probability of this joint variable ydof
i is specified by local figure convexity and
probability at node i over all xj other than xi is called the similarity/proximity cues. The local convexity is determined
local belief, bðxi Þ: by the local angle of the contour at a given location, while the
The BP algorithm iterates local message passing and similarity/proximity cues are estimated by considering
belief updates (Yedidia et al., 2003). The message Mij ðxj Þ points having similar local orientation that lie in orthogonal
passed from a hidden node i to its neighboring hidden node j directions to the contour. These cues are computed separately
represents the probability distribution over the state of xj : In and combined in a weighted linear fashion to form the total
each iteration, messages and beliefs are updated as follows: local interaction. When prior information is available, it is
ð Y multiplied with this local interaction.2 The hidden variable
Mij ðxj Þ ¼ c dxi Tij ðxi ; xj ÞEi ðxi ; yi Þ Mki ðxi Þ ð2Þ xdof
i has two possible states, specifying the DOF relative to the
xi k[Ni =j direction of local convexity. When the form stream
Y converges, both the direction of figure as well as the certainty
bðxj Þ ¼ cEj ðxj ; yj Þ Mkj ðxj Þ ð3Þ
k[Nj 2
Note that in our previous work (Baek & Sajda, 2003), prior information
where Ni =j denotes a set of neighboring nodes of i except was treated as an additional cue and combined as a weighted linear sum.
The current mechanism of multiplying the prior with the local likelihood
node j: Mij is computed by combining all messages received results in a more localized posterior which is also more consistent with a
by node i from all neighbors except node j in the previous Bayesian framework. Examples shown in Fig. 4 were generated using this
iteration and marginalizing over all possible states of xi multiplication scheme.
Fig. 3. Generative network model for form and motion streams. Each stream is modeled by a network of hypercolumn nodes which integrate inputs via bottom-
up, top-down, and lateral connections. The DOF probability computed in the form stream integrates into the motion stream by modulating the covariances of
distributions used to model the velocity at apertures located at junction points. In the current model, the interaction between form and motion streams is
unidirectional.
(belief) of this direction is represented at each point along the i Þ; a; rmax Þ is an exponential
The weighting function f ðbðxdof
contour as the state of the corresponding hidden node. The defined on ½0; 0:5a and ranges from 1 to rmax : c is defined
form stream has been applied to several examples, including by the maximum weight rmax as c ¼ ðrmax 2 1Þ=ðe0:5a 2 1Þ:
perceptually ambiguous figures, with results that are in Initially, the covariance matrices of hidden variables are
agreement with human perception (Baek & Sajda, 2003). set to represent infinite uncertainty, and mean vectors are
Fig. 4 shows some examples of DOF estimates inferred by the set to zero. Local motions are then propagated and
form stream. integrated across space. When the BP algorithm converges,
A hidden node in the motion stream represents the the global motion percept is estimated by a mixture of
velocity at the corresponding location along the contour. We Gaussians:
assume a prior that specifies a preference for slower
X
motions, modeled using a symmetric Gaussian centered at pðvlyÞ ¼ c pi ðvlyÞpðiÞ ð6Þ
i ; xj Þ and
the origin. Both the pairwise compatibility Tij ðxmot mot
i
the local interaction Ei ðxi ; yi Þ that model the velocity
mot mot
likelihood at the apertures are also Gaussian. Currently, Tij where pi ðvlyÞ is the probability distribution of velocity v at
is set manually and Ei is defined by the mean of the normal location i resulting from the Gaussian of hidden variable
velocity at point i and a local covariance matrix Covi :
i ; pðiÞ are the mixing coefficients, and c is the
xmot
Before the BP algorithm begins, the variance at junction normalization constant. The peak location of distribution
points is modulated by a function of the DOF belief bðxdof i Þ: pðvlyÞ is the maximum a posteriori (MAP) estimate of
velocity, which is the most probable motion given the
i Þ; a; rmax Þ·Covi
Covi ¼ f ðbðxdof ð4Þ
observations. In addition, as described in the following
sections, the posterior distribution provides a means for
¼ ðc·ðea{bðxi Þ20:5}
dof
2 1Þ þ 1Þ·Covi ð5Þ interpreting perceived motion coherence.
Fig. 4. Examples of DOF estimates by the form stream. ‘Dumbbell’ bars indicate the DOF, as they point to the surface locally owned by the contour. Degree of
certainty (belief) in the DOF increases as the color changes (shown in web version) from blue ðP ¼ 0:5Þ to red ðP ¼ 1:0Þ: Note that since the assignment of
DOF is binary, P ¼ 0:5 implies complete uncertainty in the assignment. (a) DOF estimate for arbitrary shaped figure. (b) A spiral figure in which figure-ground
discrimination is ambiguous unless one serially traces the contour. The low certainty (blue) for the central region of the spiral indicates ambiguity of DOF,
consistent with human perception. Figures (c)–(e) show the perceptual shift induced by prior information. In the form stream, prior information is multiplied
with the local likelihood computed by combining interactions from different local cues. (c) DOF estimate for Rubin’s vase without using prior information. (d)
DOF estimate biased toward face figures by the prior cue for face features (nose, chin, brow). (e) DOF estimate shifts to the vase figure via a change in prior
cues for vase figures (base, top, center cup).
3. Simulation results increases (McDermott et al., 2001). The colored ‘dumbbell’

bars at the junctions between the occluders and moving bars
All simulations were done with the same network represents the DOF inferred by the form stream. The belief
architecture and parameter values, except the covariance in DOF increases, with color changing from blue to red,
of the motion prior. See Appendix for parameter values used because additional form cues (convexity cues) are observed
in the simulations.3 for the more ‘complete’ occluding figures.
The second row in Fig. 5 shows the resulting velocity
3.1. Circular motion of a square modulated by occluders estimates by the network model for the four stimuli. For
stimulus (a), all line terminators are intrinsic, hence there is
no modulation of local motion signals by DOF. However,
The first row in Fig. 5 shows the four stimuli used for this
the presence of occluders in stimuli (b), (c), and (d)
experiment. They are 908 rotated versions of the diamond
classifies the line terminators as extrinsic and the local
stimuli described in McDermott et al. (2001), but the
motion signals are suppressed. The strength of the
perceptual effect is basically the same. The motion of
suppression increases and becomes highest for stimulus
moving line segments is identical in all four stimuli. A pair
(d) since the belief in DOF at the junctions becomes stronger
of vertical line segments move together sinusoidally in a
as the degree of completion of the occluders increases. In
horizontal direction while the two horizontal line segments
the first two figures, we clearly see the bimodal distribution
move simultaneously in a vertical direction. The vertical
which implies that we perceive two separate motions for
and horizontal motions are 908 out of phase. The difference
stimuli (a) and (b). The last two figures show that a single
between the stimuli is the degree of ‘completeness’ of the
peak is formed at the intersection of two distributions for
L-shaped occluders, which alters the perceived motion.
stimuli (c) and (d), which indicates a stronger perception of
When there are no occluders as in (a), we are more likely to
see two separate motions of the line segments, while the single coherent motion compared to stimuli (a) and (b).
perceived motion coherence of a square rotating behind the Fig. 6 shows the ratio of the peak value at the intersection
occluders increases as the occluders’ completeness of the horizontal and vertical motion distributions normal-
ized by the maximum value of horizontal motion for the
3
For Flash demos of the types of stimuli used in the simulations, see four stimuli in Fig. 5. The values are scaled relative to
http://web.mit.edu/persci/demos/Motion&Form/master.html the ratio for stimulus (d). This figure can be interpreted as
Fig. 5. First row shows stimuli generated from four line segments that move sinusoidally, 908 out of phase, in vertical and horizontal directions. The line
segments are presented without occluders (a) and with occluders, with an increasing degree of completeness (b)–(d). The colored dumbbell bars at junctions
show the DOF inferred by the form stream. The color of the bar (shown in web version) represents the strength of the beliefs in DOF which increases from blue
(P ¼ 0:5) to red (P ¼ 1:0). The presence of the occluding surface alters the motion perception. Velocity estimation results for the stimulus presented in the
second row were computed by combining the estimates at all locations using a mixture of Gaussians.
the model’s estimate of the degree of motion coherence, drifting grating viewed through a rectangular shaped
which in fact represents a continuum as one moves from aperture, with the grating being perceived as moving in
stimulus (a) to (d). Note that the difference in strength of the the direction of elongated dimension of the aperture. The
coherence for stimuli (c) and (d) is clearly seen in this general explanation for the barber-pole illusion is that
figure. These results are qualitatively consistent and the unambiguous local motion signals at the terminators of
quantitatively similar to the psychophysical results pre- gratings are integrated, disambiguating the local motion and
sented in McDermott et al. (2001). thereby dominating the global motion percepts (Nakayama
Fig. 7 illustrates the resulting velocity estimate for & Silverman, 1988). Since longer dimensions of the
stimuli (b) and (d) for six successive frames sampled from a aperture provide more terminators that move along the
period of sinusoidal motion. On the top row, both horizontal axis of the dimension, the perceived motion is determined
and vertical motion are almost equally likely in each frame by the direction of that aperture dimension.
for stimulus (b) and the two separate motions oscillate in the Fig. 9 shows the velocity estimates of the model for the
direction normal to the line segment orientation. On barber-pole stimulus for several different aspect ratios. The
the other hand, for stimulus (d) shown at the bottom, the velocity likelihood at terminators in the two aperture
maximum of the distribution is at the intersection of the two dimensions and along the gratings of the barber-pole
motions in each frame and it follows a circular trajectory stimulus are shown in Fig. 10. Since all terminators are
which is consistent with perceiving rotation. intrinsic, there is no modulation of the local motion signals
A prior cue shown in Fig. 8 can be added for inferring
DOF in the form stream. Psychophysically, this can
simulate priming subjects or using stereo to add depth
information so that subjects are biased to see the occlusion
more strongly. Adding priors to the form stream in the
model strengthens the belief for DOF in the indicated
direction, which increases local motion suppression, and
consequently more coherent motion would be expected.
Fig. 8 shows the results with weak and strong prior cues for
stimulus (b). As the prior is made stronger, the estimated
motion produces a single peak at the intersection, similar to
stimulus (d) shown in Fig. 5.
Fig. 6. The probability at the intersection of the distributions in velocity
3.2. Barber-pole motion modulated by occluders space, representing horizontal and vertical motions, relative to the
maximum value of the horizontal motion distribution for stimuli (a)–(d)
One of the classic examples illustrating the aperture in Fig. 5. The values are scaled relative to the ratio for stimulus (d). This
‘peak strength probability’ ratio can be seen as the model’s estimate of
problem and the importance of occlusion cues in motion motion coherence, with values near 1 having high coherence and those near
perception is the barber-pole illusion (Wallach, 1935). In its 0 having low coherence. Compare to the psychophysical results (Fig. 6) of
basic form, the barber-pole stimulus consists of a diagonal McDermott, Weiss, and Adelson (2001).
Fig. 7. The sequence of resulting velocity estimates for stimuli (b) and (d) in Fig. 5 for six successive frames sampled from a period of sinusoidal motion. First
row shows results for stimulus (b), indicating two separate motions oscillating in the direction normal to the line segment orientation. The sequence in the
second row for stimulus (d) shows a coherent motion forming a circular trajectory, indicating coherent rotating motion. ‘X’ indicates MAP location.
by the form stream. The resulting velocity estimates clearly vertically dominate the resulting estimated motion. In the
illustrate that the dominant motion follows the direction of same way, the model estimates horizontal motion when
elongated dimension of the rectangular aperture, consistent occluders are placed along the vertical sides of the barber-
with the explanation described above. These results can be pole, with ‘perceived’ horizontal motion being even
seen more clearly in Fig. 14, which illustrates the continuum stronger than that in the first figure.
of horizontal motion bias as a function of different aspect To see whether the influence of extrinsic terminators on
ratios. the perceived motion is completely discounted or is just
Variations of the classic barber-pole illusion have been decreased, Lidén and Mingolla varied the aspect ratio of the
developed for investigating the influence of occlusion in barber-pole stimuli while keeping the size of the occluders
visual motion perception (Duncan, Albright, & Stoner, fixed (Lidén and Mingolla, 1998). The resulting motion
2000; Lidén & Mingolla, 1998; Shimojo et al., 1989). percepts show that the elongation of the barber-pole along
Occlusion cues alter the intrinsic/extrinsic visual features the occlusion boundary still influences the perceived
and consequently change the perceived motion. Lidén and direction of motion. They argue that their results suggest
Mingolla (1998) performed a series of psychophysical that the influence of extrinsic terminators is attenuated
experiments on a variety of barber-pole stimuli with rather than completely suppressed. They quantify the
occluding patches. They found that (1) monocular occlusion relative attenuation of the extrinsic terminators with respect
cues play a more critical role than binocular depth cues in to the intrinsic terminators based on the changes in
motion perception and (2) the local motion signal at perceived motion.
extrinsic terminators is not completely discounted. We generate similar stimuli as shown in the first and third
We used stimuli similar to those used by Lidén and row of Fig. 13. The aspect ratio was varied from 3:1 to 1:3
Mingolla (1998) to investigate the models response to and the occluders were placed in a way that altering aspect
different occlusion configurations. Fig. 11 shows these ratio changes the number of extrinsic terminators. Below
configurations, with occluders placed along either vertical each stimulus is the velocity estimate of the model for the
or horizontal sides of the barber-pole. The difference with stimulus. Although the changes between different configur-
those used in Lidén and Mingolla (1998) is the absence of ations may not be as clear as in the psychophysical results
depth cues and texture on the occluders. The arrows indicate reported in Lidén and Mingolla (1998), the model estimates
the perceived direction of barber-pole motion. Presumably, match the psychophysical results qualitatively (see Fig. 3 of
the grating terminators formed at the occluding boundary Lidén and Mingolla (1998)). A summary of these results is
are considered extrinsic and therefore their motion signal is shown in Fig. 14. These results indicate that the influence of
suppressed. Thus the perceived motion tends to be biased in
the direction orthogonal to the occlusion boundary.
Fig. 12 shows the velocity estimates of the model for the
three stimuli in Fig. 11. The stimulus with no occluders has
a longer side in the horizontal direction so the resulting
motion is horizontally biased, as already shown. The
occluders placed along the top and bottom of the barber-
pole causes horizontally moving terminators to have high
probability of being extrinsic. The strong belief in the DOF Fig. 8. Left figure is the prior cue for stimulus (b) in Fig. 5 used for inferring
at the occlusion boundary suppresses their influence in the DOF in the form stream. Velocity estimation results with weak and strong
motion stream. As a result, the intrinsic terminators moving prior cues are shown in (a) and (b), respectively.
Fig. 9. The barber-pole illusion. In its classic form shown in the far left, a diagonal drifting grating viewed through a rectangular shaped aperture is perceived as
moving in the direction of the elongated dimension of the aperture (Wallach, 1935). The prevailing explanation for this is that the unambiguous motion signals
at line terminators are integrated to produce the perceived direction of motion. Since there are more line terminators along the elongated dimension of the
aperture, they have stronger influence. Second through the last figures are the estimated velocity by the model for different barber-pole aspect ratios.
extrinsic terminators is not abolished in the model, rather it addition, our model does not address which visual cortical
is attenuated resulting in a continuum of horizontal motion areas are best suited for this specific type of cortical
bias, consistent with the results from human subjects. The integration of form and motion. Nonetheless, we have
quantitative differences between psychophysical results and attempted to demonstrate that the architecture of our model
our results may be due to stronger suppression by the form does in fact exploit several of the same organizational
stream in our model. principles seen in visual cortex. The laminar hypercolumn
architecture of the model, together with the relatively local
connectivity, are all reasonable constraints which make such
4. Discussion an architecture biologically feasible. In addition, Weiss
points out that the updating rules used in Bayesian networks
In this paper we describe a generative network model for require three building blocks: a distributed representation,
integrating form and motion cues. The model can account weighted linear summation of probabilities, and normal-
for a number of perceptual phenomena related to how form ization (Weiss, 2000). These processes have clear ties to
information is used to distinguish between intrinsic and cortical networks (Heeger, Simoncelli, & Movshon, 1996).
extrinsic terminators in the motion integration process. Many vision researchers have adopted a Bayesian or
Previous neural network models on segmentation and probabilistic approach to vision. Kersten and Schrater
integration of motion signals have studied the influence of (2000) describe vision within the context of the Bayesian
motion signals at terminators and occlusion cues (Gross- principles of ‘least commitment’ and ‘modularity’. Geisler
berg, Mingolla, & Viswanathan, 2001; Lidén & Pack, and Diehl (2002) describe a Bayesian based selection
1999). The advantage of our model over these previous framework which generally applies to the evolution of all
models is that uncertainties in the underlying observations perceptual systems. Lee and Mumford (2003) begin to
and inferred representations are used in characterizing address network and architectural issues, by considering
terminators as intrinsic vs. extrinsic and therefore these hierarchical Bayesian inference within visual cortex,
terminators do not fall into one class or the other but instead relating a general hierarchical model to neurophysiological
have some smooth transition between the two classes. A experimental data. Zemel (2004) has described ‘cortical
second, more general advantage of our model is that it belief networks’ as a model for orientation tuning in V1 to
provides a natural way to integrate different data types (top- motion discrimination in MT. Though our approach is
down, bottom-up), since all variables are mapped to consistent with these other theories/model, it differs in that
probability space. The advantage of the previous models, we propose a specific model and architecture which attem-
however, is that they are perhaps more neural and thus more pts to account for how different occlusion relationships,
closely related to the underlying biological circuitry. In
Fig. 11. Barber-pole with occlusion cues. The arrows indicate the perceived
direction of motion of the barber-pole. Left: Barber-pole with no occluders.
Middle: Occluders placed on the top and bottom cause the perceived motion
to be mostly vertical. The line terminators aligned with the occluders are
Fig. 10. Velocity likelihood for three apertures in the barber-pole stimulus considered as extrinsic terminators, therefore their motion signals tend to
with no occluders. (a) Likelihood at a line terminator along the horizontal become more ambiguous and have less influence on the perceived motion.
side, (b) likelihood at a line terminator along the vertical side, and (c) Right: Occluders placed on the left and right sides of the barber-pole bias
likelihood at a location on a diagonal line segment. the perceived motion toward horizontal (Lidén and Mingolla, 1998).
Fig. 12. Resulting velocity estimates for the three barber-pole stimuli
shown in Fig. 11. These results are obtained by performing motion
integration across all locations on the barber-pole. Left figure shows that the
barber-pole stimulus with no occluders is perceived as moving horizontally. Fig. 14. The percent horizontal velocity of the model estimates is plotted
Occluders placed along the top and bottom of the barber-pole bias the against the aspect ratio for the barber-pole stimuli without occluders (line
perceived motion toward vertical (middle). Similarly, strong horizontal with squares) and for the six occluded configurations shown in Fig. 13 (line
motion is perceived by placing occluders on both sides of the barber-pole with circles for horizontal occluders and line with triangles for vertical
(right). Note that the horizontal motion becomes stronger with occluders occluders). The results for the unoccluded configuration matches well with
compared to the first figure (no occluders). ‘X’ indicates the MAP estimate. the sigmoid estimation for the psychophysical response curve illustrated in
Lidén and Mingolla (1998) while the degree of change in horizontal motion
for occluded configurations is less dramatic. Compare to Fig. 3 of Lidén and
represented via certainty in figure-ground representations Mingolla (1998).
(DOF), are integrated within the motion integration process
and result in a continuum of perceived motion bias. analyze them (Paradiso, 1988), researchers have suggested
Since the initial studies on the statistical properties of that it might be probability distributions instead of single
population codes that first used Bayesian techniques to values that a neural population encodes (Anderson & Van
Essen, 1994; Pouget, Dayan, & Zemel, 2000; Zemel, Dayan,
& Pouget, 1998). A computational model proposed by
Deneve, Latham, and Pouget (1999) uses a biologically
plausible recurrent network to decode population codes by
maximum likelihood estimation. Therefore the network
essentially implements optimal inference and the simulation
results suggest that cortical areas may function as ideal
observers. If it is true that information is represented as
probability distributions in cortical areas, it means that the
brain may perform Bayesian inference that effectively deals
with uncertainty commonly arising in visual tasks. A recent
modeling study by Rao (2004) has shown how integrate-
and-fire neurons can be used to build networks for carrying
out Bayesian inference. Our current work is focusing on
using such integrate-and-fire models to build more realistic
columnar networks.
Over the last several years, substantial progress has been
made in generative network theory, probabilistic learning
rules, and their application to image analysis (Portilla,
Strela, Wainwright, & Simoncelli, 2002; Romberg, Choi, &
Baraniuk, 2002; Spence, Parra, & Sajda, 2000; Wainwright
& Simoncelli, 1999). Recent work by Hinton (Hinton &
Brown, 2000) has been directed at trying to understand the
relationship between Bayesian and neural processing.
Application of this ‘neuro-Bayesian’ approach to image
processing has been significantly advanced by Weiss
(1997), who has demonstrated that when applied to difficult
Fig. 13. Occluded barber-pole stimuli with varying aspect ratios and the
image processing problems, Bayesian networks converge
velocity estimates of the model for each corresponding stimulus. From the
top-left, aspect ratios of the barber-pole stimuli are 3:1, 2:1, 1:1, 1:1, 1:2, orders of magnitude more rapidly than current relaxation-
and 1:3, respectively. The number of extrinsic terminators created along the based algorithms. Processing time is only limited by the
occlusion boundaries changes according to the aspect ratio. The model time (iterations) required for information to propagate
estimates of the velocity show an increase for the motion in the direction of between all units focused on the target. This is in line with
the occlusion boundary as more extrinsic terminators are generated along
the boundary. This results suggest that the local motion of the extrinsic
David Marr’s dictum (Marr, 1982) that visual processing
terminators are not completely suppressed in the model, but instead should only take as long as required for all relevant
suppressed relative to the certainty of DOF in the from stream. information about the image to be transmitted across cortex,
and no further iterations should be necessary once the the similarity/proximity cue by identifying points having a
information has arrived at the appropriate hypercolumn. similar local tangent angle (i.e. orientation) that lie in a
Thus Bayesian methods offer the hope of matching the time direction orthogonal to the contour at point i: Ei;cvx prefers
constraints posed by human visual recognition. smaller angles and Ei;sim favors shorter distances with
Ultimately a model’s value is in the predictions it can similar orientations. These two local interactions are
generate. Our model predicts that contour ownership is combined to compute the overall local interaction:
represented as a probabilistic, local representation, and that
Ei ðxi ; yi Þ ¼ wcvx Ei;cvx þ wsim Ei;sim ðA1Þ
the DOF serves as a substrate for motion integration.
Our previous work has argued for DOF serving as such a The hidden variable xi represents a two dimensional binary
substrate for perceptual integration (Finkel & Sajda, 1994), DOF vector at location i along the boundary. The vector
however, the probabilistic representation is an additional xi ¼ ð1; 0ÞT specifies that the DOF is in the direction of local
prediction from this current model. Though the concept of convexity, while xi ¼ ð0; 1ÞT assigns the opposite direction
border ownership was first put forth by Nakayama et al. to the DOF.
(1989), the DOF is a local vector representation which is Since the graph is a chain, every node has two neighbors
ideally suited for neural coding, for example via a from which it receives messages. Furthermore, the hidden
population vector scheme. Interestingly, von der Heydt variables are discrete and have two possible states, with
and colleagues (Zhou, Friedman, & von der Heydt, 2000) incoming messages and the belief at xj in the form stream
have discovered that neurons in V2 appear to code for shown in Fig. 3 computed as follows:
ownership, in fact through a local representation of the side X
of the contour that represents the figure. In addition, he finds Mij ðxj Þ ¼ c Tij ðxi ; xj ÞEi ðxi ; yi ÞMhi ðxi Þ ðA2Þ
xi
firing rates of these neurons appear to be modulated in a
X
continuous way, based on occlusion and other figure-ground Mkj ðxj Þ ¼ c Tkj ðxk ; xj ÞEk ðxk ; yk ÞMlk ðxk Þ ðA3Þ
cues, consistent with a probabilistic representation. The xk
results of our model would predict that the strength of these
neuronal firing rates in V2 would modulate the distribution bðxj Þ ¼ cEj ðxj ; yj ÞMij ðxj ÞMkj ðxj Þ ðA4Þ
of responses of motion selective neurons, perhaps in area where Tij specifies the pairwise compatibility as described in
MT, leading to a shift in the distribution of firing rates and Section 2.2, and the sum is over two possible states of the
resulting in a change in perceived motion. hidden variables. The vector multiplication is performed
element by element. Each element of bðxj Þ represents the
degree of confidence in the DOF for the corresponding
Acknowledgements direction at location j. When the algorithm converges the
states of the hidden variables are determined by taking the
This work was supported by the DoD Multidisciplinary direction having larger certainty.
University Research Initiative (MURI) program adminis-
tered by the Office of Naval Research under grant N00014- A.2. Belief propagation in the motion stream
01-1-0625, and a grant from the National Imagery and
Mapping Agency, NMA201-02-C0012. We assume a Gaussian generating process in the motion
stream with the probabilities in velocity space represented
by two dimensional mean vectors and 2 £ 2 covariance
Appendix A matrices. The update rules for the parameters in the one
dimensional Gaussian case are described in Weiss (1997).
The graphical structure of the form and motion streams in Let mj and Sj be the mean and inverse covariance matrix
the current network model is an undirected chain. The defining the probability distribution of hidden variable xj :
equations for updating messages and beliefs in the model Also, let maj be the mean passed along one direction of the
are described below. chain and mbj the mean passed along the opposite direction,
with Saj and Sbj being the corresponding inverse covariance
A.1. Belief propagation in the form stream matrices. Similarly, the mean and inverse covariance matrix
passed from the local observation yj to the hidden variable xj
In the form stream, local figure convexity and similar- are represented by mlj and Slj : The parameters in the motion
ity/proximity cues are combined to form initial obser- stream, shown in Fig. 3, are then updated as follows:
vations. The local interaction Ei;cvx between hidden variable
xdof
i (the superscript will be dropped for notational mj ¼ ðSlj þ Saj þ Sbj Þ21 ðSlj mlj þ Saj maj þ Sbj mbj Þ ðA5Þ
convenience) and observed variable yi;cvx specified by the
convexity at point i is determined by the local angle of the Sj ¼ ðSlj þ Saj þ Sbj Þ21 ðA6Þ
contour at the location. At the same time, the local
interaction Ei;sim between xi and yi;sim is computed from maj ¼ ðSai þ Sli Þ21 ðSai mai þ Sli mli Þ ðA7Þ
Bullier, J., Hupé, J. M., James, A., & Girard, P. (1996). Functional
Saj ¼ ðC þ ðSai þ Sli Þ21 Þ21 ðA8Þ interactions between areas V1 and V2 in the monkey. Journal of
Physiology (Paris), 90, 217 –220.
mbj ¼ ðSbk þ Slk Þ21 ðSbk mbk þ Slk mlk Þ ðA9Þ Callaway, E. M. (1998). Local circuits in primary visual cortex
of the macaque monkey. Annual Review of Neuroscience, 21,
47 –74.
Sbj ¼ ðC þ ðSbk þ Slk Þ21 Þ21 ðA10Þ Crowley, J. C., & Katz, L. C. (1999). Development of ocular dominance
columns in the absence of retinal input. Nature Neuroscience, 2(12),
where C is the covariance matrix of a zero mean Gaussian 1125–1130.
distribution describing noise in the observations. The global Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes:
a neural implementation of ideal observers. Nature Neuroscience, 2(8),
motion percept is estimated by combining the resulting
740 –745.
Gaussians across all locations. Driver, J., & Spence, C. (1998). Cross-modal links in spatial attention.
Philosophical Transactions of the Royal Society of London B:
A.3. Parameter values used in the simulations Biological Science, 353, 1319– 1331.
Duncan, R. O., Albright, T. D., & Stoner, G. R. (2000). Occlusion and the
interpretation of visual motion: perceptual and neuronal effects of
Parameter Value Description
context. The Journal of Neuroscience, 20(15), 5885–5897.
Finkel, L. H., & Sajda, P. (1994). Constructing visual perception. American
a 5 Constant multiplying Scientist, 82, 224 –237.
exponent of weight Freeman, W. T., Pasztor, E. C., & Carmichael, O. T. (2000). Learning low-
function f ðbðxdof
i Þ;
level vision. International Journal of Computer Vision, 40(1), 25– 47.
a; rmax Þ Geisler, W. S., & Diehl, R. L. (2002). Bayesian natural selection and the
evolution of perceptual systems. Philosophical Transactions of the
rmax 350 Maximum value of the Royal Society of London B: Biological Science, 357, 419– 448.
weight function f ðbðxdof
i Þ; Gilbert, C. D. (1992). Horizontal integration and cortical dynamics.
a; rmax Þ Neuron, 9, 1 –13.
wcvx : wsim 8:2 Weights for convexity Grossberg, S., Mingolla, E., & Viswanathan, L. (2001). Neural dynamics of
cue and similarity cue motion integration and segmentation within and across apertures.
Tdof diag (0.995; 0.005) Matrix defining pairwise Vision Research, 41, 2521–2553.
Grossberg, S., & Williamson, J. R. (2001). A neural model of horizontal
compatibility in form
and interlaminar connections of visual cortex develop into adult circuits
stream that carry out perceptual grouping and learning. Cerebral Cortex, 11(1),
ssqrs 34.64 Standard deviation of 37 –58.
prior for circular motion Heeger, D. J., Simoncelli, E. P., & Movshon, J. A. (1996). Computational
of square simulation models of cortical visual processing. Proceedings of the National
sbpole 2.24 Standard deviation of Academy of Sciences, 93, 623– 627.
Hinton, G. E., & Brown, A. D. (2000). Spiking Boltzman machines. In S.A.
prior for barber-pole
Solla, T.K. Leen, & K.-R. Müller (Eds.), Advances in Neural
simulation Information Processing Systems 12, Cambridge, MA: MIT Press,
pp. 122–128.
Horton, J. C., & Hocking, D. R. (1996). Intrinsic variability of ocular
dominance column periodicity in normal macaque monkeys. Journal of
References
Neuroscience, 16(22), 7228–7339.
Hubel, D. H., & Wiesel, T. D. (1977). Functional architecture of macaque
Adelson, E. H. (1992). Perceptual organization and the judgment of monkey visual cortex. Proceedings of the Royal Society of London B,
brightness. Science, 262, 2042–2044. 198, 1–59.
Adelson, E. H., & Movshon, J. A. (1982). Phenomenal coherence of Jebara, T. (2004). Machine learning: Discriminative and generative.
moving visual patterns. Nature, 300, 523–525.
Dordrecht: Kluwer Academic Publishers.
Anderson, C. H., & Van Essen, D. C. (1994). Neurobiological
Kapadia, M. K., Ito, M., Gilbert, C. D., & Westheimer, G. (1995).
computational systems. In J. M. Zurada, R. J. Marks, II, & C. J.
Improvement in visual sensitivity by changes in local context: parallel
Robinson (Eds.), Computational intelligence imitating life (pp.
studies in human observers and in V1 of alert monkeys. Neuron, 15,
213– 222). New York: IEEE Press.
843 –856.
Baek, K., & Sajda, P. (2003). A probabilistic network model for integrating
visual cues and inferring intermediate-level representations. Proceed- Kapadia, M. K., Westheimer, G., & Gilbert, C. D. (2000). Spatial
ings of IEEE Workshop on Statistical and Computational Theories of distribution of contextual interactions in primary visual cortex and in
Vision, Nice, France. (http://www.stat.ucla.edu/~yuille/meetings/ visual perception. Journal of Neurophysiology, 84, 2048–2062.
papers/sctv03_10.pdf). Kersten, D., & Schrater, P. (2000). Pattern inference theory: A probabilistic
Bosking, W. H., Zhang, Y., Schofield, B., & Fitzpatrick, D. (1997). approach to vision. In R. Mausfeld, & D. Heyers (Eds.), Perception and
Orientation selectivity and the arrangement of horizontal connections the physical world. Chichester: Wiley.
in tree shrew striate cortex. Journal of Neuroscience, 17(6), Lee, T. S., & Mumford, D. (2003). Hierarchical Bayesian inference in the
2112 – 2127 (http://www.stat.ucla.edu/~yuille/meetings/papers/ visual cortex. Journal of the Optical Society of America (A), 20(7),
sctv03_10.pdf). 1434–1448.
Budd, J. M. L. (1998). Extrastriate feedback to primary visual cortex in Lidén, L., & Mingolla, E. (1998). Monocular occlusion cues alter the
primates: a quantitative analysis of connectivity. Proceedings of the influence of terminator motion in the barber pole phenomenon. Vision
Royal Society of London B, 265, 1037– 1044. Research, 38, 3883–3898.
Lidén, L., & Pack, C. (1999). The role of terminators and occlusion cues in Romberg, J. K., Choi, H., & Baraniuk, R. G. (2002). Bayesian tree-
motion integration and segmentation: a neural network model. Vision structured image modeling using wavelet-domain hidden Markov
Research, 39, 3301–3320. models. IEEE Transactions on Image Processing, 10(7), 1056– 1068.
Marr, D. (1982). Vision: A computational investigation into the human Rubin, E. (1915). Visuell Wahrgenommene Figuren. Copenhagen:
representation and processing of visual information. San Francisco: Gyldenalske Boghandel.
W.H. Freeman. Sajda, P., & Finkel, L. H. (1995). Intermediate-level visual representations
Martinez-Conde, S., Cudeiro, J., Grieve, K. L., Rodriguez, R., Rivadulla, and the construction of surface perception. Journal of Cognitive
C., & Acuña, C. (1999). Effect of feedback projections from area 18 Neuroscience, 7(2), 267 –291.
layers 2/3 to area 17 layers 2/3 in the cat visual cortex. Journal of Shimojo, S., Silverman, G. H., & Nakayama, K. (1989). Occlusion and the
Neurophysiology, 82, 2667–2675. solution to the aperture problem for motion. Vision Research, 29,
McAdams, C. J., & Read, R. C. (2003). Effects of attention on the spatial 619– 626.
and temporal structure of receptive fields in macaque primary visual Spence, C., Parra, L., & Sajda, P. (2000). Hierarchical image probability
cortex. Society for Neuroscience Annual Meeting, Washington, DC, (hip) models. IEEE International Conference in Image Processing, 3,
5519. 320– 323.
McDermott, J., Weiss, Y., & Adelson, E. H. (2001). Beyond junctions: Stettler, D. D., Das, A., Bennett, J., & Gilbert, C. D. (2002). Lateral
nonlocal form constraints on motion interpretation. Perception, 30, connectivity and contextual interactions in macaque primary visual
905–923. cortex. Neuron, 36, 739 –750.
Nakayama, K., Shimojo, S., & Silverman, G. H. (1989). Stereoscopic Thurston, J. B., & Carraher, R. G. (1966). Optical Illusions and the Visual
depth: its relation to image segmentation, grouping, and the recognition
Arts. New York: Reinhold Pub Corp.
of occluded objects. Perception, 18, 55 –68.
Tsunoda, K., Yamane, Y., Nishizaki, M., & Tanifuji, M. (2001). Complex
Nakayama, K., & Silverman, G. H. (1988). The aperture problem II: spatial
objects are rep-resented in macaque inferotemporal cortex by the
integration of velocity information along contours. Vision Research, 28,
combination of feature columns. Nature Neuroscience, 4(8), 832–838.
747–753.
Wainwright, M. J., & Simoncelli, E. P. (1999). Scale mixtures of gaussians
Paradiso, M. (1988). A theory for the use of visual orientation information
and the statistics of natural images. In S.A. Solla, T.K. Leen, & K.-R.
which exploits the columnar structure of striate cortex. Biological
Müller (Eds.), Advances in Neural Information Processing Systems 12,
Cybernetics, 58, 35– 49.
pp. 855–861.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of
Wallach, H. (1935). Üeber visuell whargenommene bewegungrichtung.
plausible inference. Los Altos, CA: Morgan Kaufmann.
Polat, U., Mizobe, K., Pettet, M. W., Kasamatsu, T., & Norcia, A. M. Psychologische Forschung, 20, 325– 380.
(1998). Collinear stimuli regulate visual responses depending on cell’s Weiss, Y. (1997). Interpreting images by propagating Bayesian beliefs. In
contrast threshold. Nature, 391(5), 580–584. M.C. Mozer, M.I. Jordan, & T. Petsche (Eds.), Advances in Neural
Polat, U., & Norcia, A. M. (1996). Neurophysiological evidence for Information Processing Systems 9, pp. 908– 915.
contrast dependent long-range facilitation and suppression in the human Weiss, Y. (2000). Correctness of local probability propagation in graphical
visual cortex. Vision Research, 36(14), 2099–2109. models with loops. Neural Computation, 12, 1–42.
Portilla, J., Strela, V., Wainwright, M. J., Simoncelli, E. P., (2002). Image Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003). Understanding belief
denoising using gaussian scale mixtures in the wavelet domain. propagation and its generalizations. In G. Lakemeyer, & B. Nebel
Technical report TR2002-831. Computer Science Department, Courant (Eds.), Exploring artificial intelligence in the new millennium (pp.
Institute of Mathematical Sciences, New York University. 239– 269). Amsterdam: Elsevier Science and Technology Books.
Pouget, A., Dayan, P., & Zemel, R. (2000). Information processing with Zemel, R. (2004). Cortical belief networks. In R. Hecht-Neilsen, & T.
population codes. Nature Reviews Neuroscience, 1, 125–132. McKenna (Eds.), Computational models for neuroscience. New York:
Raizada, R. D. S., & Grossberg, S. (2003). Towards a theory of the laminar Springer.
architecture of cerebral cortex: computational clues from the visual Zemel, R., Dayan, P., & Pouget, A. (1998). Probabilistic interpretation of
system. Cerebral Cortex, 13(1), 100–113. population codes. Neural Computation, 10, 403–430.
Rao, R. P. N. (2004). Bayesian computation in recurrent neural circuits. Zhou, H., Friedman, H. S., & Heydt, R. (2000). Coding of border ownership
Neural Computation, 16, 1–38. in monkey visual cortex. Journal of Neuroscience, 20(17), 6594–6611.

NN 17 809

Uploaded by

Copyright:

Available Formats

NN 17 809

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NN 17 809

Uploaded by

Copyright:

Available Formats

Neural Networks 17 (2004) 809–821

2004 Special Issue

Integration of form and motion within a generative model of visual cortex

1. Introduction within the same level of the processing hierarchy. This is

Nomenclature N a set of neighboring nodes

Given a set of observations, including top-down and lateral

3. Simulation results increases (McDermott et al., 2001). The colored ‘dumbbell’

You might also like