NN 17 873

Neural Networks 17 (2004) 873–897
www.elsevier.com/locate/neunet
A model of active visual search with object-based attention

guiding scan paths
Linda J. Lanyon*, Susan L. Denham
Centre for Theoretical and Computational Neuroscience, University of Plymouth, Drakes Circus, Plymouth, Devon PL4 8AA, UK
Received 7 October 2003; revised 30 March 2004; accepted 30 March 2004
Abstract
When a monkey searches for a colour and orientation feature conjunction target, the scan path is guided to target coloured locations in
preference to locations containing the target orientation [Vision Res. 38 (1998b) 1805]. An active vision model, using biased competition, is
able to replicate this behaviour. As object-based attention develops in extrastriate cortex, featural information is passed to posterior parietal
cortex (LIP), enabling it to represent behaviourally relevant locations [J. Neurophysiol. 76 (1996) 2841] and guide the scan path. Attention
evolves from an early spatial effect to being object-based later in the response of the model neurons, as has been observed in monkey single
cell recordings. This is the first model to reproduce these effects with temporal precision and is reported here at the systems level allowing the
replication of psychophysical scan paths.
q 2004 Elsevier Ltd. All rights reserved.
Keywords: Visual attention; Biased competition; Active visual search; Mean field population approach
1. Introduction search, where retinal inputs change as the focus of attention

is shifted. This model addresses this issue; replicating
1.1. Biased competition attentional effects in inferior temporal (IT) region and
extrastriate visual area 4 (V4), with temporal precision, to
Vision attention literature has been strongly influenced qualitatively reproduce scan paths recorded from monkeys
recently by the biased competition hypothesis (Desimone, carrying out active search for a feature conjunction target.
1998; Desimone & Duncan, 1995; Duncan & Humphreys, The guidance of scan paths is carried out by a novel object-
1989; Duncan, Humphreys, & Ward, 1997). There is much based ‘cross-stream’ interaction between extrastriate areas
evidence from monkey single cell recordings to support the in the ventral pathway leading to temporal cortex and the
hypothesis (Chelazzi, Miller, Duncan, & Desimone, 1993, dorsal pathway leading to parietal cortex (Milner &
2001; Miller, Gochin, & Gross, 1993; Moran & Desimone, Goodale, 1995; Ungerleider & Mishkin, 1982).
1985; Motter, 1993, 1994a,b; Reynolds, Chelazzi, & Biased competition suggests that neuronal responses are
Desimone, 1999) and, gradually, models have been determined by competitive interactions that are subject to a
developed to test the theory computationally. Early models number of biases, such as ‘bottom-up’ stimulus information
tended to be small scale with only a few interacting units and ‘top-down’ cognitive requirements. Important in the
(Reynolds et al., 1999; Usher & Niebur, 1996). More theory is the idea that a working memory template of the
recently, systems level models have begun to emerge (Deco target object can bias competition between objects and
& Lee, 2002; De Kamps & Van der Velde, 2001; Hamker, features such that the target object is given a competitive
1998). However, there has been no systems level modelling advantage and other objects are suppressed. At the single
of the biased competition hypothesis for active visual cell level, studies in IT (Chelazzi et al., 1993; Miller et al.,
1993) and V4 (Chelazzi et al., 2001; Moran & Desimone,
1985; Motter, 1993, 1994a,b; Reynolds et al., 1999) show
* Corresponding author. Tel.: þ 44-1752-2325-67; fax: þ44-1752-2333-
49.
that a high response to a preferred stimulus (i.e. a stimulus
E-mail address: [email protected] (L.J. Lanyon). that causes a strong response from the cell) presented in
0893-6080/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2004.03.012
874 L.J. Lanyon, S.L. Denham / Neural Networks 17 (2004) 873–897
the cell’s receptive field is attenuated by the addition of a area V4, because the latter relies on the resolution of
second non-preferred stimulus (i.e. a stimulus that causes competition between objects in IT. The model was able to
only a weak response when presented alone in the receptive combine spatial and object-based attention at both the single
field). Responses are eventually determined by which of the cell and systems level in order to reproduce attentional
two stimuli are attended. When the preferred stimulus is effects seen in single cells in V4 (Chelazzi et al., 2001; Luck
attended, the response approaches that when this stimulus is et al., 1997) and IT (Chelazzi et al., 1993) with temporally
presented alone. If the non-preferred stimulus is attended, accuracy. Here, we use these attentional effects to produce
responses are severely suppressed, despite the presence of biologically plausible active vision behaviour.
the preferred stimulus in the receptive field. Neurons from There is evidence to suggest that an eye movement may
modules representing areas IT and V4 from the model be linked with a spatial enhancement of responses in area
presented here have been able to replicate such effects at the V4 because microstimulation of the frontal eye field (FEF),
cellular level (Lanyon & Denham, submitted). Here, we which is involved in the allocation of attention and eyes to
examine the systems level behaviour of the model in more locations in the scene (Schall, 2002), results in responses in
detail when replicating the nature of search scan paths V4 that are spatially modulated (Moore & Armstrong,
observed by Motter and Belky (1998b) who found most 2003). In addition to anticipatory spatial increases in
fixations landed within 18 of stimuli (only 20% fell in blank activity being found in LIP (Colby et al., 1996), posterior
areas of the display despite the use of very sparse displays) parietal cortex in general is implicated in the control of both
and these stimuli tended to be target coloured (75% of spatial and object-based attention (Corbetta et al., 2000;
fixations landed near target coloured stimuli and only 5% Corbetta, Shulman, Miezin, & Petersen, 1995; Fink, Dolan,
near non-target coloured stimuli). Halligan, Marshall, & Frith, 1997; Hopfinger et al., 2000;
Martinez et al., 1999; Posner, Walker, Friedrich, & Rafal,
1.2. Visual attention 1984; Robinson, Bowman, & Kertzman, 1995). Therefore,
the model assumes that FEF provides a spatial bias to V4 via
There has been debate over the issue of whether visual circuitry in LIP. Thus, a spatial bias is applied directly from
attention operates purely as a spatial ‘spotlight’ (Crick, FEF to LIP, and LIP then biases V4. The source of the bias
1984; Helmholtz, 1867; Treisman, 1982) or is more to LIP could also be dorsolateral prefrontal cortex, which
complex, operating in an object-based manner. The has connections with parietal cortex (Blatt, Andersen, &
evidence for object-based attention has been convincing Stoner, 1990), or pulvinar.
and growing in the psychophysical literature (Blaser,
Pylyshyn, & Holcombe, 2000; Duncan, 1984), from 1.3. Visual search behaviour
functional magnetic resonance imaging (O’Craven, Down-
ing, & Kanwisher, 1999), event-related potential recordings When a visual target contains a simple feature that is
(Valdes-Sosa, Bobes, Rodriguez, & Pinilla, 1998; Valdes- absent from distractors, it tends to effortlessly ‘pop-out’
Sosa, Cobo, & Pinilla, 2000) and from single cell recordings from the scene. However, when a target is defined by a
(Chelazzi et al., 1993, 2001; Roelfsema, Lamme, & conjunction of features, the search takes longer and appears
Spekreijse, 1998). However, there is no doubt that attention to require a serial process, which has been suggested to be
can produce spatially specific effects also (Bricolo, the serial selection of spatial locations to which attention is
Gianesini, Fanini, Bundesen, & Chelazzi, 2002; Connor, allocated (Treisman, 1982; Treisman & Gelade, 1980).
Callant, Preddie, & Van Essen, 1996). In the lateral Active visual search involves the movement of eyes, and
intraparietal area (LIP), an anticipatory spatial enhancement it is presumed attention (Hoffman & Subramaniam, 1995),
of responses has been recorded from single cells (Colby, to locations to be inspected. The resultant series of points
Duhamel, & Goldberg, 1996) and has been seen in imaging where the eyes fixate form a scan path. This differs from the
of the possible human homologue of LIP (Corbetta, more commonly modelled covert search where the attentive
Kincade, Ollinger, McAvoy, & Shulman, 2000; Hopfinger, focus is shifted but eye position and, hence, retinal input are
Buonocore, & Mangun, 2000; Kastner, Pinsk, De Weerd, held constant. During active search for a feature conjunction
Desimone, & Ungerleider, 1999). Spatial effects have been target, it seems that colour (or luminance) is more influential
recorded in single cells in area V4 in advance of the sensory on the scan path than form features, such as orientation, in
response and have then modulated the earliest stimulus- monkeys (Motter & Belky, 1998b), and in humans (Scialfa
invoked response at 60 ms post-stimulus (Luck, Chelazzi, & Joffe, 1998; Williams & Reingold, 2001). When the
Hillyard, & Desimone, 1997). However, object-based numbers of each distractor type are equal, this preference for
effects have not been recorded until much later in the target coloured locations over locations containing the
response, from , 150 ms in IT and V4 (Chelazzi et al., target orientation seems robust, even when the task is biased
1993, 2001). towards orientation discrimination (Motter & Belky,
The model presented here has been used (Lanyon & 1998b). Colour appears to segment the scene and guide
Denham, submitted) to suggest that spatial attention is the scan path in a manner that resembles guided search
available earlier than object-based attention, at least in (Wolfe, 1994; Wolfe, Cave, & Franzel, 1989). Even abrupt
L.J. Lanyon, S.L. Denham / Neural Networks 17 (2004) 873–897 875
onsets can be ineffective in overriding a top-down

attentional set for colour and capturing attention in an
exogenous manner (Folk & Remington, 1999). However,
when distractor types are not in equal proportion in the
display, saccadic selectivity may be biased towards the
feature dimension that has the fewest distractors sharing this
feature with the target (Bacon & Egeth, 1997; Shen,
Reingold, & Pomplum, 2000, 2003). In other words, if
there are fewer distractors sharing the target’s orientation
than its colour, the scan path may be drawn target
orientation distractors. This is known as the distractor-
ratio effect. Displays used in simulations here replicate
those used by Motter and Belky (1998a,b) and have equal
numbers of each type of distractor.
1.4. Representation of behaviourally relevant locations
Posterior parietal area LIP encodes the locations of

behaviourally relevant features (Colby et al., 1996), perhaps
acting as a ‘salience map’ (Colby & Goldberg, 1999;
Gottlieb, Kusunoki, & Goldberg, 1998; Kusunoki, Gottlieb,
& Goldberg, 2000). Although responses in LIP are not
dependant on motor response (Bushnell, Goldberg, &
Robinson, 1981), it has projections to superior colliculus
(Lynch, Graybiel, & Lobeck, 1985) and FEF (Blatt et al.,
1990), which are thought to be involved in generating
saccades, and direct electrical stimulation of LIP elicits Fig. 1. An overview of the model.
saccades (Their & Andersen, 1998). Thus, LIP is likely to
form part of the cortical network responsible for selecting Clark & Hillyard, 1996, and Martinez et al., 1999 for ERP
possible targets for saccades and, in this model, activity in evidence against an early attentional modulation in V1) and
LIP is used to decide the next location to be fixated. the V1 module could be added to the dynamic part of this
In the model LIP, featural information is integrated model in order to replicate attentional effects therein.
within a spatial competitive framework in order to represent An overview of the model is given in Fig. 1 and is
the most behaviourally relevant locations in the scene formally defined in Appendix A. Fig. 2 shows the inhibition
(Colby et al., 1996) and guide the scan path accordingly. and biases that influence competition in each of the dynamic
modules. External biases, in the form of direct current
injections, from prefrontal cortex and FEF provide infor-
2. The model mation relating to the target object and a spatial bias relating
to the newly fixated location, respectively. These top-down
The model focuses on intermediate stages of visual biases, in addition to the bottom-up stimulus information
processing with attention arising as an emergent property of from the retina and V1, serve to influence the competitive
the dynamics of the system within modules representing V4, interactions within V4, IT and LIP.
IT and LIP. This dynamic portion of the model uses mean To detect form, the retina performs simple centre-
field population dynamics (Section A.3), where represen- surround processing simulating retinal ganglion broadband
tation is at the level of cell populations, known as pools or cells, as modelled by Grossberg and Raizada (2000). A
assemblies. Before the dynamic portion of the model, similar approach is adopted for the concentric single
featural ‘pre-processing’ is carried out by two modules opponent cells that process the colour information. Within
representing the retina (and lateral geniculate nucleus, LGN, V1, simple cells detect orientation using a difference-of-
of the thalamus) and V1. These modules do not form part of offset-Gaussian approach, also described by Grossberg and
the dynamic portion of the system, for reasons of Raizada (2000). This approach was found to be more
computational simplicity and speed, and because the aim accurate for detecting oriented edges than a Gabor filter. To
of this model is to focus on attention operating in process colour, double-opponent V1 cells are modelled, as
extrastriate areas and beyond. However, attention is found described in Section A.2.2.
in V1 (Brefczynski & DeYoe, 1999; Ghandi, Heeger, & The ventral stream (V1, V4 and IT) operates as a
Boynton, 1999; Motter, 1993; Roelfsema et al., 1998; feature/object processing hierarchy with receptive field
Somers, Dale, Seiffert, Tootell, & Functional, 1999; but see sizes increasing towards IT and encoded features becoming
Fig. 2. The inhibition and biases that influence competition in each of the dynamic modules (a) IT; (b) LIP; (c) V4.
more complex. Anterior areas in IT, such as area TE, are not feature dimension only. Stimuli of this type were chosen to
retinotopic but encode objects in an invariant manner mirror those used by Motter and Belky (1998b), in order that
(Wallis & Rolls, 1997) and the IT module here represents the active vision scan paths from this experiment could be
such encoding. An inhibitory interneuron pool mediates simulated. The size of the V1 orientation and colour filters,
competition between objects in IT. V1 and V4 are described in Section A.2, determines the number of pixels
retinotopic and process both colour and orientation. A V1 that represent 18 of visual angle, since V1 receptive fields
cell for every feature exists at every pixel position in the tend to cover no more than about 18 (Wallis & Rolls, 1997).
image. V4 receives convergent inputs from V1 over the area This size is then used to scale the stimuli to be 1 £ 0.258, as
of its receptive field and V4 receptive fields overlap by one used by Motter and Belky (1998a,b).
V1 neuron. V4 is arranged as a set of feature ‘layers’ The model operates in an active vision manner by
encoding each feature in a retinotopic manner. Each feature moving its retina around the image so that its view of the
belongs to a feature type, i.e. colour or orientation. V4 world is constantly changing. Most visual attention models
neurons are known to be functionally segregated (Ghose & (Deco, 2001; Deco & Lee; 2002; Niebur, Itti, & Koch, 2001)
Ts’O, 1997) and the area is involved in the representation of have a static retina. Here, cortical areas receive different
colour as well as form (Zeki, 1993). bottom-up input from the retina and V1 at each fixation. The
LIP provides a retinotopic spatio-featural map that is retinal image is the view of the scene entering the retinal,
used to control the spatial focus of attention and fixation. and cortical areas, at any particular fixation. From the
Locations in LIP compete with one another and the centre of information within the retinal image, the system has to
the receptive field of the assembly with the highest activity select its next fixation point in order to move its retina. The
is chosen as the next fixation point. LIP is able to integrate size of the retinal image is variable for any particular
featural information in its spatial map due to its connection simulation but is normally set to 441 pixels, which equates
with area V4. Competition between locations in LIP is to approximately 408 of visual angle. This is smaller than
mediated by an inhibitory interneuron pool. our natural vision but stable performance across a range of
Stimuli consist of vertical and horizontal, red and green retinal image sizes is possible (the only restriction being that
bars. During active search for an orientation and colour a very small retina tends to lead to a higher proportion of
conjunction target, distractors differ from the target in one fixations landing in blank areas of very sparse scenes due to
lack of stimuli within the limited retina) due to the 2.2. Object-based attention
normalisation of inputs to IT, described in Section A.3.2.
Cortical areas V4 and LIP are scaled dependant on the size Object-based attention operates within the ventral stream
of the retinal image so that larger retinal images result in of the model (V4 and IT) in order that features belonging to
more assemblies in V4 and LIP and, thus, longer processing the target object are enhanced and non-target features are
times during the dynamic portion of the system. For a 408 suppressed. A connection from the retinotopic ventral
retinal window, V4 consists of 20 £ 20 assemblies in each stream area (V4) to the parietal stream of the model allows
feature layer, and processing in 5 ms steps for typical the ventral object-based effect to influence the represen-
fixations lasting , 240 ms (saccade onset is determined by tation of behaviourally relevant locations in the LIP module.
the system) takes approximately 12 s to run in Matlab using The prefrontal object-related bias to IT provides a
a Pentium (4) PC with a 3.06 GHz processor and 1 GB of competitive advantage to the assembly encoding the target
RAM. For monochromatic single cell simulations the object such that, over time, this object wins the competition
system takes 0.5 s to process a 240 ms fixation at 1 ms in IT. Attention appears to require working memory (De
steps using a small retinal image (covering 23 £ 23 pixels). Frockert, Rees, Frith, & Lavie, 2001) and, due to its
sustained activity (Miller, Erickson, & Desimone, 1996),
prefrontal cortex has been suggested as the source of a
2.1. Spatial attention
working memory object-related bias to IT. Other models
have implemented such as bias (Deco & Lee, 2002; Renart,
Following fixation an initially spatial attention window Moreno, de la Rocha, Parga, & Rolls, 2001; Usher &
(AW) is formed. The aperture of this window is scaled Niebur, 1996). Here, the nature of this bias resembles the
according to coarse resolution information reflecting local responses of so-called ‘late’ neurons in prefrontal cortex,
stimulus density (Motter & Belky, 1998a), which is whose activity builds over time and tends to be highest late
assumed to be conveyed rapidly by the magnocellular in a delay period (Rainer & Miller, 2002; Romo, Brody,
pathway to parietal cortex, including LIP, and other possibly Hernandez, & Lemus, 1999). It is modelled with a sigmoid
involved areas, such as FEF. Alternatively, this information function, as described in Section A.3.2. This late response
may be conveyed sub-cortically, and superior colliculus or also reflects the time taken for prefrontal neurons to
pulvinar could be the source of this spatial effect in LIP and distinguish between target and non-target objects, beginning
V4. All other information within the system is assumed to 110 – 120 ms after the onset of stimuli at an attended
be derived from the parvocellular pathway. Thus, during a location (Everling, Tinsley, Gaffan, & Duncan, 2002).
scan path, the size of the AW is dynamic, being scaled IT provides an ongoing object-related bias to V4 that
according to stimulus density found around any particular allows target features to win local competitions in V4 such
fixation point (see Lanyon and Denham (2004a) for further that these features become the most strongly represented
details). Attention gradually becomes object-based over across V4. Each assembly in IT provides an inhibitory bias
time, as competition between objects and features is to features in V4 that do not relate to the object encoded by
resolved. Object-based attention is not constrained by the it. As the target object becomes most active in IT, this
spatial AW but is facilitated within it. Thus, object-based results in suppression of non-target features in V4. These
attention responses are strongest within the AW causing a object-based effects appear in IT and V4 later in their
combined attentional effect, as found by McAdams and response, from , 150 ms post-stimulus, as was found in
Maunsell (2000) and Treue and Martinez Trujillo (1999). single cell recordings (Chelazzi et al., 1993, 2001; Motter,
The initial spatial AW is implemented as a spatial bias 1994a,b). The result of object-based attention in V4 is that
provided from FEF to LIP, which results in a spatial target features are effectively ‘highlighted’ in parallel across
attention effect in LIP that is present in anticipation of the visual field (McAdams & Maunsell, 2000; Motter,
stimulus information (Colby et al., 1996; Corbetta et al., 1994a,b).
2000; Hopfinger et al., 2000; Kastner et al., 1999). A
connection from LIP to V4 allows a spatial attentional effect 2.3. The scan path
to be present in V4 as found in many studies (Connor et al.,
1996; Luck et al., 1997). Spatial attention in the model V4 In order to carry out effective visual search, brain areas
assemblies appears as an increase in baseline firing in involved in the selection of possible target locations should
advance of the stimulus information and as a modulatory be aware of object-based effects that are suppressing non-
effect on the stimulus-invoked response beginning at target features in the visual scene (as represented in the
, 60 ms, as found in single cells by Luck et al. (1997) retinotopic visual areas). This model suggests that object-
and reported in Lanyon and Denham (submitted). Spatial based effects occurring in area V4 are able to influence the
attention in V4 provides a facilitatory effect to object-based spatial competition for next fixation location in LIP. It is this
attention within the AW. The excitatory connection from cross-stream interaction between the ventral and dorsal
LIP to V4 also serves to provide feature binding at the visual streams (Milner & Goodale, 1995; Ungerleider &
resolution of the V4 receptive field. Mishkin, 1982) that allows visual search to select
appropriate stimuli for examination. Thus, object-based to be significant when the most active object assembly is
attention, at least at the featural level, may be crucial for twice as active as its nearest rival (such a quantitative
efficient search. difference is reasonable when compared to the recordings of
As object-based attention becomes effective within the Chelazzi et al., 1993) and a saccade is initiated 70 ms later,
ventral stream, the parallel enhancement of target features reflecting motor preparation latency.
across V4 results in LIP representing these locations as It is unclear what information is available within cortex
being salient, due to its connection from V4. The weight of during a saccade but evidence suggests that magnocellular
connection from the V4 colour assemblies to LIP is slightly inputs are suppressed during this time (Anand & Bridge-
stronger than that from the V4 orientation assemblies. This man, 2002; Thiele, Henning, Kubischik, & Hoffmann, 2002;
gives target coloured locations an advantage in the spatial and see Ross, Morrone, Goldberg, and Burr (2001) and Zhu
competition in LIP such that target coloured locations are and Lo (1996) for reviews). In the model, saccades are
represented as more behaviourally relevant and, hence, tend instantaneous but the dynamic cortical areas (IT, LIP and
to be most influential in attracting the scan path, as found by V4) are randomised at the start of the subsequent fixation in
Motter and Belky (1998b). The difference in strength of order to reflect saccadic suppression of magnocellular and
connection of the V4 features to LIP need only be marginal possibly parvocellular inputs, and cortical dynamics during
in order to achieve this effect. Fig. 3 shows the effect of the saccade.
adjusting the relative connection weights. The strength of
these connections could be adapted subject to cognitive 2.5. Inhibition of return
requirement or stimulus-related factors, such as distractor
ratios (Bacon & Egeth, 1997; Shen, Reingold, & Pomplum, As an integrator of spatial and featural information
2000, 2003). However, with the proportion of distractor (Colby et al., 1996; Gottlieb et al., 1998; Toth & Assad,
types equal, Motter and Belky (1998b) found that 2002), LIP provides the inhibition of saccade return (Hooge
orientation was unable to override colour even during an & Frens, 2000) mechanism required here to prevent the scan
orientation discrimination task. This suggests that the bias path returning to previously inspected sites. Inhibitory after-
towards a stronger colour connection could be learnt during effects once attention is withdrawn from an area are
development and be less malleable to task requirement. demonstrated in classic inhibition of return (IOR) studies
(Clohessy, Posner, Rothbart, & Vecera, 1991; Posner et al.,
2.4. Saccades 1984) and may be due to oculomotor processes, possibly
linked with the superior colliculus (Sapir, Soroker, Berger,
Single cell recordings (Chelazzi et al., 1993, 2001) & Henik, 1999; Trappenberg, Dorris, Munoz, & Klein,
provide evidence that saccade onset may be temporally 2001) or a suppressive after-effect of high activity at a
linked to the development of significant object-based previously attended location. In a model with a static retina,
effects, with saccades taking place , 70– 80 ms after a suppression of the most active location over time by specific
significant effect was observed in either IT or V4. In the IOR input (Itti & Koch, 2000; Niebur et al., 2001) or
model, saccades are linked to the development of a through self-inhibition is possible. Here, such a process
significant object-based effect in IT. The effect is deemed within LIP could lead to colour-based IOR (Law, Pratt, &
Fig. 3. The effect on fixation position of increasing the relative weight of V4 colour feature input to LIP. When V4 colour features are marginally more strongly
connected to LIP than V4 orientation features, the scan path is attracted to target coloured stimuli in preference to stimuli of the target orientation. Fixation
positions were averaged over 10 scan paths, each consisting of 50 fixations, over the image shown in Fig. 6a.
Abrams, 1995) due to the suppression of the most active area 7a, where neurons respond to both the retinal
locations. When the retina is moving, such inhibition is location of the stimulus and eye position in the orbit.
inadequate because there is a need to remember previously Such a mapping may be used to keep track of objects
visited locations across eye movements and there may be a across saccades and this concept has already been
requirement for a head- or world-centred mnemonic explored in neural network models (Andersen & Zipser,
representation. This is a debated issue with there being 1988; Mazzoni, Andersen, & Jordan, 1991; Quaia,
some evidence to suggest that humans use very little Optican, & Goldberg, 1998; Zipser & Andersen, 1988).
memory during search (Horowitz & Wolfe, 1998; Wood- Representations in LIP are retinotopic such that, after a
man, Vogel, & Luck, 2001). However, even authors saccade the representation shifts to the new co-ordinate
advocating ‘amnesic search’ (Horowitz & Wolfe, 1998) system based on the post-saccadic centre of gaze. Just
do not preclude the use of higher-level cognitive and prior to a saccade the spatial properties of receptive fields
mnemonic processes for efficient active search. Parietal in LIP change (Ben Hamed, Duhamel, Bremmer, & Graf,
damage is linked to the inability to retain a spatial working 1996) and many LIP neurons respond (, 80 ms) before
memory of searched locations across saccades so that the saccade to salient stimuli that will enter their receptive
locations are repeatedly re-fixated (Husain et al., 2001) and fields after the saccade (Duhamel, Colby, & Goldberg,
computational modelling has suggested that units with 1992). In common with most models of this nature (Deco
properties similar to those found in LIP could contribute to & Lee, 2002), such ‘predictive re-mapping’ of the visual
visuospatial memory across saccades (Mitchell & Zipser, scene is not modelled here. At this time, it is left to
2001). Thus, it is plausible for the model LIP to be involved specialised models to deal with the issue of pre- and post-
in IOR. saccadic spatial constancy, involving changes in represen-
Also, IOR seems to be influenced by recent event/reward tation around the time of a saccade (Ross, Morrone, &
associations linked with orbitofrontal cortex (Hodgson et al., Burr, 1997), memory for targets across saccades (Findlay,
2002). In the model, the potential reward of a location is Brown, & Gilchrist, 2001; McPeek, Skavenski, &
linked to its novelty, i.e. whether a location has previously Nakayama, 2000) and the possible use of visual markers
been visited in the scan path and, if so, how recently. for co-ordinate transform (Deubel, Bridgeman, & Schnei-
Competition in LIP is biased by the ‘novelty’ of each der, 1998), along with the associated issue of suppression
location with the possible source of such a bias being frontal of magnocellular (Anand & Bridgeman, 2002; Thiele,
areas, such as orbitofrontal cortex. A mnemonic map of Henning, Kubischik, & Hoffmann, 2002; and see Ross,
novelty values is constructed in a world- or head-centred co- Morrone, Goldberg, and Burr (2001) and Zhu and
ordinate frame of reference and converted into retinotopic Lo (1996) for reviews) and possibly parvocellular cortical
co-ordinates when used in LIP. Initially, every location in inputs just prior to and during a saccade.
the scene has a high novelty, but when fixation (and, thus,
attention) is removed from an area, all locations that fall
within the spatial AW have their novelty values reduced. At 3. Results
the fixation point the novelty value is set to the lowest value.
In the immediate vicinity of the fixation point (the area of 3.1. Dynamics of object-based attention result in
highest acuity and, therefore, discrimination ability, in a representation of behaviourally relevant locations in LIP
biological system) the novelty is set to low values that
gradually increase, in a Gaussian fashion, with distance Fig. 4 shows the activity in V4, IT and LIP at different
from the fixation point (Hooge & Frens, 2000). All locations times during the first fixation on an image. The outer box
that fall within the AW, but are not in the immediate vicinity plotted on the image in Fig. 4a represents the retinal image
of the fixation point, have their novelty set to a neutral value. and the inner box represents the AW. Initially, spatial
Novelty is allowed to recover linearly with time. This allows attention within the AW modulates representations in LIP
IOR to be present at multiple locations, as has been found in and V4. Object assemblies in IT are all approximately
complex scenes (Danziger, Kingstone, & Snyder, 1998; equally active. Later in the response (from , 150 ms post-
Snyder & Kingstone, 2001; Tipper, Weaver, & Watson, fixation), object-based attention develops and the target
1996) where the magnitude of the effect decreases object becomes most active in IT, whereas distractor objects
approximately linearly from its largest value at the most are suppressed. Features belonging to the red vertical target
recently searched location so that at least five previous are enhanced at the expense of the non-target features across
locations are affected (Irwin & Zelinsky, 2002; Snyder & V4 (Motter, 1994a,b). Responses in V4 are modulated by
Kingstone, 2000). both spatial attention and object/feature attention (Anllo-
The use of such a scene-based map (in world- or head- Vento & Hillyard, 1996; McAdams & Maunsell, 2000).
centred coordinates) reflects a simplification of processing This occurs because V4 is subject to both a spatial bias, from
that may occur within parietal areas, such as the ventral LIP, and an object-related bias, from IT. Both inputs are
intraparietal area, where receptive fields range from applied to V4 throughout the temporal processing at each
retinotopic to head-centred (Colby & Goldberg, 1999) or fixation. However, the spatial bias results in an earlier
spatial attention effect in V4 than the object effect because Once these object-based effects are present in V4, LIP is
the latter is subject to the development of significant object- able to represent the locations that are behaviourally
based attention in IT resulting from the resolution of relevant, i.e. the red coloured locations, as possible saccade
competition therein. targets. Prior to the onset of object-based attention all
Fig. 4. (a) A complete scene with the retinal image shown as the outer of two boxes plotted. Within the retinal image the initial spatial AW, shown as the inner
box, is formed. This AW is scaled according to local stimulus density. (N.B. When figures containing red and green bars are viewed in greyscale print, the red
appears as a darker grey than the green.) (b) The activity within the cortical areas 45 ms after the start of the fixation. This is prior to the sensory response in V4
(at 60 ms). However, there is an anticipatory elevation in activity level within the spatial AW in LIP and V4, as seen in single cell studies of LIP (Colby et al.,
1996) and V4 (Luck et al., 1997). (c) The activity within the cortical areas 70 ms after the start of the fixation. At this time, spatial attention modulates responses
in V4 and LIP. LIP represents both red and green stimulus locations. (d) The activity 180 ms after the start of the fixation. Object-based attention has
significantly modulated responses in IT and V4. LIP represents the location of target coloured (red) stimuli more strongly than non-target coloured (green)
stimuli. Object-based effects are still subject to a spatial enhancement in V4 within the AW. (For interpretation of the references to colour in this figure legend,
the reader is referred to the web version of this article.)
Fig. 4 (continued )
stimuli are approximately equally represented in LIP (Fig. would be explained, in this model, by the time taken to
4c) and a saccade at this time would select target and non- develop object-based effects in the ventral stream and
target coloured stimuli with equal probability. Saccade convey this information to LIP.
onset is determined by the development of object-based
effects in IT and, by the time this has occurred later in the 3.2. Inhibition in V4
response (Fig. 4d), the target coloured locations in LIP have
become more active than non-target coloured locations. The nature of inhibition in V4 affects the representation
Therefore, the saccade tends to select a target coloured of possible target locations in LIP. If V4 is implemented
location. Increased fixation duration has been linked with with a common inhibitory pool for all features or per feature
more selective search (Hooge & Erkelens, 1999) and this type (i.e. one for colours and one for orientations), this
results in a normalising effect similar to a winner-take-all but selective for green stimuli. Thus, inhibitory interneuron
process, whereby high activity in any particular V4 assemblies in V4 exist for every retinotopic location in V4
assembly can strongly suppress other assemblies. V4 and for each feature type. This is shown in Fig. 2c.
assemblies receive excitatory input from LIP, as a result This type of local inhibition results in the performance
of the reciprocal connection from V4 to LIP. When LIP shown in Fig. 5, which records the V4 colour assemblies and
receives input from all features in V4 as a result of two LIP at two different time steps during a search for a red
different types of stimulus (e.g. a red vertical bar and a target. Within the receptive field of the V4 assemblies at
horizontal green bar) being within a particular V4 receptive (matrix coordinate) location [6,13] both a red horizontal bar
field, these feature locations in V4 will become most highly and a green vertical bar are present in the retinal image and
active and may suppress other locations too strongly. Thus, are within the spatial AW. During the initial stimulus-
the common inhibitory pool tends to favour locations that related response from 60 ms post-fixation, the red and the
contain two different stimuli within the same receptive field. green selective assemblies at this location are approximately
Previous model that have used a common inhibitory pool equally active. These are amongst the strongest V4
(Deco, 2001; Deco & Lee, 2002) may not have encountered assemblies at this point because they are in receipt of
this problem because only one feature type was encoded. bottom-up stimulus information and are within the AW
Therefore, in this model currently, the requirement is that positively biased by LIP. LIP represents the locations of all
features within a particular feature type should compete strong featural inputs from V4 and has no preference for
locally so, for example, an assembly selective for red stimuli stimulus colour at this point. However, by 70 ms post-
competes with an assembly at the same retinotopic location stimulus strong competitive effects, due to there being two
Fig. 5. The dynamic effect when a V4 receptive field includes two different stimuli. This image also shows what happens when fixation is close to the edge of
the original image (a computational problem not encountered in the real world) and the retinal image extends beyond the image. In this case the cortical areas
receive no bottom-up stimulus information for the area beyond the original image but other cortical processing remains unaltered. As LIP represents the
locations of behaviourally relevant stimuli, the next fixation is never chosen as a location beyond the extent of the original image. (a) The extent of the retinal
image (outer box) and AW are shown for a fixation, forced to be at this location. (b) The activity within the cortical areas 70 ms after the start of the fixation.
Within the receptive field of V4 assemblies at position [6,13] there is both a red horizontal and a green vertical bar. Therefore, all V4 assemblies at this position
receive stimulus input. At the time of the initial sensory response assemblies at this position were as active as other assemblies in receipt of bottom-up stimulus
information. However, by 70 ms competition between the colours at this location, and between the orientations at this location, has caused the activity here to
be slightly lower than that of other assemblies that encode the features of a single stimulus within their receptive field. This is due to there being less
competition in the single stimulus case. The addition of the second stimulus drives responses down, as has been found in single cell recordings such as Chelazzi
et al. (2001). At this time all V4 assemblies are approximately equally active and there is no preference for the target object’s features. (c) The activity within
the cortical areas 200 ms after the start of the fixation when object-based attention has become significant. The red and vertical (target feature) assemblies at
[6,13] (in matrix coordinates) have become more active than the non-target features, which are suppressed, despite each receiving ‘bottom-up’ information. (d)
A plot of the activity of the V4 assemblies over time at position [6,13]. From ,150 ms object-based effects suppress non-target features. The activity of the
vertical orientation and red colour selective assemblies are shown as solid lines and that of the horizontal orientation and green colour selective assemblies as
dotted lines.)
Fig. 5 (continued )
stimuli within the receptive field, result in overall responses coloured stimuli and it is one of these locations that will
in each V4 feature at this location being lower than at be chosen as the next fixation point.
locations that contain only one stimulus, i.e. the addition of
a second stimulus lowers responses of the cell to the referred 3.3. The scan path
stimulus, as found in single cell studies such as Chelazzi
et al. (2001). By 200 ms, object-based attention has become The system produces scan paths that are qualitatively
effective in V4 and the green assembly in V4 has become similar to those found by Motter and Belky (1998b) where
significantly suppressed compared to the red assembly at the fixations normally land within 18 of orientation-colour
same location. The representation in LIP is now biased feature conjunction stimuli, rather than in blank areas, and
towards the representation of the locations of target these stimuli tend to be target coloured. Fig. 6a and b shows
learning, results in the capture of more non-target coloured

stimuli within the scan path. Motter and Belky (1998b)
found that 5% of fixations landed within 18 of non-target
coloured stimuli during active search. Hence, the model
predicts that distractor stimuli (specifically, stimuli not
sharing a target feature) may be more likely to capture
attention when object-based effects within the ventral
stream are weak (possibly due to unfamiliarity with the
objects or the task) or have not yet had time to develop fully
when the saccade is initiated. In particular, the tendency for
the scan path to select only target coloured locations, during
search for a colour and form conjunction target, may be
weaker when the objects or task are less well known.
In addition, the strength of the novelty bias affects the
scan path. Increasing the weight of the novelty bias to LIP
slightly reduces the likelihood of fixating target coloured
stimuli and, in sparse scenes, increases the number of
fixations landing in blank areas of the display, as shown in
Fig. 8. This possibly suggests that a search where novelty is
the key factor, for example a hasty search of the entire
scene, may result in more ‘wasted’ fixations in blank areas.
3.4. Saccade onset
Lanyon and Denham (submitted) replicated saccade

onset times for single cell recordings in IT (Chelazzi et al.,
1993) and V4 (Chelazzi et al., 2001) and found that, due to
the link with the development of object-based attention,
saccade onset can be delayed by two factors. Firstly,
reducing the weight of IT feedback to V4 in order to
replicate data from less trained monkeys resulted in a
slightly longer latency to saccade. More significantly,
saccade onset time was dependant on the nature and latency
of object-based prefrontal feedback to IT, also suggested to
Fig. 5 (continued ) be related to learning. Thus, saccade onset time may depend
on the familiarity with objects in the display and the task.
The same effect is found here for scan path simulations.
As saccade onset is determined by the timing of object-
typical scan paths through relatively dense and sparse based attention in IT, a typical plot of activity in IT during a
scenes, respectively. scan path fixation is given in Fig. 9. The effect on saccade
Weaker object-related feedback within the ventral path- onset time of varying the strength of IT feedback to V4
way (prefrontal to IT; IT to V4) reduces the object-based during scan path simulations is shown in Fig. 10. Stronger
effect within V4 and this results in more non-target coloured object-based feedback from IT to V4 (and, similarly,
stimuli being able to capture attention. Replication of single prefrontal feedback to IT) results in the latency to saccade
cell data (described by Lanyon & Denham, submitted) from being reduced because the object-based effect in IT is
older and more highly trained monkeys (Chelazzi et al., reinforced over time by the feedforward inputs from V4. If
2001, compared to Chelazzi et al., 1993) suggested that IT the weight of the IT feedback (parameter h in Eqs. (A20)
feedback to V4 is tuned by learning, i.e. the strength of the and (A22)) is set to the value (5) used by Lanyon and
IT feedback to V4 related to the amount of the monkeys’ Denham (submitted) to replicate the single cell recordings
training in the task and stimuli. The strength of this feedback of Chelazzi et al. (2001), scan path saccades take place
also affects the scan path. When the feedback is strength- about 230 –240 ms post-stimulus (i.e. after fixation). For the
ened, object-based effects in IT and V4 are increased and the scene used in Fig. 6a, this results in saccades taking place,
scan path is more likely to select target coloured stimuli. on average, at 238 ms post-stimulus. Chelazzi et al. (2001)
The effect of varying the weight of IT feedback to V4 on found that saccades took place on average 237 – 240 ms
fixation position is shown in Fig. 7. Reducing the strength of post-stimulus, depending on stimulus configuration in
object-related feedback from IT to V4, which may reflect relation to the receptive field.
Fig. 6. Scan paths that tend to select locations near a target coloured stimulus. Here, IT feedback to V4 is strong (parameter h in Eqs. (A20) and (A22) is set to
5). Fixations are shown as (i) magenta dots: within 18 of a target coloured stimulus; (ii) blue circles: within 18 of a non-target colour stimulus. (a) A scan path
through a dense scene. The target is a red stimulus. 95.92% of fixations are within 18 of a target coloured stimulus. 4.08% of fixations are within 18 of a non-
target colour stimulus. Average saccade amplitude ¼ 7.438. (b) A scan path through a sparse scene. The target is a green stimulus. 91.84% of fixations are
within 18 of a target coloured stimulus. 8.16% of fixations are within 18 of a non-target colour stimulus. Average saccade amplitude ¼ 12.138. (For
interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Thus, the model tentatively predicts that during search in experiments, search has been found to be facilitated,
crowded scenes, as well as in the smaller arrays used for the enabling faster manual reaction times, when distractors
single cell simulations, fixation duration may be shorter are familiar (Lubow & Kaplan, 1997; Reicher, Snyder, &
amongst very familiar objects or in a familiar task. Richards, 1976; Richards & Reicher, 1978; Wang, Cava-
Therefore, search may be faster under these conditions. In nagh, & Green, 1994). However, Greene and Rayner (2001)
Fig. 7. The effect of the weight of IT to V4 feedback on fixation position. Fixation positions were averaged over 10 scan paths, each consisting of 50 fixations,
over the image shown in Fig. 6a. Fixations are considered to be near a stimulus when they are within 18 of the stimulus. As the weight of feedback is increased
there is a tendency for more target coloured stimuli and less non-target coloured stimuli to be fixated because object-based effects in the ventral stream are
stronger.
suggest that this familiarity effect may be due to the span of density such that, when fixation is placed in an area of dense
effective processing (which may be likened to the size of stimuli, the spatial AW is smaller than that when stimuli are
AW here) being wider around familiar distractors, rather sparse. The AW contributes a positive bias to the
than fixation duration being shorter. competition in LIP and results in locations containing target
features within the AW being favoured within LIP and
3.5. Effect of scene density on saccade amplitude being most likely to attract attention. Thus, the model
predicts that saccade amplitude is dependant on the stimulus
As Motter and Belky (1998b) found, saccades tend to be density in the local area around fixation.
shorter in dense scenes compared to sparse scenes. Fig. 6 Fig. 11a shows a scan path that has been unable to reach
shows this effect (also, large amplitude saccades are shown target coloured stimuli in the bottom left corner of the
in a sparse scene in Fig. 11). The same sized retinal image display. This was due to the constraint imposed by the size
was used for both the simulations in Fig. 6 in order that of the retinal image in this case (441 £ 441 pixels,
stimuli at the same distances could attract attention. equivalent to approximately 408 of visual angle). When
However, the spatial AW is scaled based on local stimulus the retinal image is increased to 801 £ 801 pixels
Fig. 8. The effect of varying the weight of the novelty bias on fixation position in a relatively sparse scene. Fixation position was averaged over 10 scan paths,
each consisting of 50 fixations, over the image shown in Fig. 6b. Fixations are considered to be near a stimulus when they are within 18 of the stimulus. As the
weight of the novelty bias increases, the number of fixations near target coloured stimuli decreases and fixations are more likely to occur in blank areas of the
display.
3.6. Inhibition of return in the scan path
Implementation of the novelty bias to LIP successfully

produces IOR within the scan path that declines over time
allowing the eventual re-visiting of sites previously
inspected. Plots of the novelty values after the scan paths
shown in Fig. 6 are shown in Fig. 12. These figures show
novelty being present at multiple locations (Danziger et al.,
1998; Snyder & Kingston, 2001; Tipper et al., 1996),
recovering over time (Irwin & Zelinsky, 2002; Snyder &
Kingston, 2000), being strongest in the immediate vicinity
of the fixation point and decreasing with distance from the
fixation point (Hooge & Frens, 2000). Novelty scales with
the size of the AW so that a smaller area around fixation
receives reduced novelty in a dense scene than in a sparse
scene. In a scene containing patches of different stimulus
Fig. 9. Plot of activity of the four object assemblies in IT (RH: encodes red densities, the size of AW and, hence, novelty update is
horizontal bar; RV: encodes red vertical bar; GH: encodes green horizontal based on the local density. This allows the scan path to
bar; GV: encodes green vertical bar) over time during one fixation. The move through sparser areas with larger saccades but
initial sensory response begins from 80 ms post-stimulus (after fixation
thoroughly examine any dense areas it encounters (Lanyon
established). At that time all objects are approximately equally active.
However, from about 150 ms post-stimulus the target object (a red & Denham, 2004a).
horizontal bar, RH) becomes most active. Significant object-based attention
is established and a saccade is initiated approximately 240 ms after the
onset of the stimulus. The saccade is indicated by the cessation of the plot 4. Discussion and conclusion
lines. (For interpretation of the references to colour in this figure legend, the
reader is referred to the web version of this article.)
This appears to be the first model that has both spatial
and object-based attention occurring over different time
(equivalent to approximately 738 of visual angle), more courses (that accord with cellular recordings: Chelazzi et al.,
stimuli are available to the retina and this allows the stimuli 1993, 2001; Luck et al., 1997) and with the scan path able to
at the extremity to attract attention, as shown in Fig. 11b. locate behaviourally relevant stimuli due to the creation of a
Altering the retinal window size does not affect the nature of retinotopic salience-type map based on the development of
object-based attention. The model allows spatial attention
the stimuli that attract attention (in this case, target coloured
and elements of early spatial selection theories (Treisman,
stimuli) but does alter the course of the scan path. Thus, the
1982; Treisman & Gelade, 1980) to co-exist with object-
model predicts that field of view can have a dramatic effect
based attention and the ‘parallel’ biased competition
on the scan path.
framework (Desimone, 1998; Desimone & Duncan, 1995;
Duncan & Humphreys, 1989; Duncan et al., 1997). No
specific gating of signals (Olshausen, Anderson, & Van
Essen, 1993) is required to achieve the initial spatial AW but
all attentional effects arise as a result of different biases that
influence the dynamics of the system.
The model concentrates on object-based effects observed
in monkey single cell studies (Chelazzi et al., 1993, 2001;
Luck et al., 1997; Moran & Desimone, 1985). Such effects
at this level may be important in leading to the object-based
effects observed psychophysically (Blaser et al., 2000;
Duncan, 1984), in imaging (O’Craven et al., 1999) and
evoked potentials (Valdes-Sosa et al., 1998, 2000). The top-
down object bias leads to feature-based attention across V4,
as observed in monkey single cell recordings in the area
(Motter, 1994a,b). Human imaging has also shown that
attending to a feature increases activity in areas that encode
Fig. 10. The effect on saccade onset of varying the weight of IT object- that feature type (Chawla, Rees, & Friston, 1999; Saenz,
related feedback to V4. Saccade onset times were averaged over 10 scan
paths, each consisting of 50 fixations, over the image shown in Fig. 6a. As
Buracas, & Boynton, 2002). Target objects are selectively
the strength of this feedback increases, the latency to saccade is reduced, i.e. enhanced in IT in a bottom-up manner, due to the
fixation duration shortens. enhancement across its feature types, and top-down, due to
Fig. 11. Scan paths on a sparse image. (a) The retina is restricted to 441 £ 441 pixels (,408). This results in target coloured stimuli in the bottom left of the
image not being available to the retina from the nearby fixation point and, thus, not being reached by the scan path. Note, this problem only occurs in sparse
images with the retina image restricted to a smaller size than is normally available to humans and monkeys. (b) The retina is enlarged to be 801 £ 801 pixels
(,738). All target coloured stimuli are now examined.
the working memory bias. Many object-based attention objects are the subject of object-based attention and that a
experimental findings are confounded by issues relating to possibly earlier feature-based attention operates because
whether the selection is object-based, feature-based or performance benefits were found only when the feature
surface-based. In psychophysics, object-based attention (colour or orientation) to be discriminated was the same one
typically refers to the performance advantage given to all in which the target ‘pop-out’ occurred. Confusion results
features belonging to the same object (but not to those of a from the difficulty in designing an experimental paradigm
different object at the same location). However, Mounts and that eliminates all but object-based attentional effects. For
Melara (1999) have suggested that not all features bound to example, the rapid serial object transformation paradigm
those modelled by Grossberg and Raizada (2000), may

result in collinear enhancements leading to object-based
effects such as those observed by Roelfsema et al. (1998) in
V1. In the current model, such local lateral interactions
could be added to V1 and V4 by means of an excitatory
input from similar feature selective cells in the local area.
Such an addition would allow local feature grouping to be
performed, which could assist the separation of overlapping
objects. However, arguably, depth information may be
involved in this process in order to create separable surfaces.
Whilst not addressing all issues of object-based attention
this model makes a valuable contribution by unifying
object-based and spatial effects at the single cell level and
using the resultant activity to drive search behaviour.
The model is also novel in that certain features have
priority in driving the search scan path, as has been found to
be the case for colour in monkeys (Motter & Belky, 1998b)
and humans (Sciala & Joffee, 1998; Williams & Reingold,
2001) during certain types of search. Object-based attention,
as a result of biased competition, leads to feature-based
attention in V4. For example, non-target colours are
suppressed such that the scene is segmented by colour as
if viewed through a colour-attenuating filter (Motter, 1994a,
b). The development of object-based attention across V4
was shown to influence the scan path due to the integration
of featural information in the spatial competition in LIP. The
nature of connections from V4 to LIP allowed colour to
have priority over orientation in driving the scan path but
only when object-based effects were strong enough in V4.
This may be viewed as priority being given to certain
feature maps (Treisman, 1988; Wolfe, 1994; Wolfe, Cave,
& Franzel, 1989). Previous models (Deco & Lee, 2002;
Fig. 12. The values remaining in the novelty map of the scene after a scan
Hamker, 1998; Niebur et al., 2001) have tended not to have
path. Novelty is present at multiple locations (Danziger et al., 1998; Snyder
& Kingston, 2001; Tipper et al., 1996), recovers over time (Irwin & a particular featural bias in attracting attention; for example,
Zelinsky, 2002; Snyder & Kingston, 2000) and is strongest in the when forming a ‘salience map’. The map produced here in
immediate vicinity of the fixation point, decreasing with distance from LIP integrates the output of object-based attention in the
the fixation point (Hooge & Frens, 2000). (a) Novelty in a dense scene after model’s ventral stream and produces the featural bias by
the scan path shown in Fig. 6a. (b) Novelty in a sparse scene after the scan
means of a slight difference in connection strength from V4
path shown in Fig. 6b.
features to LIP. The relative weight of these connections
may be malleable on the basis of task requirements or
(introduced by Valdes-Sosa et al., 1998, 2000, and used by
stimulus-related information, such as the proportion of each
Mitchell, Stoner, Fallah, & Reynolds, 2003, Pinilla, Cobo,
distractor type present in the retinal image. Thus, the
Torres, & Valdes-Sosa, 2001, and Reynolds, Alborzian, &
distractor-ratio effect (Bacon & Egeth, 1997; Shen et al.,
Stoner, 2003) uses two transparent surfaces containing dots
2000, 2003) could be achieved through the ratio of featural
of different colours and directions of motion and has been connection strengths from V4 to LIP, allowing the feature
reported in the object-based attention literature. However, it type that has the lowest proportion of distractors to bias
has been shown to produce surface-based effects rather than competition in LIP most strongly and, therefore, attract
object- or feature-based effects (Mitchell et al., 2003). A the scan path to these distractor locations. However, other
number of complex neural mechanisms are likely to process may contribute to this effect. For example, normal-
constitute the range of so-called object-based attention isation of total activity within each V4 feature layer could
effects observed in various experimental paradigms and not allow features occurring rarely in the retinal image to
all are reflected in the current model. One factor not fully achieve higher activity in V4 so that the locations of stimuli
addressed in the current version is the neural correlate of possessing a rarely occurring target feature become most
perceptual grouping, according to Gestalt principles, to active in LIP. The primary goal of the current work was to
locally group features into objects. For example, lateral replicate behaviour observed by Motter and Belky (1998b),
connections in early stages of visual processing, such as where distractor types were equal and colour tended to
influence the scan path most strongly despite task In conclusion, this is a biologically plausible model that
requirements. is able to replicate a range of experimental evidence for both
The ‘cross-stream’ interaction between V4 and LIP is an spatial and object-based attentional effects. Modelling
important feature of this model and it is predicted that loss visual attention in a biologically constrained manner within
of the connection from the ventral stream to LIP, due to an active vision paradigm has raised issues, such as memory
lesion, would result in deficits in the ability for the scan path for locations already visited in the scene, not addressed by
to locate behaviourally relevant stimuli. In order to the models with static retinas. The model is extendable to
object-based effect in V4 to bias LIP correctly in the current include other factors in the competition for the capture of
architecture, it was necessary for competition in V4 to be attention and should be of interest in many fields including
local. Given this, object-based attention in V4 allowed LIP neurophysiology, neuropsychology, computer vision and
to be able to map the behaviourally relevant locations in the robotics.
scene (Colby et al., 1996) and accurately control the search
scan path. Therefore, the model predicts that ventral stream
feedforward connections to parietal cortex may determine Appendix A
search selectivity. The model also predicts that the latency
to saccade is dependant on the development of object-based A.1. Retina
attention, which relies on strong object-related feedback
within the ventral stream. Tentatively, the strength of such Colour processing in the model focuses on the red –
feedback may be related to learning and familiarity with green channel with two colour arrays, Gred and Ggreen ; a
stimuli and the task. The connection from LIP to V4 serves simplification of the output of the medium and long
to bind features across feature types at the resolution of the wavelength retinal cones, being input to the retinal
V4 receptive field. Therefore, it is predicted that the loss of ganglion cells. References to red and green throughout
this connection would cause cross-feature type binding this paper refer to long and short wavelengths. The
errors (such as binding form and colour) whilst leaving greyscale image, Ggrey ; used for form processing, is a
intact the binding of form into object shape within the composite of the colour arrays and provides luminance
ventral stream. Such an effect has been observed in a patient information.
with bilateral parietal lesions (Humphreys, Cinel, Wolfe,
Olson, & Klempen, 2000).
Further factors affecting the scan path included the field A.1.1. Form processing in the retina
of view (the size of the retinal image), local stimulus density At each location in the greyscale image, retinal ganglion
(because this determines the size of the AW, which broad-band cells perform simple centre-surround proces-
influences saccade amplitude), and the importance of sing, according to Grossberg and Raizada (2000) as follows:
novelty (in the form of inhibition of previously inspected On-centre, off-surround broadband cells:
locations). X
Future work includes extending the factors influencing uþ ¼ Gijgrey 2 grey
Gpq ði; j; s1 ÞGpq ðA1Þ
competition in LIP so that the model is able to represent pq
further bottom-up stimulus-related factors and, possibly,
further top-down cognitive factors in the competition for Off-centre, on-surround broadband cells:
attentional capture. As such, the model may be able to X
replicate psychophysical data relating to attentional capture u2 ¼ 2Gijgrey þ grey
Gpq ði; j; s1 ÞGpq ðA2Þ
under competing exogenous and endogenous influences, pq
and have practical application in computer vision.

A further issue that could be addressed is the constancy where Gpq ði; j; s1 Þ is a two-dimensional Gaussian kernel,
of the scene across saccades. It may be possible to model given by:
changes in receptive field profiles in LIP (Ben Hamed et al., !
1996) and V4 (Tolias et al., 2001) immediately before 1 1
Gpq ði; j; s1 Þ ¼ exp 2 2 ððp 2 iÞ2 þ ðq 2 jÞ2 Þ ðA3Þ
saccade onset in order to provide ‘predictive re-mapping’ of 2ps21 2 s1
stimuli within LIP (Duhamel et al., 1992). This could
provide further insight into the neural basis of the novelty The Gaussian width parameter is set to: s1 ¼ 1:
bias necessary for IOR. The model has highlighted deficits These broadband cells provide luminance inputs to V1
in the knowledge of the neural correlates of inhibition of interblob simple cells that are orientation selective.
saccade return during active visual search. Further work is
needed to establish the link between scene-based inhibition
(or a scene-based memory of locations already inspected) A.1.2. Colour processing in the retina
and retinotopic areas, such as LIP, that are involved in Retinal concentric single opponent cells process colour
selecting saccadic targets. information as follows.
Red on-centre, off-surround concentric single-opponent ½xþ signifies half-wave rectification, i.e.
cells:
(
X x; if x $ 0
nredON
ij ¼ Gijred 2 green
Gpg ði; j; s1 ÞGpq ðA4Þ ½xþ ¼
pq 0; otherwise
Red off-centre, on-surround concentric single-opponent and the oriented DOOG filter DðlkÞ
pqij is given by
cells:
X
nredOFF
ij ¼ 2Gijred þ green
Gpg ði; j; s1 ÞGpq ðA5Þ DðlkÞ
pqij ¼ Gpq ði 2 d cos u; j 2 d sin u; s2 Þ
pq
2 Gpq ði þ d cos u; j þ d sin u; s2 Þ ðA10Þ
Green on-centre, off-surround concentric single-
opponent cells:
X where
ngreenON
ij ¼ Gijgreen 2 red
Gpg ði; j; s1 ÞGpq ðA6Þ
pq d ¼ s2 =2 and u ¼ pðk 2 1Þ=K; where k ranges from 1 to
2K, K being the total number of orientations (2 is used
Green off-centre, on-surround concentric single-
here).
opponent cells:
s2 is the width parameter for the DOOG filter, set as
X
ngreenOFF
ij ¼ 2Gijgreen þ red
Gpg ði; j; s1 ÞGpq ðA7Þ below.
pq r is the spatial frequency octave (i.e. spatial resolution),
such that
These concentric single-opponent cells provide colour- r ¼ 1 and s2 ¼ 1 for low resolution processing, used in
specific inputs to V1 double-opponent blob neurons. the magnocellular (or sub-cortical) pathway for scaling
the AW;
r ¼ 2 and s2 ¼ 0:5 for high resolution processing, used
A.2. V1 in the parvocellular pathway, which forms the remainder
of the model
The V1 module consists of K þ C neurons at each
location in the original image, so that neurons detect K The direction-of-contrast sensitive simple cell response
orientations and C colours. At any fixation, V1 would only is given by
process information within the current retinal image.
However, in this model, the entire original image is ‘pre- Srijk ¼ g½Rrijk þ Lrijk 2 lRrijk 2 Lrijk lþ ðA11Þ
processed’ by V1 in order to save computational time during
the active vision component of the system. As V1 is not g is set to10.
dynamically updated during active vision, this does not alter The complex cell response is invariant to direction of
the result. Only those V1 outputs relating to the current contrast and is given by
retinal image are forwarded to V4 during the dynamic active
vision processing.
Irijk ¼ Srijk þ Srij ðk þ KÞ ðA12Þ
A.2.1. Form processing in V1
For orientation detection, V1 simple and complex cells where k ranges from 1 to K.
are modelled as described by Grossberg and Raizada (2000), The value of the complex cells, Irijk ; over the area of the
with the distinction that two spatial resolutions are current retinal image, is input to V4.
calculated here. Simple cells detect oriented edges using a
difference-of-offset-Gaussian (DOOG) kernel. A.2.2. Colour processing in V1
The right- and left-hand kernels of the simple cells are The outputs of LGN concentric single-opponent cells
given by (simplified to be the retinal cells here) are combined in the
cortex in the double-opponent cells concentrated in the blob
ðlkÞ
Rrijk ¼ Spq ð½uþ 2
pq þ 2 ½upq þ Þ½Dpqij þ ðA8Þ zones of layers 2 and 3 of V1, which form part of the
parvocellular system. The outputs of blob cells are
þ ðlkÞ
Lrijk ¼ Spq ð½u2
pq þ 2 ½upq þ Þ½2Dpqij þ ðA9Þ transmitted to the thin stripes of V2 and from there to
colour-specific neurons in V4. For simplicity, V2 is not
where included in this model.
Double-opponent cells have a centre-surround antagon-
uþ and u2 are the outputs of the retinal broadband cells ism and combine inputs from different single-opponent cells
above; as follows:
Red on-centre portion: ensemble activity is used to represent populations, or

X X assemblies, of neurons with similar encoding properties. It
vredON
ij ¼ Gpq ði; j; s1 ÞnredON
ij þ Gpq ði; j; s1 ÞnijgreenOFF is assumed that neurons within the assembly receive similar
pq pq
X X external inputs and are mutually coupled. Modelling at the
2 Gpq ði; j; s1 ÞnredOFF
ij 2 Gpq ði; j; s1 ÞnijgreenON neuronal assembly level is inspired by observations that
pq pq
several brain regions contain populations of neurons that
ðA13Þ receive similar inputs and have similar properties, and it is a
Red off-surround portion: suitable level at which to produce a systems level model
X X such as this (see Gerstner (2000) and Rolls and Deco (2002)
vredOFF
ij ¼ Gpq ði; j; s2 ÞnredOFF
ij þ Gpq ði; j; s2 ÞnijgreenON for further details of this modelling approach). Population
pq pq
X X averaging does not require temporal averaging of the
2 Gpq ði; j; s2 ÞnredON
ij 2 Gpq ði; j; s2 ÞnijgreenOFF discharge of individual cells and, thus, the response of the
pq pq population may be examined over time, subject to the size of
ðA14Þ the time step used in the differential equations within the
model. The response function, which transforms current
Green on-centre portion:
X X (activity within the assembly) into discharge rate, is given
vgreenON
ij ¼ Gpq ði; j; s1 ÞnijgreenON þ Gpq ði; j; s1 ÞnredOFF
ij by the following sigmoidal function that has a logarithmic
pq pq singularity
X X
2 Gpq ði; j; s1 ÞnijgreenOFF 2 Gpq ði; j; s1 ÞnredON
ij 1
pq pq FðxÞ ¼ ðGerstner; 2000Þ ðA19Þ
1
ðA15Þ Tr 2 t log 1 2
tx
Green off-surround portion:
X X where
vgreenOFF
ij ¼ Gpq ði; j; s2 ÞnijgreenOFF þ Gpq ði; j; s2 ÞnredON
ij
pq pq Tr ; the absolute refractory time, is set to 1 ms
X X
2 Gpq ði; j; s2 ÞnijgreenON 2 Gpq ði; j; s2 ÞnredOFF
ij
t is the membrane time constant (where 1=t determines
pq pq the cell’s firing threshold). The threshold is normally set
ðA16Þ to half the present maximum activity in the layer (in V4,
this is half the maximum within the feature type in order
where that the object-based attention differences between
s1 ¼ 1; s2 ¼ 2 features are not lost due to the normalising effect of
this function).
The complete red-selective blob cell is given by
Iijc1 ¼ g½vredON 2 vredOFF þ ðA17Þ LIP contains the same number of neurons as V4 and has
ij ij
reciprocal connections with both the orientation and colour
The complete green-selective blob cell is given by layers in V4. The size of V4 and LIP is determined by the
size of the retinal image, which is flexible, but is normally
Iijc2 ¼ g½vgreenON
ij 2 vgreenOFF
ij þ ðA18Þ
set to a radius of 220 pixels for simulations of the scan path.
where If monochromatic stimuli are used (normally only for single
cell simulations), V4 contains only orientation selective
g ¼ 0:2: This scales the output of V1 blob cells to be assemblies.
consistent with that of the orientation-selective cells.
c1 ¼ K þ 1: This represents the position of the first colour A.3.1. V4
input to V4 (i.e. red). V4 consists of a three-dimensional matrix of pyramidal
c2 ¼ K þ 2: This represents the position of the second cell assemblies. The first two dimensions represent
colour input to V4 (i.e. green). the retinotopic arrangement and the other represents the
individual feature types: Orientations and colours. In the
The blob cell outputs over the area of the current retinal latter dimension, there are K þ C cell assemblies, as shown
image are input to V4. in Fig. 2c. The first K assemblies are each tuned to one of K
orientations. The next C cell assemblies are each tuned to a
A.3. Dynamic cortical modules particular colour. Two orientations (vertical and horizontal)
and two colours (red and green) are normally used here.
The dynamic cortical modules follow a similar approach Two sets of inhibitory interneuron pools exist: One set
to Deco (2001) and Deco and Lee (2002), and are modelled mediates competition between orientations and the other
using mean field population dynamics (Gerstner, 2000), as mediates competition between colours. V4 receives con-
used by Usher and Niebur (1996), in which an average vergent input from V1 over the area of its receptive field
with a latency of 60 ms to reflect normal response latencies m is the parameter for inhibitory interneuron input,
(Luck et al., 1997). In order to simulate the normalisation of set to 1
inputs occurring during retinal, LGN and V1 processing, the
V1 inputs to V4 are normalised by passing the convergent Over time, this results in local competition between
inputs to each V4 assembly through the response function at different orientation selective cell assemblies.
Eq. (A19) with its threshold set to a value equivalent to an
input activity for approximately half a stimulus within its A.3.1.2. Colour processing in V4. The output from the V1
receptive field. simple cell process, Iijc ; for each position ði; jÞ and colour c,
provides the bottom-up input to colour selective pyramidal
A.3.1.1. Form processing in V4. The output from the V1 assemblies in V4 that evolve according to the following
simple cell process, Iijk ; for each position ði; jÞ at orientation dynamics
k, provides the bottom-up input to orientation selective X
d
pyramidal assemblies in V4 that evolve according to the t1 Wijc ðtÞ¼2Wijc þ aFðWijc ðtÞÞ2 bFðWijIC ðtÞÞþ x Ipqc ðtÞ
following dynamics dt pq
X
d þ gYij ðtÞþ h BWlijc Xm ·FðXm ðtÞÞþI0 þ n
t1 Wijk ðtÞ ¼ 2 Wijk ðtÞ þ aFðWijk ðtÞÞ 2 bFðWijIK ðtÞÞ
dt X X m
þ x Ipqk ðtÞ þ gYij ðtÞ þ h BWijk Xm ·FðXm ðtÞÞ ðA22Þ
pq m
where
þ I0 þ n ðA20Þ
where Ipqc is the input from the V1 blob cells at all positions
within the V4 receptive field area ðp; qÞ; and of
t1 is set to 20 ms preferred colour c.
a is the parameter for excitatory input from other Xm is the feedback from IT cell populations via weight
cells in the pool, set to 0.95 BWijc Xm ; described later.
b is the parameter for inhibitory interneurons input,
set to 10 The remaining terms are the same as those in Eq. (A20).
Ipqk is the input from the V1 simple cell edge The dynamic behaviour of the associated inhibitory pool
detection process at all positions within the V4 for colour-selective cell assemblies in V4 is given by:
receptive field area ðp; qÞ; and of preferred d IC X
IC
orientation k t1 Wij ðtÞ ¼ 2Wijc ðtÞ þ l FðWijc ðtÞÞ 2 mFðWijIC ðtÞÞ
dt c
x is the parameter for V1 inputs, set to 4
Yij is the input from the posterior parietal LIP module, ðA23Þ
reciprocally connected to V4 Parameters take the same values as those in Eq. (A21).
g is the parameter for LIP inputs, set to 3 Over time, this results in local competition between
Xm is the feedback from IT cell populations via weight different colour selective cell assemblies.
BWijk Xm ; described later
h is the parameter representing the strength of object- A.3.2. IT
related feedback from IT (Fig. 7); normally set to 5, Neuronal assemblies in IT are assumed to represent
but set to 2.5 for simulation of single cell anterior IT (for example, area TE) where receptive fields
recordings in IT (Chelazzi et al., 1993), cover the entire retinal image and populations encode
I0 is a background current injected in the pool, set to invariant representations of objects. The model IT encodes
0.25 all possible objects, i.e. feature combinations, and receives
n is additive noise, which is randomly selected from feedforward feature inputs from V4 with a latency of 80 ms
a uniform distribution on the interval (0,0.1) to reflect normal response latencies (Wallis & Rolls, 1997).
V4 inputs to IT are normalised by dividing the total input to
The dynamic behaviour of the associated inhibitory pool each IT assembly by the total number of active (i.e. non-
for orientation-selective cell assemblies in V4 is given by zero) inputs. IT also feeds back an object bias to V4. The
d X strength of these connections is given by the following
t1 WijIK ðtÞ ¼ 2WijkIK
ðtÞ þ l FðWijk ðtÞÞ 2 mFðWijIK ðtÞÞ weights, which are set by hand (to 2 1 or 0, as appropriate,
dt k
for inhibitory feedback, although the model may also be
ðA21Þ
implemented with excitatory feedback; 0, þ 1) to represent
where prior object learning. These simple matrices reflect the type
of weights that would be achieved through Hebbian learning
l is the parameter for pyramidal cell assembly input, without the need for a lengthy learning procedure (such as
set to 1 Deco, 2001), which is not the aim of this work. The result is
that the connections that are active for excitatory feedback A.3.3. LIP
(or inactive for inhibitory feedback) are those features The pyramidal cell assemblies in LIP evolve according to
relating to the object. the following dynamics
V4 cell assemblies to IT (feedforward):
d
AXm Wijz t1 Y ðtÞ ¼ 2Yij ðtÞ þ aFðYij ðtÞÞ 2 bFðY I ðtÞÞ
dt ij
IT to V4 cell assemblies (feedback): X X
þ x FðWijk ðtÞÞ þ 1 FðWijc ðtÞÞ
BWijz Xm k c
X
where z indicates orientation, k, or colour, c. þ gPdij ðtÞ þ h Zpq þ I0 þ n ðA27Þ
The pyramidal cell assemblies in IT evolve according to pq
the following dynamics
d where
t1 X ðtÞ ¼ 2 Xm ðtÞ þ aFðXm ðtÞÞ 2 bFðX I ðtÞÞ
dt m X
þ x AXm Wijk ·FðWijk ðtÞÞ b is the parameter for inhibitory input, set to 1
ijk Wijk is the orientation input from V4 for orientation k, at
X location ði; jÞ
þx AXm Wijc ·FðWijc ðtÞÞ þ gPvM ðtÞ þ I0 þ n
Wijc is the colour input from V4 for colour c, at location
ijc
ði; jÞ
ðA24Þ
where 1.x so that colour-related input from V4 is stronger
than orientation-related input. Set to x ¼ 0:8;
b is the parameter for inhibitory interneuron input, 1 ¼ 4:
set to 0.01 Pdij is the top-down bias, i.e. spatial feedback current
Wijk is the feedforward input from V4 relating to from FEF, dorsolateral prefrontal cortex or pulvi-
orientation information, via weight AXm Wijk nar, injected directly into this pool when there is a
Wijc is the feedforward input from V4 relating to colour requirement to attend to this spatial location. Here,
information, via weight AXm Wijc when fixation is established, this spatial bias is
x is the parameter for V4 inputs, set to 2.5 applied and the spatial AW is formed.
PvM is the object-related feedback current from ventro- g is the parameter for the spatial top-down bias, set to
lateral prefrontal cortex, injected directly into this 2.5
pool Zpq is the bias from the area, pq, of the novelty
map (which is the size of the original image, N)
This feedback is sigmoidal over time as follows: that represents the receptive field. Area pq
represents the size of the LIP receptive field.
PvM ¼ 21=ð1 þ expðtsig 2 tÞÞ ðA25Þ
h is the parameter for the novelty bias, normally set
to 0.0009 (Fig. 8)
where t ¼ time (in milliseconds) and
tsig is the point in time where the sigmoid reaches half its
In order to attract the scan path to target coloured
peak value, normally set to 150 ms for scan path
locations:
simulations.
g is the parameter for the object-related bias, set to 1.2
The remaining terms are evident from previous
The remaining terms and parameters are evident from equations.
previous equations. The dynamic behaviour of the associated inhibitory pool
The dynamic behaviour of the associated inhibitory pool in LIP is given by
in IT is given by
d I X
d X t1 Y ðtÞ ¼ 2Y I ðtÞ þ l FðYij ðtÞÞ 2 mFðY I ðtÞÞ ðA28Þ
t1 X I ðtÞ ¼ 2X I ðtÞ þ l FðXm ðtÞÞ 2 mFðX I ðtÞÞ ðA26Þ dt ij
dt m
where where
l is the parameter for pyramidal cell assembly input, l is the parameter for pyramidal cell assembly input,
set to 3 set to 1;
m is the parameter for inhibitory interneuron input, m is the parameter for inhibitory interneuron input,
set to 1 set to 1.
References Crick, F. (1984). Function of the thalamic reticular complex: The

searchlight hypothesis. Proceedings of the National Academy of
Science, USA, 81, 4586–4590.
Anand, S., & Bridgeman, B. (2002). An unbiased measure of the Danziger, S., Kingstone, A., & Snyder, J. J. (1998). Inhibition of return to
contributions of chroma and luminance to saccadic suppression of successively stimulated locations in a sequential visual search
displacement. Experimental Brain Research, 142(3), 335 –341. paradigm. Journal of Experimental Psychology: Human Perception
Andersen, R. A., & Zipser, D. (1988). The role of the posterior parietal and Performance, 24(5), 1467–1475.
cortex in coordinate transformations for visual-motor integration. Deco, G. (2001). Biased competition mechanisms for visual attention in a
Canadian Journal of Physiology and Pharmacology, 66(4), 488– 501. multimodular neurodynamical system. In S. Wermter, J. Austin, & D.
Anllo-Vento, L., & Hillyard, S. A. (1996). Selective attention to the color Willshaw (Eds.), Emergent neural computational architectures based
and direction of moving stimuli: Electrophysiological correlates of on neuroscience: Towards neuroscience-inspired computing (pp.
hierarchical feature selection. Perception and Psychophysics, 58(2), 114– 126). Heidelberg: Springer.
191–206. Deco, G., & Lee, T. S. (2002). A unified model of spatial and object
Bacon, W. J., & Egeth, H. E. (1997). Goal-directed guidance of attention based on inter-cortical biased competition. Neurocomputing,
attention: Evidence form conjunctive visual search. Journal of 44– 46, 775–781.
Experimental Psychology: Human Perception and Performance, De Frockert, J. W., Rees, G., Frith, C. D., & Lavie, N. (2001). The role of
23(4), 948–961. working memory in visual selective attention. Science, 291,
Ben Hamed, S., Duhamel, J. R., Bremmer, F., & Graf, W. (1996). Dynamic 1803–1806.
changes in visual receptive field organisation in the macaque lateral De Kamps, M., & Van der Velde, F. (2001). Using a recurrent network to
intraparietal area (LIP) during saccade preparation. Society of bind form, color and position into a unified percept. Neurocomputing,
Neuroscience, Abstracts Part 2, 1619. 38– 40, 523–528.
Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object Desimone, R. (1998). Visual attention mediated by biased competition in
through feature space. Nature, 408. extrastriate visual cortex. Philosophical Transactions of the Royal
Blatt, G. L., Andersen, R. A., & Stoner, G. R. (1990). Visual receptive field Society of London B, 353, 1245–1255.
organization and cortico-cortical connections of the lateral intraparietal Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual
area (LIP) in the macaque. Journal of Computational Neurology, 299, attention. Annual Review of Neuroscience, 18, 193– 222.
421–445. Deubel, H., Bridgeman, B., & Schneider, W. X. (1998). Immediate post-
Brefczynski, J., & DeYoe, E. A. (1999). A physiological correlate of the saccadic information mediates space constancy. Vision Research,
spotlight of visual attention. Nature Neuroscience, 2, 370–374. 38(20), 3147–3159.
Bricolo, E., Gianesini, T., Fanini, A., Bundesen, C., & Chelazzi, L. (2002). Duhamel, J.-R., Colby, C. L., & Goldberg, M. E. (1992). The updating of
Serial attention mechanisms in visual search: A direct behavioural the representation of visual space in parietal cortex by intended eye
demonstration. Journal of Cognitive Neuroscience, 14(7), 980 –993. movements. Science, 255, 90– 92.
Bushnell, M. C., Goldberg, M. E., & Robinson, D. L. (1981). Behavioural Duncan, J. (1984). Selective attention and the organisation of visual
enhancement of visual responses in monkey cerebral cortex. information. Journal of Experimental Psychology: General, 113,
I. Modulation in posterior parietal cortex related to selective visual 501– 517.
attention. Journal of Neurophysiology, 46(4), 755–772. Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus
Chawla, D., Rees, G., & Friston, K. J. (1999). The physiological basis of similarity. Psychological Review, 96(3), 433–458.
attentional modulation in extrastriate visual areas. Nature Neuro- Duncan, J., Humphreys, G. W., & Ward, R. (1997). Competitive brain
science, 2(7), 671–676. activity in visual attention. Current Opinion in Neurobiology, 7,
Chelazzi, L., Miller, E. K., Duncan, J., & Desimone, R. (1993). A neural 255– 261.
basis for visual search in inferior temporal cortex. Nature, 363, Everling, S., Tinsley, C. J., Gaffan, D., & Duncan, J. (2002). Filtering of
345–347. neural signals by focused attention in the monkey prefrontal cortex.
Chelazzi, L., Miller, E. K., Duncan, J., & Desimone, R. (2001). Responses Nature Neuroscience, 5(7), 671 –676.
of neurons in macaque area V4 during memory-guided visual search. Findlay, J. M., Brown, V., & Gilchrist, I. D. (2001). Saccade target
Cerebral Cortex, 11, 761–772. selection in visual search: The effect of information from the previous
Clark, V. P., & Hillyard, S. A. (1996). Spatial selective attention affects fixation. Vision Research, 41(1), 87–95.
early extrastriate but not striate components of the visual evoked Fink, G. R., Dolan, R. J., Halligan, P. W., Marshall, J. C., & Frith, C. D.
potential. Journal of Cognitive Neuroscience, 8(5), 387– 402. (1997). Space-based and object-based visual attention: Shared and
Clohessy, A. B., Posner, M. I., Rothbart, M. K., & Vecera, S. P. (1991). The specific neural domains. Brain, 120, 2013–2028.
development of inhibition of return in early infancy. Journal of Folk, C. L., & Remington, R. (1999). Can new objects override attentional
Cognitive Neuroscience, 3, 345 –350. control settings? Perception and Psychophysics, 61(4), 727–739.
Colby, C. L., Duhamel, J. R., & Goldberg, M. E. (1996). Visual, Gerstner, W. (2000). Population dynamics of spiking neurons: Fast
presaccadic, and cognitive activation of single neurons in monkey transients, asynchronous states, and locking. Neural Computation, 12,
lateral intraparietal area. Journal of Neurophysiology, 76(5), 43– 89.
2841–2852. Ghandi, S. P., Heeger, D. J., & Boynton, G. M. (1999). Spatial attention
Colby, C. L., & Goldberg, M. E. (1999). Space and attention in parietal affects brain activity in human primary visual cortex. Proceedings of the
cortex. Annual Review of Neuroscience, 22, 319– 349. National Academy of Science, USA, 96, 3314–3319.
Connor, C. E., Callant, J. L., Preddie, D. C., & Van Essen, D. C. (1996). Ghose, G. M., & Ts’O, D. Y. (1997). Form processing modules in primate
Responses in area V4 depend on the spatial relationship between area V4. The Journal of Neurophysiology, 77(4), 2191–2196.
stimulus and attention. Journal of Neurophysiology, 75, 1306–1308. Gottlieb, J. P., Kusunoki, M., & Goldberg, M. E. (1998). The representation
Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, of visual salience in monkey parietal cortex. Nature, 391, 481 –484.
G. L. (2000). Voluntary orienting is dissociated from target detection in Greene, H. H., & Rayner, K. (2001). Eye movements and familiarity effects
human posterior parietal cortex. Nature Neuroscience, 3, 292–297. in visual search. Vision Research, 41, 3763– 3773.
Corbetta, M., Shulman, G. L., Miezin, F. M., & Petersen, S. E. (1995). Grossberg, S., & Raizada, R. D. S. (2000). Contrast-sensitive perceptual
Superior parietal cortex activation during spatial attention shifts and grouping and object-based attention in the laminar circuits of primary
visual feature conjunction. Science, 270, 802–805. visual cortex. Vision Research, 40, 1413– 1432.
Hamker, F. H. (1998). The role of feedback connections in task-driven Mazzoni, P., Andersen, R. A., & Jordan, M. I. (1991). A more biologically
visual search. In D. Von Heinke, G. W. Humphreys, & A. Olson (Eds.), plausible learning rule than backpropagation applied to a network
Connectionist models in cognitive neuroscience: proceedings of the fifth model of cortical area 7a. Cerebral Cortex, 1(4), 293–307.
neural computation and psychology workshop (NCPW’98) (pp. McAdams, C. J., & Maunsell, J. H. R. (2000). Attention to both space and
252– 261). London: Springer. feature modulates neuronal responses in macaque area V4. Journal of
Helmholtz, H. v. (1867). Handbuch der physiologishen optik. Leipzig: Neurophysiology, 83(3), 1751–1755.
Voss. McPeek, R. M., Skavenski, A. A., & Nakayama, K. (2000). Concurrent
Hodgson, T. L., Mort, D., Chamberlain, M. M., Hutton, S. B., O’Neill, K. S., processing of saccades in visual search. Vision Research, 40(18),
& Kennard, C. (2002). Orbitofrontal cortex mediates inhibition of 2499–2516.
return. Neuropsychologia, 1431, 1 –11. Miller, E. K., Erickson, C. A., & Desimone, R. (1996). Neural mechanisms
Hoffman, J. E., & Subramaniam, B. (1995). The role of visual attention in of visual working memory in prefrontal cortex of the macaque. The
saccadic eye movements. Perception and Psychophysics, 57(6), Journal of Neuroscience, 16(16), 5154–5167.
787– 795. Miller, E., Gochin, P., & Gross, C. (1993). Suppression of visual responses
Hooge, I. T., & Erkelens, C. J. (1999). Peripheral vision and oculomotor of neurons in inferior temporal cortex of the awake macaque by addition
control during visual search. Vision Research, 39(8), 1567–1575. of a second stimulus. Brain Research, 616, 25–29.
Hooge, I. T., & Frens, M. A. (2000). Inhibition of saccade return (ISR): Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford:
Spatio-temporal properties of saccade programming. Vision Research, Oxford University Press.
40(24), 3415–3426. Mitchell, J. F., Stoner, G. R., Fallah, M., & Reynolds, J. H. (2003).
Hopfinger, J. B., Buonocore, M. H., & Mangun, G. R. (2000). The neural Attentional selection of superimposed surfaces cannot be explained by
mechanisms of top-down attentional control. Nature Neuroscience, modulation of the gain of color channels. Vision Research, 43,
3(3), 284–291. 1323–1325.
Horowitz, T. S., & Wolfe, J. M. (1998). Visual search has no memory. Mitchell, J., & Zipser, D. (2001). A model of visual-spatial memory across
Nature, 394, 575–577. saccades. Vision Research, 41, 1575–1592.
Humphreys, G. W., Cinel, C., Wolfe, J., Olson, A., & Klempen, N. (2000). Moore, T., & Armstrong, K. M. (2003). Selective gating of visual signals by
Fractionating the binding process: neuropsychological evidence microstimulation of frontal cortex. Nature, 421, 370–373.
distinguishing binding of form from binding of surface features. Vision Moran, J., & Desimone, R. (1985). Selective attention gates visual
Research, 40, 1569–1596. processing in the extrastriate cortex. Science, 229, 782– 784.
Husain, M., Mannan, S., Hodgson, T., Wojciulik, E., Driver, J., & Kennard, Motter, B. C. (1993). Focal attention produces spatially selective
C. (2001). Impaired spatial working memory across saccades processing in visual cortical areas V1, V2, and V4 in the presence of
contributes to abnormal search in parietal neglect. Brain, 124, competing stimuli. Journal of Neurophysiology, 70(3), 909 –919.
941– 952. Motter, B. C. (1994a). Neural correlates of attentive selection for color or
Irwin, D. E., & Zelinsky, G. J. (2002). Eye movements and scene luminance in extrastriate area V4. The Journal of Neuroscience, 14(4),
perception: Memory for things observed. Perception and Psycho- 2178–2189.
physics, 64(6), 882 –895. Motter, B. C. (1994b). Neural correlates of feature selective memory and
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for pop-out in extrastriate area V4. The Journal of Neuroscience, 14(4),
overt and covert shifts of visual attention. Vision Research, 20, 2190–2199.
1489–1506. Motter, B. C., & Belky, E. J. (1998a). The zone of focal attention during
Kastner, S., Pinsk, M., De Weerd, P., Desimone, R., & Ungerleider, L. active visual search. Vision Research, 38(7), 1007–1022.
(1999). Increased activity in human visual cortex during directed Motter, B. C., & Belky, E. J. (1998b). The guidance of eye movements
attention in the absence of visual stimulation. Neuron, 22, 751–761. during active visual search. Vision Research, 38(12), 1805–1815.
Kusunoki, M., Gottlieb, J., & Goldberg, M. E. (2000). The lateral Mounts, J. R. W., & Melara, R. D. (1999). Attentional selection of objects
intraparietal area as a salience map: The representation of abrupt onset, or features: Evidence from a modified search task. Perception &
stimulus motion, and task relevance. Vision Research, 40, 1459–1468. Psychophysics, 61(2), 322–341.
Lanyon, L. J., & Denham, S. L. (2004a). A biased competition Niebur, E., Itti, L., & Koch, E. (2001). Controlling the focus of visual
computational model of spatial and object-based attention mediating selective attention. In J. L. Van Hemmen, J. Cowan, & E. Domany
active visual search. Neurocomputing, in press. (Eds.), Models of neural networks IV (pp. 247–276). New York:
Lanyon, L. J., & Denham, S. L. A biased competition model of spatial and Springer.
object-based attention mediating active visual search. Submitted for O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence
publication. for objects as the units of attentional selection. Nature, 401, 584 –587.
Law, M. B., Pratt, J., & Abrams, R. A. (1995). Color-based inhibition of Olshausen, B. A., Anderson, C. H., & Van Essen, D. C. (1993). A
return. Perception and Psychophysics, 57(3), 402–408. neurobiological model of visual attention and invariant pattern
Lubow, R. E., & Kaplan, O. (1997). Visual search as a function of type of recognition based on dynamic routing of information. The Journal of
prior experience with targets and distractor. Journal of Experimental Neuroscience, 13, 4700–4719.
Psychology: Human Perception and Performance, 23, 14– 24. Pinilla, T., Cobo, A., Torres, K., & Valdes-Sosa, M. (2001). Attentional
Luck, S. J., Chelazzi, L., Hillyard, S. A., & Desimone, R. (1997). Neural shifts between surfaces: Effects on detection and early brain potentials.
mechanisms of spatial attention in areas V1, V2 and V4 of macaque Vision Research, 41, 1619–1630.
visual cortex. Journal of Neurophysiology, 77, 24–42. Posner, M. I., Walker, J. A., Friedrich, F. J., & Rafal, R. D. (1984). Effects
Lynch, J. C., Graybiel, A. M., & Lobeck, L. J. (1985). The differential of parietal injury on covert orienting of attention. Journal of
projection of two cytoarchitectonic subregions of the inferior parietal Neuroscience, 4, l863–1874.
lobule of macaque upon the deep layers of the superior colliculus. Quaia, C., Optican, L. M., & Goldberg, M. E. (1998). The maintenance of
Journal of Comparative Neurology, 235, 241 –254. spatial accuracy by the perisaccadic remapping of visual receptive
Martinez, A., Anllo-Vento, L., Sereno, M. I., Frank, L. R., Buxton, R. B., fields. Neural Networks, 11(7/8), 1229–1240.
Dubowitz, D. J., Wong, E. C., Hinrichs, H., Heinze, H. J., & Hillyard, Rainer, G., & Miller, E. K. (2002). Time course of object-related neural
S. A. (1999). Involvement of striate and extrastriate visual cortical areas activity in the primate prefrontal cortex during a short-term memory
in spatial attention. Nature Neuroscience, 2(4), 364–369. task. European Journal of Neuroscience, 15(7), 1244–1254.
Reicher, G. M., Snyder, C. R. R., & Richards, J. T. (1976). Familiarity of Thiele, A., Henning, P., Kubischik, M., & Hoffmann, K. P. (2002).
background characters in visual scanning. Journal of Experimental Neural mechanisms of saccadic suppression. Science, 295(5564),
Psychology: Human Perception and Performance, 2, 522–530. 2460–2462.
Renart, A., Moreno, R., de la Rocha, J., Parga, N., & Rolls, E. T. (2001). A Tipper, S. P., Weaver, B., & Watson, F. L. (1996). Inhibition of return to
model of the IT-PF network in object working memory which includes successively cued spatial locations: Commentary on Pratt and Abrams
balanced persistent activity and tuned inhibition. Neurocomputing, 38– (1995). Journal of Experimental Psychology: Human Perception and
40, 1525–1531. Performance, 22(5), 1289–1293.
Reynolds, J. H., Alborzian, S., & Stoner, G. R. (2003). Exogenously cued Toth, L. J., & Assad, J. A. (2002). Dynamic coding of behaviourally
attention triggers competitive selection of surfaces. Vision Research, relevant stimuli in parietal cortex. Nature, 415, 165–168.
43(1), 59–66. Tolias, A. S., Moore, T., Smirnakis, S. M., Tehovnik, E. J., Siapas, A. G.,
Reynolds, J. H., Chelazzi, L., & Desimone, R. (1999). Competitive Schiller, P. H. (2001). Eye movements module visual receptive fields of
mechanisms subserve attention in macaque areas V2 and V4. The V4 neurons. Neuron, 29, 757 –767.
Journal of Neuroscience, 19(5), 1736–1753. Trappenberg, T. P., Dorris, M. C., Munoz, D. P., & Klein, R. M. (2001). A
Richards, J. T., & Reicher, G. M. (1978). The effect of background model of saccade initiation based on the competitive integration of
familiarity in visual search. Perception and Psychophysics, 23, exogenous and endogenous signals in the superior colliculus. Journal of
499–505. Cognitive Neuroscience, 13, 256–271.
Robinson, D. L., Bowman, E. M., & Kertzman, C. (1995). Covert orienting Treisman, A. (1982). Perceptual grouping and attention in visual search for
of attention in macaques. II. Contributions of parietal cortex. Journal of features and for objects. Journal of Experimental Psychology: Human
Neurophysiology, 74(2), 698 –712. Perception and Performance, 8, 194–214.
Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (1998). Object-based Treisman, A. (1988). Features and objects: The fourteenth Bartlett
attention in the primary visual cortex of the macaque monkey. Nature, memorial lecture. Quarterly Journal of Experimental Psychology, 40,
395, 376 –381. 201– 237.
Rolls, E., & Deco, G. (2002). Computational neuroscience of vision. Treisman, A., & Gelade, G. (1980). A feature integration theory of
Oxford, UK: Oxford University Press. attention. Cognitive Psychology, 12, 97–136.
Romo, R., Brody, C. D., Hernandez, A., & Lemus, L. (1999). Neuronal Treue, S., & Martinez Trujillo, J. (1999). Feature-based attention influences
correlates of parametric working memory in the prefrontal cortex. motion processing gain in macaque visual cortex. Nature, 339(6736),
Nature, 399(6735), 470 –473. 575– 579.
Ross, J., Morrone, M. C., & Burr, D. C. (1997). Compression of visual Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In
space before saccades. Nature, 386, 598–601. D. J. Ingle, M. A. Goodale, & R. W. J. Mansfield (Eds.), Analysis of
Ross, J., Morrone, M. C., Goldberg, M. E., & Burr, D. C. (2001). Changes visual behaviour (pp. 549–586). Cambridge, MA: MIT Press.
in visual perception at the time of saccades. Trends in Neurosciences, Usher, M., & Niebur, E. (1996). Modeling the temporal dynamics of IT
24(2), 113–121. neurons in visual search: A mechanism for top-down selective attention.
Saenz, M., Buracas, G. T., & Boynton, G. M. (2002). Global effects of Journal of Cognitive Neuroscience, 8(4), 311–327.
feature-based attention in human visual cortex. Nature Neuroscience, Valdes-Sosa, M., Bobes, M. A., Rodriguez, V., & Pinilla, T. (1998).
5(7), 631–632. Switching attention without shifting the spotlight: Object-based
Sapir, A., Soroker, N., Berger, A., & Henik, A. (1999). Inhibition of return attentional modulation of brain potentials. Journal of Cognitive
in spatial attention: Direct evidence for collicular generation. Nature Neuroscience, 10(1), 137 –151.
Neuroscience, 2, 1053–1054. Valdes-Sosa, M., Cobo, A., & Pinilla, T. (2000). Attention to object files
Schall, J. D. (2002). The neural selection and control of saccades by the defined by transparent motion. Journal of Experimental Psychology:
frontal eye field. Philosophical Transactions of the Royal Society of Human Perception and Performance, 26(2), 488–505.
London B Biological Science, 357, 1073–1082. Wallis, G., & Rolls, E. T. (1997). Invariant face and object recognition in
Scialfa, C. T., & Joffe, K. M. (1998). Response times and eye movements in the visual system. Progress in Neurobiology, 51, 167–194.
feature and conjunction search as a function of target eccentricity. Wang, Q., Cavanagh, P., & Green, M. (1994). Familiarity and popout in
Perception and Psychophysics, 60(6), 1067–1082. visual search. Perception and Psychophysics, 56, 495–500.
Shen, J., Reingold, E. M., & Pomplum, M. (2000). Distractor ratio Williams, D. E., & Reingold, E. M. (2001). Preattentive guidance of eye
influences patterns of eye movements during visual search. Perception, movements during triple conjunction search tasks: The effects of feature
29(2), 241–250. discriminability and saccadic amplitude. Psychonomic Bulletin and
Shen, J., Reingold, E. M., & Pomplum, M. (2003). Guidance of eye Review, 8(3), 476–488.
movements during conjunctive visual search: The distractor ratio effect. Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search.
Canadian Journal of Experimental Psychology, 57(2), 76 –96. Psychonomic Bulletin and Review, 1, 202–238.
Snyder, J. J., & Kingstone, A. (2000). Inhibition of return and visual search: Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An
How many separate loci are inhibited? Perception and Psychophysics, alternative to the feature integration model for visual search. Journal of
62(3), 452–458. Experimental Psychology: Human Perception and Performance, 15,
Snyder, J. J., & Kingstone, A. (2001). Inhibition of return at multiple 419– 433.
locations in visual search: When you see it and when you don’t. Woodman, G. F., Vogel, E. K., & Luck, S. J. (2001). Visual search remains
Quarterly Journal of Experimental Psychology Section A: Human efficient when visual working memory is full. Psychological Science,
Experimental Psychology, 54(4), 1221–1237. 12(3), 219– 224.
Somers, D. C., Dale, A. M., Seiffert, A. E., & Tootell, R. B. H. (1999). Zeki, S. (1993). A vision of the brain. Oxford, UK: Blackwell.
Functional MRI reveals spatially specific attentional modulation in Zhu, J. J., & Lo, F. S. (1996). Time course of inhibition induced by a
human primary visual cortex. Proceedings of the National Academy of putative saccadic suppression circuit in the dorsal lateral geniculate
Science, USA, 96, 1663–1668. nucleus of the rabbit. Brain Research Bulletin, 41(5), 281–291.
Their, P., & Andersen, R. A. (1998). Electrical microstimulation Zipser, D., & Andersen, R. A. (1988). A back-propagation programmed
distinguishes distinct saccade-related areas in the posterior parietal network that simulates response properties of a subset of posterior
cortex. Journal of Neurophysiology, 80, 1713–1735. parietal neurons. Nature, 331(6158), 679 –684.

NN 17 873

Uploaded by

Copyright:

Available Formats

NN 17 873

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NN 17 873

Uploaded by

Copyright:

Available Formats

Neural Networks 17 (2004) 873–897

A model of active visual search with object-based attention

1. Introduction search, where retinal inputs change as the focus of attention

onsets can be ineffective in overriding a top-down

1.4. Representation of behaviourally relevant locations

Posterior parietal area LIP encodes the locations of

learning, results in the capture of more non-target coloured

3.4. Saccade onset

Lanyon and Denham (submitted) replicated saccade

3.6. Inhibition of return in the scan path

Implementation of the novelty bias to LIP successfully

those modelled by Grossberg and Raizada (2000), may

and have practical application in computer vision.

Red on-centre portion: ensemble activity is used to represent populations, or

References Crick, F. (1984). Function of the thalamic reticular complex: The

You might also like