A Feedback Model of Visual Attention: M. W. Spratling and M. H. Johnson

A Feedback Model of Visual Attention
M. W. Spratling and M. H. Johnson
Abstract
& Feedback connections are a prominent feature of cortical anatomy and are likely to have a significant functional role in neural information processing. We present a neural network model of cortical feedback that successfully simulates neurophysiological data associated with attention. In this domain, our model can be considered a more detailed, and biologically plausible, implementation of the biased competition model of attention. However, our model is more general as it can also explain a variety of other top-down processes in vision, such as figure/ground segmentation and contextual cueing. This model thus suggests that a common mechanism, involving cortical feedback pathways, is responsible for a range of phenomena and provides a unified account of currently disparate areas of research. &
INTRODUCTION
Top-down effects play an important role in sensory information processing (Siegel, Ko rding, & Ko nig, 2000). For example, during visual perception, information propagates through the visual processing hierarchy from primary sensory areas to higher cortical regions. In addition to this feedforward transmission of information, feedback connections convey information in the reverse direction and lateral connections integrate in` formation across the visual field (Lamme, Supe r, & Spekreijse, 1998; Lamme & Roelfsema, 2000). Feedforward and feedback connections can be distinguished by the cortical layers from which they originate and in which they terminate, as illustrated in Figure 1 (Budd, 1998; Crick & Koch, 1998; Lamme et al., 1998; Mountcastle, 1998; Barbas & RempelClower, 1997; Johnson & Burkhalter, 1997; Ebdon, 1996; Mumford, 1992; Felleman & Van Essen, 1991). Feedforward connections are provided by the axon projections of pyramidal cells in Layers II and III. These projections terminate predominantly in Layer IV of the higher region (as do inputs from the thalamus). The main target for feedforwarded projections are spiny stellate cells that, in turn, target the basal dendrites of pyramidal cells in Layers II and III. Feedback connections are provided by the axon projections from pyramidal cells in Layers V and VI and terminate mainly in Layers I and VI of the lower region (or are sent to subcortical structures). The main targets of the feedback projections terminating in Layer I are the apical dendrites of pyramidal cells with somata in Layers II, III, and V (Budd, 1998; Cauller, Clancy, & Connors, 1998;
Birkbeck College 2004 Massachusetts Institute of Technology
Rockland, 1998; Rolls & Treves, 1998; Cauller, 1995). In general, cortical regions tend to be reciprocally connected (Lamme et al., 1998; Mountcastle, 1998; Felleman & Van Essen, 1991; Crick & Asanuma, 1986). Attention is one top-down process that operates via the cortical feedback projections (Schroeder, Mehta, & Foxe, 2001; Treue, 2001; Mehta, Ulbert, & Schroeder, 2000; Pollen, 1999; Desimone & Duncan, 1995) targeting the apical dendrites in Layer I (Olson, Chun, & Allison, 2001; Cauller, 1995). By manipulating attention or expectation, it is possible to explore the affects of feedback on the response properties of cortical pyramidal cells. Attention modulates the sensory-driven activation of cells (Kanwisher & Wojciulik, 2000; McAdams & Maunsell, 2000; Kastner, Pinsk, De Weerd, Desimone, & Ungerleider, 1999; Luck, Chelazzi, Hillyard, & Desimone, 1997), such that activity in response to an attended stimulus is increas ed in amplitude and duration (Schroeder et al., 2001; Kastner & Ungerleider, 2000). Such top-down modulation also affects the ongoing competition between cells (Itti & Koch, 2001; Reynolds, Chelazzi, & Desimone, 1999; Luck et al., 1997). Competition can be biased by both stimulus saliency and attention (Olson, 2001; Kastner & Ungerleider, 2000; De Weerd, Peralta, Desimone, & Ungerleider, 1999). Hence, increased attention has effects similar to increasing the contrast or saliency of the stimulus (Itti & Koch, 2001; Kastner & Ungerleider, 2000; Olson, 2001; Reynolds, Pasternak, & Desimone, 2000). Feedback thus serves to amplify and focus activity (Hupe et al., 1998). All connectionist models of perception investigate how feedforward, sensory-driven, information is processed and represented. However, despite the apparent importance of top-down processes in perception, relatively few models have investigated the role of feedback. In this article we present a neural network model
Journal of Cognitive Neuroscience 16:2, pp. 219237
Figure 1. Cortical layers and regions. A schematic showing pyramidal cells within the six layers of the cortical sheet. Pyramidal cell bodies are shown as filled triangles, dendrites as solid lines, and axons as dashed lines. All other cell types have been omitted, as has the intraregional connectivity. The cortical sheet is shown divided into two regions at different levels in an information processing hierarchy. Axon projections connecting these two regions are illustrated.
of visual processing that does incorporate feedback connections. Because feedback plays a prominent role in attention, we evaluate the performance of our model by comparison with the response properties of cortical pyramidal cells during attentional tasks. Similar effects of feedback, or recurrent, activity on neural responses can also be observed in nonattentional tasks (Lamme & Roelfsema, 2000; Lamme et al., 1998; Lee, Mumford, Romero, & Lamme, 1998; Zipser, Lamme, & Schiller, 1996). We thus propose that the same underlying mechanisms are responsible for a variety of other top-down processes in vision. We demonstrate this claim by using the same neural network model to explain empirical data associated with figure/ground segmentation, feature binding, and contextual cueing. Hence, our model suggests that a common mechanism, involving cortical feedback pathways, is responsible for a range of phenomena in visual perception and could potentially provide a unified account of currently disparate areas of research. The Biased Competition Model The biased competition model (Reynolds et al., 1999; Duncan, 1998; Desimone & Duncan, 1995) is a leading account of the neurophysiological data associated with attention (Frith, 2001; Kastner & Ungerleider, 2000). This theory proposes that visual stimuli compete to be represented by cortical activity. Competition may occur at each stage along the visual information processing pathway. The outcome of the competition is influenced not only by bottom-up, sensory driven processes but also by top-down, attention-dependent biases toward the information that is most relevant to current behavioral goals. A neural network architecture to implement this model has been proposed (Reynolds et al., 1999; Reynolds & Desimone, 1999) and is shown in Figure 2a. In
220 Journal of Cognitive Neuroscience
this network, two input neurons, with distinct stimulus selectivities, project both excitatory and inhibitory connections to a single output neuron in a subsequent stage of the cortical hierarchy. The response of the output node is dependent on the activation of each of the input nodes and the strengths of the excitatory and inhibitory afferents. Attention acts by increasing the synaptic efficiencies of the connections originating from the input neuron selective to the attended stimulus. This neural network architecture was not intended to be an account of the actual neural circuitry underlying visual attention (Reynolds et al., 1999, p. 1752). It is therefore unsurprising that although this implementation succeeds as a descriptive model of the observed behavior, it has shortcomings as a detailed, functional account. First, the time-varying response of this model is
Figure 2. Models of attention. Nodes are shown as large circles, excitatory synapses as small open circles, and inhibitory synapses as small filled circles. The x values represent top-down biases that vary in strength depending on the attentional state. These top-down signals are assumed to arise from neural generators outside the modeled circuits. (a) The Reynolds and Desimone implementation of the biased competition model. In this model, the top-down signals modulate the strength of the feedforward connections. ( b) The proposed feedback model. In this model, the top-down signals modulate the strength of activation of the nodes. Feedback connections reciprocating the feedforward connections also exist, but have been omitted from this figure for clarity.
Volume 16, Number 2
a poor match to that of real cells (see Results). Second, attention acts by multiplicatively modulating the strengths of afferent synapses for which no biologically plausible mechanism is specified. Finally, although this network is proposed as an implementation of the biased competition model there are actually no competitive processes occurring between nodes within this neural architecture. Reynolds and Desimone (1999) suggest that there may be many other possible implementations of the biased competition model. Therefore, we shall refer to the particular neural network architecture described above as the Reynolds and Desimone model to distinguish it from other implementations and from the theory of biased competition itself. In the following section we propose an alternative implementation, which is illustrated in Figure 2b. In this neural network, nodes within a cortical region compete via lateral inhibitory connections and top-down signals modulate the activations of nodes, rather than the efficiencies of synapses. As with the Reynolds and Desimone model, the top-down signals vary in strength depending on the attentional state. These attention-dependent, top-down signals are assumed to originate in higher cortical regions that are not explicitly modeled. However, this should not be taken to imply that these signals arise from a dedicated attentional system, a point we return to in the Discussion. The nodes used by the proposed model have more complex behavior than those used in the Reynolds and Desimone model. Hence, the feedback model is capable of simulating experimental data in greater detail. More importantly, the model proposes a biologically plausible mechanism by which biased competition can operate, and can explain other top-down, perceptual processes that have not previously been thought to share common mechanisms. The Feedback Model The Reynolds and Desimone model, in common with others (e.g., Olshausen, Anderson, & Van Essen, 1993), uses top-down signals to multiplicatively modulate the synaptic strengths of interregional connections so that attended information can be selectively routed to higher cortical regions. Equivalent results can be achieved by using top-down signals to modulate the activity of neurons rather than weights of synapses (Salinas & Abbott, 1997; Salinas & Thier, 2000). This mechanism has previously been modeled by allowing the activity generated by stimulation of the receptive field to be multiplicatively modulated by the response to a separate set of inputs applied to a gain field (Salinas & Abbott, 1996; Salinas & Thier, 2000; Salinas & Sejnowski, 2001) or a contextual field (Phillips & Singer, 1997; Phillips et al., 1995). These algorithms thus require inputs from different sources to be integrated separately and to have dissimilar effects on activity.
Feedforward and feedback connections preferentially target distinct regions of pyramidal cell dendrites (Spratling, 2002). For example, pyramidal cells in Layers II and III predominantly receive feedforward information at the basal dendrites and feedback information at the apical dendrites. The apical dendrite appears to act as a functionally distinct dendritic compartment because activation applied to the apical dendrites is integrated before transmission to the soma (Ko rding & Ko nig, 2000, 2001; Larkum, Zhu, & Sakmann, 1999). The distal (i.e., apical) and proximal (mainly basal) dendrites of pyramidal cells thus appear to act as separate dendritic compartments (Yuste, Gutnick, Saar, Delaney, & Tank, 1994) capable of independently integrating the feedforward and feedback information that they receive (Spratling, 2002). The axon initial segment acts as the final site of integration, as it is here that action potential initialization occurs (Stuart, Spruston, Sakmann, & Hausser, 1997). However, inputs to the different den tritic regions contribute to this output in different ways (Larkum, Zhu, & Sakmann, 2001). Stimulation of the apical dendrite causes smaller, but more protracted, excitatory postsynaptic potentials (EPSPs) at the soma than does equivalent stimulation of the basal dendrites (Budd, 1998; Rockland, 1998). Hence, apical inputs have weaker effects on output activity than basal inputs. Such findings are consistent with the suggestion that feedback acts to modulate responses that are primarily driven by feedforward inputs (Friston & Buchel, 2000; Koch & Segev, 2000; Crick & Koch, 1998; Hupe et al., 1998). We propose a model in which feedback stimulation is integrated in the apical dendrite and feedforward information is separately integrated in the basal dendrite. The total strength of the top-down activation is then used to multiplicatively modulate the total strength of feedforward activation to determine the final response of the node. This formulation enables bottom-up, sensory-driven stimulation to drive the response of the node even in the absence of top-down activity. In contrast, feedback activation cannot drive the nodes activity in the absence of feedforward activation, but it can amplify any response to feedforward stimulation. The activations of nodes in the model are also affected by lateral inhibitory connections targeting the basal dendrites, via which neurons in the same region compete to represent stimuli (Spartling & Johnson, 2001, 2002). A schematic of the model is shown in Figure 3. For comparison with the Reynolds and Desimone model, a simplified illustration of the feedback model with two input nodes and two output nodes is also shown in Figure 2b. For simplicity the model contains only one layer of cells in each region. Hence, in contrast with the cortex, feedback connections originate from the same neurons that receive feedforward connections. This simplification is justified since pyramidal cells within the same
Spratling and Johnson 221
For clarity, in each simulation we have used the simplest network that can successfully model the physiological data. Hence, appropriate numbers of nodes and synaptic weight values have been chosen for each experiment. However, it should be noted that the architecture of the model remains constant and that every node in every reported simulation operates in the same way (as describe in the Methods section). Furthermore, the same parameters have been used throughout all the simulations reported here. Hence, each simulation can be considered to be a specific example of the behavior generated by a single computational model. Spatial Attention Attentional Selection
Figure 3. The feedback model in greater detail. A schematic of the proposed model showing two interacting cortical regions. Each region contains one layer of pyramidal cells (somata shown as filled triangles). These model neurons each have two independent, dendritic regions: the basal dendrites, which receive feedforward connections (shown as solid lines), and the apical dendrites, which receive feedback connections (shown as dashed lines). Nodes in different regions are reciprocally connected by feedforward and feedback projections (only the connections between one pair of nodes are shown) and may also be targeted by connections originating from other regions or by sensory inputs. Nodes within the same region compete via lateral inhibitory connections that target the basal dendrites (only selected connections between neighboring nodes are shown). Excitatory and inhibitory synapses are shown as open and filled circles, respectively.
cortical column are believed to have similar response properties. Furthermore, there is insufficient data about the differences between the layers to meaningfully model distinct superficial and deep layer neurons.
RESULTS
Both electrophysiological and psychophysical experiments on attention can be classified into those that explore spatial attention (the effects of directing attention to one location rather than another) and those that explore featural, or object-based, attention (the effects of directing attention to one object, or stimulus feature, rather than another). In addition, experiments may be classified into those that address attentional selection (the role of attention in the selection of one item out of many) and those that address attentional facilitation (the enhancement of processing that occurs for an individual stimulus at the focus of attention). In the Spatial attention and Featural attention sections, we demonstrate that the feedback model described above can successfully stimulate results from these different experimental paradigms. In addition, we demonstrate in the Familiarity and context section that the same model can explain experimental data associated with figure/ground segmentation and contextual cueing.
For neurons in the ventral pathway the response to a stimulus that generates strong neural activation when presented in isolation is reduced by the introduction of a second nonpreferred stimulus within the receptive field (Reynolds et al., 1999). Hence, rather than being processed independently, multiple stimuli within the same receptive field appear to compete in a mutually suppressive manner (Kastner & Ungerleider, 2000). If attention is directed toward one stimulus then the response becomes more similar to the response that would be generated by that stimulus in isolation (Reynolds et al., 1999; Luck et al., 1997; Moran & Desimone, 1985). Hence, attention appears to bias the competition in favor of the attended stimulus. These effects are illustrated in Figure 4a, which shows the response of a single cell recorded in V2 of a rhesus monkey. Similar results have been demonstrated for cells in area V4, inferior temporal cortex, in area MT of the dorsal pathway, and in the prefrontal cortex (Everling, Tinsley, Gaffan, & Duncan, 2002; Reynolds et al., 1999; Reynolds & Desimone, 1999; Treue & Martinez-Trujillo, 1999). The firing rate of the cell is shown in response to a preferred stimulus, to a poor stimulus, and to both stimuli when attention is directed to a location outside the receptive field. When attention is directed to the location occupied by the preferred stimulus in the pair, the response of the cell becomes more similar to that elicited by the preferred stimulus in isolation. The neural network architecture for the Reynolds and Desimone model, shown in Figure 5a, was used to stimulate these results. For this simulation, one input node was activated by the preferred stimulus and the other input node was activated by the poor stimulus. The input nodes had different connection strengths with the output node to reflect the stimulus selectivity of the recorded neuron. This architecture generated the results shown in Figure 4b. It can be seen that the steady-state responses from the model are in good agreement with the experimental data. When both stimuli are present, the output node receives strong excitatory input from the preferred stimulus and strong inhibitory input from the poor
Volume 16, Number 2
Figure 4. The effect of spatial attention on the response of a neuron. Responses are shown for different combinations of stimuli appearing within the receptive field of a single neuron. (a) The response of a cell in V2 (adapted from Reynolds et al., 1999). ( b) Simulation results from the Reynolds and Desimone model. (c) Simulation results from the feedback model. For (a) the response was measured in spikes per second and time in milliseconds; for ( b) and (c) both response and time are in arbitrary units and have been scaled to resemble (a).
stimulus, resulting in a response that is intermediate between the responses generated by each stimulus in isolation. When attention is directed to the preferred stimulus this has the effect of multiplying the synaptic efficiencies from the attended input by a factor of five, and causes the response of the output node to become dominated by the attended stimulus. This experiment was also simulated with the feedback model, using the network shown in Figure 5b. For this simulation, each output node had a preference to a different one of the two inputs. The input nodes received feedback connections from the output nodes, which had weights proportional to the corresponding feedforward connections. In addition, the input nodes each received a feedback connection, of weight 0.5, from different attention-dependent sources of feedback. The results of the simulation, showing the response of one output node, are shown in Figure 4c. When only
Figure 5. Details of the neural network architectures used to simulate the data shown in Figure 4. Nodes are shown as large circles, excitatory synapses as small open circles, and inhibitory synapses as small filled circles. The x values represent top-down biases that have an activation value of either zero or one depending on the attentional state. These top-down signals are assumed to arise from neural generators outside the modeled circuits. Feedforward (and lateral) connections are shown as solid lines and feedback connections are shown as dashed lines. Values for the synaptic strengths are indicated; these weights were chosen to provide the best fit between the behavior of each model and the experimental data (see the Methods section). (a) The Reynolds and Desimone model. (b) The feedback model.
one stimulus is presented to the network, the output node with the preference to this input quickly wins the competition. Hence, the recorded node is strongly active in response to its preferred stimulus, but has a weak and brief response to its nonpreferred stimulus. When both stimuli are present, both output nodes are strongly activated and there is ongoing competition between them. Hence, the response of the recorded node to the pair of stimuli is less than its response to its preferred stimulus in isolation. When attention is directed to the preferred stimulus, one of the top-down signals is active, modulating the activity of one input node so that the feedforward activation received by the output nodes is stronger for one input than the other. The output of the node with the preference for this amplified input is therefore enhanced. Note that for both models a similar effect would result if the activities in the two input nodes were made unequal, not by differences in the strength of the attention-dependent biases, but by a difference in the strength of the feedforward activation each node received, for instance, due to the stimuli having unequal contrasts. Hence, a strong bottom-up signal can bias competition in just the same way that top-down modulation can. Such effects have been observed experimentally (Kastner & Ungerleider, 2000; Vecera, 2000; De Weerd et al., 1999; Reynolds & Desimone, 1999). Furthermore, training has been observed to increase the apparent salience of a stimulus ( Jagadeesh, Chelazzi, Mishkin, & Desimone, 2001). This effect is explained because training is likely to enhance the neuronal representation of the training stimulus and hence provide that stimulus with a competitive advantage ( Jagadeesh et al., 2001). Attentional Facilitation When attention is directed to a particular location, the processing of a stimulus appearing at that location is
enhanced (Kirschfeld & Kammer, 2000; Reynolds et al., 2000; Kastner et al., 1999). Increased attention has effects similar to increasing the contrast or saliency of the stimulus (Itti & Koch, 2001; Olson, 2001; Kastner & Ungerleider, 2000; Reynolds et al., 2000). Results from one experiment illustrating this effect are shown in Figure 6a. In this experiment, only one stimulus appeared within the receptive field of the recorded cell. The response of the cell was measured when the location of this stimulus was attended and when it was not attended for varying stimulus contrasts. The data shown was measured from area V4 of the rhesus monkey. The Reynolds and Desimone model (implemented by the network architecture of Figure 7a) accounts for attentional facilitation by modulating the synaptic efficiencies of the afferents projecting to the recorded neuron. Simulation results for this model are shown in Figure 6b. The feedback model can also simulate the effects of facilitation by amplifying the activation of the attended input node. Results for the feedback model, generated using the neural network architecture shown in Figure 7b, are plotted in Figure 6c. Both neural network architectures successfully simulate the change in magnitude of attentional facilitation with stimulus contrast. Attention produces the largest change in response at intermediate stimulus contrasts and has only a weak effect at high contrast. Since suboptimal stimuli were used to generate the physiological data, it appears that weak attentional modulation at high contrast is not due to the firing rate of the recorded cell reaching saturation. The two models explain this effect in different ways. For the Reynolds and Desimone model, as the activation of the input node increases, the response of the output node tends towards the maximum response that is possible given the balance between the excitatory and inhibitory afferents. Further increasing the effective strength of the input via attention has a diminishing effect as this upper limit is reached. In contrast, in the feedback model it is saturation of the response of the node in the lower region that results in attention having less effect at high contrast. Whereas both models account equally well for the general effects of attentional facilitation, the models can be distinguished when compared to the finer detail of the physiological data. It can be seen that for the biological data, response latencies reduce with increased stimulus contrast but are not affected by attention (in this respect, increasing attention is unlike increasing the stimulus contrast). For the Reynolds and Desimone model, response latencies correctly reduce not only with increased stimulus contrast but also with attention. For the feedback model, response latencies are marginally reduced by stimulus contrast but not by attention. The feedback model thus provides a better fit to the detailed experimental data.
It can also be seen from the physiological data that attentional modulation occurs at longer latencies with increasing contrast. Reynolds et al. (2000) state that this effect can be explained by the biased competition model; however, the implementation they propose fails to do so. Instead, with their neural network architecture the effect of attention at high contrast is at least as strong at short latencies as it is at longer latencies. In contrast, the feedback model does successfully simulate this aspect of the experimental data. It does so due to the input node reaching saturation during the early part of its response at high contrast. The top-down activation is therefore unable to amplify the early activity of the input node. However, at longer latencies the response of the input node is attenuated (see Equation 5 in the Methods section) enabling attention to modulate activity at longer latencies. In contrast with the above experiment, other physiological data more directly illustrates that attention multiplicatively modulates neural response. A cell will generate a range of different firing rates when its response is measured across a range of different stimuli. For example, a cell with selectivity for stimulus orientation will generate its maximum response at one stimulus orientation and progressively weaker responses to stimuli at orientations that increasingly deviate from the preferred one. Attention has purely multiplicative effect changing the height, but not the width, of such a tuning curve, such that the percentage increase in response is approximately the same for each stimulus (McAdams & Maunsell, 1999; Treue & Martinez-Trujillo, 1999; Treue, 2001). Hence, for these experiments the largest absolute increase in response is for the stimulus that generates the strongest activation. These results initially appear to be at odds with those described above. However, both models can explain these data. For the Reynolds and Desimone model, afferents from strong and weak stimuli will have a different balance of excitatory and inhibitory weights. For a given stimulus contrast, attentional modulation of these weights will have the same relative effect on the response of the output node to each stimulus. Similarly, for the feedback model, the activation of the input nodes corresponding to weak and strong stimuli would be modulated equally by attention, but these activation values would drive the output neuron via differently weighted afferents and hence cause attention to have a proportionate effect on the response of the output node. Alternatively, with the feedback model, the activation of the recorded cell might be directly modulated by an attention-dependent, top-down, bias. Featural Attention Rather than specifying the spatial location that is to be attended, it is also possible to experiment with the effects of cueing the target object that is to be attended.
Volume 16, Number 2
Figure 6. The effect of changing stimulus contrast on spatial attention. Responses to a single stimulus within the receptive field of the neuron are shown with and without attention. The contrast of the stimulus increases from left to right. (a) The averaged response for a population of cells in V4 (adapted from Reynolds et al., 2000). ( b) Simulation results from the Reynolds and Desimone model. (c) Simulation results from the feedback model. For (a) the response was measured in spikes per second and time in milliseconds; for ( b) and (c) both response and time are in arbitrary units and have been scaled to resemble (a).
In one such experiment (Chelazzi, Miller, Duncan, & Desimone, 2001), rhesus monkeys were presented with an array containing one or two objects, one of which might have previously been cued as the target for a saccade. Responses were measured from cells in area V4, with receptive fields sufficiently large to encompass the
stimulus array. Different responses were generated when the target object was the preferred stimulus of the recorded cell compared to when the target was a nonoptimal stimulus (see Figure 8a). Results are similar to those for attentional selection using spatial cues (presented above) in that when the stimulus array
Figure 7. Details of the neural network architectures used to simulate the data shown in Figure 6. Nodes are shown as large circles, excitatory synapses as small open circles, and inhibitory synapses as small filled circles. The x values represent top-down biases that have an activation value of either zero or one depending on the attentional state. These top-down signals are assumed to arise from neural generators outside the modeled circuits. Feedforward (and lateral) connections are shown as solid lines and feedback connections are shown as dashed lines. Values for the synaptic strengths are indicated; these weights were chosen to provide the best fit between the behavior of each model and the experimental data (see the Methods section). (a) The Reynolds and Desimone model. ( b) The feedback model.
contains a pair of objects the response of the cell becomes more similar to the response that would be generated by the attended stimulus in isolation. Figure 8b shows the simulation of these data generated using the Reynolds and Desimone model. The details of the neural network architecture used are shown in Figure 9a. It can be seen that this architecture is identical (except for the synaptic weight values) to that previously used to simulate the effects of spatial selection (see Figure 5a). The only differences between these simulations are the combinations of sensory and top-down signals used to generate the results. However, it might be expected that the source of feedback should differ between featural and spatial attention. For spatial attention, we would expect that
feedback to ventral areas is transmitted via the dorsal stream. Differential feedback would be received by ventral regions with receptive fields at the appropriate scale to define the attended region. In contrast, for featural attention we would expect feedback signals to be transmitted via the ventral pathway. Hence, featural attention should modulate nodes in the higher region and be transmitted down to the lower region via the feedback connections between these regions. Although this cannot be simulated using the Reynolds and Desimone model, it can be using the feedback model. Figure 9b shows the neural network architecture used to simulate the data using the feedback model, and the results of this simulation are shown in Figure 8c. In contrast to the network used to simulate the spatial attention experiments presented previously, in this implementation, top-down signals are used that provide biases to nodes in the higher cortical region.1 When a single stimulus is presented to the network the node with the preference to that stimulus wins the competition and inhibits the output of the other node. When both stimuli are presented, both nodes in the higher region receive equal feedforward activation. In such condition even a small top-down signal to one node, or the other, can bias the competition in favor of that node. Familiarity and Context In all the tasks described above, the attended object or location was cued before stimulus onset. Top-down activity thus corresponded to an expectation about the content of the test stimulus. In other tasks the required focus of attention may not be defined a priori but may be influenced entirely by the content of the stimulus itself. Similarly, in nonattentional tasks, the content of the visual scene may influence the
Figure 8. The effect of featural attention on the response of a neuron. Responses are shown for different combinations of stimuli appearing within the receptive field of a single neuron. (a) The averaged response for a population of cells in V4 (adapted from Chelazzi et al., 2001). ( b) Simulation results from the Reynolds and Desimone model. (c) Simulation results from the feedback model. For (a) the response was measured in spikes per second and time in milliseconds; for ( b) and (c) both response and time are in arbitrary units and have been scaled to resemble (a).
226
Journal of Cognitive Neuroscience
Volume 16, Number 2
Figure 9. Details of the neural network architectures used to simulate the data shown in Figure 8. Nodes are shown as large circles, excitatory synapses as small open circles, and inhibitory synapses as small filled circles. The x values represent top-down biases that have an activation value of either zero or one depending on the attentional state. These top-down signals are assumed to arise from neural generators outside the modeled circuits. Feedforward (and lateral) connections are shown as solid lines and feedback connections are shown as dashed lines. Values for the synaptic strengths are indicated, these weights were chosen to provide the best fit between the behavior of the model and the experimental data (see the Methods section). (a) The Reynolds and Desimone model. (b) The feedback model.
processing of specific stimuli. Perception will be influenced both by bottom-up biases, such as stimulus saliency, and by top-down influence such as familiarity (Chun, 2002; Olson et al., 2001). In this section we present various experiments on the effect of familiarity on neural activity in the feedback model. In these simulations, two regions of neurons are reciprocally connected by feedforward and feedback connections, as illustrated in Figure 10a. The lower region receives sensory input in the form of an eight-by-eight pixel image. Nodes in this region have small receptive fields and are selective to simple patterns within the input image (short horizontal and vertical bars, examples of which are shown in Figure 10c). Nodes in the higher region receive
feedforward connections from all the nodes in the lower region. These nodes thus have larger receptive fields and can learn to become selective to larger, more complex patterns within the input (stimuli composed of several individual bars, as shown in Figure 10b). The selectivity of the nodes in the upper region is determined by prior experience. Nodes learn to become selective to frequently reoccurring patterns within the input data. Hence, unlike previous simulations, where appropriate synaptic weights were set by hand, in these simulations synaptic weights are learned.2 Furthermore, in contrast to the simulations presented above, where the top-down signals were generated by sources outside the modeled circuits, in this section the simulations do not make use of external sources of feedback. Instead, the effects of feedback processes are generated by the recurrent connectivity between the two model cortical regions. Figure 11 shows the response of a single node in the lower region under different conditions. The recorded node is selective to the bar outlined in the images shown at the top of the figure. The response of this node when different patterns are presented to the network is shown. Note that in each case the recorded neuron receives an identical sensory input; however, the context within which this image feature appears changes. Before training, nodes in the upper region are equally unselective to all patterns, and hence the recorded node receives similar top-down activation in each case and the response of the recorded node is the same for each stimulus. However, after training, one node in the higher region becomes selective to the familiar pattern. The feedback from this node modulates the activation of the recorded node, such that after training the recorded node shows an enhanced response to a familiar pattern compared to a novel pattern. A similar enhancement to the response of the node would be
Figure 10. (a) The neural network architecture used for experiments on familiarity and context. Nodes in the lower region receive feedforward connections from an eight-by-eight pixel input image. Nodes in the upper region receive feedforward connections from all the nodes in the lower region. Reciprocal feedback connections also exist between nodes in the upper region and nodes in the lower region. (c) Examples of image features to which nodes in the lower region are selective. ( b) The patterns to which nodes in the upper region become selective after training.
Spratling and Johnson
227
Figure 11. The effect of familiarity on the response of a simulated neuron in the feedback model. The two patterns shown at the top of the figure were presented to the network and the response of a node in the lower region, which was selective to the outlined bar, was recorded. Responses were recorded (a) before and ( b) after a node in the upper region was trained to respond to the familiar pattern. Before training, the recorded node generates equal responses to both stimuli. After training, the node generates an enhanced response to the familiar stimulus compared to the novel stimulus. The insets in both figures show the activity, in response to both patterns, of the node in the upper region, which becomes selective to the familiar pattern after training. Before training, this upper region node generates a weak response to both stimuli. After training, the upper region node produces a much stronger response to the familiar stimulus compared to the novel stimulus. This difference in activity in the upper region results in the modulation observed in the response of the lower region node. Both response and time are measured in arbitrary units, but the same scale has been used in each plot.
expected in a task exploring feature-based attentional facilitation. Note that in this experiment, response enhancement is driven by the stimulus and so it could be described as resulting from a bottom-up process. However, the modulation results from feedback to the recorded node, and is determined by the response properties of neurons in a higher level region, so that, in this sense, it is a top-down process. For the recorded neuron, the sensory-driven stimulation is identical in each case, and changes in the response are due to stimulus features that occur outside its receptive field. The response of a cortical cell to a particular stimulus is known to be highly dependent on the context in which that stimulus appears (Gilbert, Ito, Kapadia, & Westheimer, 2000). Whereas stimuli outside the receptive field are incapable of generating a response when presented in isolation, they can modulate the activation generated by stimuli appearing within the receptive field. Many of these effects are due to long-range, horizontal connections intrinsic to a cortical region (Li, 1999, 2000; Lamme et al., 1998; Somers et al., 1998; Gilbert, Das, Ito, Kapadia, & Westheimer, 1996; Stemmler, Usher, & Niebur, 1995). However, similar modulatory contextual effects also results from feedback (Olson et al., 2001; Gilbert et al., 2000; Hupe et al., 1998; Zipser et al., 1996). One role for such feedback effects is figure/ground segmentation (Lamme, 2000; Hupe et al., 1998; Roelfsema, Lamme, & Spekreijse, 1998, 2000; Zipser et al., 1996). Spatially distributed image features may be grouped together into a whole and segmented from
the background by feedback modulation, which provides the neurons representing these elements with a competitive advantage over neurons representing other image features (Reynolds & Desimone, 1999). Physiological data show that the response of a node is enhanced when its receptive field is within the figure compared to when it receives identical stimulation from a location on the ground (Lee et al., 1998; Zipser et al., 1996; Lamme, 1995; Lamme et al., 1998, Lamme, Supe r, Landman, Roelfsema, & Spekreijse, 2000; ` Lamme & Spekreijse, 2000). The enhancement to activity occurs during the sustained response of the neuron at longer latencies. This late onset of modulation could be due either to response saturation during the initial burst of activity (as postulated to explain the results shown in Figure 6), or it could result from a delay in receiving feedback from higher regions (Treue, 2001). Behavioral results show that figure/ground segmentation is influenced by experience such that a familiar shape is more likely to be perceived as the figure (Peterson, Harvey, & Weidenbacher, 1991). We thus modified the previous task to stimulate figure/ground segmentation. In this case, the responses of two different nodes in the lower region are plotted when a single stimulus is presented. The stimulus is the superposition of a familiar and a novel pattern (see Figure 12). Both recorded nodes are selective to a bar that is contained within different familiar patterns. However, the presented pattern contains only one of those familiar patterns (together with the novel pattern). Before training, nodes in the higher region are
Volume 16, Number 2
unselective, and hence both nodes receive equally weak feedback and respond similarly to the stimulus. After training, the stimulus evokes a strong response from the node in the higher region that is selective to the embedded familiar pattern. This node provides feedback to one of the recorded nodes that enhances the response of this node in comparison to the other recorded node. The result in this case is similar to the effect that would be expected in an attentional selection task. Similar results of training on a search task have been described by Lee, Yang, Romero, and Mumford (2002). In this experiment, the responses of cells in the primary visual cortex were found to be dependent on experience and the behavioral relevance of stimuli. These cells only became sensitive to certain oddball stimuli defined by shape-from-shading after training. Perceptual popout saliency depended on experience and appeared to result from feedback from area V2. Higher order stimulus attributes were thus shown to influence lower level processing. Figure 13 shows results for an experiment identical to the one above, except that a different stimulus was used. Again, with this stimulus, both recorded neurons receive the same feedforward stimulation. However, in this case the stimulus contains nearly all the pixels that make up a familiar pattern, plus a few extra active pixels. The missing and additional pixels can be interpreted as noise in the image. The responses of two neurons in the lower region are shown. One node receives input from a bar that forms part of the
familiar pattern but in which a pixel is missing. The other node receives input from the additional pixels. Prior to training, both nodes generate the same response. After training, the node that responds to the bar that is part of the familiar pattern has an enhanced response compared to the activation evoked in the other node by the noise. Hence, higher level knowledge about previous events stored in the feedforward synaptic weights can provide top-down information to enable familiar stimuli to be represented more strongly than unfamiliar stimuli or background noise. Attention has been suggested as one possible mechanism for solving the binding problem (Roelfsema et al., 2000; Reynolds & Desimone, 1999; Luck & Ford, 1998; Treisman, 1998). One proposal is that spatial attention acts to enhance the processing of stimulus features across multiple dimensions, and hence provide a cue for grouping together all the features originating from a single object at one location (Treisman, 1998). More generally, top-down information could act to disambiguate sensory data for which there would otherwise be multiple interpretations. Such a proposal is compatible with the feedback model since the competition between nodes representing incompatible interpretations of the sensory data could be biased by top-down signals. This would enhance the responses of nodes compatible with the biased interpretation and enable these nodes to suppress the activity of nodes representing other possible interpretations. Similarly, sources of feedback, other than those generated by attentional demands, could
Figure 12. The effect of familiarity on the responses of two simulated neurons in the feedback model. The pattern shown at the top of the figure was presented to the network and the responses of two nodes in the lower region were recorded. The recorded nodes were selective to the two outlined bars indicated, each of which formed part of a different familiar pattern (shown in Figure 10b). The presented pattern provided equal sensory-driven activity to both nodes but contained only one of the familiar patterns. (a) Before training, each node generates equal responses to the stimulus. ( b) After training, one node generates an enhanced response to the bar that is part of the presented familiar stimulus, compared to the other node. The insets in both figures show the responses of the two nodes in the upper region that become selective to the familiar patterns after training. Before training, both upper region nodes generate a weak response to the stimulus. After training, the upper region node that is selective to the presented familiar pattern produces a much stronger response to the stimulus compared to the other node. This difference in activity in the upper region results in the modulation observed in the responses of the lower region nodes. Both response and time are measured in arbitrary units, but the same scale has been used in each plot.
229
Figure 13. The effect of familiarity on noise suppression in the feedback model. The pattern shown at the top of the figure was presented to the network and the responses of two nodes in the lower region were recorded. The recorded nodes were selective to the two outlined bars indicated, each of which formed part of a different familiar pattern. The presented pattern was a corrupted version of one of the familiar patterns, such that each node received equal sensory-driven activity. a) Before training, each node generates equal responses to the stimulus. (b) After training, one node generates an enhanced response to the bar that is part of the presented familiar stimulus, compared to the other node. Both response and time are measured in arbitrary units, but the same scale has been used in each plot.
also act to bias the interpretation of feedforward information. For example, Figure 14 illustrates how contextual information from outside a neurons receptive field could bias the interpretation of images presented to the network described throughout this section. We introduce an ambiguous situation in the previous task by making all the pixels in one quadrant of the image active. Given the selectivities of the nodes in the lower region, this input might be interpreted either as four horizontal bars or as four vertical bars. Hence, due to the presence of multiple objects in the image, there is a need to group together features that belong to a single object, i.e., to determine whether a particular pixel should be grouped with other pixels in the same row, or with other pixels in the same column. Without any feedback, the competition between the nodes representing horizontal bars and those representing vertical bars is resolved entirely by the relative strengths of the feedforward connections, or in the event of all weights being equal, the competition is resolved randomly (due to the noise in the node activations). However, if contextual information is provided, by presenting the ambiguous features in combination with a familiar pattern, then the competition is biased in favor of one interpretation or the other (see Figure 14). Feedback from the node in the higher region, activated by the familiar pattern, modulates the activity of all nodes that are consistent with this familiar pattern. One node with a receptive field within the lower left quadrant thus receives feedback
activation. This additional, top-down information is sufficient to bias the competition in favor of either the horizontal or the vertical bars, as appropriate to the context. Similar results would be expected if feedback was provided by an attention-dependent top-down signal rather than being generated by a familiar stimulus context.
DISCUSSION
We have presented a model of visual attention in which neurons compete to respond to stimuli. The outcome of this competition is influenced not only by bottom-up, sensory driven processes, but also by topdown, attention-dependent biases. Several other models of attention are based on the same principles of competition and cooperation (e.g., Corchs & Deco, 2002; Deco, Pollatos, & Zihl, 2002; Rolls & Deco, 2002; Grossberg & Raizada, 2000; Hamker, 1999, 2002; Usher & Niebur, 1996; Phaf, Van der Heijden, & Hudson, 1990), and some of these models have been applied to simulating single-cell data of the type that we have used to evaluate our model. For example, experimental data for selective spatial attention have been modeled by Corchs and Deco (2002) and Grossberg and Raizada (2000), and data concerning object-based selective attention have been modeled by Usher and Niebur (1996) and Hamker (2002). However, we believe that our model advances work in this area in several ways. First, we have demonstrated that a single
Volume 16, Number 2
model can account for physiological data associated with both featural and spatial attention, as well as the complex interaction between attention and stimulus contrast. Despite the relative simplicity of our model, it stimulates these data in detail. Second, our model uses novel mechanisms to implement the competition between nodes and to model the modulatory effects of feedback pathways. Both these mechanisms are biologically plausible (Spratling & Johnson, 2001, 2002; Spratling, 2002), and the latter mechanism provides an account for the anatomically observed asymmetry between ascending and descending interregional cortical connections. Finally, we have demonstrated that the same model can also account for other topdown processes in visual perception. The proposed model thus suggests that each of these phenomena, which are currently considered to be distinct, result from common mechanisms. Hence, our model extends
the applicability of the biased competition theory by suggesting that this principle is involved in a wide range of different perceptual phenomena. Competition between neural representations is a common feature of cortical information processing (Keysers & Perrett, 2002; OReilly, 1998). Furthermore, modulatory effects on neural activity have been proposed as a common computational mechanism used throughout the cortex (Salinas & Thier, 2000; Salinas & Sejnowski, 2001; Phillips & Singer, 1997). Our model combines these two mechanisms to suggest that top-down information originating from a wide range of different sources can modulate neutral activity and bias competition. Many other neutral network models of visual attention have been proposed that do not employ the principles of biased competition. For example, in many models attentional selection and object representation occur in separate interacting systems (e.g., Heinke,
Figure 14. The effect of familiarity on parsing ambiguous patterns in the feedback model. The stimulus presented to the network consisted of all pixels in the lower left quadrant being active together with the remainder of one of the two familiar patterns. The responses of eight nodes in the lower region were recorded. The recorded nodes were selective to the horizontal or vertical bars within the lower left quadrant, as indicated. (a) and (c) Before training, all the nodes representing vertical bars are active, irrespective of the contextual information. This is due to the nodes representing the vertical bars having slightly stronger feedforward weights in the particular network tested. ( b) After training, when the familiar pattern is consistent with there being a horizontal bar in the lower left quadrant, all the nodes representing horizontal bars are active. (d) After training, when the familiar pattern is consistent with there being a vertical bar in the lower left quadrant, all the nodes representing vertical bars are active. Both response and time are measured in arbitrary units, but the same scale has been used in each plot.
231
Humphreys, & di Virgilo, 2002; Heinke & Humphreys, 2003; Itti & Koch, 2001; Mozer, 1988; Koch & Ullman, 1985). These models typically use a saliency map or selection network in which competition occurs between different spatial locations. This module is responsible for selecting the focus of attention and is used to gate the perceptual input to a separate neural system that performs object recognition. Rather than postulating a separate cortical system for the selection and control of the focus of attention, the biased competition model suggests that attention is an emergent property of many neural mechanisms working together to resolve competition for visual processing (Desimone & Duncan, 1995, p. 194). Because competition can occur at each stage in the visual processing hierarchy, this model resolves the dichotomy between early (Treisman, 1969; Broadbent, 1958) and late (Shiffrin & Schneider, 1977; Deutsch & Deutsch, 1963) selection theories of attention by challenging the traditional view that there are distinct preattentive and attentive stages in perceptual processing. Rather than proposing that attention always operates to select objects at an early stage of visual processing, or that attentional selection operates at a late stage in visual processing, the biased competition model suggests that selection occurs at different stages depending on the stimuli and the task. In many of the experiments we have reported, topdown information is provided by sources that are external to the simulated neural network. The biased competition model proposes that these top-down signals originate in cortical regions that respond distinctively in each of the attentional task conditions. Physiological data suggest that such signals might originate in a number of frontal and parietal regions of the cortex (Kanwisher & Wojciulik, 2000; Kastner & Ungerleider, 2000). These regions thus seem to be responsible for the selection and control of attention in these tasks. A very similar set of cortical regions are implicated in the provision working memory (Kastner & Ungerleider, 2000; Luck et al., 1997). The sustained activity of cells in these regions provides working memory and generates top-down signals that can affect the responses of cells throughout the ventral hierarchy ` (Super, Spekreijse, & Lamme, 2001; Duncan, 1998). These effects on neural activity, recorded during working memory tasks, are very similar to those recorded during attention tasks in the absence of visual stimulation (Kastner & Ungerleider, 2000). Hence, top-down signals from working memory are likely to bias processing of subsequently presented stimuli (de Fockert, Rees, Frith, & Lavie, 2001; Reynolds & Desimone, 1999; Desimone, 1998). It therefore seems likely that the feedback signals in the attentional tasks we have simulated originate in working memory because all these tasks required a memory to be maintained of the cued spatial location or the cued target object.
Whereas top-down bias results from working memory in certain experiments, it results from long-term memory in the experiments on familiarity and context. In these tasks, top-down bias was generated by the selectivity of the modeled neurons rather than being provided by an external source. The reciprocal connectivity of the modeled neural network then results in the modulation of neural response properties. Training causes a node to become selective to a certain stimulus. This node subsequently generates a stronger response to this stimulus, which can bias processing of perceptual data via feedback. Such bias can modulate the responses of nodes in the previous region. It could also be transmitted via feedforward connections, to increase the strength (or saliency) of the bottom-up signal corresponding to the familiar item in subsequent processing stages. In this way, the processing of the perpetual data is biased by the selectivity of each node. Because the selectivities of cortical cells at all stages of the visual processing hierarchy can be modified by past experience (Sigala & Logothetis, 2001; Kobatake, Wang, & Tanaka, 1998; Logothetis, 1998; Desimone, 1996; Karni, 1996), sensory-driven activity is biased to be interpreted in terms of stored knowledge about previously encountered situations (Lee et al., 2002; Siegel et al., 2000). In conclusion, feedback information can originate from a variety of sources and be generated at a variety of times under different task conditions. Such topdown information has thus been given a variety of names, such as attention, expectation, context, and familiarity. We suggest that an identical feedback mechanism is responsible for all these top-down effects on perceptual processing. We have demonstrated this claim by presenting a single neural network model that can account for a number of perceptual processes that are currently considered distinct. The model proposes that cued attention to a target location, or object, results in top-down activity that operates via cortical feedback projections to modulate sensory-driven neural activations and affect the ongoing competition between cells (Reynolds et al., 1999; Duncan, 1998; Desimone & Duncan, 1995). Similarly, our model suggests that during figure/ground segmentation, feedback may enhance the activities of neurons representing spatially distributed image features, providing these elements with a competitive advantage over neurons representing other image features (Reynolds & Desimone, 1999). Furthermore, with this model feedback may also act to resolve ambiguities in sensory data by providing bias for one possible interpretation over all others and hence serve as a mechanism to solve the binding problem (Roelfsema et al., 2000; Reynolds & Desimone, 1999; Luck & Ford, 1998; Treisman, 1998). A final phenomenon that can be accounted for by this model is contextual cueing, the process by which high-level knowledge
Volume 16, Number 2
generates top-down biases for perceptual processing (Chun, 2002; Olson et al., 2001).
METHODS
The Reynolds and Desimone Model The total excitatory and inhibitory activation received by the output node is calculated as: y12;excit
2 X i1 2 X i1
w yi1 xi1 i12
the basal dendrite of the same node, ma is the total number of synapses on the apical dendrite, mb is the total number of synapses on the basal dendrite, vijk is the synaptic weight from input i to the apical dendrite of node j in region k, wijk is the synaptic weight from t input i to the basal dendrite of node j in region k, x ijk is the activation of input i to the apical dendrite of t node j in region k, and X ijk is the activation received by the basal dendrite of node j in region k from input i after preintegration lateral inhibition3:
t t Xijk xijk
1 t max
y12;inhib
w yi1 xi1 i12
p1 p 6 j
where yi1 is the activation of node i in the input region, w+ is the strength of the excitatory weight from input i, i12 w is the strength of the inhibitory weight from input i, i12 and xi1 is the strength of the attention-dependent signal directed at input i. Reynolds et al. (1999) suggest that xi1 should take a value of 5 for attended stimuli, and 1 for unattended stimuli. The time-varying output of the node can be calculated as:
t t1 y12 y12
t1 t1 y12 y12;excit y12;inhib y12
where a is a parameter controlling the rate of decay of a nodes activity, b is the maximum response of a node, and g is a parameter that updates the activity slowly so as to dampen oscillatory behavior. Reynolds et al. (1999) specify parameter values of a = 0.2 and b = 1 and we use a value of g = 0.1 in our simulations. If the input activity remains constant for a sufficient time, the output reaches a steady-state value given by:
t y12
t!1
y12;excit y12;excit y12;inhib
where at is a scale factor controlling the strength of t1 lateral inhibition, y pk is the activation of node p in region k at time t 1 (defined in Equation 4), and (z)+ is the positive half-rectified value of z. In the reported simulations, the value of at was gradually increased at each time step, from an initial value of zero to a maximum value of six in steps of 0.1. t The values of the apical inputs (x ijk ) are the activations of nodes in a higher cortical region at the previous t1 time step (i.e., y jk + 1), or are top-down signals that arise from neural generators outside the modeled circuits. t The values of the basal inputs (X ijk ) are the inhibited activations from nodes in a lower cortical region at the t1 previous time step (i.e., y jk 1). Note that the activations of every node are determined in an identical way and that attention-dependent top-down signals are treated in the same way as any other source of feedback: they provide another activation value, x, that contributes to the apical activation via a synaptic weight, v. The activation of the apical dendrite multiplicatively modulates the activation of the basal dendrite in order to determine the output activation of each node:
t t t yjk yjk;basal 1 yjk;apical
9 = yt1 wipk pk n o m :maxq1 wqpk max n y t1 ; q1 qk 8 <
The Feedback Model For each node the activations of the apical and basal dendrites were calculated as:
t yjk;apical ma X i1 t1 vijk xijk
t yjk;basal
mb X i1
t1 wijk Xijk
where y tjk,apical is the activation of the apical dendrite of node j in region k at time t, y tjk ,basal is the activation of
t where y jk is the activation of node j in region k at time t. This formulation enables bottom-up, sensory-driven stimulation to drive the response of the node even in the absence of top-down activity. In contrast, feedback activation cannot drive the nodes activity in the absence of feedforward activation, but it can amplify any response to feedforward stimulation. The presence of reciprocal excitatory connections can lead to positive feedback effects resulting in runaway activation values. To prevent this, the activity of each node is attenuated in proportion to the
233
t1 cumulative strength of its previous activity (C jk ), and this attenuated activation is clipped to be in the range [0,1]: " #1 t yjk t yjk 5 t1 1 C jk
This attenuated activity has a time-varying profile that resembles that of biological neurons, having an initial burst of activity followed by a sustained response at a lower firing rate. To make this profile more realistic the change in activity of the node is smoothed by taking into account the previous activity of the node:
t t t1 yjk 1 yjk 1 1 yjk
half the corresponding feedforward weight. Given these constraints, there were 11 possible sets of weights for the interconnections between the input and output regions, and 11 possible sets of weights for the external sources of feedback. Rather than exhaustively searching through these possibilities, weights were adjusted incrementally to improve the fit with the experimental data until satisfactory results were obtained. Acknowledgments
This work was funded by MRC Research Fellowship number G81/512. Reprint requests should be sent to Dr M. W. Spratling, Centre for Brain and Cognitive Development, Birkbeck College, 32 Torrington Square, London, WC1E 7JL, UK, or via email: [email protected].
t The cumulative activity, Cjk , of the node is calculated as:
Notes
1. Rather than considering this a change in the network architecture, we can consider that the same architecture was used in previous experiments, but that the top-down signals targeting nodes in the upper region were silent, and hence could be ignored. Similarly, in the current experiment, topdown signals targeting the lower region are irrelevant and have been ignored. 2. See Spratling and Johnson (2002) for details of the learning algorithm used. Feedback connections from the upper to the lower region were also learned. An activity-dependent learning rule was used that was equivalent to the one used for training the feedforward connections. This resulted in corresponding feedforward and feedback weights having similar strengths. 3. For full details of the implementation of preintegration lateral inhibition, and for a justification of this scheme on both biological and computational grounds, see Spratling and Johnson (2001, 2002).
t Cjk
t 2 yjk
t1 2 Cjk
where t1 and t2 are time constants that take the values t1 = 0.5 and t2 = 0.25. Finally, the activity of each node was modified by a small amount of noise, such that:
t t yjk yjk 1
The noise values, , were logarithmically distributed positive real numbers in the range [0,0.01]. Since the magnitude of the noise is small, it has very little effect on neural activity except when multiple nodes have virtually identical synaptic weights. When this occurs, the noise causes one of these nodes to win the competition to be active in response to the current stimulus. Synaptic Weight Values In the simulations of the attentional data reported in the Spatial attention and Featural attention sections, appropriate values for the synaptic weights were set by hand. Because the model was being used to simulate neurophysiological data collected from different cells with distinct selectivities, different sets of weight values were used in different simulations. Weights were chosen to provide the best fit between the behavior of each model and the experimental data, with the following restrictions. Weight values were limited to the range [0, 1], and were adjusted in steps of 0.1. The total sum of the synaptic weights received P each basal dendrite was at m + made equal to one (i.e., = 1). Where a i= (wijk ) 1 network was symmetrical, corresponding weights were given the same value (e.g., the weight from input one to node one was made equal to the weight from input two to node two). Feedback weights were given a value of
REFERENCES
Barbas, H., & Rempel-Clower, N. (1997). Cortical structure predicts the pattern of cortico-cortical connections. Cerebral Cortex, 7, 635646. Broadbent, D. E. (1958). Perception and communication. London: Pergamon. Budd, J. M. L. (1998). Extrastriate feedback to primary visual cortex in primates: A quantitative analysis of connectivity. Proceedings of the Royal Society of London, Series B, 265, 10371044. Cauller, L. J. (1995). Layer I of primary sensory neocortex: Where top-down converges upon bottom-up. Behavioural Brain Research, 71, 163170. Cauller, L. J., Clancy, B., & Connors, B. W. (1998). Backward cortical projections to primary somatosensory cortex in rats extend long horizontal axons in layer I. Journal of Comparative Neurology,390, 297310. Chelazzi, L., Miller, E. K., Duncan, J., & Desimone, R. (2001). Responses of neurons in macaque area V4 during memory-guided visual search. Cerebral Cortex, 11, 761772. Chun, M. M. (2002). Contextual cueing of visual attention. Trends in Cognitive Sciences, 4, 170178 Corchs, S., & Deco, G. (2002). Large-scale neural model for Volume 16, Number 2
visual attention: Integration of experimental single cell and fMRI data. Cerebral Cortex, 12, 339348. Crick, F., & Asanuma, C. (1986). Certain aspects of the anatomy and physiology of the cerebral cortex. In D. E. Rumelhart, J. L. McClelland, & The PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructures of cognition. Vol. 2: Psychological and biological models, (pp. 33371). Cambridge: MIT Press. Crick, F., & Koch, C. (1998). Constraints on cortical and thalamic projections: The no-strong-loops hypothesis. Nature, 391, 245250. de Fockert, J. W., Rees, G., Frith, C. D., & Lavie, N. (2001). The role of working memory in visual selective attention. Science, 291, 18031806. De Weerd, P., Peralta, M. R., Desimone, R., & Ungerleider, L. G. (1999). Loss of attentional stimulus selection after extrastriate cortical lesions in macaques. Nature Neuroscience, 2, 753758. Deco, G., Pollatos, O., & Zihl, J. (2002). The time course of selective visual attention: Theory and experiments. Vision Research, 42, 29252945. Desimone, R. (1996). Neural mechanisms for visual memory and their role in attention. Proceedings of the National Academy of Sciences, U.S.A., 93, 1349413499. Desimone, R. (1998). Visual attention mediated by biased competition in extrastriate visual cortex. Philosophical Transactions of the Royal Society of London, Series B, 353, 12451255. Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18, 193222. Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 8090. Duncan, J. (1998). Converging levels of analysis in the cognitive neuroscience of visual attention. Philosophical Transactions of the Royal Society of London, Series B, 353, 13071317. Ebdon, M. (1996). Towards a general theory of cerebral neocortex. PhD thesis, University of Sussex, UK. Everling, S., Tinsley, C. J., Gaffan, D., & Duncan, J. (2002). Filtering of neural signals by focused attention in the monkey prefrontal cortex. Nature Neuroscience, 5, 671676. Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex, 1, 147. Friston, K. J., & Buchel, C. (2000). Attentional modulations of effective connectivity from V2 to V5/MT in humans. Proceedings of the National Academy of Sciences, U.S.A., 97, 75917596. Frith, C. (2001). A framework for studying the neural basis of attention. Neuropsychologia, 39, 13671371. Gilbert, C., Ito, M., Kapadia, M., & Westheimer, G. (2000). Interactions between attention, context and learning in primary visual cortex. Vision Research, 40, 12171226. Gilbert, C. D., Das A., Ito, M., Kapadia, M., & Westheimer, G. (1996). Spatial integration and cortical dynamics. Proceedings of the National Academy of Sciences, U.S.A., 93, 615622. Grossberg, S., & Raizada, R. (2000). Contrast-sensitive perceptual grouping and object-based attention in the laminar circuits of primary visual cortex. Vision Research, 40, 14131432. Hamker, F. H. (1999). The role of feedback connections in task-driven visual search. In D. Heinke, G. W. Humphreys, & A. Olson (Eds.), Connectionist models in cognitive neuroscience: Proceedings of the 5th Neural Computation and Psychology Workshop (NCPW98), (pp. 252261). London: Springer.
Hamker, F. H. (2002). How does the ventral pathway contribute to spatial attention and the planning of eye movements? In R. P. Wurtz & M. Lappe (Eds.), Proceedings of the 4th Workshop on Dynamic Perception, (pp. 8388). St. Augustin, Germany: Infix Verlag. Heinke, D., & Humphreys, G. W. (2003). Attention, spatial representation and visual neglect: Simulating emergent attention and spatial memory in the selective attention for identification model (SAIM). Psychological Review, 110, 2987. Heinke, D., Humphreys, G. W., & di Virgilio, G. (2002). Modeling visual search experiments: Selective attention for identification model (SAIM). Neurocomputing, 44-46, 817822. Hupe, J. M., James, A. C., Payne, B. R., Lomber, S. G., Girard, P., & Bullier, J. (1998). Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons. Nature, 394, 784787. Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2, 194202. Jagadeesh, B., Chelazzi, L., Mishkin, M., & Desimone, R. (2001). Learning increases stimulus salience in anterior inferior temporal cortex of the macaque. Journal of Neurophysiology, 86, 290303. Johnson, R. R., & Burkhalter, A. (1997). A polysynaptic feedback circuit in rat visual cortex. Journal of Neuroscience, 17, 71297140. Kanwisher, N., & Wojciulik, E., (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1, 91100. Karni, A. (1996). The acquisition of perceptual and motor skills: A memory system in the adult human cortex. Cognitive Brain Research, 5, 3948. Kastner, S., Pinsk, M. A., De Weerd, P., Desimone, R., & Ungerleider, L. G. (1999). Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron, 22, 751761. Kastner, S., & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience, 23, 315341. Keysers, C., & Perrett, D. I. (2002). Visual masking and RSVP reveal neural competition. Trends in Cognitive Sciences, 6, 120125. Kirschfeld, K., & Kammer, T. (2000). Visual attention and metacontrast modify latency to perception in opposite directions. Vision Research, 40, 10271033. Kobatake, E., Wang, G., & Tanaka, K. (1998). Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. Journal of Neurophysiology, 80, 324330. Koch, C., & Segev, I. (2000). The role of single neurons in information processing. Nature Neuroscience, 3, 11711177. Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219227. Kording, K. P., & Ko nig, P. (2000). Learning with two sites of synaptic integration. Network: Computation in Neural Systems, 11, 2539. Kording, K. P., & Ko nig, P. (2001). Supervised and unsupervised learning with two sites of synaptic integration. Journal of Computational Neuroscience, 11, 207215. Lamme, V. A. F. (1995). The neurophysiology of figure-ground segregation in primary visual cortex. Journal of Neuroscience, 15, 16051615. Lamme, V. A. F. (2000). Neural mechanisms of visual awareness: A linking proposition. Brain and Mind, 1, 385406. Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes Spratling and Johnson 235
of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23, 571579. Lamme, V. A. F., & Spekreijse, H. (2000). Modulations of primary visual cortex activity representing attentive and conscious scene perception. Frontiers in Bioscience, 5, 232243. ` Lamme, V. A. F., Super, H., Landman, R., Roelfsema, P. R., & Spekreijse, H. (2000). The role of primary visual cortex ( V1) in visual awareness. Vision Research, 40, 15071521. ` Lamme, V. A. F., Super, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8, 529535. Larkum, M. E., Zhu, J. J., & Sakmann, B. (1999). A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature, 398, 338341. Larkum, M. E., Zhu, J. J., & Sakmann, B. (2001). Dendritic mechanisms underlying the coupling of the dendritic with the axonal action potential initiation zone of adult rat layer 5 pyramidal neurons. Journal of Physiology, 533, 447466. Lee, T. S., Mumford, D., Romero, R., & Lamme, V. A. F. (1998). The role of primary visual cortex in higher level vision. Vision Research, 38, 24292454. Lee, T. S., Yang, C. F., Romero, R. D., & Mumford, D. (2002). Neural activity in early visual cortex reflects behavioral experience and higher-order perceptual saliency. Nature Neuroscience, 5, 589597. Li, Z. (1999). Visual segmentation by contextual influences via intracortical interactions in primary visual cortex. Network: Computation in Neural Systems, 10, 187212. Li, Z. (2000). Pre-attentive segmentation in the primary visual cortex. Spatial Vision, 13, 2550. Logothetis, N. (1998). Object vision and visual awareness. Current Opinion in Neurobiology, 8, 536544. Luck, S. J., Chelazzi, L., Hillyard, S. A., & Desimone, R. (1997). Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. Journal of Neurophysiology, 77, 2442. Luck, S. J., & Ford, M. A. (1998). On the role of selective attention in visual perception. Proceedings of the National Academy of Sciences, U.S.A., 95, 825830. McAdams, C. J., & Maunsell, J. H. R. (1999). Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. Journal of Neuroscience, 19, 431441. McAdams, C. J., & Maunsell, J. H. R. (2000). Attention to both space and feature modulates neuronal responses in macaque area V4. Journal of Neurophysiology, 83, 17511755. Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2000). Intermodal selective attention in monkeys: II. Physiological mechanisms of modulation. Cerebral Cortex, 10, 359370. Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science, 229, 782784. Mountcastle, V. B. (1998). Perceptual neuroscience: The cerebral cortex. Cambridge, MA: Harvard University Press. Mozer, M. C. (1988). A connectionist model of selective attention in visual perception. In Proceedings of the 10th Annual Conference of the Cognitive Science Society, (pp. 195201). Hillsdale, NJ: Erlbaum. Mumford, D. (1992). On the computational architecture of the neocortex: II. The role of cortico-cortical loops. Biological Cybernetics, 66, 241251. Olshausen, B. A., Anderson, C. H., & Van Essen, D. C. (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. Journal of Neuroscience, 13, 47004719. 236 Journal of Cognitive Neuroscience
Olson, C. R. (2001). Object-based vision and attention in primates. Current Opinion in Neurobiology, 11, 171179. Olson, I. R., Chun, M. M., & Allison, T. (2001). The contextual guidance of attention: Human intracranial event-related potential evidence for feedback modulation in anatomically early temporally late stages of visual processing. Brain, 124, 14171425. OReilly, R. C. (1998). Six principles for biologically based computational models of cortical cognition. Trends in Cognitive Sciences, 2, 455462. Peterson, M. A., Harvey, E. M., & Weidenbacher, H. J. (1991). Shape recognition contributions to figure-ground reversal: Which route counts? Journal of Experimental Psychology, Human Perception and Performance, 17, 10751089. Phaf, R. H., Van der Heijden, A. H. C., & Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22, 273341. Phillips, W. A., Kay, J., & Smyth, D. (1995). The discovery of structure by multi-stream networks of local processors with contextual guidance. Network: Computation in Neural Systems, 6, 225246. Phillips, W. A., & Singer, W. (1997). In search of common foundations for cortical computation. Behavioural and Brain Sciences, 20, 657722. Pollen, D. A. (1999). On the neural correlates of visual perception. Cerebral Cortex, 9, 419. Reynolds, J. H., Chelazzi, L., & Desimone, R. (1999). Competitive mechanisms subserve attention in macaque areas V2 and V4. Journal of Neuroscience, 19, 17361753. Reynolds, J. H., & Desimone, R. (1999). The role of neural mechanisms of attention in solving the binding problem. Neuron, 24, 1929. Reynolds, J. H., Pasternak, T., & Desimone, R. (2000). Attention increases sensitivity of V4 neurons. Neuron, 26, 703714. Rockland, K. S. (1998). Complex microstructures of sensory cortical connections. Current Opinion in Neurobiology, 8, 545551. Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (1998). Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395, 376381. Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (2000). The implementation of visual routines. Vision Research, 40, 13851411. Rolls, E. T., & Deco, G. (2002). Computational neuroscience of vision. Oxford, UK: Oxford University Press. Rolls, R. T., & Treves, A. (1998). Neural networks and brain function. Oxford, UK: Oxford University Press. Salinas, E., & Abbott, L. F. (1996). A model of multiplicative neural responses in parietal cortex. Proceedings of the National Academy of Sciences, U.S.A., 93, 1195611961. Salinas, E., & Abbott, L. F. (1997). Invariant visual perception from attentional gain fields. Journal of Neurophysiology, 77, 32673272. Salinas, E., & Sejnowski, T. J. (2001). Gain modulation in the central nervous system: Where behavior, neurophysiology and computation meet. The Neuroscientist, 7, 430440. Salinas, E., & Thier, P. (2000). Gain modulation: A major computational principle of the central nervous system. Neuron, 27, 1521. Schroeder, C. E., Mehta, A. D., & Foxe, J. J. (2001). Determinants of attentional control in cortical processing: Evidence from human and monkey electrophysiologic investigations. Frontiers in Bioscience, 6, d672684. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127190. Volume 16, Number 2
Siegel, M., Kording, K. P., & Konig, P. (2000). Integrating top-down and bottom-up sensory processing by somato-dendritic interactions. Journal of Computational Neuroscience, 8, 161173. Sigala, N., & Logothetis, N. K. (2001). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318320. Somers, D. C., Todorov, E. V., Siapas, A. G., Toth, L. J., Kim, D. S., & Sur, M. (1998). A local circuit approach to understanding integration of long-range inputs in primary visual cortex. Cerebral Cortex, 8, 204217. Spratling, M. W. (2002). Cortical region interactions and the functional role of apical dendrites. Behavioral and Cognitive Neuroscience Reviews, 1, 219228. Spratling, M. W., & Johnson, M. H. (2001). Dendritic inhibition enhances neural coding properties. Cerebral Cortex, 11, 11441149. Spratling, M. W., & Johnson, M. H. (2002). Pre-integration lateral inhibition enhances unsupervised learning. Neural Computation, 14, 21572179. Stemmler, M., Usher, M., & Niebur, E. (1995). Lateral interactions in primary visual cortex: A model bridging physiology and psychophysics. Science, 269, 18771880. Stuart, G., Spruston, N., Sakmann, B., & Hausser, M. (1997). Action potential initiation and backpropagation in neurons of the mammalian CNS. Trends in Neurosciences, 20, 125131.
` Super, H., Spekreijse, H., & Lamme V. A. F. (2001). A neural correlate of working memory in the monkey primary visual cortex. Science, 293, 120124. Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the Royal Society of London, Series B, 353, 12951306. Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76, 282299. Treue, S. (2001). Neural correlates of attention in primate visual cortex. Trends in Neurosciences, 24, 295300. Treue, S., & Martinez-Trujillo, J. C. (1999). Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399, 575579. Usher, M., & Niebur, E. (1996). Modeling the temporal dynamics of IT neurons in visual search: A mechanism for top-down selective attention. Journal of Cognitive Neuroscience, 8, 311327. Vecera, S. P. (2000). Toward a biased competition account of object-based segregation and attention. Brain and Mind, 1, 353384. Yuste, R., Gutnick, M. J., Saar, D., Delaney, K. R., & Tank, D. W. (1994). Ca2 + accumulations in dendrites of neocortical pyramidal neurons: An apical band and evidence for two functional compartments. Neuron, 13, 2343. Zipser, K., Lamme, V. A. F., & Schiller, P. H. (1996). Contextual modulation in primary visual cortex. Journal of Neuroscience, 16, 73767389.
237

A Feedback Model of Visual Attention: M. W. Spratling and M. H. Johnson

Uploaded by

Copyright:

Available Formats

A Feedback Model of Visual Attention: M. W. Spratling and M. H. Johnson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Feedback Model of Visual Attention: M. W. Spratling and M. H. Johnson

Uploaded by

Copyright:

Available Formats

A Feedback Model of Visual Attention

M. W. Spratling and M. H. Johnson

Birkbeck College 2004 Massachusetts Institute of Technology

Volume 16, Number 2

Journal of Cognitive Neuroscience

Volume 16, Number 2

Spratling and Johnson

Spratling and Johnson

Spratling and Johnson

w yi1 xi1 i12

w yi1 xi1 i12

t1 t1 y12 y12;excit y12;inhib y12

y12;excit y12;excit y12;inhib

9 = yt1 wipk pk n o m :maxq1 wqpk max n y t1 ; q1 qk 8 <

Spratling and Johnson

t The cumulative activity, Cjk , of the node is calculated as:

Spratling and Johnson

You might also like