A Dual Role For Prediction Error in

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Cerebral Cortex May 2009;19:1175--1185

doi:10.1093/cercor/bhn161
Advance Access publication September 26, 2008

A Dual Role for Prediction Error in Hanneke E.M. den Ouden1, Karl J. Friston1, Nathaniel D. Daw2,
Anthony R. McIntosh3 and Klaas E. Stephan1,4
Associative Learning
1
Wellcome Trust Centre for Neuroimaging, Institute of
Neurology, University College London, 12 Queen Square,
London WC1N 3BG, UK, 2Department of Psychology, New
York University, New York, NY 10003, USA, 3Rotman Research
Institute of Baycrest Centre, University of Toronto, Toronto,
Ontario, Canada M6A 2E1 and 4Branco-Weiss-Laboratory,
Institute for Empirical Research in Economics, University of
Zürich, Switzerland

Confronted with a rich sensory environment, the brain must learn In all of these previous studies, the learned associations had
statistical regularities across sensory domains to construct causal direct relevance for behavior, either because they were linked
models of the world. Here, we used functional magnetic resonance to rewarding or punishing outcomes (e.g., McClure et al. 2003;
imaging and dynamic causal modeling (DCM) to furnish neurophysi- O’Doherty et al. 2004; Seymour et al. 2004) or because subjects
ological evidence that statistical associations are learnt, even when received feedback on their performance (Fletcher et al. 2001;

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


task-irrelevant. Subjects performed an audio-visual target-detection Aron et al. 2004; Corlett et al. 2004; Turner et al. 2004). In
task while being exposed to distractor stimuli. Unknown to them, contrast, it is unclear whether incidental learning of stimulus--
auditory distractors predicted the presence or absence of subsequent stimulus associations, i.e., learning of associations that are
visual distractors. We modeled incidental learning of these associa- irrelevant for current behavioral goals, draws upon the same
tions using a Rescorla--Wagner (RW) model. Activity in primary visual neuronal mechanisms. A paradigm that shows that these types
cortex and putamen reflected learning-dependent surprise: these areas of associations are learned is sensory preconditioning. Here, in
responded progressively more to unpredicted, and progressively less a first stage, the subject is exposed to behaviorally meaningless
to predicted visual stimuli. Critically, this prediction-error response CS1--CS2 associations and, in a second stage, to CS1--US (un-
was observed even when the absence of a visual stimulus was conditioned stimulus) pairings. In a third and final stage, the
surprising. We investigated the underlying mechanism by embedding presentation of a CS2 alone generates a conditioned response,
the RW model into a DCM to show that auditory to visual connectivity indicating that the subject must have learned the initial CS1--CS2
changed significantly over time as a function of prediction error. Thus, association (Brogden 1939; Gewirtz and Davis 2000).
consistent with predictive coding models of perception, associative In this study we used a factorial design that extended the
learning is mediated by prediction-error dependent changes in con- first stage of classical sensory preconditioning paradigms.
nectivity. These results posit a dual role for prediction-error in en- Healthy volunteers performed an audio-visual target-detection
coding surprise and driving associative plasticity. task, while being exposed to a stream of concurrent audio-
visual ‘‘distractor’’ stimuli (Fig. 1). These stimuli possessed
Keywords: associative learning, cross-modal, dynamic causal modeling,
statistical regularities, which enabled prediction of the visual
effective connectivity, fMRI, Rescorla--Wagner model
distractor from the preceding auditory cue (Fig. 2). Critically,
however, these statistical associations were completely irrele-
vant to the target-detection task. Any learning of these
Introduction associations would therefore be of an incidental (task-
Among the fundaments of adaptive behavior is the ability to unrelated) nature and, in the absence of behavioral responses
predict future events. This ability is crucial to functions ranging to the learned associations, could only be inferred neurophys-
from sensory processing to decision making. In psychology and iologically. This paradigm capitalized on previous work by
neuroscience, prediction has been studied most extensively in McIntosh et al. (McIntosh et al. 1998) who used positron
the context of Pavlovian and instrumental conditioning tasks, emission tomography (PET) to show that learning of associa-
which measure how organisms anticipate (and act on) tions between sensory stimuli was reflected by activity in early
affectively significant events such as food delivery or electric visual cortex. However, the use of PET permitted only a simple
shocks. A recent series of functional neuroimaging studies has conditioning scheme and precluded a full investigation of
investigated the neurophysiological basis of prediction and dynamic changes in the brain’s representation of the learned
learning in humans. Using Pavlovian and instrumental condi- association. Here, we employed a more refined conditioning
tioning tasks, these studies have identified several areas where scheme and used functional magnetic resonance imaging
blood oxygenation level--dependent (BOLD) signals correlate (fMRI) to study learning-dependent changes in brain activity
with trial-wise estimates from formal learning models like over time. Additionally, we assessed learning-dependent
temporal difference (TD) learning (Sutton and Barto 1998) or changes in effective connectivity between auditory and visual
the Rescorla--Wagner (RW) model (Rescorla and Wagner cortex using dynamic causal modeling (DCM).
1972). In particular, BOLD activity in areas including the Using a 4-factorial design (c.f. Fig. 2), this study characterized
striatum and the dorsolateral prefrontal cortex (DLPFC) (key learning in terms of the temporal evolution (learning; factor 1)
dopaminergic targets) has been shown to covary with both of both brain activity and interregional connectivity in response
predictions and prediction errors (Fletcher et al. 2001; to a visual stimulus whose presence or absence (V+ vs. V–; factor
McClure et al. 2003; Corlett et al. 2004; O’Doherty et al. 2) was predicted in 2 contexts, established by 2 types of auditory
2004; Seymour et al. 2004; Turner et al. 2004; Gläscher and conditioning stimuli (CS+ vs. CS–; factor 3), each of which could
Büchel 2005; Pessiglione et al. 2006; Jensen et al. 2007). be present or absent on each trial (A+ vs. A–; factor 4). In other
Ó 2008 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which
permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
cue. As a consequence, any model of the learning process must
be able to formulate how a net prediction is computed from
the associative strengths of the 2 cue components. Here we
chose the RW model because it is the simplest and most
generic model of associative learning that accounts for cue
interactions (see Discussion for details). The RW model has
been validated extensively, using behavioral data from both
humans and animals and can account for many aspects of
associative learning (Schultz and Dickinson 2000; Pearce and
Bouton 2001). In our study, the trial-wise associative strength
predicted by the RW model was used to construct regressors
for a voxel-wise general linear model (GLM) of fMRI data and
modulatory inputs for dynamic causal models (Friston et al.
2003) of the effective connectivity between auditory and visual
areas. Specifically, we addressed the following 2 questions:
Figure 1. Experimental design. (A) stimuli presented during the experiment. The 1) In the absence of any behavioral responses to the
‘‘distractor’’ stimuli, whose associations are being learned incidentally, comprised 2
auditory CS corresponding to high- and low-frequency tones and one visual US audiovisual stimulus associations, can we obtain neurophysio-

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


consisting of 3 concentric squares. The target stimuli, to which the subjects logical evidence that the brain learns these associations?
responded, comprised a white noise burst and a circle. (B) Temporal sequence of Specifically, can we find brain regions whose activity correlates
a single trial. The CS and US could be either presented or omitted. The average trial with learning (throughout the paper, we will use the colloquial
duration was 2 s. The TO cue was a small central dot (100 ms); the auditory CS was
term ‘‘learning curve’’ to denote the vector of predicted
presented for 500 ms, starting 400 ms after TO. The visual stimulus was presented j
750 ms after TO, also for 500 ms. The intertrial interval (ITI) was jittered, ranging from associative strength over time, i.e., /t in eq. 1.) predicted by
350--1350 ms, and target stimuli were inserted only in the longest ITIs, lasting for 300 a generic model of associative learning (i.e., the RW model)?
ms. Candidate areas included early visual cortex and the striatum.
Furthermore, do these areas show a response profile across
cue--outcome combinations that reflects a match between
prediction and outcome or rather a prediction-error response?
2) Because the predictive auditory cue temporally precedes
the visual outcome, learning should modify neuronal activity in
early visual cortex in response to auditory cues. Can these
putative learning-related changes in visual cortex activity be
explained by changes in the effective connectivity from
auditory to visual cortex (c.f., (McLaren et al. 1989; McIntosh
et al. 1998)? Specifically, do these changes conform to changes
in associative strength under a RW model of learning?
Before describing our experiment, 2 important issues should
be highlighted. First, the goal of this fMRI study was not to
pinpoint the exact mathematical form of incidental learning by
comparing different models of associative learning. Instead, we
Figure 2. Probabilistic relationship between auditory and visual stimuli. Contingency
used the simplest (i.e., the RW) model of associative learning
tables showing the proportion of each trial type occurring during CSþ and CS blocks
respectively. Below the tables are the resulting conditional probabilities of the visual that could accommodate our paradigm. In the Discussion, we
stimulus being present (or absent), given the presence (or absence) of the auditory argue why the RW can be considered an appropriate a priori
CS; these probabilities can be inferred by comparing the frequencies within each learning model for our particular paradigm, relative to other
column of the table. models of associative learning. Second, it is important to note
that within a given experimental condition the predicted
outcomes and prediction errors are perfectly anticorrelated
words, in contrast to a classical sensory preconditioning (see Supplementary Material for details). This means they
paradigm, we could not only investigate differential learning, cannot be distinguished as alternative predictors of observed
depending on CS type but could also assess whether the brain responses. However, with our factorial design one can
consequences of an absent CS were learned. It should be noted analyze the pattern of parameter estimates across experimental
+
that both the CS and CS– context (or blocks) were balanced in conditions, contrasting expected and unexpected cue--outcome
terms of stimuli; the a priori probabilities of the auditory CS and combinations. This enabled us to distinguish, voxel by voxel,
of the visual stimulus occurring on a given trial were always 50%. brain responses that reflected a match between predicted and
Critically, the task was not related to these auditory and visual actual trial outcomes from responses that encode prediction
stimuli; subjects performed a target-detection task on unrelated error or surprise.
stimuli that were presented sporadically.
One of the features of our factorial paradigm is that on half
the trials the auditory CS is absent. This necessitates an Methods and Materials
additional cue that marks the beginning of each trial which was Subjects
a visual trial onset (TO) cue. In other words, learning of Sixteen healthy volunteers, 25.3 ± 3.3 years of age, (mean age ± SD, 8
stimulus associations in this paradigm has 2 components, one female) participated in the study. The subjects had no history of
related to the auditory CS and another related to the visual TO psychiatric or neurological disorders. Written informed consent

1176 Predictive Coding during Associative Learning d


den Ouden et al.
was obtained from all volunteers prior to the study, which was effects. Each functional brain volume comprised 34 2-mm axial slices
approved by the National Hospital for Neurology and Neurosurgery with a 2-mm interslice gap, and an in-plane resolution of 3 3 3 mm. The
Ethics Committee. field of view covered the whole brain, except for the cerebellum
and brainstem. The total duration of the experiment was approximately
60 min per subject.
Experimental Design—fMRI
The central idea of this study was to present subjects with ‘‘distractor’’
stimuli that were linked by predictive associations: 2 auditory stimuli Data Analysis
served as CS and differentially predicted whether or not a visual
stimulus would follow. Critically, the volunteers performed an un- Functional Neuroimaging Analysis
related detection task on separate auditory and visual targets; for this fMRI data were analyzed using the statistical software packaged SPM5
task, the predictive relationships between the distractor stimuli were (Wellcome Trust Centre for Neuroimaging, London, UK; http://
completely irrelevant. Stimuli were presented using Cogent2000 www.fil.ion.ucl.ac.uk/spm). The 1200 images from each subject were
(www.vislab.ucl.ac.uk/Cogent/index.html). An initial sound matching realigned to correct for head movements, corrected for movement-by-
task and the subsequent learning study (4 3 10 min) were all distortion interactions (Anderson et al. 2001), spatially normalized to
completed inside the scanner. Subjects were debriefed with a postscan the Montreal Neurological Institute (MNI) template brain, smoothed
questionnaire to assess whether they had learned the experimental spatially with a 3-dimensional Gaussian kernel of 8-mm full width half
contingencies. maximum and resampled to 3 3 3 3 3 mm voxels. The data were then
modeled voxel-wise, using a GLM that included regressors for all
Sound Matching experimental trials as well as regressors for the target-detection task.

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


Preceding the learning experiment, subjects had to match the 2 CS Trial-specific effects were modeled by trains of delta functions
(450 and 1000 Hz) and the auditory target stimulus (white noise burst) convolved with 3 hemodynamic basis functions (a canonical hemody-
for perceived loudness. Stimuli were presented sequentially and namic response function, and its temporal and dispersion derivatives).
dichotically. Subjects adapted the volume of the 1000-Hz tone to the Additionally, the time-dependent associative strengths from the RW
j
450-Hz tone until they perceived them to be of equal loudness. This model (/i;t ; see eq. 1) and their partial derivatives with respect to
procedure was repeated 8 times and the results averaged. Sub- learning rate (see next section) were used as parametric modulators of
sequently, subjects matched the perceived loudness of the white noise each trial-specific regressor. The data were high-pass filtered (cut-off
burst to the pure tones, each repeated 4 times. The adapted volumes, as 128 s) to remove low-frequency signal drifts, and a first-order
a percentage of the volume of the low tone were 94.0 ± 6.2% (mean ± autoregressive model was used to model the remaining serial
SD) for the high tone, and 104 ± 4.9% for the white noise burst. correlations (Friston et al. 2002). Contrast images of parameter
estimates encoding trial-specific effects were created for each subject
Differential Conditioning and entered separately into voxel-wise one-sample t-tests (df = 15), to
During the experiment, subjects were exposed to alternating blocks of implement a second-level random effects analysis. We report regions
trials in which one of 2 auditory CS (high and low tone) predicted the that survive cluster-level correction for multiple comparisons (family-
+
presence (CS ) or omission (CS–) of a subsequent visual stimulus with wise error, FWE) across the whole brain at P < 0.05. Because previous
a fixed probability of 80% (Fig. 1 and 2). On each trial, a CS was studies demonstrated the role of the striatum and the prefrontal cortex
+
presented (A ) with 50% probability. On 50% of all trials, a visual in associative learning (e.g., Fletcher et al. 2001; O’Doherty et al. 2004;
+
stimulus was present (V ). Every trial was preceded by a visual TO cue. Corlett et al. 2004), we performed an additional restricted search in
Our paradigm thus used a 4-factor design with the following factors these areas, using anatomical masks generated from the PickAtlas
+ +
for each trial: 1) CS context (CS vs. CS–), 2) CS presence (A vs. A–), 3) toolbox (Maldjian et al. 2003). Again, we only report activations that
+
visual outcome (V vs. V–), and 4) learning (or time). We used a mixed survived a small volume correction (SVC) at P < 0.05.
event and epoch design in which CS type was blocked, whereas the
presentation of the CS and visual outcome were randomized (event- RW Model
+
related) within blocks. CS and CS– blocks were completely balanced so We used a RW model of associative learning to generate predictors of
that in each block of 10 trials 5 CS and 5 visual stimuli were presented. learning-dependent changes in brain activity (as indexed by the BOLD
+
Within each subject, the auditory CS and CS– and their probabilistic signal) and inter-regional connectivity over time. The basic principle of
relation to subsequent visual stimuli were fixed throughout the this model is that the size of the trial-specific prediction error, that is,
experiment. The assignment of tones to the 2 CS was counterbalanced the degree of surprise incurred by an event, determines the change in
across subjects, that is, in half the subjects the high tone served as CS
+ associative strength. From the train of observed events a learning curve
(and the low tone as CS–), and vice versa the other half of the subjects. was computed and fitted to the fMRI data. Trial-specific cueing was
Each of the 4 sessions consisted of 20 blocks of 10 trials, interspersed modeled by means of 2 separate components (see Fig. 1): the visual TO
with periods of rest (12 s), in which subjects fixated on a fixation cross. cue, which was present on every trial and the auditory CS per se, which
Blocks and sessions were balanced across and within subjects. was present on half the trials. This allowed us to model learning effects
on trials where no CS was present. In the RW framework, the predicted
j
outcome on trial t, /t , is the sum of the associative strengths of each
Target-Detection Task
cue component:
To ensure continuous attention to auditory and visual targets per se
j j j
(but not their statistical associations), subjects performed a concurrent /i;t + 1 = /i;t + ei kt – /t 3 ui;t ð1Þ
target-detection task. The target stimuli were randomly interspersed where
between trials and consisted of either a white noise burst or a circle.
Target stimuli occurred on average once per block (at most 2 times). In /jt = + /ji;t 3 ui;t ð2Þ
i
total, 40 auditory and 40 visual target stimuli were presented,
randomized within conditions and sessions. On each trial t, equation (1) is calculated separately for each cue
component, indexed by i (i.e., the auditory CS, and TO), whereas ui,t
indexes which of the cue components is actually present on trial t (see
fMRI Data Acquisition the Supplementary Material). kt indicates the actual outcome at trial t,
+
A 3 Tesla Siemens Allegra MRI scanner (Siemens, Erlangen, Germany) being 1 for V and 0 for V–; et is the learning rate that determines how
was used to acquire T1-weighted fast-field echo structural images and strongly the prediction error affects the update of the prediction.
j
multislice T2*-weighted echo-planar volumes with BOLD contrast (time Separate components are summed in equation (2), where /t is the
repetition = 2.08 s). For each subject, functional data were acquired in summed prediction of whether a visual stimulus will be presented at
+
4 scanning sessions of approximately 10 min each. 306 volumes were trial t, and j indexes whether this is a CS or CS– trial. (When considered
acquired per session (1224 scans in total per subject). The first 6 for a single cue per trial, eq. 1 can also be seen as a simple model of
j
volumes of each session were discarded to allow for T1 equilibrium Hebbian or associative plasticity. In this context, /i;t encodes the

Cerebral Cortex May 2009, V 19 N 5 1177


associative strength, which changes according to the second term in
eq. 1. This associative term comprises a (presynaptic) input ui;t
encoding the outcome on any trial, and a (postsynaptic) prediction
error.)
A challenge when applying the RW model to our experiment was to
determine an appropriate learning rate. In principle this could be done
by fitting the model to behavioral data and using the resulting learning
rate to construct regressors for the fMRI analysis. However, our
experimental design deliberately precluded behavioral responses;
instead, learning could only be assessed neurophysiologically in terms
of changes in cortical activity and inter-regional connectivity. Alterna-
tive strategies are to choose the learning rate based on principled
considerations (e.g., O’Doherty et al. 2004) or using model comparison
(Gläscher and Büchel 2005). Because we knew from a previous study
that learning should occur in the visual cortex (McIntosh et al. 1998),
we adopted the approach by Gläscher and Büchel (2005) of optimizing
the value of ei to best explain putative learning-induced responses
within the main area of interest, the visual cortex. Given our volunteers
did not notice the statistical associations (and thus learning was
presumably slow) and given that another study of perceptual

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


association learning showed small learning rates eCS below 0.1
(Gläscher and Büchel 2005), we tested the following values of eCS in
Figure 3. Compound learning curves. Learning curves were calculated separately for
separate models: 0.01, 0.025, 0.05, 0.075, 0.1. We found that eCS = 0.075
trials on which the auditory CS was present (dots) and absent (crosses), during CSþ
gave the best fit to the data in primary visual cortex for the main
(blue), and CS (red) blocks. Note that learning is slower in the absence of an
contrast of interest (i.e., the 4-way interaction in a random effects auditory CS than in its presence and faster for CSþ than for CS trials.
second-level analysis); this learning rate was then used for further
analysis across the entire brain and for the connectivity analyses
described below. Importantly, we used a first-order Taylor expansion j
around the learning rate eCS = 0.075 to make the model less dependent independently of the prediction error ðkt –/t Þ elicited by the visual
on the particular choice of learning rate and to account for intersubject outcome, we tested the simple 3-way interaction CS type 3 CS presence
variability in the shape of the learning curves. This was implemented by 3 RW learning, which is independent of visual outcome.
j
including the partial derivative of the learning curve /t with respect to An important feature of our factorial design is that it enabled us to
the learning rate ei as an additional parametric modulator in the GLM determine whether the responses of a particular brain region reflected
for the fMRI data. the prediction of the visual target or the prediction error. This is
These analyses assumed that the optimal learning rate was identical important because one cannot include separate regressors based on
+
for CS or CS– trials. In additional analyses suggested by our reviewers, predictions and prediction errors in the same design matrix. This is due
we tested this assumption. We examined whether 1) a selective to the form of the RW equation, in which predictions and prediction
decrease of the learning rate for CS– trials improved our ability to detect errors are perfectly correlated (within a given experimental condi-
learning effects during this trial type, and, more generally, whether 2) tion), after mean-correction (see Supplementary Materials for details).
trial-type specific tests of the partial derivatives indicated a learning However, in a factorial design like ours such a distinction can be made
rate that was different from eCS = 0.075. As detailed in the by analyzing the pattern of parameter estimates across conditions,
Supplementary Material, neither of these analyses provided any contrasting conditions that correspond to expected and unexpected
evidence for a differential learning rate over stimuli or regions. cue--outcome combinations. Specifically, our factorial design provided
Because of its short duration and small size, the TO cue is less salient us, in a mirror-symmetric fashion, with 2 expected outcomes and 2
+
than the CS. Because in the RW model the learning rate reflects unexpected outcomes for each CS type. For example, on CS trials,
+ + – –
stimulus properties including salience (Rescorla and Wagner 1972), eTO A V and A V trials represented expected cue--outcome combinations
+ +
can be assumed to be considerably smaller than eCS. In this study eTO (conditional probability = 80%) whereas A V– and A–V trials consisted
was assumed to be 4 times smaller than the eCS. It should be noted that of unexpected cue--outcome combinations (conditional probability =
violations of this assumption are unlikely to have a dramatic effect 20%); c.f. Figure 2. This means one can effectively compare expected
because the inclusion of the derivatives enables the model to cope with and unexpected trials (with low and high prediction error, respec-
deviations from the assumed learning rates (see above). The resulting tively), with a contrast that is orthogonal to the presence or absence of
learning curves are shown in Figure 3 (see Supplementary Fig. 1A for the visual outcome and its prediction. This enabled us to distinguish,
a breakdown of the learning curves with regard to the 2 cue voxel by voxel, brain responses that reflected expected visual
components). outcomes from those that represented unexpected or surprising
outcomes. During learning, brain regions encoding prediction errors
should show increasing activation on trials where the outcome was
Statistical Analysis of Learning Effects unexpected according to the learned contingencies and decreasing (or
In our factorial design, learning is reflected by time-evolving, context- nonchanging) activation on trials where the outcome was expected.
dependent brain responses to visual stimuli. Specifically, over time, We will call such an activation pattern a ‘‘prediction-error response’’;
learning should change how differential brain responses to visual this activation pattern would be expected if surprise was the driving
stimuli depend on the presence of an auditory CS and whether it is force for learning. In this case, surprising events, or prediction errors,
+
presented in a CS or CS– context. Furthermore, the emergence of signal the need for learning in order to update predictions. This idea is
differential responses should follow the time-course predicted by the not only a core component of associative learning models (Shanks
RW model. In other words, learning is expressed as a 4-way interaction 1995; Schultz and Dickinson 2000), but is also central to predictive
CS type 3 CS presence 3 visual outcome 3 RW learning. (Note that coding theories of perception (Rao and Ballard 1999; Friston 2005):
when the CS is absent on a specific trial, this trial can be assigned that the brain should concentrate resources on representing surprising
+
unambiguously to the CS or CS– factor because this factor was sensory events.
blocked.) The primary goal of our GLM analyses was therefore to test Note that our factorial analysis was not geared towards detecting
this interaction. To establish which CS was driving this interaction, we prediction-error responses only. It was equally capable of finding
also tested, the simple (3-way) interactions CS presence 3 visual opposite activation patterns, that is, increasing activation on trials
outcome 3 RW learning within each CS type. Finally, to test for where the prediction based on the learned contingencies matched the
j
responses reflecting the prediction (/t ) entailed by the auditory CS, outcome, and decreasing (or nonchanging) activation on trials where

1178 Predictive Coding during Associative Learning d


den Ouden et al.
the prediction did not match the outcome (c.f. Baier et al. 2006). chosen local maximum. Overall, we were able to extract time series in
Notably, for our particular design, both types of responses could be 14 out of 16 subjects. In 2 subjects, V1 could not be defined due to the
identified by the same statistical test, that is, the 4-way interaction CS lack of a significant interaction that met the anatomical and functional
type 3 CS presence 3 visual outcome 3 learning (see above). Because criteria described above. These 2 subjects were excluded from the
it is only the direction of the interaction that differs between the 2 DCM analysis.
types of responses, our factorial design enabled an analysis that
simultaneously tested for these 2 aspects of associative learning. DCM specification. The question addressed by DCM was whether
learning effects in V1 could be explained by changes in the
Dynamic Causal Modeling connectivity of a simple auditory--visual network. Our DCMs modeled
In DCM, the states of multiple interacting brain regions are modeled as the entire time series, so data from all trials or conditions, trying to
a set of coupled bilinear differential equations (Friston et al. 2003). The explain regional activations by condition-dependent changes in
neuronal states, which represent the neuronal population activity of connectivity. We tested 3 simple models that could potentially account
the modeled brain regions, change in time according to the system’s for the interaction we found in V1. These models were fitted separately
connectivity and experimentally controlled inputs u. These inputs can to each subject’s data and compared using Bayesian model selection
enter the model in 2 different ways; they can either elicit responses (Penny et al. 2004). In these models, auditory and visual stimuli from all
through direct influences on specific regions (‘‘driving inputs,’’ e.g., trials elicited activity directly in their respective primary sensory areas
sensory inputs) or they can change the strength of connections (see Fig. 4). These driving inputs were modeled as individual events.
between regions (‘‘modulatory inputs,’’ e.g., task effects or learning). The first model only had a connection from A1 to V1, whereas the
The hidden neural dynamics (i.e., not directly observed by fMRI) are second and third models included the reciprocal connection (see Fig.

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


modeled by the following bilinear differential equation: 5). The A1 / V1 connection in model 1 and 2, and the V1 / A1
connection in model 3 were modulated by the Hadamard product
dz  m  j
= A + + uj B ðjÞ z + Cu ð3Þ (point-wise multiplication) of the RW associative strength /t and
dt j =1 a vector encoding visual outcome (1 for visual stimulus present, –1 for
+
Here, z is the state vector (with each state variable representing the visual stimulus absent) during CS trials. In the first 2 models, this
+
population activity of one region in the model, in this study the modulatory effect corresponds to the interaction of the auditory CS
auditory and visual cortex), t is continuous time, and uj is the j-th input prediction with the visual outcome and models a learning-dependent
+
to the modeled system (here the stimuli and learning curve). In this contribution from CS responses in auditory cortex to visual cortex
state equation, the A matrix represents the fixed (endogenous) responses that depends on whether the visual stimulus was present or
strength of connections between regions and the B(1). . .B(m) matrices not (c.f., a prediction error that rests on top-down signals from auditory
represent the modulation of these connections by (exogenous) inputs areas). In the third model, which represented a control suggested by
(in this case, learning), as an additive change. Finally, the C matrix
represents the influence of exogenous inputs on each area (here the
auditory and visual stimuli). Note that DCM allows one to make
inferences about changes in effective connections between areas,
which do not necessarily correspond to direct anatomical connections
but may be via intermediary regions.
In DCM, the hidden neuronal dynamics described by equation (3) is
linked to predicted BOLD responses by a hemodynamic forward model
(Friston et al. 2003). Given measured BOLD responses, maximum
a posterior estimates of the parameters in equation (3) can be obtained
through an optimization scheme based on variational Bayes (Friston
et al. 2003).

Choice of areas and time series extraction. The goal of the present
DCM analysis was to explain the (3-way) simple interaction CS
+
presence 3 visual outcome 3 RW learning for CS trials in V1 (see
SPM findings in the Results section) by a simple model, in which the
strength of the A1 / V1 connection was modulated as a function of
j
the RW predictions, /t (i.e., learning curves; Fig. 3). Representative A1
time series were chosen by testing for the main effect of CS presence,
and V1 time series were selected by testing for the simple interaction
described above. (The goal of DCM is to explain regional effects [as
detected in a voxel-wise GLM analysis] in terms of interregional
connectivity and its experimentally induced changes. This puts
congruence constraints on the contrast used to identify a regional
time series and the mechanisms in a DCM that are proposed to model
this time series. Therefore, different contrasts are typically required for
selecting time series representing the different areas in a model; c.f.
Stephan, Harrison, et al. 2007.) We did not model the 4-way interaction
with DCM because the SPM analysis showed that the learning effect
+
was driven by the CS (see Results section). Figure 4. Dynamic causal models of learning effects on audio-visual connectivity. For
all 3 models, the primary auditory (A1) and visual (V1) areas are both driven by their
As the exact locations of activation maxima varied over subjects, we
respective sensory inputs. The first model tested had a single connection from A1 to
ensured the comparability of our models across subjects by using
V1 (M1). In model 2 (M2) the V1 / A1 connection was added. In both M1 and M2,
combined anatomical--functional constraints in selecting the subject- the A1 / V1 connection was allowed to change during CSþ trials as a function of
specific time series (c.f. Stephan, Marshall, et al. 2007). Specifically, we the visual outcome (Vþ vs. V) and the RW learning curve (/). This modulatory
thresholded the subject-specific SPMs at P < 0.05 and chose the local effect corresponds to the interaction of the auditory CSþ prediction with the visual
maximum within 8 mm of the group activation maxima in primary outcome and models a learning-dependent contribution to V1 responses from CSþ
auditory cortex (A1) and primary visual cortex (V1) as inferred by responses in A1; and this contribution depends on whether the visual stimulus
a probabilistic cytoarchitectonic atlas in MNI space (Eickhoff et al. was present or not (c.f., a prediction error mediated by top-down signals from A1). In
2005). As a summary time series, we computed the first eigenvector the third model, suggested as a control by one of the reviewers, instead of the A1 / V1
across all suprathreshold voxels within a radius of 4 mm around the connection, the V1 / A1 connection is modulated by the learning signal.

Cerebral Cortex May 2009, V 19 N 5 1179


one of our reviewers, this modulatory effect acted on the reverse same pattern of responses bilaterally; this activation extended
connection, V1/A1. into the insula bilaterally (see Table 1).
Because previous studies have implicated the right DLPFC in
prediction (error) processing (Fletcher et al. 2001; Corlett et al.
Results 2004), we used an anatomically defined fronto-striatal mask to
The postscan debriefing questionnaire showed that none of the test the 3-way interaction CS type 3 CS presence 3 RW
subjects had become aware of the contingencies between the learning, which characterizes responses to the prediction
auditory and visual stimuli. Prior to the fMRI data analysis we entailed by the auditory CS, independent of the visual outcome.
verified subjects’ performance on the target-detection task. On During learning, the right DLPFC became increasingly active
average, subjects responded to 93 ± 3% of the target stimuli. when a visual stimulus was predicted compared to when it was
Following Gläscher and Büchel (2005) we determined an not; activity was higher for CS+A+ and CS–A– trials compared
optimal learning rate for the RW model, evaluating the primary with CS+A– and CS–A+ trials (compare the probabilities in Fig. 2).
contrast of interest (i.e., the 4-way interaction in a random As above, we characterized the nature of the 3-way interaction
effects second-level analysis) under different learning rates in by testing the associated simple interactions, confirming it was
the primary visual cortex (as defined by a probabilistic also driven by CS+ trials (Fig. 4C). The same pattern of
cytoarchitectonic atlas (Eickhoff et al. 2005). Model fits under activation was found in the left putamen, but this activation
5 different learning rates, suggested eCS = 0.075 was the optimal did not survive correction for multiple comparisons.

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


learning rate (see Fig. 3 and Methods section for details).
Learning-Dependent Changes in Connectivity
Statistical Parametric Mapping Because the learning effect was mainly driven under CS+
First, we examined the 4-way interaction CS type 3 CS presence 3 blocks, we focused on changes in connectivity between
visual outcome 3 RW learning. We found learning- auditory and visual cortices during incidental learning of the
dependent responses in the primary visual cortex and putamen predictive attributes of CS+ trials (see Fig. 6). Bayesian model
that survived whole-brain correction for multiple comparisons comparison showed that a DCM with a single connection from
(see Fig. 5A,B). To characterize the nature of this interaction, A1 to V1 (model 1) was superior to alternative models with
we tested the simple interaction (CS presence 3 visual outcome reciprocal connections (group Bayes factor in favor of model 1:
3 RW learning) within each CS type. This showed that the 4-way 2.1 3 1017 and 2.2 3 1018 when compared with model 2
interaction was driven mainly by learning during the CS+ blocks and model 3, respectively). Across subjects, the A1 / V1
(see Supplementary Fig. 1B for the parameter estimates). As connection in the optimum model had an average strength of
shown in Figure 5A,B, testing the simple interaction for CS+ trials 0.10 s–1 (p = 0.003, df = 13, t = 3.57). During CS+ trials, this
afforded almost identical results in the visual cortex and the connection was significantly modulated by learning, depending
putamen as the 4-way interaction (see also Table 1). In contrast, on whether the visual stimulus was present or not (i.e., CS+ 3
no evidence of learning, that is, no significant interaction of CS (V+ vs. V–) 3 / in Fig. 6). Note that the modulatory variable
presence and outcome with learning, was found for CS– trials. in the DCM corresponds to the interaction of the auditory
The nature of the simple 3-way interaction was such that V1 prediction with the visual outcome during CS+ trials. It
and the putamen showed an increased response when an accounts for a learning-dependent contribution from CS+
expected visual stimulus was omitted, or when an unexpected responses in auditory cortex to visual cortex responses that
visual stimulus was presented (i.e., A+V– and A–V+ trials). depends on whether the visual stimulus was present or not
Critically, this response to surprising visual outcomes increased (c.f., a prediction error mediated by top-down signals from
over time as the association was learned, following the form of auditory areas). Quantitatively, the strength of this modulation
the RW learning curve. Conversely, V1 responses to predicted was –0.01 s–1 (p = 0.028, df = 13, t = 2.49). This corresponds to
stimuli diminished during learning. The putamen showed the learning-induced changes in connectivity ranging from 2% (for

Figure 5. fMRI results. (A) Significant activations in V1 as a function of RW learning, for both the 4-way interaction (CS type 3 CS presence 3 visual outcome 3 RW learning;
red), and the simple (3-way) interaction (blue), which is restricted to the CSþ trials (x 5 6, also showing the caudate activation) and (B) in the putamen bilaterally (y 5 6),
displayed on the mean structural image across all subjects. (C) z 5 12. Significant 3-way interaction CS type 3 CS presence 3 RW learning in the DLPFC and left putamen (red).
This interaction is driven by the CSþ trials, as shown by the simple interaction in blue.

1180 Predictive Coding during Associative Learning d


den Ouden et al.
+ +
Table 1 stimulus presence (V , V–), and learning (over time). Both CS
MNI coordinates and Z-values for significantly activated regions and CS– blocks were exactly balanced in terms of sensory
stimulation, so that the a priori probabilities of the auditory CS
MNI coordinates
and of the visual stimulus occurring on a given trial were always
Foci of activation x y z Z value Cluster size 50%. Critically, the volunteers did not make any responses to
Four-way interaction: CS type 3 CS presence 3 visual outcome 3 RW learning the stimuli whose associations were being learned; instead,
L occipital lobe* 6 75 9 4.25 41 they performed a target-detection task on unrelated stimuli.
L insula and putamen* 30 18 6 4.84 84
L putamen** 24 12 6 3.85 20 Our factorial design enabled us 1) to characterize changes in
R insula and putamen* 36 12 3 4.72 82 neurophysiological responses due to learned associations that
R putamen** 27 6 3 4.48 35
L caudate/thalamus* 9 15 15 4.70 40
were incidental to behavior, and 2) to investigate whether
L SII cortex* 51 27 24 4.39 93 activity in specific brain areas, and the connection strengths
L middle temporal gyrus* 57 39 3 3.88 26 amongst them, reflected a match between predictions and
Simple (3-way) interaction: CS presence 3 visual outcome 3 RW learning (restricted to CSþ)
L occipital lobe* 9 78 3 4.31 36 outcome or prediction errors, respectively.
L insula and putamen* 33 12 3 4.55 57 Our results demonstrate that during incidental learning of
L putamen** 27 12 6 3.63 10 audio-visual associations changes in both regional activity and
R insula and putamen* 36 12 3 3.98 57
R putamen** 27 9 0 3.94 32 underlying connectivity reflect prediction errors. Furthermore,
L caudate/thalamus* 21 9 9 4.32 54 we show that learning-dependent responses in visual cortex

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


L caudate** 15 9 21 4.19 14
R caudate** 15 12 18 4.24 7
can be elicited, even in the absence of visual stimuli. This
L SII cortex* 60 33 15 4.15 87 finding can be explained by changes in top-down influences
L middle temporal gyrus* 57 36 6 4.30 34 from auditory regions that are consistent with predictive
R posterior insula* 39 12 12 5.01 38
Three-way interaction: CS type 3 CS presence 3 RW learning coding models of perceptual inference.
R inferior frontal gyrus** 42 27 12 4.39 10
RW Model: Predictions and Prediction Error
*Significant at P \ 0.05 (FWE whole-brain cluster-level corrected).
**Significant at P \ 0.05 (SVC).
The goal of this study was not to pinpoint the exact
mathematical form of learning by comparing different models
of associative learning. Instead, we focused on changes in
CS+A– trials) to 8% (for CS+A+ trials) (Fig. 6). (As shown by eq. 3, regional activity and interregional connectivity that could be
the overall strength of a connection, given a single modulatory explained by a specific learning model, namely the RW model.
parameter, is the sum of the intrinsic connection strength [A] The RW model is a generic and well-established model of
and the modulatory parameter [B] multiplied with its associ- associative learning that has been successful in modeling a wide
ated input [u]. In the present case, the asymptotic magnitude of range of learning processes (Rescorla and Wagner 1972;
the input function is 0.8 for CS+A+ trials and 0.2 for CS+A– trials Schultz and Dickinson 2000; Pearce and Bouton 2001). We
[see Fig. 5].) chose this model because it is the simplest learning model
Critically, the negative sign of the modulatory parameter appropriate for our particular paradigm. In the absence of
reflects the nature of the visual responses to auditory afferents interactions among multiple cues per trial, the RW model is
under CS+ trials: V1 responses to predicted visual stimuli mathematically equivalent to a Hebbian model of associative
diminished during learning and the DCM explained this learning (Montague and Berns 2002). A crucial aspect of our
through a decrease in the strength of the A1 / V1 connection. paradigm, however, is that on each trial the net prediction
This is exactly consistent with an increase in the ‘‘explaining resulting from 2 interacting cue components (the auditory CS
away’’ of predicted visual input under predictive coding; in and the visual TO cue) must be considered (see Methods
j
other words, if top-down predictions /t (see eq. 2) from sections for details). This excludes the use of any associative
auditory cues decrease the amplitude of V1 prediction error learning model that cannot accommodate cue interactions
j
jkt –/t j, a better prediction corresponds to a decrease in (e.g., Hebbian models). In contrast, the RW model accommo-
effective connectivity. Conversely, V1 responses to unpre- dates this aspect gracefully. Another learning model, TD
dicted (i.e., absent) visual stimuli increased during learning. learning, can also deal with multiple cues and their temporal
This was modeled in the DCM through an increase in the A1 / relationships; however, under our design with temporally
V1 connection strength; again this is consistent with an overlapping cue and outcome, the TD model is effectively
j
increase in V1 prediction-error amplitude jkt –/t j, when equivalent to the simpler RW model. Finally, the associative
predictions are violated. In summary, A1 / V1 influences learning models of Pearce and Hall (1980) and Mackintosh
depended on whether the visual outcome was expected or (1975) assume that prediction errors affect the amount of
surprising and were consistent with an ‘‘explaining away’’ role. attention that is allocated to stimuli and that the more attention
The emergence of this effect conformed to the learning curve is allocated to a specific stimulus, the more strongly it becomes
provided by the RW model. associated with an outcome or reinforcer. This is not relevant
to our experimental paradigm in which attention is actively
Discussion directed away from the stimuli whose associations are learned.
McIntosh and colleagues showed that after a predictive The RW model has one problematic limitation, however: as
relationship between an auditory stimulus and a visual stimulus detailed in the supplementary materials, its equation uses both
had been learned, the auditory stimulus alone was able to evoke predictions and prediction errors that are perfectly correlated
responses in the visual cortex (McIntosh et al. 1998). The under mean-correction. In situations where mean-correction is
current study extended this work, pairing a visual stimulus mandatory (e.g., when using them to form interaction terms)
with a predictive auditory stimulus in a 4-factorial design, with this makes it impossible to disambiguate/interpret their
the factors CS type (CS+, CS–), CS presence (A+, A–), visual contributions to a dependent variable. However, the factorial

Cerebral Cortex May 2009, V 19 N 5 1181


Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015
Figure 6. Learning effects on audio-visual connectivity. Bayesian model comparison showed that the DCM with a single connection from A1 to V1 was superior to the other
models. Across subjects, there was a significant ‘‘endogenous’’ or ‘‘fixed’’ strength of the A1 / V1 connection (0.10 s1, P 5 0.003) and a significant learning-induced
modulation (magenta arrows) of this connection (P 5 0.028). The insets show the parameter estimates for the main effects in both A1 and peripheral V1. The magenta arrows
indicate how the main effect in peripheral V1 is modulated by changes in connectivity from A1 to V1 during CSþ trials: over time the response to surprising visual outcomes is
upregulated, whereas the response to unsurprising visual outcomes is downregulated. Note that in this plot the magenta arrows designate the direction in which V1 responses
change due to modulation of connectivity; for quantitative information on this modulatory effect, see the main text.

design in our study allows us to circumvent this problem, as it with a basic principle emerging from many previous studies:
comprises conditions that correspond to congruent and prediction errors, or surprise, constitute a driving force for
incongruent prediction/outcome combinations, respectively. learning because they signal the need for learning in order to
Analyzing the 4-way interaction between our experimental update predictions (Shanks 1995; Schultz et al. 1997; Schultz
factors, we found that responses in the primary visual cortex and Dickinson 2000). Although the role of prediction errors has
and the putamen were sensitive to surprising events; over time, been mainly explored for reinforcement learning so far, there is
these areas became significantly more active when presented growing evidence that prediction errors may be equally
with a surprising cue--outcome combination. Learning was important for learning statistical relationships that are affectively
stronger for the CS+ blocks than for the CS– blocks, which is in neutral and behaviorally irrelevant. In other words, the same
line with previous behavioral evidence (Wasserman et al. 1993; mechanisms that optimize the learning of stimulus--response
Fletcher et al. 2001). Previous fMRI studies in humans have links may operate during the perceptual learning of stimulus--
demonstrated that BOLD activity in the striatum is correlated stimulus associations (Rao and Ballard 1999; Friston 2005).
with (signed) prediction errors during reinforcement learning Evidence that organisms learn predictive associations between
(O’Doherty et al. 2003; McClure et al. 2003; O’Doherty et al. initially neutral stimuli is seen in classical conditioning effects
2004; Seymour et al. 2004; Jensen et al. 2007; Menon et al. such as sensory preconditioning (Brogden 1939). Some forms of
2007) and other associative learning tasks (Corlett et al. 2004). sensory learning also exhibit such features, for example, the
In these studies, the learned associations, and the sign of the mismatch negativity (MMN) paradigm, in which responses to
resulting prediction errors, were of direct relevance for
sensory stimuli decrease with predictability (Friston 2005;
behavior. The current study shows that the putamen is
Baldeweg 2006), regardless of whether stimuli are attended. A
sensitive to unexpected outcomes even when the cue-stimulus
mechanism similar to predictive coding has been proposed in
association is learned incidentally and has no relevance to
the motor domain for cancellation of self-generated events
behavior. However, in contrast to the previous studies, the
(Wolpert et al. 1995; Blakemore et al. 1998; Shergill et al. 2005).
pattern of putamen activity does not appear to be sensitive
Moreover, the learning of predictive relationships that are
to the direction of the prediction error, only to its amplitude.
affectively neutral and task-irrelevant may engage similar
This difference may reflect the fact that learning was
perceptual as opposed to operant. In other words, the computational and neural mechanisms as those for predicting
occurrence of an unpredicted or surprising event may play significant events (Zink et al. 2006; Wittmann et al. 2007).
the role of negative reward, irrespective of whether the The results of the present study support the notion that the
surprising event entailed the presence of absence of a stimulus. role of prediction errors in learning transcends the simple
This issue will be discussed further in the section on predictive reinforcement of stimulus--response links and plays a more
coding below. pervasive and general role in various forms of learning. Indeed
a hallmark of adaptive systems is their ability to minimize
Role of Prediction Errors Beyond Reinforcement surprising exchanges with their environment (Friston et al.
Learning 2006). This entails adjustments to their internal models of the
Our finding that learning-induced responses in primary visual environment so that potentially surprising event can be
cortex and the putamen reflected prediction errors accords predicted. Almost universally, this adjustment involves changes

1182 Predictive Coding during Associative Learning d


den Ouden et al.
in the system’s connections; it is therefore perhaps a little of visual stimuli (e.g., McIntosh et al. 1998; Baier et al. 2006;
surprising that most previous imaging studies on learning and Watkins et al. 2006).
conditioning have exclusively searched for brain areas whose
activity correlated with specific variables of a particular
learning model (e.g., prediction or prediction error), but have Predictive Coding in Visual Cortex
not investigated how these variables change interactions In previous neurophysiological studies of reinforcement
among areas (but see McIntosh et al. 1998; Büchel et al. learning, a negative prediction error, in the form of unexpected
1999). Functional interactions are central to the physiological absence of a reinforcer (e.g., a reward), often led to a decrease
implementation of learning; it has long been suggested that in neuronal or BOLD activity (Schultz 1998; McClure et al.
plasticity in connection strengths between neurons underlies 2003; Tobler et al. 2007). Such directed excursions are thought
the learning of predictive associations (Hebb 1949). Put simply, to reflect the fact that the prediction error is a signed quantity:
2 neural units encoding associated entities increase their it signals not just that predictions need to be updated, but in
synaptic connections to encode the learned associative which direction. In contrast, in our study we found an increase
strength of the stimuli. More precisely, for RW and similar in striatum and visual cortex activity not only for unexpectedly
‘‘caching’’ models (Daw et al. 2005) the connection strength at presented stimuli, but also for the unexpected absence of
time t should carry the predicted association at time t (McLaren a stimulus. Similarly, the strength of the A1 / V1 connection
et al. 1989; Schultz and Dickinson 2000). This hypothesis decreased whenever the visual outcome was expected, and it

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


requires models of effective connectivity, in which connection increased whenever the outcome was surprising.
strengths vary as a function of the associative strength A useful perspective that explains our 2 main findings, the
predicted by the learning model. To our knowledge, the implicit encoding of surprise by V1 responses and its mediation
present study has implemented this approach for the first time, by learning-dependent changes in input from the auditory
modeling how learning, as described by a RW model, modulates cortex, is provided by the framework of predictive coding.
the effective connectivity, as assessed by a DCM, between Predictive coding posits a hierarchy of connected brain areas in
primary auditory and visual areas. which each level strives to attain a compromise between
information about sensory inputs provided by the level below
and predictions (or priors) provided by the level above (Rao
Changes in Connectivity between Auditory and Visual and Ballard 1999; Murray et al. 2002; Friston 2003; Summerfield
Areas et al. 2006). The central learning principle is to establish a good
In accordance with the considerations above, we investigated model of the world, which is achieved by changing connection
whether the learning-related changes in visual cortex strengths such that prediction errors are minimized at all levels
responses could be explained by a simple model of effective of the hierarchy. The hierarchy of a predictive coding
connectivity, in which the strength of A1 / V1 connection architecture is often defined anatomically (in terms of forward
changed as a function of the associative strength predicted by and backward connections) and within one sensory modality,
the RW model. We modeled observed responses in the primary but it is equally possible to examine cross-modal predictive
visual cortex by means of a simple 2-area DCM in which activity coding relationships (c.f. von Kriegstein and Giraud 2006). In
in the visual cortex was modeled by 2 components, 1) a direct the present study, a temporal hierarchical relation between
effect of visual stimulation and 2) a modulation of the A1 / V1 auditory and visual areas is induced by presenting the auditory
connection by the interaction of the time-evolving prediction cue prior to the visual stimulus.
with the visual input (in CS+ blocks; see Fig. 6). Across subjects, Predictive coding may be a general principle of brain
this DCM showed a significant change in the strength of the A1 function in which statistical relationships in the world are
/ V1 connection congruent with the pattern of responses in monitored, even when they are not attended and not relevant
V1: the A1/V1 connection strength increased on trials where for ongoing behavior. This would allow the brain to ignore
the visual outcome did not match the auditory prediction and predictable and therefore uninteresting events in the environ-
decreased on trials where prediction and outcome matched. In ment, thereby enhancing the saliency of unexpected events. A
other words, the learning-induced changes in A1 / V1 good example of this notion is given by the mismatch negativity
connection strength reflected the same pattern of surprise or (MMN), the difference between the event-related potential to
prediction errors as the regional activity in V1. This demon- an unexpected ‘‘deviant’’ and predictable ‘‘standard’’ stimuli
strated that the response of V1 to visual stimuli was modulated (Naatanen et al. 2001). Importantly, the relationship between
by learning-dependent changes in top-down auditory influen- the MMN and learning was not established on the basis of
ces that were consistent with the notion of predictive coding, behavioral data; in fact, it was initially not even recognized
a general framework for perceptual inference and learning that (Naatanen et al. 1978). This relationship was only subsequently
is discussed in the next section (Friston 2005). inferred from striking relationships between the probability of
Although connections in models of effective connectivity do deviants and neurophysiological time series (e.g., Csepe et al.
not need to correspond to monosynaptic anatomical connec- 1987; Pincze et al. 2002). Current theories of MMN, which
tions, it is of interest to note that the surprise-related response interpret it as a paradigmatic example of learning based on
in visual cortex appears to be in the peripheral visual field predictive coding (Friston 2005; Baldeweg 2006), have recently
(Fig. 3A), and anatomical connections from primary auditory received empirical support by DCM studies of electroenceph-
cortex to peripheral visual cortex have been demonstrated in alographic measurements (David et al. 2006; Garrido et al.
recent monkey studies (Falchier et al. 2002; Rockland and 2007). These studies demonstrated that MMN can be un-
Ojima 2003). Additionally, numerous fMRI studies have derstood as a prediction-error signal, which results from
demonstrated that auditory stimulation or auditory attention deviant-induced changes in inter-regional connection strengths.
affect activity in visual cortices during simultaneous processing A similar conclusion is offered by the present study. Here, we

Cerebral Cortex May 2009, V 19 N 5 1183


+
found that, at least during CS trials, BOLD responses in area V1 neuroimaging data. This is a further step toward the long-term
increased when the prediction provided by the auditory cue did goal of constructing invertible models that unite the neuro-
not match the subsequent visual stimulus (analogous to MMN physiological and computational aspects of learning (c.f.
elicited by deviants). This surprise signal progressively increased Stephan 2004).
as the predictive properties of the auditory cue were learnt.
Moreover, in direct analogy to DCM studies of the MMN (David
et al. 2006; Garrido et al. 2007), we found a decrease in the A1 Supplementary Material
/ V1 connection strength on ‘‘standard’’ trials (where the Supplementary material can be found at: http://www.cercor.
oxfordjournals.org/
prediction by the auditory cue was correct), and an increase on
‘‘deviant’’ trials where the visual outcome did not match the
prediction by the auditory cue. In the context of predictive Funding
coding, learning involves a more efficient suppression of sensory
Wellcome Trust (ref: 0856780/Z/99/B); Wellcome Trust PhD
events, which is manifest by an apparent reduction in evoked
studentship (ref: 078047/ZS/04/Z) supported H.D.O.; and
responses, mediated by top-down predictions (which explain
University Research Priority Program ‘‘Foundations of Human
away bottom-up sensory afferents). Within the framework of
Social Interactions’’ at the University of Zurich supported K.E.S.
our bilinear DCM, this is modeled as a decrease in top-down
effective connectivity for visual stimuli that match the current

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


prediction. Notes
We thank Quentin Huys for helpful discussions of the manuscript.
Limitations and Future Directions Conflicts of Interest: None declared.
We conclude this article by discussing a number of limitations Address correspondence to Hanneke den Ouden, Wellcome Trust
of the present study. First, because we wished to study brain Centre for Neuroimaging, Institute of Neurology, UCL, 12 Queen
Square, London, UK WC1N 3BG. Email: h.denouden@fil.ion.ucl.ac.uk.
responses to stimulus associations that were irrelevant to
behavior, we did not obtain behavioral evidence for learning.
References
Instead, as with the MMN paradigm described above, learning is
characterized neurophysiologically as a change in activity over Anderson JL, Hutton C, Ashburner J, Turner R, Friston K. 2001. Modeling
geometric deformations in EPI time series. Neuroimage. 13:
time. We are currently conducting similar experiments with
903--919.
stimuli that do require a behavioral response, providing us with Aron AR, Shohamy D, Clark J, Myers C, Gluck MA, Poldrack RA. 2004.
a behavioral assessment of the learning process. It might be Human midbrain sensitivity to cognitive feedback and uncertainty
useful to emphasize that a neurophysiological characterization during classification learning. J Neurophysiol. 92:1144--1152.
of incidental associative learning processes, only requires that Baier B, Kleinschmidt A, Muller NG. 2006. Cross-modal processing in
the statistical associations between the CS/US stimuli are early visual and auditory cortices depends on expected statistical
irrelevant for task performance. In contrast, it is not essential relationship of multisensory information. J Neurosci. 26:
12260--12265.
that the CS and US stimuli themselves are behaviorally Baldeweg T. 2006. Repetition effects to sounds: evidence for predictive
irrelevant. In fact, in our experiment these stimuli have some coding in the auditory system. Trends Cogn Sci. 10:93--94.
behavioral relevance insofar as they constitute distractors to Blakemore SJ, Wolpert DM, Frith CD. 1998. Central cancellation of self-
which responses must be suppressed. produced tickle sensation. Nat Neurosci. 1:635--640.
A second limitation is that the magnitude of the learning Brogden WJ. 1939. Sensory preconditioning. J Exp Psychol. 25:323--332.
effects (i.e., changes in A1 / V1 connection strength in the Büchel C, Coull JT, Friston KJ. 1999. The predictive value of changes in
effective connectivity for human learning. Science. 283:1538--1541.
range of 2--8%) was rather modest at the single-subject level.
Corlett PR, Aitken MR, Dickinson A, Shanks DR, Honey GD, Honey RA,
This is likely to be due to the incidental nature of the learning Robbins TW, Bullmore ET, Fletcher PC. 2004. Prediction error
in the present study, with attention being directed away from during retrospective revaluation of causal associations in humans:
stimulus associations and none of the subjects noticing the fMRI evidence in favor of an associative model of learning. Neuron.
contingencies. However, the expression of these learning 44:877--888.
effects was highly consistent across subjects. Csepe V, Karmos G, Molnar M. 1987. Evoked potential correlates of
stimulus deviance during wakefulness and sleep in cat—animal
Finally, the dynamic causal model presented here does not
model of mismatch negativity. Electroencephalogr Clin Neuro-
make any assumptions about where in the brain the predicted physiol. 66:571--578.
associative strength is calculated; that is, which brain area David O, Kiebel SJ, Harrison LM, Mattout J, Kilner JM, Friston KJ. 2006.
exerts the modulatory influence onto the A1 / V1 connec- Dynamic causal modeling of evoked responses in EEG and MEG.
tion. Given the responses that we observed in the putamen, it is Neuroimage. 30:1255--1272.
possible that the modulation of the A1 / V1 connection is Daw ND, Niv Y, Dayan P. 2005. Uncertainty-based competition between
mediated via this region. Testing this hypothesis, however, prefrontal and dorsolateral striatal systems for behavioral control.
Nat Neurosci. 8:1704--1711.
requires the inclusion of nonlinear terms in the neuronal state Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR, Amunts K,
equation of DCM which goes beyond its bilinear mathematical Zilles K. 2005. A new SPM toolbox for combining probabilistic
framework. However, very recently, there has been methodo- cytoarchitectonic maps and functional imaging data. Neuroimage.
logical progress in nonlinear extensions of DCM (Stephan, 25:1325--1335.
Harrison, et al. 2007), and once this approach is firmly Falchier A, Clavagnier S, Barone P, Kennedy H. 2002. Anatomical
established and accepted, it should be possible to investigate evidence of multimodal integration in primate striate cortex. J
Neurosci. 22:5749--5759.
the source of the modulatory influences we observed.
Fletcher PC, Anderson JM, Shanks DR, Honey R, Carpenter TA,
Notwithstanding this limitation, the current study has pre- Donovan T, Papadakis N, Bullmore ET. 2001. Responses of human
sented a novel combination of dynamic system models and frontal cortex to surprising events are predicted by formal
formal learning theory, which were used to model human associative learning theory. Nat Neurosci. 4:1043--1048.

1184 Predictive Coding during Associative Learning d


den Ouden et al.
Friston K. 2003. Learning and inference in the brain. Neural Netw. 16: Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. 2006.
1325--1352. Dopamine-dependent prediction errors underpin reward-seeking
Friston K. 2005. A theory of cortical responses. Philos Trans R Soc Lond behaviour in humans. Nature. 442:1042--1045.
B Biol Sci. 360:815--836. Pincze Z, Lakatos P, Rajkai C, Ulbert I, Karmos G. 2002. Effect of deviant
Friston K, Kilner J, Harrison L. 2006. A free energy principle for the probability and interstimulus/interdeviant interval on the auditory
brain. J Physiol Paris. 100:70--87. N1 and mismatch negativity in the cat auditory cortex. Brain Res
Friston KJ, Glaser DE, Henson RN, Kiebel S, Phillips C, Ashburner J. Cogn Brain Res. 13:249--253.
2002. Classical and Bayesian inference in neuroimaging: applica- Rao RP, Ballard DH. 1999. Predictive coding in the visual cortex:
tions. Neuroimage. 16:484--512. a functional interpretation of some extra-classical receptive-field
Friston KJ, Harrison L, Penny W. 2003. Dynamic causal modelling. effects. Nat Neurosci. 2:79--87.
Neuroimage. 19:1273--1302. Rescorla RA, Wagner AR. 1972. A theory of Pavlovian conditioning:
Garrido MI, Kilner JM, Kiebel SJ, Stephan KE, Friston KJ. 2007. Dynamic variations in the effectiveness of reinforcement and nonreinforce-
causal modelling of evoked potentials: a reproducibility study. ment. In: Black AH, Prokasy WF, editors. Classical conditioning II:
Neuroimage. 36:571--580. current research and theory. New York: Appleton Century Crofts.
Gewirtz JC, Davis M. 2000. Using Pavlovian higher-order conditioning p. 64--99.
paradigms to investigate the neural substrates of emotional learning Rockland KS, Ojima H. 2003. Multisensory convergence in calcarine
and memory. Learn Mem. 7:257--266. visual areas in macaque monkey. Int J Psychophysiol. 50:19--26.
Gläscher J, Büchel C. 2005. Formal learning theory dissociates brain Schultz W. 1998. Predictive reward signal of dopamine neurons. J
regions with different temporal integration. Neuron. 47:295--306. Neurophysiol. 80:1--27.
Hebb DO. 1949. The organisation of behaviour. New York: John Wiley. Schultz W, Dayan P, Montague PR. 1997. A neural substrate of

Downloaded from http://cercor.oxfordjournals.org/ at Kainan University on February 15, 2015


Jensen J, Smith AJ, Willeit M, Crawley AP, Mikulis DJ, Vitcu I, Kapur S. prediction and reward. Science. 275:1593--1599.
2007. Separate brain regions code for salience vs. valence during Schultz W, Dickinson A. 2000. Neuronal coding of prediction errors.
reward prediction in humans. Hum Brain Mapp. 28:294--302. Annu Rev Neurosci. 23:473--500.
Mackintosh NJ. 1975. A theory of attention: variations in the associability Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ,
of stimulus with reinforcement. Psychol Rev. 82:276--298. Friston KJ, Frackowiak RS. 2004. Temporal difference models
Maldjian JA, Laurienti PJ, Kraft RA, Burdette JH. 2003. An automated describe higher-order learning in humans. Nature. 429:664--667.
method for neuroanatomic and cytoarchitectonic atlas-based in- Shanks DR. 1995. The psychology of associative learning. Cambridge,
terrogation of fMRI data sets. Neuroimage. 19:1233--1239. UK: Cambridge University Press.
McClure SM, Berns GS, Montague PR. 2003. Temporal prediction errors Shergill SS, Samson G, Bays PM, Frith CD, Wolpert DM. 2005. Evidence
in a passive learning task activate human striatum. Neuron. 38: for sensory prediction deficits in schizophrenia. Am J Psychiatry.
339--346. 162:2384--2386.
McIntosh AR, Cabeza RE, Lobaugh NJ. 1998. Analysis of neural Stephan KE. 2004. On the role of general system theory for functional
interactions explains the activation of occipital cortex by an neuroimaging. J Anat. 205:443--470.
auditory stimulus. J Neurophysiol. 80:2790--2796. Stephan KE, Harrison LM, Kiebel SJ, David O, Penny WD, Friston KJ.
McLaren IP, Kaye H, Mackintosh NJ. 1989. An associative theory of the 2007. Dynamic causal models of neural system dynamics: current
representation of stimuli: applications to perceptual learning and state and future extensions. J Biosci. 32:129--144.
latent inhibition. In: Morris RGM, editor. Parallel distributed Stephan KE, Marshall JC, Penny WD, Friston KJ, Fink GR. 2007.
processing: implications for psychology and neurobiology. Oxford: Interhemispheric integration of visual processing during task-driven
Clarendon Press. p. 102--120. lateralization. J Neurosci. 27:3512--3522.
Menon M, Jensen J, Vitcu I, Graff-Guerrero A, Crawley A, Smith MA, Summerfield C, Egner T, Greene M, Koechlin E, Mangels J, Hirsch J.
Kapur S. 2007. Temporal difference modeling of the blood-oxygen 2006. Predictive codes for forthcoming perception in the frontal
level dependent response during aversive conditioning in humans: cortex. Science. 314:1311--1314.
effects of dopaminergic modulation. Biol Psychiatry. 62:765--772 Sutton RS, Barto AG. 1998. Reinforcement learning: an introduction.
Montague PR, Berns GS. 2002. Neural economics and the biological Cambridge (MA): MIT Press.
substrates of valuation. Neuron. 36:265--284. Tobler PN, O’Doherty JP, Dolan RJ, Schultz W. 2007. Reward value
Murray SO, Kersten D, Olshausen BA, Schrater P, Woods DL. 2002. coding distinct from risk attitude-related uncertainty coding in
Shape perception reduces activity in human primary visual cortex. human reward systems. J Neurophysiol. 97:1621--1632.
Proc Natl Acad Sci USA. 99:15164--15169. Turner DC, Aitken MR, Shanks DR, Sahakian BJ, Robbins TW,
Naatanen R, Gaillard AW, Mantysalo S. 1978. Early selective-attention Schwarzbauer C, Fletcher PC. 2004. The role of the lateral frontal
effect on evoked potential reinterpreted. Acta Psychol (Amst). 42: cortex in causal associative learning: exploring preventative and
313--329. super-learning. Cereb Cortex. 14:872--880.
Naatanen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler I. 2001. von Kriegstein K, Giraud AL. 2006. Implicit multisensory associations
Primitive intelligence’’ in the auditory cortex. Trends Neurosci. 24: influence voice recognition. PLoS Biol. 4:e326.
283--288. Wasserman EA, Elek SM, Chatlosh DL, Baker AG. 1993. Rating causal
O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. 2004. relations: Role of probability in judgments of response^outcome
Dissociable roles of ventral and dorsal striatum in instrumental contingency. J Exp Psychol Learn Mem Cogn. 19:174--188.
conditioning. Science. 304:452--454. Watkins S, Shams L, Tanaka S, Haynes JD, Rees G. 2006. Sound alters
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. 2003. Temporal activity in human V1 in association with illusory visual perception.
difference models and reward-related learning in the human brain. Neuroimage. 31:1247--1256.
Neuron. 38:329--337. Wittmann BC, Bunzeck N, Dolan RJ, Duzel E. 2007. Anticipation of
Pearce JM, Bouton ME. 2001. Theories of associative learning in animals. novelty recruits reward system and hippocampus while promoting
Annu Rev Psychol. 52:111--139. recollection. Neuroimage. 38:194--202.
Pearce JM, Hall G. 1980. A model for Pavlovian learning: variations in the Wolpert DM, Ghahramani Z, Jordan MI. 1995. An internal model for
effectiveness of conditioned but not of unconditioned stimuli. sensorimotor integration. Science. 269:1880--1882.
Psychol Rev. 87:532--552. Zink CF, Pagnoni G, Chappelow J, Martin-Skurski M, Berns GS. 2006.
Penny WD, Stephan KE, Mechelli A, Friston KJ. 2004. Comparing Human striatal activation reflects degree of stimulus saliency.
dynamic causal models. Neuroimage. 22:1157--1172. Neuroimage. 29:977--983.

Cerebral Cortex May 2009, V 19 N 5 1185

You might also like