Vanilla PP For Philosophers: A Primer On Predictive Processing
Vanilla PP For Philosophers: A Primer On Predictive Processing
Vanilla PP For Philosophers: A Primer On Predictive Processing
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 1 | 18
www.predictive-mind.net
Wenn die Anschauung sich nach der Beschaffenheit der Gegenstände richten müßte, so sehe ich
nicht ein, wie man a priori von ihr etwas wissen könne; richtet sich aber der Gegenstand (als Ob-
jekt der Sinne) nach der Beschaffenheit unseres Anschauungsvermögens, so kann ich mir diese
Möglichkeit ganz wohl vorstellen. (Kant 1998[1781/87], B XVII)2
One thing Kant emphasizes at this point in the Critique of Pure Reason is that our intuitions (An-
schauungen), which constitute the sensory material on which acts of synthesis are performed, are not
sense-data that are simply given (cf. Brook 2013, § 3.2). They are not just received, but are also partly
shaped by the faculty of intuition (Anschauungsvermögen). In contemporary parlance, the idea can be
expressed as follows:
Classical theories of sensory processing view the brain as a passive, stimulus-driven device. By
contrast, more recent approaches emphasize the constructive nature of perception, viewing it as an
active and highly selective process. Indeed, there is ample evidence that the processing of stimuli
is controlled by top-down influences that strongly shape the intrinsic dynamics of thalamocortical
networks and constantly create predictions about forthcoming sensory events. (Engel et al. 2001,
p. 704)
1 At this point, one might have expected a reference to Helmholtz’ famous idea that perception is the result of unconscious inferences — we will refer to
this passage below. Helmholtz’ view on perception was heavily influenced by Kant (although Helmholtz seems to have emphasized the role of learn-
ing and experience more than Kant, see Lenoir 2006, pp. 201 & 203): “Dass die Art unserer Wahrnehmungen ebenso sehr durch die Natur unserer
Sinne, wie durch die äusseren Dinge bedingt sei, wird durch die angeführten Thatsachen sehr augenscheinlich an das Licht gestellt, und ist für die
Theorie unseres Erkenntnissvermögens von der höchsten Wichtigkeit. Gerade dasselbe, was in neuerer Zeit die Physiologie der Sinne auf dem Wege
der Erfahrung nachgewiesen hat, suchte Kant schon früher für die Vorstellungen des menschlichen Geistes überhaupt zu thun, indem er den Antheil
darlegte, welchen die besonderen eingeborenen Gesetze des Geistes, gleichsam die Organisation des Geistes, an unseren Vorstellungen haben.” (Von
Helmholtz 1855, p. 19). (Our translation: “These facts clearly show that the nature of our perceptions is as much constrained by the nature of our sens-
es as by external objects. This is of utmost importance for a theory of our epistemic faculty. The physiology of the senses has recently demonstrated, by
way of experience, exactly the same point that Kant earlier tried to show for the ideas of the human mind in general, by expounding the contribution
made by the special innate laws of the mind — the organization of the mind, as it were — to our ideas.”) An overview of PP’s Kantian roots can be
found in Swanson 2016.
2 “If intuition has to conform to the constitution of the objects, then I do not see how we can know anything of them a priori; but if the object (as an
object of the senses) conforms to the constitution of our faculty of intuition, then I can very well represent this possibility to myself.” (Kant 1998,
B xvii)
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 2 | 18
www.predictive-mind.net
This is what we here call the first feature of predictive processing: Top-Down Processing. As can be
seen, the idea that perception is partly driven by top-down processes is not new (which is not to deny
that dominant theories of perception have for a long time marginalized their role). The novel con-
tribution of PP is that it puts an extreme emphasis on this idea, depicting the influence of top-down
processing and prior knowledge as a pervasive feature of perception, which is not only present in cases
in which the sensory input is noisy or ambiguous, but all the time.3 According to PP, one’s brain con-
stantly forms statistical estimates, which function as representations4 of what is currently out there in
the world (feature #2, Statistical Estimation), and these estimates are hierarchically organized (track-
ing features at different spatial and temporal scales; feature #3, Hierarchical Processing).5 The brain
uses these representations to predict current (and future) sensory input and the source of it, which
is possible because estimates at different levels of the hierarchy are predictive of each other (feature
#4, Prediction). Mismatches between predictions and actual sensory input are not used passively to
form percepts, but only to inform updates of representations which have already been created (thereby
anticipating, to the extent possible, incoming sensory signals). The goal of these updates is to mini-
mize the prediction error resulting from the prediction (feature #5, Prediction Error Minimization
(PEM)), in such a way that updates conform to the norms of Bayesian Inference (feature #6; more on
this below). The computational principle of PEM is a general principle to which all processing in the
brain conforms (at all levels of the hierarchy posited by PP). From this, it is only a small step towards
describing processing in the brain as a controlled online hallucination:6
[A] fruitful way of looking at the human brain, therefore, is as a system which, even in ordinary
waking states, constantly hallucinates at the world, as a system that constantly lets its internal
autonomous simulational dynamics collide with the ongoing flow of sensory input, vigorously
dreaming at the world and thereby generating the content of phenomenal experience. (Metzinger
2004[2003], p. 52)
Note that the contents of phenomenal experience are only part of what is, according to PP, generated
through the hierarchically organized process of prediction error minimization (most contents will be
unconscious). Summing up the first six core features described above, and adding the seventh feature,
we can now give a concise definition of what is called predictive processing in this collection (we will
enrich the definition with features 8-12 below):
3 Of course, it is an interesting question to what extent Kant himself saw active (top-down) influences on intuitions (Anschauungen) as a perva-
sive feature. At least some passages in the Critique of Pure Reason suggest that Kant laid more emphasis on the (top-down) influences exerted by
our faculty of cognizing (the spontaneity of concepts):
“Unsere Erkenntnis entspringt aus zwei Grundquellen des Gemüts, deren die erste ist, die Vorstellungen zu empfangen (die Rezeptivität der Eindrücke),
die zweite das Vermögen, durch diese Vorstellungen einen Gegenstand zu erkennen (Spontaneität der Begriffe); durch die erstere wird uns ein Gegen-
stand gegeben, durch die zweite wird dieser im Verhältnis auf jene Vorstellung (als bloße Bestimmung des Gemüts) gedacht.” (Kant 1998[1781/87], B 74).
“Our cognition arises from two fundamental sources in the mind, the first of which is the reception of representations (the receptivity of impressions),
the second the faculty for cognizing an object by means of these representations (spontaneity of concepts); through the former an object is given to us,
through the latter it is thought in relation to that representation (as a mere determination of the mind).” (Kant 1998, B 74). However, a serious inves-
tigation of this question would have to focus on the influence of unconscious representations on the forming of intuitions (see Giordanetti et al. 2012).
4 The use of the word “representation” is not completely uncontroversial here. There is some debate about whether PP posits representations, and if so
how best to describe them (see Gładziejewski 2016; Downey 2017; Dołega 2017). It is at least possible, however, to treat representationalist descrip-
tions of the posits entailed by PP as a representational (or intentional) gloss (cf. Egan 2014; Anderson 2017). So, although we acknowledge that some
would disagree, we believe it is useful to describe the estimates posited by PP as representations, at least for the purposes of this primer (even if some
authors would argue that these posits are not representations in a strong sense).
5 This hierarchy of estimates entails a hierarchical generative model. A generative model is the joint distribution of a collection of random variables (see
glossary). A hierarchical generative model corresponds to a hierarchy of random variables, where variables at non-adjacent levels are conditionally
independent (this can, for instance, represent a hierarchy of causally related objects or events, see Drayson 2017). The hierarchy of estimates posited
by PP tracks the values of a hierarchy of random variables. A heuristic illustration of a generative model can be found in the introduction to (Clark
2016). We are grateful to Chris Burr for reminding us to mention generative models.
6 Horn (Horn 1980, p. 373) ascribes the idea that “vision is a controlled hallucination” to Clowes (Clowes1971). The only published statement by
Clowes which comes near this formulation seems however to be: “People see what they expect to see” (Clowes 1969, p. 379; cf. Sloman 1984). More
recently, a similar idea has been put forward by Grush (Grush 2004, p. 395; he ascribes it to Ramesh Jain): “The role played by sensation is to constrain
the configuration and evolution of this representation. In motto form, perception is a controlled hallucination process.”
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 3 | 18
www.predictive-mind.net
You are imprisoned in the control room of a giant robot. […] The robot inhabits a dangerous
world, with many risks and opportunities. Its future lies in your hands, and so, of course, your own
future as well depends on how successful you are in piloting your robot through the world. If it is
destroyed, the electricity in this room will go out, there will be no more food in the fridge, and you
will die. Good luck! (Dennett 2013, p. 102)
7 The first three parts of this definition correspond roughly with the definition offered by Clark (Clark 2013a, p. 202; Clark 2015, p. 5). In (Clark 2013a),
Clark also introduces the notion of action-oriented PP (which incorporates the fourth aspect of the definition offered here). These four features are
central too to Hohwy’s exposition of prediction error minimization (see the first four chapters in Hohwy 2013).
8 More on this below. Note that it is possible to develop PP accounts without invoking FEP (so in a way PP is independent of FEP), but PP can be
incorporated into FEP (see Friston and Kiebel 2009), so prediction error minimization can be construed as a way of minimizing free energy (which
would then be a special case of FEP).
9 “The psychic activities that lead us to infer that there in front of us at a certain place there is a certain object of a certain character, are generally not
conscious activities, but unconscious ones. In their result they are equivalent to a conclusion, to the extent that the observed action on our senses
enables us to form an idea as to the possible cause of this action; although, as a matter of fact, it is invariably simply the nervous stimulations that are
perceived directly, that is, the actions, but never the external objects themselves.” (Von Helmholtz 1985[1925], p. 4).
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 4 | 18
www.predictive-mind.net
The person inside the robot has only indirect access to the world, via the robot’s sensors, and the ef-
fects of executed actions cannot be known but have to be inferred. This illustrates the feature we call
Environmental Seclusion (feature #8). Environmental Seclusion is not a computational feature but
an epistemological one, yet it appears in descriptions of the problems to which PP computations pro-
vide a solution.10 To find out what the different signals received by the robot mean, the person inside
has to form a hypothesis about their hidden causes. The problem of inferring the causes of sensory
signals is an inverse problem, because it requires inverting the mapping from (external, hidden) causes
to (sensory) effects. This is a difficult problem (to say the least), because the same effect could have
multiple causes.11 So even if the relationship between causes C and effects E could be described by a
deterministic mapping, ƒ: C E, the inverse mapping, ƒ-1: E C, would not usually exist. How does
the brain solve this problem?
A first observation is that the cause of a sensory effect is underdetermined by the effect, so prior
information has to be used to make a good guess about the hidden cause. Furthermore, if we know
how the sensory apparatus is affected by external causes, it is easier to infer sensory effects, given in-
formation about external causes, than the other way around. So if we have some information about
hidden causes, we can form a prediction of their sensory effects. This prediction can be compared to
the actual sensory signal, and the extent to which the two differ from each other, i.e., the size of the
prediction error gives us a hint as to the quality of our estimate of the hidden cause. We can update
this estimate, compute a new prediction, again compare it with current sensory signals, and thereby
(hopefully) minimize the prediction error. Ideally, it does not hurt if our first estimate of the hidden
cause is really poor, as by constantly computing predictions and prediction errors, and by updating
our estimate accordingly, we can become more and more confident that we have found a good repre-
sentation of the hidden cause.
Let us illustrate this strategy with the following simple example. A teacher enters the classroom
and finds a piece of paper on his desk, with the message “The teacher is an impostor. He doesn’t even
really exist.” The message has been written with a fountain pen, in blue, which (as the teacher knows)
excludes many of his students. To find the culprit, the teacher asks all students using fountain pens
with blue ink to come to the front and, using their own pen, to write something on a piece of paper.
As it turns out, this involves only three students, A, B, & C, and all use ink of different brands (which
makes them distinguishable). The teacher can now form an educated guess about the hidden cause of
the message (“The teacher is an impostor. He doesn’t even really exist.”): he assumes that student A is
the culprit, and asks A to write down the same message. This can be seen as a prediction of the actual
message, and by comparing them the teacher evaluates his guess about its hidden cause. If the ink
looks the same there is no prediction error, and the estimate of the hidden cause does not have to be
changed — the culprit has been found. If there is a difference, he can update his estimate, by assuming
that, say, B has produced the message. By constantly forming predictions (messages written by the
suspects) and comparing them with the actual sensory signal (the message on the desk), the teacher
eventually minimizes the prediction error and finds the true culprit.
There are a lot of differences between this fictive scenario and the situation in which the brain finds
itself. One is that the example involves personal-level agency (just like Dennett’s giant robot thought
10 Here are some examples: “For example, during visual perception the brain has access to information, measured by the eyes, about the spatial dis-
tribution of the intensity and wavelength of the incident light. From this information the brain needs to infer the arrangement of objects (the
causes) that gave rise to the perceived image (the outcome of the image formation process).“ (Spratling 2016, p. 1 preprint).
“The first of these (the widespread, top-down use of probabilistic generative models for perception and action) constitutes a very substan-
tial, but admittedly quite abstract, proposal: namely, that perception and […] action both depend upon a form of ‘analysis by synthesis’
in which observed sensory data is explained by finding the set of hidden causes that are the best candidates for having generated that sensory
data in the first place.“ (Clark 2013a, p. 234; but see Clark in press, for a qualified view).
“Similarly, the starting point for the prediction error account of unity is one of indirectness: from inside the skull the brain has to infer the hidden
causes of its sensory input“ (Hohwy 2013, p. 220).
11 For this reason, the problem can also be described as an ill-posed problem (see Spratling 2016), but some authors would regard the problem of finding
out how to solve this problem as ill-posed (see Anderson 2017).
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 5 | 18
www.predictive-mind.net
experiment): the teacher tests the hypothesis that, say, student A is the culprit by asking A to write
down a message. Furthermore, the number of possible hidden causes is finite, and the “prediction
error” tells the teacher only that a particular student is not implicated. It does not contain any further
information about the culprit; it only excludes one of the suspects. The brain cannot go through all
possible hypotheses one by one, because there are (potentially) inifinite possible hidden causes in the
world. Furthermore, the world is changing, so representations of hidden causes have to be dynamic:
adapting to, and anticipating, all (relevant and predictable) changes in the environment. Finally, to be
more realistic, the teacher example would have be extended such that the teacher forms predictions
about all his sensory inputs all the time. Just as he could infer the causal interactions leading up to the
note, he can infer all the causal goings-on around him all the time (including his own influence on the
sensory stream).
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 6 | 18
www.predictive-mind.net
But why do we need PEM if we have Bayesian inference? The answer is that Bayesian inference can
be computationally complex, even intractable. In simple cases, it is possible to compute the posterior
analytically; in other cases, it has to be approximated. In yet other cases, it may be possible to compute
the posterior, but what one would really like to have is the maximizer of the posterior (for instance, a
single hypothesis that is most likely, after having taken the new evidence into account). Finding the
maximizer may again be computationally demanding and can require approximative methods. Some
approximative methods involve prediction error minimization. While the motivation for Bayesian
inference is independent of prediction error minimization, once Bayesian inference is regarded as a
solution to the problem of perception, prediction error minimization can provide a solution to the
problem of computing Bayesian updates.
Note that Bayesian inference also works for hierarchical models. Assuming that variables at non-ad-
jacent levels in the hierarchy are conditionally independent, estimates can be updated in parallel at
the different levels (cf. Friston 2003, p. 1342), which ideally yields a globally consistent set of estimates
(in practice, things are complicated, as Lee and Mumford 2003, p. 1437, point out). Here, the idea is
that most objects in the world do not directly influence each other causally, but they are still objects
in the same world, which means that causal interactions between two arbitrary objects are usually me-
diated by other objects. Similarly, different features of a single object are not completely independent,
because they are features of the same object, but this does not mean representations of these features
must always be jointly processed. For instance, a blue disc can be represented by representing a certain
color (blue) at a certain place, and a certain shape (a disc) at the same place. Information about the
location of the color gives me information about the location of the shape. If I have a separate repre-
sentation of the disc’s location, however, I can treat the color and the shape as (conditionally) inde-
pendent, i.e., given the disc’s location, information about the color does not give me new information
about the shape. Computationally, this allows for sparser representations, which may also be reflected
by functional segregation in the brain (cf. Friston and Buzsáki 2016, who explore this with a focus on
the temporal domain).
PP may apply more naturally to interoception (the sense of the internal physiological condition of
the body) than to exteroception (the classic senses, which carry signals that originate in the exter-
nal environment). This is because for an organism it is more important to avoid encountering un-
expected interoceptive states than to avoid encountering unexpected exteroceptive states. A level
of blood oxygenation or blood sugar that is unexpected is likely to be bad news for an organism,
whereas unexpected exteroceptive sensations (like novel visual inputs) are less likely to be harmful
and may in some cases be desirable […]. (Seth 2015, p. 9)
Clearly, the goal of interoceptive inference is not simply to infer the internal condition of the body, but
to enable predictive control of vital parameters like blood oxygenation or blood sugar (feature #7). Seth
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 7 | 18
www.predictive-mind.net
provides the following example. When the brain detects a decline in blood sugar through interocep-
tive inference, the resulting percept (a craving for sugary things) will lead to prediction errors
Agents can suppress free energy by changing the two things it depends on: they can change sensory
input by acting on the world or they can change their recognition density by changing their internal
states. This distinction maps nicely onto action and perception […]. (Friston 2010, p. 129)
In short, the error between sensory signals and predictions of sensory signals (derived from internal
estimates) can be minimized by changing internal estimates and by changing sensory signals (through
action). What this suggests is that the same internal representations which become active in percep-
tion can also be deployed to enable action. This means that there is not only a common data-format,
but also that at least some of the representations that underpin perception are numerically identical
with representations that underpin action.
This assumption is already present in James’ ideomotor theory (James 1890),13 the core of which is
summed up as follows by James: “[T]he idea of the movement M’s sensory effects will have become
an immediately antecedent condition to the production of the movement itself.” (James 1890, p. 586;
italics omitted). More recently, this has been picked up by common coding accounts of action repre-
sentation (see Hommel et al. 2001; Hommel 2015; Prinz 1990).14 The basic idea is always similar: The
neural representations of hidden causes in the world overlap with the neural underpinnings of action
preparation (which means parts of them are numerically identical). In other words, there are “ideo-
motor” representations, which can function both as percepts and as motor commands.15
Computationally, the Ideomotor Principle (feature #9) exploits a formal duality between action
and perception. The duality is this: If I can perceptually access a state of affairs p, this means p has
some perceivable consequences (or constituents) c; action is goal-oriented, so by performing an action
I want to bring about some state of affairs p. This means action can also be described as a process in
which the perceivable consequences c of p are brought about, and perception can be described as the
process by which the causes of a hypothetical action (which brings about p, and thereby c) are inferred
13 Another precursor of the idea can be found in the works of Herbart (Herbart 1825, pp. 464 f.) and Lotze (Lotze 1852, pp. 313 f.).
14 A review of ideomotor approaches can be found in (Badets et al. 2014). A historical overview can be found in (Stock and Stock 2004).
15 Strictly speaking, ideomotor representations are sometimes just regarded as late (high-level) contributions to perception, and as the (early) precursors
of action (in the following quotation, “TEC” denotes the theory of event coding (TEC)): “TEC does not consider the complex machinery of the ‘early’
sensory processes that lead to them. Conversely, as regards action, the focus is on ‘early’ cognitive antecedents of action that stand for, or represent,
certain features of events that are to be generated in the environment (= actions). TEC does not consider the complex machinery of the ‘late’ motor
processes that subserve their realization (i.e., the control and coordination of movements). Thus, TEC is meant to provide a framework for under-
standing linkages between (late) perception and (early) action, or action planning.” (Hommel et al. 2001, p. 849)
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 8 | 18
www.predictive-mind.net
(for a rigorous description of this idea, see Todorov 2009). The computational benefits of this dual
perspective are reaped in the notion of active inference (developed by Friston and colleagues):
In this picture of the brain, neurons represent both cause and consequence: They encode con-
ditional expectations about hidden states in the world causing sensory data, while at the same
time causing those states vicariously through action. [...] In short, active inference induces a cir-
cular causality that destroys conventional distinctions between sensory (consequence) and motor
(cause) representations. This means that optimizing representations corresponds to perception or
intention, i.e. forming percepts or intents. (Friston et al. 2011, p. 138) 16
Active inference is often distinguished from perceptual inference. Since both are realized by minimiz-
ing prediction error, however, and since their implementations may not be neatly separable, active
inference is also used as a more generic term, especially by Friston and colleagues. In the context of
the free-energy principle (see below), it denotes the computational processes which minimize free
energy and underpin both action and perception: “Active inference — the minimisation of free ener-
gy through changing internal states (perception) and sensory states by acting on the world (action).”
(Friston et al. 2012a, p. 539).17
Common to both action and perception is (unconscious, approximatively Bayesian) inference.
Since neural structures underpinning action and perception, respectively, are assumed to overlap, ac-
tive and perceptual inference work in tandem.18 This updated and enriched version of the Ideomotor
Principle thereby provides a unifying perspective on action and perception, while its deeper implica-
tions and challenges are only beginning to be explored.19
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 9 | 18
www.predictive-mind.net
Following Friston et al. (Friston et al. 2011, p. 138), N could function both as a percept and as an in-
tent, though it usually only functions as one of them. So, unless I suffer from echopraxia, perceiving a
movement will not usually cause me to move in the same way (although there are situations in which
persons do mimic each other to some extent, see Quadt 2017). This can be accounted for within the
framework of PP by noting the following: the hypothesis that I am scratching my chin will yield pro-
prioceptive and other sensory predictions (which describe, for example, the states of my muscles when
my arm is moving). Unless I am in fact scratching my chin, these predictions will be in conflict with
sensory signals, so there will be a large prediction error, which will lead to an update of the hypothesis
that I am scratching my chin. In other words, the hypothesis cannot be sustained in the presence of
such prediction errors. So to enable movement, precision estimates associated with sensory predic-
tion errors must be cancelled out by top-down modulation. Combining this with the hypothesis that
attention increases precision estimates, one could describe this as a process of attending away from
somatosensory signals. Conversely, attending to sensory stimuli should impair normal movement (see
Limanowski 2017).
This connection between action and attention is also exploited in accounts of self-tickling (see Van
Doorn et al. 2014; Van Doorn et al. 2015). Deviances in precision estimates have been linked to atten-
tion and motor disorders, and raised in the context of autism and schizophrenia (see Gonzalez-Gadea
et al. 2015; Palmer et al. 2015; Van de Cruys et al. 2014; Friston et al. 2014; Adams et al. 2016). This
is only one example of how the PP approach may possess great heuristic fecundity and explanatory
power for cognitive neuropsychiatry and related fields.
[W]ir beobachten unter fortdauernder eigener Thätigkeit, und gelangen dadurch zur Kenntniss
des Bestehens eines gesetzlichen Verhältnisses zwischen unseren Innervationen und dem Präsent-
werden der verschiedenen Eindrücke aus dem Kreise der zeitweiligen Präsentabilien. Jede unserer
willkührlichen Bewegungen, durch die wir die Erscheinungsweise der Objecte abändern, ist als
ein Experiment zu betrachten, durch welches wir prüfen, ob wir das gesetzliche Verhalten der
vorliegenden Erscheinung, d.h. ihr vorausgesetztes Bestehen in bestimmter Raumordnung, richtig
aufgefasst haben. (Von Helmholtz 1959[1879/1887], p. 39)20
A classical application in the present debate is saccadic eye movements, which are now conceptualized
as an embodied form of hypothesis-testing (Friston et al. 2012b). Apart from these heuristic consid-
erations, if PP is on the right track we can ask if the brain literally engages in inference. This question
is answered in the affirmative by Alex Kiefer in his contribution to this collection (see Kiefer 2017). A
20 “We observe amid our own continuous activity, and thereby achieve knowledge of the existence of a lawful relation between our innervations and the
presence of different impressions of temporary presentations [Präsentabilien]. All of our voluntary movements through which we change the appear-
ance of things should be regarded as an experiment, through which we test whether we have grasped correctly the lawful behavior of the appearance
at hand, i.e. its supposed existence in determinate spatial structures.” (Own translation)
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 10 | 18
www.predictive-mind.net
skeptical position is maintained by Jelle Bruineberg (see Bruineberg 2017 and Bruineberg et al. 2016).
The more general issue of how folk psychology and PP are related, and to what extent the scientific
usage of folk-psychological concepts may need to be revised, is discussed by Joe Dewhurst (see De-
whurst 2017).
2. if we observe an organism which is capable of surviving for a decent amount of time, at a ran-
dom point during its lifetime, it is very likely to be found in a situation from the second list (this
echoes the tautology at the beginning of this section).
We can re-express these two observations in a slightly more technical way. Let us call the set of all
possible states in which an organism could be its state space, where a state is defined by the current
sensory signals received by the organism’s sensory system. In principle, we can now define a probabil-
ity distribution over this state space which assigns probabilities to the different regions in this space
and describes how likely it is to find the organism in the respective regions during its lifetime. Certain
regions will have a high probability (e.g., a fish is likely to be found in water); others will have a low
probability (a fish is unlikely to be found outside of water). Furthermore, most regions of state space
will have a low probability (because there are so many deadly situations). Formally, this means that the
entropy of the probability distribution is low (it would be maximal if it assigned probabilities uniform-
ly to the different regions of state space; see below for a simple formal example). With this probability
distribution in hand, we can now make a bet on where in its state space the organism will be found,
if observed at an arbitrary moment during its lifetime. Since the distribution assigns extremely low
probabilities to most regions of state space, we can make a fairly precise guess (e.g., we can guess that
a fish will be in water, that a freshwater fish will be in fresh water, and so on).
Now consider the following. Throughout the lifetime of the fish, we take repeated samples of its
states and construct an empirical distribution using these samples. An empirical distribution assigns
probabilities which reflect the frequency with which samples were (randomly) drawn from the dif-
ferent regions. As a simple example, think of a device which produces one of two numbers, 0 and 1,
whenever a button is pushed, and the two numbers are produced with certain probabilities unknown
to the agent. It could be that both numbers are produced with the same probability (0.5), or that one
is produced much more frequently than the other (say, 0 is produced with probability 0.9, and 1 is
produced with probability 0.1). Every time one presses the button, one notes which number has been
produced (this is a single sample), and by counting how often each number is produced one can con-
struct an empirical distribution using the relative frequencies. For instance, if 14 out of 100 samples
are 0, and the other 86 samples are 1, the empirical distribution could assign the probability 0.14 to 0
and 0.86 to 1. The entropy of this distribution would be approximately 0.58.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 11 | 18
www.predictive-mind.net
But what is entropy in the first place? It is the average surprise of (in this case) the different outputs
of the device. Here, surprise (also called “surprisal”) is a technical notion for the negative logarithm
of an event’s probability. The average surprise (the entropy) is now computed as follows: H = –[0.14 *
log(0.14) + 0.86 * log(0.86)]. If this quantity is low, it is because the surprise values of the individual
outcomes are low (or at least most of them). So to have a low entropy, the surprise of states must be
low at any time (or at least most of the time).
We can again apply this to the fish example. Most of the time, the fish will be in unsurprising states.
Given exhaustive knowledge about the fish, we can in principle describe the regions of the state space
in which the fish is likely to be found, and construct an “armchair” probability distribution that reflects
this knowledge. Or we can observe the fish and note the relative frequencies with which it is found in
different regions of its state space. In the long run, this empirical distribution should become more
and more similar to the “armchair” distribution. (This is an informal description of the ergodicity as-
sumption, which is a formally defined feature of certain random processes, see Friston 2009, p. 293.)
So far, we have observed the fish from the outside, from the observer’s perspective. What happens
if we change our point of view and observe the fish from “the animal’s perspective” (as Eliasmith 2000,
pp. 25 f., calls it)? The key difference is that we do not even know the fish’s current state. An organ-
ism gains knowledge about its own current state by sensory measurements, but these measurements
provide the organism with only partial and perhaps noisy information. What is more, the organism
does not have access to the probability distribution relative to which the surprisal of its states can be
computed. Here, the free-energy principle (FEP) provides a principled solution (feature #12).
The general strategy of FEP consists of two steps. The first is to try to match an internally coded prob-
ability distribution (a recognition distribution) to the true posterior distribution of the hidden states,
given sensory signals. The second is to try to change sensory signals in such a way that the surprise of
sensory and hidden states is low at any given time. This may seem to make matters even worse, because
now there are two problems: How can the recognition distribution be matched to an unknown poste-
rior, and how can the surprise of sensory signals be minimized, if the distribution relative to which the
surprise is defined is unkown? The ingenuity of FEP consists in solving both problems by minimizing
free energy. Here, free energy is an information-theoretic quantity, the minimization of which is possible
from the animal’s perspective (for details, see Friston 2008; Friston 2009; Friston 2010).
Explaining this requires a slightly more formal description (here, we will simplify matters; a much
more detailed, but still accessible, explanation of the free-energy principle can be found in Bogacz
2015). Firstly, “matching” the recognition distribution to an unknown posterior is just an approxi-
mation to Bayesian inference in which the recognition distribution is assumed to have a certain form
(e.g., Gaussian). This simplifies computations. Secondly, once an approximation to the true model is
computed, free energy constitutes a tight bound on the surprisal of sensory signals. Hence, minimiz-
ing free energy by changing sensory signals will, implicitly, minimize surprisal.
Note that the connection between FEP and Bayesian inference is straightforward: minimizing free
energy entails approximating the posterior distribution by the recognition distribution. If the recog-
nition distribution is assumed to be Gaussian (with the famous bell-shaped probability density func-
tion), minimizing free energy entails minimizing precision-weighted prediction errors. So at least un-
der this assumption (which is called the Laplace assumption ), there is also a connection between FEP
and prediction error minimization. In fact, FEP can be regarded as the fundamental theory, which
can combine the different features of predictive processing described above within a single, formally
rigorous framework. However, it is debatable which of these features are actually entailed by FEP.
As mentioned before, Environmental Seclusion is an example of a controversial feature (see Fabry
2017a; Clark 2017). Therefore, it could be helpful to look at specific aspects of this novel proposal not
only from an empirical, but also from a conceptual and a metatheroetical perspective. This was one
major motive behind our initiative, leading to the current collection of texts.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 12 | 18
www.predictive-mind.net
Figure 1: A schematic illustration of how minimizing free energy can, implicitly, minimize surprise. Initially, the recog-
nition distribution will not match the true posterior distribution (of hidden causes, given sensory signals) very well. In
order to improve the recognition distribution, it can be changed in such a way that the measured sensory signals become
more likely, given this model (this means the model evidence is increased). One way to implement this is by minimizing
prediction error. So the assumption is that sensory signals are unsurprising, and this should be reflected by the recognition
distribution (i.e., the recognition distribution is altered in such a way that, relative to this distribution, sensory signals are
unsurprising). Of course, it could be that the sensory signals are, relative to the true posterior, surprising. For this reason,
the recognition distribution has to be tested. This is done, implicitly, by bringing about changes in the world that will, if the
recognition distribution is adequate, lead to unsurprising sensory signals. This is active sampling. To some extent, sensory
signals will always be surprising, so an adjustment of the recognition distribution will always be required, followed by ac-
tive sampling, and a further adjustment of the recognition distribution, etc. So this bootstrapping process works through
a continuous trial-and-error procedure, and depends on an intimate causal connection between the agent and its environ-
ment. Although the black arrows are meant to indicate a temporal sequence, there does not have to be a neat separation
between perceptual inference and active inference, and the bootstrapping process could also start with bodily movements.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 13 | 18
www.predictive-mind.net
Glossary
Active inference: 1. Computational process in which prediction error is minimized by acting on the
world (“making the world more similar to the model”), as opposed to minimizing prediction error
by changing the internal model, i.e. perceptual inference (“making the model more similar to the
world”). 2. Also used as a generic term for the computational processes which underpin both ac-
tion and perception, and, in the context of FEP, for all computational processes that minimize free
energy.
Bayesian inference: Updating a model in accordance with Bayes’ rule, i.e. computing the posterior
distribution: p(c|s) = p(s|c)p(c)/p(s). For an example, see (Harkness and Keshava 2017).
Counterfactual model: A counterfactual model is a conditional probability distribution that relates
possible actions to possible future states (at least following Friston et al. 2012b).
Estimator: A statistical estimator is a function of random variables that are conceived as samples; so
an estimator specifies how to compute an estimate from observed data. An estimate is a particular
value of an estimator (which is computed when particular samples, i.e., realizations of random
variables, have been obtained).
“Explaining Away”: The notion of “explaining away” is ambiguous. 1. Some authors write that senso-
ry signals are explained away by top-down predictions (cf. Clark 2013a, p. 187). 2. Another sense
in which the term is used is that competing hypotheses or models are explained away (cf. Hohwy
2010, p. 137). 3. A third sense is as in explaining prediction error away (cf. Clark 2013a, p. 187).
Free energy: In the context of Friston’s FEP, free energy is not a thermodynamic quantity, but an infor-
mation-theoretic quantity that constitutes an upper bound on surprisal. If this bound is tight, the
surprisal of sensory signals can therefore be reduced if free energy is minimized by bringing about
changes in the world.
Gaussian distribution: The famous bell-shaped probability distribution (also called the normal dis-
tribution). Its prominence is grounded in the central limit theorem, which basically states that
many distributions can be approximated by Gaussian distributions.
Generative model: The joint probability distribution of two or more random variables, often given in
terms of a prior and a likelihood: p(s,c) = p(s|c)p(c). (Sometimes, only the likelihood p(s|c) is called
a “generative model”.) The model is generative in the sense that it models how sensory signals s are
generated by hidden causes c. Furthermore, it can be used to generate mock sensory signals, given
an estimate of hidden causes.
Hierarchy: PP posits a hierarchy of estimators, which operate at different spatio-temporal timescales
(so they track features at different scales). The hierarchy does not necessarily have a top level (but
it might have a center — think of the levels as rings on a disc or a sphere).
Inverse problem: From the point of view of predictive coding, the problem of perception requires
inverting the mapping from hidden causes to sensory signals. This problem is difficult, to say the
least, because there is not usually a unique solution, and sensory signals are typically noisy (which
means that the mapping from hidden causes to sensory signals is not deterministic).
Prediction: A prediction is a deterministic function of an estimate, which can be compared to anoth-
er estimate (the predicted estimate). Predictions are not necessarily about the future (note that a
variable can be predictive of another variable if the first carries information about the second, i.e., if
there is a correlation, cf. Anderson and Chemero 2013, p. 204). Still, many estimates in PP are also
predictive in the temporal sense (cf. Butz 2017; Clark 2013c, p. 236).
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 14 | 18
www.predictive-mind.net
Precision: The precision of a random variable is the inverse of its variance. In other words, the greater
the average divergence from its mean, the lower the precision of a random variable (and vice versa).
Random variable: A random variable is a measurable function between a probability space and a
measurable space. For instance, a six-sided die can be modeled as a random variable, which maps
each of six equally likely events to one of the numbers in the set {1,2,3,4,5,6}.
Surprisal: An information-theoretic notion which specifies how unlikely an event is, given a model.
More specifically, it refers to the negative logarithm of an event’s probability (also just called “sur-
prise”). It is important not to confuse this subpersonal, information-theoretic concept with the
personal-level, phenomenological notion of “surprise”.
References
Adams, R. A., Huys, Q. J. & Roiser, J. P. (2016). Compu- Brook, A. (2013). Kant’s view of the mind and conscious-
tational psychiatry: Towards a mathematically informed ness of self. In E. N. Zalta (Ed.) The Stanford encyclopedia
understanding of mental illness. J Neurol Neurosurg Psy- of philosophy.
chiatry, 87 (1), 53-63. https://dx.doi.org/10.1136/jnnp- Bruineberg, J. (2017). Active inference and the primacy of
2015-310737. the ‘I can’. In T. Metzinger & W. Wiese (Eds.) Philosophy
Anderson, M. L. (2017). Of Bayes and bullets: An embod- and predictive processing. Frankfurt am Main: MIND
ied, situated, targeting-based account of predictive pro- Group.
cessing. In T. Metzinger & W. Wiese (Eds.) Philosophy Bruineberg, J., Kiverstein, J. & Rietveld, E. (2016). The an-
and predictive processing. Frankfurt am Main: MIND ticipating brain is not a scientist: The free-energy prin-
Group. ciple from an ecological-enactive perspective. Synthese,
1–28. https://dx.doi.org/10.1007/s11229-016-1239-1.
Anderson, M. L. & Chemero, T. (2013). The problem with
Burr, C. (2017). Embodied decisions and the predictive
brain GUTs: Conflation of different senses of ‘’predic-
brain. In T. Metzinger & W. Wiese (Eds.) Philosophy and
tion’’ threatens metaphysical disaster. Behavioral and
predictive processing. Frankfurt am Main: MIND Group.
Brain Sciences, 36 (3), 204–205.
Butz, M. V. (2017). Which structures are out there? Learn-
Badets, A., Koch, I. & Philipp, A. M. (2014). A review of
ing predictive compositional concepts based on social
ideomotor approaches to perception, cognition, action,
sensorimotor explorations. In T. Metzinger & W. Wiese
and language: Advancing a cultural recycling hypothe-
(Eds.) Philosophy and predictive processing. Frankfurt am
sis. Psychological Research, 80 (1), 1–15. https://dx.doi.
Main: MIND Group.
org/10.1007/s00426-014-0643-8.
Clark, A. (2013a). Whatever next? Predictive brains, situ-
Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R.,
ated agents, and the future of cognitive science. Behavi-
Fries, P. & Friston, K. J. (2012). Canonical microcircuits oral and Brain Sciences, 36 (3), 181–204. https://dx.doi.
for predictive coding. Neuron, 76 (4), 695-711. https:// org/10.1017/S0140525X12000477.
dx.doi.org/10.1016/j.neuron.2012.10.038. ———(2013b). The many faces of precision (Replies to com-
Bogacz, R. (2015). A tutorial on the free-energy frame- mentaries on ‘’Whatever next? Neural prediction, situ-
work for modelling perception and learning. Journal of ated agents, and the future of cognitive science’’). Fron-
Mathematical Psychology. https://dx.doi.org/10.1016/j. tiers in Psychology, 4, 270. https://dx.doi.org/10.3389/
jmp.2015.11.003. fpsyg.2013.00270.
Brodski, A., Paasch, G.-F., Helbling, S. & Wibral, M. (2015). ——— (2013c). Are we predictive engines? Perils, pros-
The faces of predictive coding. The Journal of Neuros- pects, and the puzzle of the porous perceiver. Behavio-
cience, 35 (24), 8997-9006. https://dx.doi.org/10.1523/ ral and Brain Sciences, 36 (3), 233–253. https://dx.doi.
jneurosci.1529-14.2015. org/10.1017/S0140525X12002440.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 15 | 18
www.predictive-mind.net
——— (2015). Radical predictive processing. The South- phy and predictive processing. Frankfurt am Main: MIND
ern Journal of Philosophy, 53, 3–27. https://dx.doi. Group.
org/10.1111/sjp.12120. ——— (2017b). Transcending the evidentiary boundary:
——— (2016). Surfing uncertainty: Prediction, action, and Prediction error minimization, embodied interaction,
the embodied mind. New York: Oxford University Press. and explanatory pluralism. Philosophical Psychology,
——— (2017). How to knit your own Markov blanket: Re- 1–20. https://dx.doi.org/10.1080/09515089.2016.12726
sisting the second law with metamorphic minds. In T. 74.
Metzinger & W. Wiese (Eds.) Philosophy and predictive Feldman, H. & Friston, K. J. (2010). Attention, uncertain-
processing. Frankfurt am Main: MIND Group. ty, and free-energy. Frontiers in Human Neuroscience, 4.
——— (in press). Busting out: Predictive brains, embod- https://dx.doi.org/10.3389/fnhum.2010.00215.
ied minds, and the puzzle of the evidentiary veil. Noûs. Friston, K. (2003). Learning and inference in the brain.
https://dx.doi.org/10.1111/nous.12140. Neural Networks, 16 (9), 1325–1352.
Clowes, M. B. (1969). Pictorial relationships – A syntactic ——— (2005). A theory of cortical responses. Philosophi-
approach. In B. Meltzer & D. Michie (Eds.) (pp. 361– cal Transactions of the Royal Society B: Biological Scien-
383). Edinburgh, UK: Edinburgh University Press. ces, 360 (1456), 815-836. https://dx.doi.org/10.1098/
Colombo, M. (2017). Social motivation in computational rstb.2005.1622.
neuroscience: Or if brains are prediction machines then
——— (2008). Hierarchical models in the brain. PLoS Com-
the Humean theory of motivation is false. In J. Kiever-
putational Biology, 4 (11), e1000211. https://dx.doi.
stein (Ed.) Routledge handbook of philosophy of the social
org/10.1371/journal.pcbi.1000211.
mind. Abingdon, OX / New York, NY: Routledge.
——— (2009). The free-energy principle: A rough guide to
Dennett, D. C. (2013). Intuition pumps and other tools for
the brain? Trends in Cognitive Sciences, 13 (7), 293–301.
thinking. New York, N.Y., and London, UK: W.W. Nor-
https://dx.doi.org/10.1016/j.tics.2009.04.005.
ton & Company.
——— (2010). The free-energy principle: A unified brain
Dewhurst, J. (2017). Folk psychology and the Bayesian
theory? Nature Reviews Neuroscience, 11 (2), 127–138.
brain. In T. Metzinger & W. Wiese (Eds.) Philosophy and
https://dx.doi.org/10.1038/nrn2787.
predictive processing. Frankfurt am Main: MIND Group.
Friston, K. & Buzsáki, G. (2016). The functional anatomy
Downey, A. (2017). Radical sensorimotor enactivism &
of time: What and when in the brain. Trends in Cognitive
predictive processing. Providing a conceptual frame-
Sciences, 20 (7), 500–511. https://dx.doi.org/10.1016/j.
work for the scientific study of conscious perception. In
tics.2016.05.001.
T. Metzinger & W. Wiese (Eds.) Philosophy and predicti-
ve processing. Frankfurt am Main: MIND Group. Friston, K. & Kiebel, S. (2009). Predictive coding under the
Dołega, K. (2017). Moderate predictive processing. In T. free-energy principle. Philosophical Transactions of the
Metzinger & W. Wiese (Eds.) Philosophy and predictive Royal Society B: Biological Sciences, 364 (1521), 1211–
processing. Frankfurt am Main: MIND Group. 1221. https://dx.doi.org/10.1098/rstb.2008.0300.
Drayson, Z. (2017). Modularity and the predictive mind. In Friston, K. J. & Stephan, K. E. (2007). Free-energy and
T. Metzinger & W. Wiese (Eds.) Philosophy and predicti- the brain. Synthese, 159 (3), 417-458. https://dx.doi.
ve processing. Frankfurt am Main: MIND Group. org/10.1007/s11229-007-9237-y.
Egan, F. (2014). How to think about mental content. Phi- Friston, K., Mattout, J. & Kilner, J. (2011). Action under-
losophical Studies, 170 (1), 115-135. https://dx.doi. standing and active inference. Biological Cybernetics,
org/10.1007/s11098-013-0172-0. 104 (1-2), 137–160. https://dx.doi.org/10.1007/s00422-
Eliasmith, C. (2000). How neurons mean: A neurocomputa- 011-0424-z.
tional theory of representational content. PhD disserta- Friston, K., Samothrakis, S. & Montague, R. (2012a). Ac-
tion, Washington University in St. Louis. Department of tive inference and agency: Optimal control without
Philosophy. cost functions. Biological Cybernetics, 106 (8), 523-541.
Engel, A. K., Fries, P. & Singer, W. (2001). Dynamic predic- https://dx.doi.org/10.1007/s00422-012-0512-8.
tions: Oscillations and synchrony in top-down process- Friston, K., Adams, R., Perrinet, L. & Breakspear, M.
ing. Nat Rev Neurosci, 2 (10), 704–716. (2012b). Perceptions as hypotheses: Saccades as exper-
Fabry, R. E. (2017a). Predictive processing and cognitive iments. Frontiers in Psychology, 3 (151). https://dx.doi.
development. In T. Metzinger & W. Wiese (Eds.) Philoso- org/10.3389/fpsyg.2012.00151.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 16 | 18
www.predictive-mind.net
Friston, K. J., Stephan, K. E., Montague, R. & Dolan, R. Hommel, B., Müsseler, J., Aschersleben, G. & Prinz, W.
J. (2014). Computational psychiatry: The brain as a (2001). The theory of event coding (TEC): A framework
phantastic organ. The Lancet Psychiatry, 1 (2), 148–158. for perception and action planning. Behavioral and
https://dx.doi.org/10.1016/S2215-0366(14)70275-5. Brain Sciences, 24, 849–878. https://dx.doi.org/10.1017/
Giordanetti, P., Pozzo, R. & Sgarbi, M. (2012). Kant‘s phi- S0140525X01000103.
losophy of the unconscious. Berlin, Boston: De Gruyter. Horn, B. K. P. (1980). Derivation of invariant scene cha-
Gonzalez-Gadea, M. L., Chennu, S., Bekinschtein, T. A., racteristics from images (pp. 371–376). https://dx.doi.
Rattazzi, A., Beraudi, A., Tripicchio, P., Moyano, B., org/10.1145/1500518.1500579.
Soffita, Y., Steinberg, L., Adolfi, F., Sigman, M., Marino, James, W. (1890). The principles of psychology. New York:
J., Manes, F. & Ibanez, A. (2015). Predictive coding in Henry Holt.
autism spectrum disorder and attention deficit hyperac- Kant, I. (1998). Critique of pure reason. Cambridge, MA:
tivity disorder. Journal of Neurophysiology, 114 (5), 2625- Cambridge University Press.
2636. https://dx.doi.org/10.1152/jn.00543.2015. ——— (1998[1781/87]). Kritik der reinen Vernunft. Ham-
Gregory, R. L. (1980). Perceptions as hypotheses. Philoso- burg: Meiner.
phical Transactions of the Royal Society of London. B, Bio-
Kiefer, A. (2017). Literal perceptual inference. In T.
logical Sciences, 290 (1038), 181–197.
Metzinger & W. Wiese (Eds.) Philosophy and predictive
Grush, R. (2004). The emulation theory of representation:
processing. Frankfurt am Main: MIND Group.
Motor control, imagery, and perception. Behavioral and
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015).
Brain Sciences, 27 (3), 377–396.
Human-level concept learning through probabilistic
Gładziejewski, P. (2016). Predictive coding and repre-
program induction. Science, 350 (6266), 1332-1338.
sentationalism. Synthese, 559–582. https://dx.doi.
https://dx.doi.org/10.1126/science.aab3050.
org/10.1007/s11229-015-0762-9.
Lee, T. S. & Mumford, D. (2003). Hierarchical Bayes-
Harkness, D. L. & Keshava, A. (2017). Moving from the
ian inference in the visual cortex. J. Opt. Soc. Am.
what to the how and where – Bayesian models and pre-
A, 20 (7), 1434–1448. https://dx.doi.org/10.1364/
dictive processing. In T. Metzinger & W. Wiese (Eds.)
JOSAA.20.001434.
Philosophy and predictive processing. Frankfurt am Main:
Lenoir, T. (2006). Operationalizing Kant: Manifolds, mod-
MIND Group.
els, and mathematics in Helmholtz’s theories of percep-
Herbart, J. F. (1825). Psychologie als Wissenschaft neu ge-
tion. In M. Friedman & A. Nordmann (Eds.) The Kan-
gründet auf Erfahrung, Metaphysik und Mathematik.
tian legacy in nineteenth-century science (pp. 141–210).
Zweiter, analytischer Teil. Königsberg: Unzer.
Cambridge, MA: MIT Press.
Hohwy, J. (2010). The hypothesis testing brain: Some phil-
osophical applications. In W. Christensen, E. Schier & Limanowski, J. (2017). (Dis-)attending to the body. Action
J. Sutton (Eds.) Proceedings of the 9th conference of the and self-experience in the active inference framework.
Australasian society for cognitive science (pp. 135–144). In T. Metzinger & W. Wiese (Eds.) Philosophy and pre-
Macquarie Centre for Cognitive Science. https://dx.doi. dictive processing. Frankfurt am Main: MIND Group.
org/10.5096/ASCS200922. Lotze, R. H. (1852). Medicinische Psychologie oder Physiol-
———(2012). Attention and conscious perception in the hy- ogie der Seele. Leipzig: Weidmann’sche Buchhandlung.
pothesis testing brain. Frontiers in Psychology, 3. https:// Metzinger, T. (2004[2003]). Being no one: The self-model
dx.doi.org/10.3389/fpsyg.2012.00096. theory of subjectivity. Cambridge, MA: MIT Press.
——— (2013). The predictive mind. Oxford: Oxford Univer- ———(2017). The problem of mental action. Predictive con-
sity Press. trol without sensory sheets. In T. Metzinger & W. Wiese
——— (2016). The self-evidencing brain. Noûs, 50 (2), 259– (Eds.) Philosophy and predictive processing. Frankfurt am
285. https://dx.doi.org/10.1111/nous.12062. Main: MIND Group.
——— (2017). How to entrain your evil demon. In T. Palmer, C. J., Paton, B., Kirkovski, M., Enticott, P. G. &
Metzinger & W. Wiese (Eds.) Philosophy and predictive Hohwy, J. (2015). Context sensitivity in action decreas-
processing. Frankfurt am Main: MIND Group. es along the autism spectrum: A predictive processing
Hommel, B. (2015). The theory of event coding (TEC) as perspective. Proceedings of the Royal Society of Lon-
embodied-cognition framework. Frontiers in Psychology, don B: Biological Sciences, 282 (1802). https://dx.doi.
6. https://dx.doi.org/10.3389/fpsyg.2015.01318. org/10.1098/rspb.2014.1557.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 17 | 18
www.predictive-mind.net
Prinz, W. (1990). A common coding approach to percep- Van de Cruys, S., Evers, K., Van der Hallen, R., van Eylen,
tion and action. In O. Neumann & W. Prinz (Eds.) Re- L., Boets, B., de-Wit, L. & Wagemans, J. (2014). Precise
lationships between perception and action (pp. 167–201). minds in uncertain worlds: Predictive coding in autism.
Berlin; Heidelberg: Springer. Psychological Review, 121 (4), 649–675. https://dx.doi.
Quadt, L. (2017). Action-oriented predictive processing org/10.1037/a0037665.
and social cognition. In T. Metzinger & W. Wiese (Eds.) Van Doorn, G., Hohwy, J. & Symmons, M. (2014). Can you
Philosophy and predictive processing. Frankfurt am Main: tickle yourself if you swap bodies with someone else?
MIND Group. Consciousness and Cognition, 23, 1-11. http://dx.doi.
Seth, A. K. (2015). The cybernetic Bayesian brain: From org/10.1016/j.concog.2013.10.009.
interoceptive inference to sensorimotor contingen- Van Doorn, G., Paton, B., Howell, J. & Hohwy, J. (2015).
cies. In T. Metzinger & J. M. Windt (Eds.) Open MIND. Attenuated self-tickle sensation even under trajectory
Frankfurt am Main: MIND Group. https://dx.doi. perturbation. Consciousness and Cognition, 36, 147–153.
org/10.15502/9783958570108. https://dx.doi.org/10.1016/j.concog.2015.06.016.
Shi, Y. Q. & Sun, H. (1999). Image and video compression Von Helmholtz, H. (1855). Ueber das Sehen des Menschen.
for multimedia engineering: fundamentals, algorithms, Leipzig: Leopold Voss.
and standards. Boca Raton, FL: CRC Press.
——— (1867). Handbuch der physiologischen Optik. Leipzig:
Sloman, A. (1984). Experiencing omputation: A tribute to
Leopold Voss.
Max Clowes. In M. Yazdani (Ed.) New horizons in educa-
——— (1959[1879/1887]). Die Tatsachen in der Wahrneh-
tional computing (pp. 207–219). Chichester: John Wiley
mung. Zählen und Messen. Darmstadt: Wissenschaftli-
& Sons.
che Buchgesellschaft.
Snowdon, P. (1992). How to interpret ‘direct perception’.
———(1985[1925]). Helmholtz‘s treatise on physiological op-
In T. Crane (Ed.) The contents of experience (pp. 48–78).
tics. Birmingham, AL: Gryphon Editions.
Cambridge: Cambridge University Press.
Spratling, M. W. (2016). A review of predictive cod- Von Holst, E. & Mittelstaedt, H. (1950). Das Reafferenz-
ing algorithms. Brain and Cognition. https://dx.doi. prinzip. Die Naturwissenschaften, 37 (20), 464–476.
org/10.1016/j.bandc.2015.11.003. Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein,
Stock, A. & Stock, C. (2004). A short history of ideo-mo- T., Naccache, L. & Dehaene, S. (2011). Evidence for a hi-
tor action. Psychological Research, 68, 176–188. https:// erarchy of predictions and prediction errors in human
dx.doi.org/10.1007/s00426-003-0154-5. cortex. Proc Natl Acad Sci U S A, 108 (51), 20754-9.
Swanson, L. R. (2016). The predictive processing paradigm https://dx.doi.org/10.1073/pnas.1117807108.
has roots in Kant. Frontiers in Systems Neuroscience, 10, Wiese, W. (2016). Action is enabled by systematic mis-
79. https://dx.doi.org/10.3389/fnsys.2016.00079. representations. Erkenntnis. https://dx.doi.org/10.1007/
Todorov, E. (2009). Parallels between sensory and motor s10670-016-9867-x.
information processing. In M. S. Gazzaniga (Ed.) The Zellner, A. (1988). Optimal information processing and
cognitive neurosciences. 4th edition (pp. 613–623). Cam- Bayes’s theorem. The American Statistician, 42 (4), 278–
bridge, MA / London, UK: MIT Press. 280. https://dx.doi.org/10.2307/2685143.
Wiese, W. & Metzinger T. (2017). Vanilla PP for Philosophers: A Primer on Predictive Processing.
In T. Metzinger & W. Wiese (Eds.). Philosophy and Predictive Processing: 1. Frankfurt am Main: MIND Group. doi: 10.15502/9783958573024 18 | 18