Computational Neuroscience - A First Course - 2013
Computational Neuroscience - A First Course - 2013
Computational Neuroscience - A First Course - 2013
Hanspeter A. Mallot
Computational
Neuroscience
A First Course
Springer Series in Bio-/Neuroinformatics
Volume 2
Series Editor
N. Kasabov
Computational Neuroscience
A First Course
ABC
Hanspeter A. Mallot
Department of Biology
University of Tübingen
Tübingen
Germany
population coding or the geometric and parametric mappings that facilitate compu-
tations in two-dimensional representations. The application-oriented claim of cyber-
netics is treated in a chapter on artificial neural nets. The author, who also published
important contributions to the entire theoretical descriptions of neural systems, has
the rare talent to put facts and their relations in mathematical terms in a way that
is easily understood. Also, the handling of the problems with continuous as well
as with discrete methods and the introduction of necessary non-linearities into the
description can be captured without special training.
Recent progress in experimental methods has provided exciting new insights into
the structure of the brain on all levels of complexity and elucidated the efficiency
and the variability of the systems. Theoretical concepts have become more profound
and more adequate for dealing with biological constraints. Self-organization and
life-long adaptation of the brain proved to determine both its anatomical structures
and the way data and knowledge are represented. The underlying processes—and
with that the decomposition of tasks—seem to be based on a relatively uniform
set of learning procedures and anatomical architectures (such as cortices), both fa-
voring the flexibility of the entire systems. Thus, brains and the information they
contain are in a permanent process of adaptation to their current goal. This process
of self-organization seems to be the one driving force of both brain development
and learning.
The self-organization of neural systems is of course of substantial interest for
technical systems with their ever-increasing complexity and diversity. Self-organiza-
tion of neuron-like structures based on their innate flexibility may lead a way to
reduced design efforts. In this sense, the original quest of cybernetics is still alive,
albeit with an extended focus on the whole, behaving organism: How do neural
systems succeed to autonomously overcome their ignorance and to exploit the pre-
dictabilities of their environment for optimal behavior and, indeed, for evolutionary
success? Evolution and ecology prove, of course, that solutions to these questions
exist.
Not surprisingly, the computational methods used in the neural and behavioral
sciences cover the major part of scientific computing at large. The book focuses on
the most important approaches and lays the foundations for extended questions and
specifications. It shows that it is not only the detailed mathematical formalisms that
are crucial but also the broader concepts which allow appropriate mathematizations
of neuroscientific phenomena and concepts. This book delivers a clear and under-
standable introduction into the structure of the problems of a scientific domain that
still belongs to the most interesting fields science has to offer.
2.3.3
Non-linearity as Interaction: Volterra Kernels . . . . . . . . . . . . . 44
2.3.4
Energy-Type Non-linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.5
Summary: Receptive Fields in the Primary Visual
Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Motion and Flicker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.2 Coincidence Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.3 Correlation Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.4 Motion as Orientation in Space-Time . . . . . . . . . . . . . . . . . . . 53
2.5 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Chapter 1
Excitable Membranes and Neural Conduction
Table 1.1 Ion concentrations and Nernst potentials of three ion types (from Aidley, 1998)
Ion concentrations
Ion External Internal Nernst potential
(mM) (mM) (mV)
Frog muscle K+ 2.25 124 -101
Na+ 109 10.4 +59
Cl− 77.5 1.5 -99
Squid axon K+ 20 400 -75
Na+ 440 50 +55
Cl− 560 40 -66
e outside
$
*1
*
1 1
gNa gK gl E
x
z
Cm + +
−
+ ENa − EK − El
%
e inside
Fig. 1.1 Electrical circuit representing the membrane of a neuron. Redrawn from Hodgkin &
Huxley (1952)
the resting potential of nerve cells, i.e. the voltage difference of about −70 mV (in-
side negative) across a neuron’s cellular membrane. This voltage difference is due
to an unequal distribution of various ions, including the cations of sodium (Na+ )1 ,
potassium (K+ )2 , and calcium (Ca2+ ), as well as the anion chloride (Cl− ) and the
anions formed by acidic proteins (i.e. X-COO− ). While the latter cannot permeate
the cell membrane, the small inorganic ions can do so to various degrees. In fact, the
membrane contains special protein complexes, the channels, which allow the ions to
pass. These channels are generally specific for particular ion types and their open-
ing may depend on the presence or absence of specific chemical “ligands” (such as
synaptic transmitters), or on the membrane potential. Indeed, voltage dependence
of ion channel opening is the key property of neuron membranes from which the
excitability of neurons results. It is thus the basis of neural information processing.
In the steady state, i.e. if the neuron is not active, the permeability of the mem-
brane for each ion sort is constant. In this situation the ion distribution across the
membrane and the membrane potential are related by the the Nernst3 and Goldman4
equations. If only one ion sort is able to permeate the membrane, the Nernst equi-
librium is obtained if the osmotic force, resulting from unequal distribution of ions
across the membrane, just cancels the electro-motor force, resulting from the un-
equal distribution of electric charges. Consider a hypothetical membrane separating
1 The chemical symbol Na refers to the German “Natrium”, named after Lake Natron in Tanzania,
which is known for its high concentration of Na+ ions.
2 The chemical symbol K refers to the German “Kalium”. It is derived from the Arab “al-qalya”,
two compartments one of which filled with pure water and the other one filled with
the solution of a potassium salt of an acidic protein. If the membrane is permeable to
potassium cations, but not to the large anions, potassium cations will cross the mem-
brane following osmotic force, i.e. by diffusion. Since the anions cannot follow, this
leads to a depletion of positive charges, i.e. a surplus of negative charges which
drives the cations back into their original compartment. The eventual equilibrium
where both forces cancel out is given by Nernst’s equation. In a realistic neuron,
the potassium equilibrium potential is in the order of -100 mV. Similarly, a Nernst
potential of +60 mV can be calculated for the Na+ ions, see Table 1.1. Note that the
membrane potential is measured as the difference inside minus outside, -100 mV
therefore means that the inside is negative. The Nernst potential marks a thermody-
namical equilibrium, i.e. a steady state which is stable without requiring energy for
its maintenance. If the membrane potential is changed (e.g. by putting electrodes
into the compartments and applying an external voltage), the electro-motor force
changes and ions will move until a new equilibrium is reached.
If ions of more than one type can cross the membrane, the equilibrium is more
complex, since the electro-motor forces are the same for all ion sorts with the same
charge, whereas the osmotic forces depend on the concentrations of each ion sort
separately. In effect, a complex mixture of the individual Nernst equilibria is ob-
tained, described by the Goldman equation (see textbooks of physiology for details).
The resting potential of a typical neuron is determined by the concentrations of K+ ,
Na+ , and Cl− ions and their respective equilibrium (Nernst) potentials, according
to Goldman’s equation. It is not at a thermodynamical equilibrium; for its mainte-
nance sodium ions are constantly pumped outwards and potassium ions inwards by
an active (i.e. energy consuming) process, the sodium-potassium-pump located in
the cell membrane.
The electrical properties of a small patch of membrane can be modeled by the
simple electrical circuit shown in Figure 1.1. The distributions of the sodium, potas-
sium and chloride ions act as batteries generating an ion current across the mem-
brane. The conductivity of the membrane for the two ions is denoted by gNa and
gK , respectively. The arrows drawn across the resistors indicate that conductivity
can change. This change of sodium and potassium conductivity as a function of the
membrane potential is the basis of the excitability of neural membranes. The third
channel is not voltage dependent and is called “leakage”; its current is mostly due
to chloride ions. Finally, the membrane has a capacitance denoted by Cm .
The action potential is generated by the different dynamics of the changes in
membrane conductivity for the different ions. An initial depolarization causes the
sodium channel to open and sodium ions will come in, leading to a still stronger
depolarization of the membrane. After about 1 ms, the sodium channel inactivates
(i.e. closes) in a process depending on intrinsic dynamics, but not on membrane
potential. As an effect of depolarization, the potassium channels will open as well,
however, with a slower time course. When they open, potassium will move out and
the membrane is repolarized. As an effect of repolarization, the potassium chan-
nels close again. When the action potential is over, the ion distributions in the cell
and its environment will be slightly less different, due to the involved ion currents.
4 1 Excitable Membranes and Neural Conduction
? ?
gg ww ww
ww gg s s u
gg u u ww e ue s ?s sc sc u u sc sc ww sc sc ww e
u eu
?
6 6 6
6 6
inside K~
n
+ ~
n
K+ K~
n
+
K~
n
+ ~
n
K+
Fig. 1.2 Changes in conductivity of an excitable membrane during the course of an action
potential. Initially (left side), the depolarization traveling passively along the axon will lead
to an opening of the Na+ -channels (•). This opening is transient and finishes after about 1
ms independent of membrane potential. As a result, the membrane potential will rise quickly
(Depolarization). With a certain delay, the potassium channels (K+ ; ◦) will open. Due to the
reverse distribution patterns of the two ions, this will lead to an outflow of positive charge,
i.e. to repolarization. This in turn leads to the closure of the potassium channels. The length
of the arrows symbolizes the conductivities gNa and gK (cf. Figure1.8).
The number of ions passing during each action potential can be calculated from
the membrane capacity, the voltage change, and the elementary charge. Roughly,
about 0.1% of the ions are exchanged during one action potential. In the long run,
these differences are compensated for by the sodium-potassium-pump. A schematic
picture of the opening and closing events is given in Fig 1.2.
5 Alan Lloyd Hodgkin (1914 – 1998). English physiologist. Nobel Prize in Physiology or Medicine
1963.
6 Andrew Fielding Huxley (1917 – 2012). English physiologist. Nobel Prize in Physiology or
Medicine 1963.
1.2 The Hodgkin-Huxley Theory 5
I = Ic + INa + IK + Il (1.1)
dE
= CM + gNa (E − ENa ) + gK (E − EK ) + gl (E − El ) (1.2)
dt
For the capacitive current Ic , we have used Coulomb’s law of capacitance: The cur-
rent I flowing into a capacitor is proportional to the change of voltage E across the
capacitor, I = C dE
dt (see Figure 1.11b). The factor C is called capacitance and mea-
sured in F (Farad) = Coulomb/Volt. Intuitively, capacity describes the fact that ions
of opposite polarity attach to the membrane by mutual electric attraction. They thus
form a charge reservoir that has to be recharged if the membrane potential changes.
Recharging will take the longer, the larger the capacitance is, i.e. the more charges
are attached to the membrane.
For the ion currents, Equation 1.2 uses Ohm’s law (see Figure 1.11a). Instead
of using resistances R, i.e. the ratio of current and voltage measured in Ω (Ohm),
the Hodgkin-Huxley theory is generally formulated using the conductivities g,
which are simply the inverses of resistance, g = 1/R. The unit of conductivity is
S (Siemens) = 1/Ω. ENa , EK and El denote the Nernst potentials listed in Table 1.1.
The ion currents are proportional to the difference between the membrane poten-
tial and the ion’s Nernst potential. If both potentials are equal, the ion type is at
equilibrium.
Instead of the membrane potential E, standard formulations of the Hodgkin-
Huxley theory generally use the depolarization V which is the difference between
the membrane potential and the resting potential,
V = E − Eresting . (1.3)
Depolarization is zero if the membrane is at rest and becomes positive for excited
states. We will switch to this notion from here.
In the sequel, we will discuss the excitable sodium and potassium channels
and present phenomenological models for their dynamics. These phenomenologi-
cal models will then be combined using Equation 1.2 to generate a prediction for
the time course of the action potential.
steady state g∞
1.5
course of an outer “driving
force”, expressed as the steady
1
state eventually obtained by the
system. It is switched at discrete
0.5
times. b. Exponential relaxation
to the respective steady states as
expressed in Equation 1.4. The 0.5 1 1.5 2 2.5 3 3.5 4
value g reached at the time of 2
b.
dynamic variable g
switching of the driving force
acts as initial value for the next 1.5
relaxation. c. Vector field illus-
trating the differential equation 1
1.5. The continuous lines show
three solutions for different ini- 0.5
tial values taken at t = 0.
0.5 1 1.5 2 2.5 3 3.5 4
2
c.
dynamic variable g
1.5
0.5
Let us assume that the membrane potential changes in a step-like fashion at time
t = 1 from a value V1 to V2 and again at time t = 2 from V2 to V3 (Figure 1.3). Such
controlled membrane potentials and membrane potential changes are studied in the
“voltage clamp preparation”, where physiological fluctuations of the membrane po-
tential are compensated for by an electronic circuit. The steady-state conductivities
g∞ (Vi ) are shown in Figure 1.3a. The actual conductivities g(t) will differ from g∞
in that they need some time to change between the steady state levels. A plausible
time course of conductivity showing the non-zero switching times (finite switching
speed) appears in Figure 1.3b. The curve shows an exponential relaxation which
between two stepping times might be formally described by
t − t0
g(t) = g∞ + (go − g∞ ) exp{− }. (1.4)
τ
Here, go is the initial value of g at time to , g∞ is the steady state of the voltage
applied during the current interval, and τ is a time constant determining the rate of
1.2 The Hodgkin-Huxley Theory 7
approach towards the steady state. Note that the approach to the steady state is the
faster (the curve is the steeper), the larger the deviation from the steady state is.
A disadvantage of this equation is the fact that it can deal only with step changes
of membrane potential occurring at discrete points in time, but not with continuous
changes as they occur during the action potential. A formal description that holds for
both step changes and continuous changes is derived from the observation that the
rate of change, i.e. the slope of g(t) should be proportional to the current deviation
from the steady state given by g(t) − g∞:
Here we write g∞ for the steady state, keeping in mind that it depends on V . What-
ever changes V (t)—and therefore g∞ —undergoes over time, Equation 1.5 keeps
valid. An illustration of Equation 1.5 is given in Figure 1.3c. Each short line marks
a local slope g (t) calculated from Equation 1.5. Each curve drawn in a way that the
local lines in Figure 1.3c are tangents to the curve is a solution of Equation 1.5.
Equation 1.5 is an example of a differential equation, relating the value of a
function to its derivative. The solutions of differential equations are functions (or
families of functions), not numbers. By taking the derivative of g in Equation 1.4
and comparing it to g itself, it is easy to show that this function solves Equation 1.5
if k is set to 1/τ . We start by taking the derivative of Equation 1.4
1 t − to
g (t) = − (go − g∞ ) exp{− }. (1.6)
τ τ
Inserting g and g (again from Equation 1.4) into Equation 1.5 we obtain
1 t − to t − to
− (go − g∞ ) exp{− } = −k(g∞ + (go − g∞) exp{− } − g∞ )
τ τ τ
which is satisfied if we set k = −1/τ .
Equation 1.4 is called an analytical solution of the differential equation 1.5 be-
cause it is formulated as a named mathematical function, i.e. the exponential. The
sketched procedure, i.e., guessing a solution and proving that it satisfies the differen-
tial equation if appropriate choices of some variables are made, is a standard way of
finding analytical solutions. In contrast, numerical solutions are sequences of func-
tional values (i.e. numbers) obtained at discrete time-steps. We will see below that
for the full Hodgkin-Huxley system, only numerical solutions exist.
K+ conductance (m S / cm2 )
tance change (gK ) in the
15
voltage clamp experiment (in
milli-Siemens per square cen-
timeter). Membrane potential is 10
stepped from resting potential
by 109 mV (depolarization)
at time t = 0 ms and stepped 5
back to resting potential at time
t = 10 ms.
0
0 5 10 15 20
time (milliseconds)
Fig. 1.5 Kinetics modeled by Equations 1.7 and 1.8 (potassium channel). Left: pool of inac-
tivated subunits. Middle: activated subunits, portion of subunits in activated state is n. Right:
Channel formed of four subunits. Probability of four subunits joining in one place is n4 .
t = 0 ms. Conductance rises to a plateau, with the highest speed of change occur-
ring not at the beginning (as would be the case for the simple relaxation equation
studied above), but at some intermediate value. This behavior implies that the solu-
tion should have an “inflection point”, i.e. the point of highest speed after the onset
of the “relaxation”. This inflection point will be important for the modeling of the
channel switching behavior. At time t = 10 ms, the membrane is repolarized and the
conductance relaxates to zero, this time without an inflection point. If other depo-
larizations are chosen, the shape of the curve changes slightly. This is to say that the
response is non-linear.
Hodgkin and Huxley (1952) suggested to model this behavior by a two-step pro-
cess including (i) a simple relaxation process described by an axillary variable n and
(ii) a fourth order interaction term needed to model the inflection point found in the
voltage clamp response:
Equation 1.7 means that four channel subunits have to meet in order to form a chan-
nel. The probability of finding four subunits in one spot is equal to the probability
of finding one, raised to the fourth power (as long as the subunits move indepen-
dently in the membrane). Equation 1.8 assumes that the channels can be in one of
two states, one favorable of forming a channel and one unfavorable. The dimen-
sionless number n is the fraction of channel subunits being in the favorable state;
it varies between 0 and 1. αn and βn are called “rate constants” which depend on
voltage, but not on time. They specify the likelihood (the rate) at which the channel
subunit switches from the unfavorable state into the favorable state (αn (V )), or back
(βn (V )). These rate constants thus incorporate the voltage dependence of channel
formation. αn (V ) can be expected to grow with V whereas βn (V ) should decrease.
The equation is illustrated as a chemical reaction kinetic in Figure 1.5.
The exponent four in Equation 1.7 has been chosen to reproduce a subtle feature
of the conductivity step response shown in Figure 1.4. While n(t) will show simple
exponential relaxation, n4 (t) nicely fits the inflection point occurring in the rising
branch but not in the falling one. It is remarkable that molecular biology has since
confirmed the involvement of just four subunits in the formation of the potassium
channel, as predicted from the voltage clamp measurements by Hodgkin & Huxley
(1952).
Observe that Equation 1.8 has the same structure as Equation 1.5 studied above,
i.e., n = −k(n − n∞), if we rearrange the terms and replace k by αn + βn and n∞ by
αn /(αn + βn).
dn αn
(t) = − (αn + βn )(n(t) − ). (1.9)
dt αn + βn
k
n∞
conc. = 1 − m conc. = m
y y αm (V )
-
y βm (V ) conc. = m3 h
I -
conc. = 1 − h conc. = h t
|
αh (V )
-
t
|
βh (V )
Fig. 1.7 Kinetics modeled by Equation 1.12 (sodium channel). Circles: inactivated “parti-
cles”; boxes: activated particles. Filled symbols: activation particles, open symbol: inhibition
particles.
The phenomenological model presented for this dynamics by Hodgkin & Huxley
(1952) uses two auxiliary variables, one (m) modeling the raise (activation gate) and
one (h) modeling the drop (inactivation gate) of conductance:
potassium channel. Indeed, more complex models of sodium channel kinetics have
been suggested. For a recent discussion, see Milescu et al. (2008).
60
40
20
6mV
0
0 2 4 6 8 10
25
conductances
gNa
20
mS/cm2
15 gK
10
5
0
0 2 4 6 8 10
time (milliseconds)
Fig. 1.8 Solutions of the space-clamp equations for external stimuli V0 occurring at time
t = 0. Top: Three solutions for the depolarization resulting from external stimuli V0 = 6 mV,
7 mV, and 30 mV above resting potential. The threshold for spike generation is between 6
and 7 mV. Spikes generated by small and large super-threshold stimuli look the same (“all or
nothing” rule of spike generation). Bottom: conductance changes resulting from a stimulation
with 7 mV.
12 1 Excitable Membranes and Neural Conduction
conducting copper wire into an axon which will equilibrate all differences in mem-
brane potential occurring along the axon. The whole patch will thus exhibit the
action potential in synchrony.
In the space-clamp situation, no current will flow in axial direction since poten-
tials are constant over space. Therefore, there can also be no current loops involving
an axial component, which in turn excludes net currents across the membrane. We
may thus assume I = 0 in Equation 1.2 and obtain the following set of four coupled
ordinary differential equations:
dV
− CM (t) = ḡK n(t)4 (V (t) − VK) +
dt
ḡNa m(t)3 h(t) (V (t) − VNa ) + ḡl (V (t) − Vl ) (1.13)
dn
(t) = αn (V (t)) (1 − n(t)) − βn(V (t)) n(t) (1.14)
dt
dm
(t) = αm (V (t)) (1 − m(t)) − βm(V (t)) m(t) (1.15)
dt
dh
(t) = αh (V (t)) (1 − h(t)) − βh(V (t)) h(t) (1.16)
dt
The coupled system of non-linear differential equations 1.13 – 1.16 does not have
analytical solutions, i.e. we cannot derive a closed formula for the time-course of
the action potential. However, for the overall argument, it suffices to compute nu-
merical solutions, i.e. sampled functions approximating the solution, which can
then be compared to the measurements. Numerical solutions are obtained by first
specifying “initial values” for each of the four variables V, n, m, and h. For V , the
initial value is simply the external stimulus. For the auxiliary n, we observe that
dn/dt should be zero if the membrane is at rest. Therefore, we obtain from equa-
tion 1.14: nrest = αn (0)/(αn (0) + βn (0)) where the value 0 in the arguments refers
to deplorization. Analogous results are obtained for m and h. The initial values are
inserted in the right side of the equations. Each equation then gives a value for the
slope of the respective state variable at time 0. With this slope and the initial value,
the values at a time t is estimated by linear extrapolation. The resulting new esti-
mates are again inserted in the equations, leading to new slope values and in turn
to new estimates of the state variables at time 2t. This procedure is iterated until
the time interval considered is exhausted. A more elaborate version of this scheme,
known as Runge-Kutta-integration, was used to produce the plots in this text.
Figure 1.8 shows a numerical solution of the system 1.13 – 1.16 for an external
stimulus of V (0) = 7 mV. The depolarization V is plotted in the upper part. The
shape of the action potential is in very good agreement to the shape measured in
space-clamp experiments using squid axons. The corresponding time course of the
conductances is shown in the lower part of Figure 1.8.
If different external stimuli V (0) are chosen, two different types of behavior are
observed (Figure 1.8). If V (0) is 6 mV or less, the effect is very small; the stimulus
is sub-threshold. For stronger stimuli, an action potential is generated which looks
more or less equal for all super-threshold stimuli. Stronger stimuli result in earlier
1.3 An Analytical Approximation 13
0.8 0.8
0.2 0.2
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Fig. 1.9 Solutions of the FitzHugh-Nagumo system with Ia = 0, a = 0.3, b = 0.8, γ = 1.0.
Left: start at v = 0.8, right: start at v = 0.2.
action potentials, not in bigger ones. The Hodgkin-Huxley equations thus capture
the all-or-nothing behavior of neural excitation.
w
0.5
0.4
0
=
0.3
t
/d
dw
0.2
d v/
0.1 dt
=0
0
w 0.6
0.5
0.4
dv
0.3 /d
t =
0.2 0
0
=
0.1
t
/d
dw
Fig. 1.10 Phase portraits of the FitzHugh-Nagumo system. a. Non-oscillating case. Param-
eters as in Figure 1.9. The thin lines are the “null isoclines” dw/dt = 0 and dv/dt = 0. The
arrows give the initial direction of solutions starting from the respective points. The heavy
lines show the two solutions from Figure 1.9. The converge to the intersection of the iso-
clines which forms a stable fixed point. b. Oscillator. Parameters a = 0.2, b = 0.2, γ = 0.2,
Ia = 0.352. Note that the v-isocline have moved upwards. The intersection of the isoclines
is no longer a stable point. Two solutions are shown approaching a stable limit cycle from
within and from outside.
idea is that each point in the plane represents a state (or phase) of the system and
the dynamics is represented by movements in that plane. For example, the solutions
shown in Figure 1.9 are represented in Figure 1.10a as trajectories, (v(t), w(t)).
We can now look for states (i.e. pairs of numbers (v, w)) for which the system of
equations 1.17 – 1.19 stops moving. Such points in the phase plane are called fixed
points or steady states. From Equation 1.17 we obtain:
dv
= 0 ⇐⇒ w = f (v) + Ia (1.20)
dt
dw b
= 0 ⇐⇒ w − v (1.21)
dt γ
1.4 Passive Conduction 15
These curves are called null-isoclines and are also plotted in Figure 1.10a. They
intersect at the only fixed point (0, 0). This fixed point is stable in the sense that
all arrows in a vicinity of it have a component in the direction of the fixed point.
Therefore, the system will approach the fixed point and rest there.
If we add an applied current Ia , the curve dv/dt = 0 is shifted upwards. Depend-
ing on the slope of both curves (dv/dt = 0 and dw/dt = 0), there can be either one
or three intersection points, i.e. points where both derivatives vanish. Figure 1.10b
shows a case where only one such intersection point exists. The parameters are
chosen such that it is the inflection point of the curve dv/dt = 0. For the parame-
ters chosen, the fixed point is not stable. Trajectories starting in a neighborhood of
the fixed point move away from it and enter a limit cycle orbiting the fixed point.
Trajectories starting further out approach the limit cycle from outside. This result
shows that the FitzHugh-Nagumo system can produce oscillating responses when
activated by an applied current. The transition from a stable fixed point to an unsta-
ble fixed point with limit cycle, effected by the change of a parameter, is called a
Hopf-bifurcation.
The qualitative behavior described here has been shown to occur for the Hodgkin-
Huxley system as well, using numerical simulations. Oscillatory responses of ex-
citable cells are known for example from the pacemaker cells in the sinoatrial and
other nodes of the heart; here, the constant incoming current Ia is realized by strong
leakage channels carrying a constant sodium current.
membrane potential and axial currents are independent of fiber length. A summary
of the quantities involved and their dimensions is given in Table 1.2.
The basic relations can be inferred from inspection of Figure 1.12. We start by
noting that the membrane potential will be a function of axon length and time, as
are the specific currents involved. We therefore denote them as V (x,t), ia (x,t), and
im (x,t), respectively. In Figure 1.12, the length variable is discretized, but we will
use a continuous representation involving spatial and temporal derivatives. Follow-
ing Ohm’s law (Figure 1.11a), the axial current ia (x,t)is proportional to the potential
change in axial direction, i.e.
1 ∂V
ia (x,t) = − (x,t), (1.22)
ra ∂ x
where ra is the axial resistance measured in Ω/cm. The curly d (∂ ) denotes partial
derivatives which are needed here since V depends on space and time, whereas the
derivative is taken only with respect to space. We will not elaborate on partial deriva-
tives here; for a mathematical definition see Figure 4.8a and Equations 4.20, 4.21.
The axial current will decrease for larger x since part of it will leak across the
membrane, thus forming a membrane current im . From Kirchhoff’s node rule (Fig-
ure 1.11c), we have:
∂ ia
−im (x,t) = (x,t). (1.23)
∂x
Substituting for ia from Equation 1.22, we obtain
1 ∂ 2V
im (x,t) = (x,t). (1.24)
ra ∂ x2
The membrane current im is composed of two components, a resistive, or leak
current (following Ohm’s law, Figure 1.11a) and a capacitive current (following
Coulomb’s law, Figure 1.11b). We can write
Table 1.2 Quantities involved in the cable equation. Only cylindrical membrane tubes are
considered and all lengths are axial. See also Figure 1.12.
k
' k
$ ' $ ?
V V I2
u
-I R -I C - I1 I3
dV
C =I
V =RI dt ∑k Ik = 0
a. b. c.
V2
Fig. 1.11 Rules for electric net- 'x
{
$
works. a. Ohm’s law: The cur-
rent I through a resistor is
proportional to the voltage V u u
across the resistor. The ratio is ' $
called resistance, R, measured
in Ω (Ohm) = V (Volt) / A
V1 V3
(Ampere). b. Coulomb’s law: x
{
x
{
the change of voltage is pro-
portional to the current flowing
into the capacitor. The ratio C & %
is the capacitance, measured in u u
F (Farad) = C (Coulomb) / V
(Volt). c. Kirchhoff’s node rule: V4
The sum of all currents flowing &x
{
%
into one node is zero. If out- ∑k Vk = 0
d.
bound currents are considered,
the signs have to be changed ac-
cordingly. d. Kirchhoff’s mesh
rule: the sum of all voltages
in a mesh adds to zero, if all
voltages are measured either in
a clockwise or in a counter-
clockwise sense.
1 ∂V
im (x,t) =
V (x,t) + cm (x,t). (1.25)
rm ∂t
We can now equate the above two expressions for im and obtain
rm ∂ 2V ∂V
V (x,t) = (x,t) − rm cm (x,t). (1.26)
ra ∂ x2 ∂t
This is a partial differential equation known as the cable equation. If we assume
that the potential is clamped to Vo at location x = 0, and consider the steady-state
solution (i.e. ∂ V /∂ t = 0), we obtain the simpler, ordinary differential equation8
8 “Ordinary” is here the opposite of “partial”. An ordinary differential equation contains only
ordinary derivatives, as opposed to partial ones. The differential equations studied so far were all
of the ordinary type.
18 1 Excitable Membranes and Neural Conduction
rm d 2V
V (x) = (x) (1.27)
ra dx2
which has the solution
−x
V (x) = Vo exp . (1.28)
rm /ra
inside
ia (x1 ) ia (x2 )
r - r - r r
ra im (x1 ) ra ra
?
r r r r
cm rm cm rm cm rm cm rm
r r r r
r r r r
outside
Fig. 1.12 Electrical circuit for modeling the passive propagation of activity along an axon.
ia axial current, im membrane current, ra axial resistance, rm membrane resistance, cm mem-
brane capacitance.
by Equation 1.28. As it turns out, the approach to this steady state is the slower, the
further away a given membrane patch is from the stimulation site. Overall, however,
the speed is governed by the time constant τ = rm cm .
1 ∂ 2V ∂V
(x,t) = cm (x,t) + ḡK n(t)4 (V (t) − VK ) +
ra ∂ x 2 ∂t
ḡNa m(t)3 h(t) (V (t) − VNa ) + ḡl (V (t) − Vl ). (1.29)
Equations 1.14 – 1.16 remain unchanged, except for the fact that the variables V , n,
m, and h now depend additionally on spatial position x. Equation 1.29 is a partial
differential equation which would be very hard to solve even numerically. One sim-
plification used by Hodgkin and Huxley (1952) uses the fact that action potentials
do not change their shape as they travel along an axon. I.e., if V (x,t) is an action
potential traveling with velocity θ , its shifted version after time Δ t and distance θ Δ t
must look the same: V (x,t) = V (x − θ Δ t,t − Δ t). Thus, V (x,t) can be written as a
one-dimensional function f (x − θ t). Taking the second derivatives of f with respect
to x and t, it is easy to see that the wave equation
∂ 2V 1 ∂ 2V
= (1.30)
∂ x2 θ 2 ∂ t2
must hold. We can therefore reduce Equation 1.29 to the ordinary differential
equation
1 ∂ 2V ∂V
(t) = CM (t) + ḡK n(t)4 (V (t) − VK ) +
ra θ ∂ t
2 2 ∂t
ḡNa m(t)3 h(t) (V (t) − VNa) + ḡl (V (t) − Vl ). (1.31)
Original Papers
Hines, M. L. and Carnevale, N. T.(1997) The NEURON simulation environment.
Neural Computation 9, 1179 – 1209
Important review paper on compartmental modeling, introducing both the basic
concepts and the simulation tools.
Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative description of membrane
current and its application to conduction and excitation in nerve. Journal of
Physiology (London), 117:500 – 544.
Classical account of the mechanism and theory of the action potential, which by
and large describes the state of the art to this day. The present chapter closely
follows this masterpiece of computational neuroscience.
Milescu, L. S., Yamanishi, T., Ptak, K., Mogri, M. Z., and Smith, J. C. (2008).
Real time kinetic modeling of voltage-gated ion channels using dynamic clamp.
Biophysical Journal 95:66 – 87.
This paper combines the “dynamic clamp” (replacement of specific channels
by real-time computed currents) and Markovian modeling of channel switching
dynamics.
Naundorf, B. Wolf, F., and Volgushev, M (2006) Unique features of action potential
initiation in cortical neurons. Nature 440:1060 – 1063.
Action potentials in cortical neurons are shown to rise even steeper than pre-
dicted in the Hodgkin-Huxley theory (initially developed for the squid). The
authors suggest an additional interaction between adjacent sodium channels not
considered in the original theory.
Rall, W. (1962). Electrophysiology of a dendritic neuron model. Biophysical Jour-
nal, 2:145 – 167.
Early and influential paper on the electrodynamics of (passive) conduction and
their dependence on dendritic geometry. Essential for deeper understanding of
dendritic summation and compartmental modeling.
Wilders, R. (2007). Computer modelling of the sinoatrial node. Medical & Biologi-
cal Engineering & Computing, 45:189 – 2007
Review of models of neural activity in pacemaker neurons of the heart. Rhyth-
mic activity can occur in Hodgkin-Huxley systems with high leak currents. The
paper summarizes and compares specific models of the sinoatrial pacemaker
oscillations.
Chapter 2
Receptive Fields and the Specificity of Neuronal
Firing
Abstract. Neuronal firing does not occur at random. In the sensory parts of the
brain, firing is triggered by properties of various input stimuli, such as the position
of a light stimulus in the visual field, the pitch of a tone, or the appearance of a famil-
iar face. In “associative” areas of the brain, specificities for more abstract concepts
have been found including cells representing place (e.g., in rodent hippocampus),
or numerosity (e.g., in primate prefrontal cortex). In the motor parts of the brain,
neurons have been found that fire preferably prior to pointing movements of the
arm into a certain direction. That is to say, these neurons are specific for particu-
lar motor actions. In the sensory domain, specificities are quantified in terms of the
receptive field, which can be defined as the totality of all stimuli driving a given
neuron. The receptive field is measured by correlating the activity of a single neu-
ron with externally measurable parameters of the stimulus. This approach is known
as reverse correlation, since stimuli will always preceed the neuronal activity. The
concept of correlation between neuronal activity and external measurables, however,
generalizes easily to the motor system, leading to the concept of the motor field of
a neuron. In this sense, visual receptive fields can be considered as an example for
neuronal specificity at large. In this chapter, we discuss the basic theory of visual
receptive fields which can be extended to similar concepts in other sensory, motor,
or associative areas. The theory is closely related to linear systems theory applied
to spatio-temporal signals, i.e. image sequences. Mathematically, it rests on integral
equations of the convolution type which will be introduced in due course.
simplest case, a little spot of light is moved across the retina covering just a few
receptors at any one time. Usually, rather than shining the light directly onto the
receptor cells, one will have the animal watch a projection screen onto which the
stimulus is presented. The receptive field is then defined as the portion of the visual
field, from which the firing activity can be modulated by means of the visual stim-
ulus (see Figure 2.1). Mathematically, this is described by a receptive field function
φ (x, y) which for each position of the stimulating spot of light (x,y) specifies the
elicited response of the neuron.
What happens if we use larger spots of light, or, more interestingly, patterned
stimuli? If we just use two spots of light, the simplest idea is that the activities
that are elicited by each of the lights in isolation, are added together. This scheme
is called “linear spatial summation” or “linear superposition”. Let us describe the
stimulus image, i.e. the spatial distribution of light intensity on the screen, by the
function I(x, y) which takes values between 0 (no light, black) and 1 (brightest pos-
sible light, white). A square light spot with intensity 1 covering the interval from
x = 0 to x = 1 and y = 0 to y = 1 can be described by the function
1 if 0 ≤ x < 1 and 0 ≤ y < 1
d(x, y) := . (2.1)
0 otherwise
It is worthwhile verifying this equation with some numerical examples. Indeed, the
idea that functions can be shifted by subtracting appropriate constants in their argu-
ment is important for the understanding of this chapter.
Let us now assume that a light stimulus delivered at location (x1 , y1 ) elicits a
response φ1 of the neuron, and that a light stimulus delivered at location (x2 , y2 )
elicits a response φ2 :
where e denotes the excitation of the neuron, measured in spikes per second. In the
case of linear spatial summation, presenting both spots together will elicit the sum
of the individual responses:
Similarly, if we increase the light intensity of the spot by a factor of λ ∈ IR, a linear
summation predicts a λ -fold output:
e - e - e
e e e
- e - u -
z
r
a. b.
c. d.
0.8 1
0.6
0
0.4 5 5
0.2 -1
0 0
-5 -5
0 0
-5 -5
e. 5
f. 5
Fig. 2.1 Visual receptive fields defined as an experimental protocol. a. The visual field is
scanned line by line with a small spot of light. b. Simultaneously, neural activity is recorded
from a retinal ganglion cell. c., d. Each time the neuron fires, a point is plotted at the location
of the light spot at the time of firing. c. A neuron which is active if the light falls within
a central disc and is inhibited if the light falls in the outer annulus (on-center off-surround
organization). d. A neuron which is inhibited if the light falls within a central disc and is
activated if the light falls in the outer annulus (off center on surround organization). The
receptive field is the total area, from which the neuron’s activity can be modulated. The areas
shown in gray are rough estimates of the excitatory and inhibitory parts of the receptive fields.
e., f. Center-surround receptive field function φ (x, y) for the two neurons depicted in Figures
c, d. The gray areas c, d are delimited by contour lines of φ .
26 2 Receptive Fields and the Specificity of Neuronal Firing
Linear spatial summation is a theoretical concept defined by the above two equa-
tions. It is useful to describe the behavior of neurons as long as the stimulus inten-
sities are not too large. I.e., if one spot of light already suffices to drive the cell into
saturation, additivity will not obtain. Also, since neither light intensities nor spike
rates can take negative values, Equation 2.6 becomes meaningless for negative λ .
However, even in clear non-linear cases, as will be discussed in later sections of
this chapter, linear descriptions are used as first approximations or components of a
more comprehensive model.
where I and J are the total numbers of pixels in the x- and y− direction.
Assume now that we have measured the neuron’s response to an isolated light
spot delivered at each of the pixel locations xi , y j and denote the result by φ (xi , y j ).
Then, by the linear superposition principle, we can construct the response as the
sum of all responses to the individual pixels and obtain
I J
e = ∑ ∑ I(xi , yi )φ (xi , yi ). (2.8)
i=1 j=1
Finally, if we assume very large numbers of pixels n and very small pixels, we may
write
∞
∞
e= I(x , y ) φ (x , y ) dx dy (2.9)
−∞ −∞
where the integral is taken over the entire visual field.
Equation 2.9 is called the correlation of I with φ (or vice versa). The function
φ (x, y) is called the receptive field function or response profile of the neuron. For
each point (x, y) in the visual field, φ (x, y) is the response of the neuron, elicited
from this location. Equation 2.9 is a so-called improper integral which evaluates to
a number (e ∈ IR). It can be thought of as the volume under the two-dimensional
“surface” I(x , y )φ (x , y ) where x , y represent the visual field. The visual field is
thus modeled as an infinite plane over which the integral is taken. Since the receptive
1 The sum sign ∑ (capital Greek letter sigma) is an abbreviation defined as ∑Ii=1 zi := z1 + z2 +
. . . + zI where I ∈ IN is the number of components of the sum. The double sum is needed here to
cover all pixels of a rectangular grid.
2.1 Spatial Summation 27
field function can be assumed to be zero outside a finite “support” region (the recep-
tive field), the existence of the integral is not a problem.2 The name “correlation”
for Equation 2.9 is based on the idea that for each location (x , y ) the two numbers
I(x , y ) and φ (x , y ) form a data pair. If we omit the subtraction of the means and
the normalization with the standard deviations, Equation 2.9 indeed describes the
statistical correlation between the variables I and φ .
For the receptive field function, further names are in use, including kernel and op-
erator. “Kernel” is a general term for a fixed factor (φ ) in an integral equation where
a second factor (I) is considered a variable input function. An operator is a mapping
from a set of input functions (images) to a set of output functions (pattern of neu-
ral activity as are studied below). Since all linear operators can be expressed as an
integral equation with a suitable kernel (Riesz representation theorem of functional
analysis), the term operator is sometimes also used for the kernel.
The transition from the spatially discrete formulation in Equation 2.8 to the con-
tinuous formulation in Equation 2.9 can also be applied to Equation 2.7. In this case,
the pixel-function d(x, y) has to be replaced by the so-called δ - or Dirac3 impulse
which can be thought of as the limit of a sequence of pixel functions with decreasing
pixel size and increasing amplitude, such that the volume remains constantly one. It
is mathematically defined by the relation
∞
δ (x) f (x)dx = f (0) for all functions f . (2.10)
−∞
i.e. correlation with δ (x − x ) “cuts out” the function value at position x. In the
next section, we will introduce the notion of convolution. Eq. 2.11 then says that
convolution with the δ -Pulse does not change the input function f ; i.e., it is the
neural element of the convolution operation.
Strictly speaking, δ (x) is not a proper function, because it does not take a finite
value at x = 0. Mathematically rigorous definitions are provided in the theory of
functionals (functional analysis).
Figure 2.1e,f shows the function φ for typical retinal ganglion cells. It has cir-
cular symmetry and is divided into a center and a surround in which the function
φ takes different signs. Negative values of φ mean that the cell is inhibited when a
stimulus is delivered at that location. In measurements, inhibitory regions show up
by a reduction of the spontaneous activity (cf. Figure 2.1) or by interactions with
∞
2 In functional analysis, all functions f (x) for which −∞ f 2(x)dx exists (i.e., evaluates to a number)
are contained in the Hilbert space L2 . Since the visual field is actually not the infinite plane but
rather a finite subset of the sphere, all continuous functions defined on the visual field satisfy this
condition.
3 Paul A. M. Dirac (1902 – 1984). English physicist. Nobel Price in Physics (with E. Schrödinger)
1933.
28 2 Receptive Fields and the Specificity of Neuronal Firing
input I(x)
6
-
position x
1 1 0 0 0 1 0 0
q q q q q q q q
@ @ @ @ @ @ @ @ @
@ @ @ @ @ @ @ @ @
? @
@ ? @ ? @ ? @ ? @ ? @
? @
? @
0.0 0.5 −0.5 0.0 −0.5 1.0 −0.5 0.0
? ? ? ? ? ? ? ?
output e(x)
6
J
J
@ J
@
-
J
@ @
JJ
position x
a. @ @
φ (x)
6
Q q
Q - @
PP QQ
@
? @
@ ? ? ?
@
?
@ ψ (x)
6
Q
Q -
PP QQ
b. ? c.
Fig. 2.2 Lateral inhibition. a. Input-output relations for a simple example. Links ending in an
arrow are excitatory (weight +1); links ending in a dash (
) symbolize inhibitory connections
(weight −0.5). The input shows a step edge (left) and a contour (right) which are enhanced
in the output. Constant input is attenuated. b. Convergence of activity to one output neuron.
The activity of this neuron is characterized by its receptive field function φ (x). c. Divergence
of activity from one input site. The resulting distribution of activity over the output layer
is the point-spread function ψ (x). The system is shift-invariant, resulting in a close relation
between point-spread-function and receptive field function, i.e. ψ (x) = φ (−x).
from the input site and negative for its neighbors which receive the lateral, inhibitory
connections. In general, we will denote such distributions as e(x, y) where (x, y) is
the location of a neuron and e(x, y) its activity. The particular distribution resulting
from a point-stimulus is called the point spread function or point image; we will de-
note it by ψ (x, y). If we have a homogeneous layout of neurons each with the same
local connectivity pattern, as is the case in Figure 2.2, the point spread functions
for different stimulation sites will be identical up to a shift in position (“translation
invariance”). As in the previous section, we may write:
30 2 Receptive Fields and the Specificity of Neuronal Firing
where ψ is the point spread function for a point stimulus delivered at position
(x, y) = (0, 0).
For general images composed of many point stimuli, we obtain:
I J
e(x, y) = ∑ ∑ I(xi , y j )ψ (x − xi , y − y j ), (2.15)
i=1 j=1
This result is achieved by the substitution x− x =: x and y− y =: y . Now, the intu-
ition is slightly different: while (x, y) is still the position in the output layer, (x , y )
is the stimulus offset relative to the current receptive field center and the integral
is taken over the entire receptive field. The input function I has to be evaluated at
5 The German term “Faltung” is sometimes also found in English texts.
6 Haldan K. Hartline (1903 – 1983). American biologist. Nobel Price in Physiology or Medicine
1967.
2.1 Spatial Summation 31
(x − x , y − y ), i.e. the receptive field center in absolute coordinates ((x, y)) minus
the offset. In the end, this is just a difference in the parameterization of the later
influences, not a different operation. Generally, convolution operations are charac-
terized by the occurrence of the integration variable in both components, but with
opposite signs.
Lateral inhibition was first studied by Ernst Mach7 in relation to a perceptual phe-
nomenon now known as Mach bands. Continuous intensity changes are perceived
as roughly homogeneous areas whereas boundaries between constant intensity areas
and intensity ramps are perceived as bright or dark bands, depending on the direc-
tion of the kink between the plateau and the ramp. Mach suggested that the retina
calculates local second derivatives with respect to space, resulting in an excitation
pattern of the form 2
∂ I ∂ 2I
e(x, y) = c + . (2.18)
∂ x2 ∂ y2
The term in brackets, i.e. the sum of second partial derivatives, is known as the
“Laplacian”8 of the function I(x, y) and closely related to the convolution with a cen-
ter surround system. Indeed, the network depicted in Figure 2.2 can be considered
as calculating a spatially discretized approximation of a (one-dimensional) second
derivative. If the intensity function is equidistantly sampled to values I1 , I2 , I3 , ..., the
local derivatives can be approximated as I1 = I2 − I1 and I2 = I3 − I2. The second
derivative is then approximated as
which is exactly the operation of Figure 2.2 up to a factor −1/2. The analogy be-
tween the formulations of lateral inhibition by the convolution integral or the partial
differential equation (2.18) is reflected by modeling center surround receptive fields
as Laplacians of Gaussians.
By comparing Equation 2.16 with Equation 2.20 for all possible images I, we find
that
φ (x , y ) = ψ (−x , −y ), (2.21)
i.e. the point-spread function and the receptive field function are mirrored versions
of each other. Note that this result holds only for translation-invariant systems. In
this case, the point-spread function describes the divergence in a network and the
receptive field functions the convergence in the same network. Clearly, divergence
and convergence have the same origin, i.e. lateral connectivity (cf. Figure 2.2b,c).
If translation-invariance does not obtain, i.e., if receptive fields of neighboring
neurons differ in systematic ways, point-spread function and receptive field function
will also differ systematically (see Mallot et al. 1990). Figure 2.3 shows a projec-
tion between two retinotopically organized neural layers with varying magnification
factor. If an input neuron is connected to many output neurons, point-spread func-
tions will be large and receptive fields will be small (upper part of Figure 2.3b).
Vice versa, if an input neuron projects only to a few output neurons, point-spread
functions will be small and receptive fields large. Similarly, large visual areas such
as V1 tend to have small receptive fields and large point-spread functions, whereas
small areas such as MT tend to have large receptive fields and small point-spread
functions.
xo
yo
xo yo
S C S C
Fig. 2.3 Receptive fields (shaded) and point images (hatched) in space-invariant (left) and
space-variant (right) projections between a sensory surface S and a central representation C.
The receptive field of a unit yo ∈ C is a part of the sensory surface. The point image of a point
xo ∈ S is a part of the central representation. In the space-invariant case (left; convolution),
receptive field and point image do not change over space. In the space-variant case (right)
magnified regions (upper part) have large point images and small receptive fields, whereas
reduced regions (lower part) have small point images and large receptive fields.
2.1 Spatial Summation 33
The function w(x, y,t) is called the spatio-temporal receptive field profile.
Note that space and time are treated differently in this equation. Temporally, we
have a convolution, reflecting the fact that the neuron will be influenced by the tem-
poral evolution of the stimulus. In space, we simply sum over the entire receptive
field. A spatial convolution would be required only if a uniform layer of neurons
with identical, but space-shifted receptive fields were to be modeled. The inner in-
tegral in Equation 2.22 is taken over positive values of t only. This is in contrast
to spatial convolution, where x and y varied between + and −∞. The reason for
this restriction is of course that only stimuli occuring before the time of measure-
ment can influence the excitation. This constraint is called “causality”. It can also be
34 2 Receptive Fields and the Specificity of Neuronal Firing
accounted for by setting the values of w(x, y,t ) to 0 for all values t < 0, meaning
that an input event cannot have effects in the past. With this convention, the integral
may be taken from −∞ to ∞.
An important special case of Equation 2.22 is obtained if we consider a purely
spatial summation, followed by a temporal summation process applied only to the
result of the spatial process. In this case, the spatio-temporal receptive field function
can be split up into two parts, a spatial profile φ (x, y) and a temporal weighting
function g(t):
w(x, y,t) = φ (x, y)g(t). (2.23)
The contribution of a stimulus delivered at time t −t to the neuron’s activity at time t
is given by a function g(t ) where t specifies how much time has passed between the
delivery of the stimulus and the instant t at which response is measured. Generally,
g(t ) will be small for large delays t and maximal for small or intermediate values
of t . Spatio-temporal kernels that can be split up in this way into a spatial and a
temporal factor are called “separable”.
The PSTH can be calculated as the pointwise average of this function over all m
stimulus presentation cycles,
1 m
PSTH(t) = ∑ pi (t).
m i=1
(2.25)
Assume now that the neuron is stimulated by a spatio-temporal pattern I(x, y,t) last-
ing for some duration T . After each presentation, an inter-stimulus-interval (ISI)
TISI is included during which the presentation screen is left blank. We need to as-
sume that the ISI is long enough for the response to decay to zero, i.e. w(x, y,t) = 0
2.2 Functional Descriptions of Receptive Fields 35
for t > TISI . In this situation, the time course of the neuronal response e(t) as cal-
culated from Equation 2.22 is the linear prediction of the PSTH.
The PSTH of a given neuron will of course be different for different stimuli. In
many experiments, the stimuli used can be thought of as a parameterized family
of functions, with a parameter specifying the orientation of a bar or grating, the
speed of a moving bar, the length or a bar etc. For each value of the parameter, a
PSTH e p (t) results where p denotes the parameter. From this PSTH, we calculate
a characteristic number such as the total number of spikes, the maximal spike rate,
etc. The dependence of this number on the parameter is known as the tuning curve
of the neuron for the stimulus parameter considered. For the maximal spike rate, we
obtain
ρ (p) := max e p (t), (2.26)
t
where the Greek letter ρ (rho) denotes the tuning curve. Alternatively, tuning may
be defined as the total number of spikes elicited with each parameter setting,
tmax
ρ (p) := e p (t)dt. (2.27)
0
Clearly, peri-stimulus time histogram and tuning curve are measuring protocols
which remain to be useful and well-defined if the linear model of the receptive field
presented here fails.
The factor 1/(2πσ 2) has been chosen such that the integral over the entire function
equals unity. The only free variable is the scale σ which determines the width of the
bell-shaped surface (see Figure 2.7b). Receptive fields differing in scale will react
to stimulus features of different size or granularity. Contour lines of the Gaussian
are curves satisfying the relation (x2 + y2 )/2σ 2 = const, i.e. they are circles. In par-
ticular, the contour line at elevation e−1/2 φ (0, 0) has radius σ . In one-dimensional
versions, i.e. sections along the coordinate axes, the arguments ±σ mark the inflec-
tion points of the bell-shaped curve.
Adding a temporal component to the receptive field function, we choose the func-
tion g(t) from Equation 2.22 to be
36 2 Receptive Fields and the Specificity of Neuronal Firing
-1
0 20 40 60 80 100
a. t
1 1 1
0.5 0.5 0.5
0 4 0 4 0 4
-0.5 2 -0.5 2 -0.5 2
-1 -1 -1
-4 0 -4 0 -4 0
-2 -2 -2
-2 -2 -2
0 0 0
2 2 2
b. to = 0 c. to = 3 d. to = 6
-4 -4 -4
4 4 4
1 1 1
0.5 0.5 0.5
0 4 0 4 0 4
-0.5 2 -0.5 2 -0.5 2
-1 -1 -1
-4 0 -4 0 -4 0
-2 -2 -2
-2 -2 -2
0 0 0
2 2 2
e. to = 12 4
-4
f. to = 24 4
-4
g. to = 48 4
-4
t t
g(t) = exp(− ). (2.29)
τ τ
Here, τ is called a time constant. The function is 0 for t = 0, rises to its maximum
at t = τ and declines asymptotically to zero for larger t. If the neuron reacts fast,
it will have a short time constant, while slow responses may be modeled by larger
time constants.
The standard model of the center-surround type receptive field of retinal ganglion
cells can now be formulated:
2 2
−t/τ1 x + y2 −t/τ2 x + y2
w(x, y,t) = c1te exp − − c2te exp − . (2.30)
2σ12 2σ22
It is a difference of two Gaussians with unequal width, where each Gaussian is mul-
tiplied with a temporal summation function. For an on-center, off-surround ganglion
cell, we may choose an excitatory mechanism with small width σ1 and small time
constant τ1 and an inhibitory mechanism with larger width σ2 > σ1 and larger time
constant τ2 > τ1 . A function with these specifications is illustrated in Figure 2.5.
Leaving the temporal component out, Equation 2.30 reduces to a so-called “differ-
ence of Gaussians” (DoG) or “Mexican hat” function plotted in Figure 2.1e,f.
2.2 Functional Descriptions of Receptive Fields 37
1
σ = 0.5 1
σ = 1.0
ω =π ω =π
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1
σ = 0.5 1
σ = 1.0
ω = 2π ω = 2π
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Fig. 2.6 One-dimensional Gabor functions (Equation 2.31, 2.32) for various choices of scale
σ and frequency ω . In each plot, the even (left-right mirror symmetric) heavy line is the
cosinusoidal Gabor function, the odd (point symmetric heavy line) is the sinusoidal Gabor
function. The dashed lines show the enveloping Gaussians, Note that the top right and bottom
left plots are identical up to a horizontal compression. This reflects the fact that in these cases
the product σ ω is the same. It is proportional to the number of “side lobes” included in the
function.
a. c. e.
yy
xy
y
ω
b.
d. f.
Fig. 2.7 a. Plane wave f (x, y) = cos (ωx x + ωy y). The vectors on top show the coordinate
axes (x,y) and the wave direction ω = (ωx , ωy ). The unlabeled vector is the orthogonal di-
rection (−ωy , ωx ) along which f is constant. b. Gaussian function ϕ (x, y) = exp{−(x2 +
y2 )/2σ 2 }. c. Cosine Gabor function obtained by multiplying f and ϕ . d. Same as c, shown
as contour plot. e. Sine Gabor function obtained by phase-shifting f by a quarter cycle and
multiplying it by ϕ . f. Same as e, shown as contour plot.
x2
gc (x) := cos(ω x) exp − 2 (2.31)
2σ
x2
gs (x) := sin(ω x) exp − 2 . (2.32)
2σ
The cosinusoidal Gabor function is “even”, i.e., it satisfies the condition that gc (x) ≡
gc (−x); the sinusoidal Gabor function is odd, gs (x) ≡ −gs (−x) (cf. Figure 2.6). As
before, σ determines the width of the Gaussian window while ω is the frequency
of the underlying sinusoidal; since gc and gs are functions of a spatial variable, ω
is called a spatial frequency. Gabor functions are also sometimes called wavelets,
because of their local wave-shaped look.
Orientation is added when the Gabor function is extended to two dimensions. To
see this, we consider first the “plane wave”
with the highest frequency is (ωx , ωy ), i.e. orthogonal to the wave-front. The spatial
frequency of the two-dimensional function f (x, y) is the vector (ωx , ωy )
. As in the
one-dimensional case, the Gabor function is obtained by multiplying the sinusoidal
with a Gaussian, see Figure 2.7b. If the Gaussian is centered on a wave peak or trough,
the result will be a symmetric, cosine Gabor function (Figure 2.7c,d); if the Gaussian
is centered on a flank, an odd, or sine Gabor function will result (Figure 2.7e,f). In
principle, Gabor functions with arbitrary phase relation can be generated by shifting
the Gaussian with respect to the plane wave. The equations read:
2
x + y2
gc (x, y) := cos(ωx x + ωy y) exp − (2.34)
2σ 2
2
x + y2
gs (x, y) := sin(ωx x + ωy y) exp − . (2.35)
2σ 2
Gabor functions are characterized by the following parameters which model the
major specificities found in neurons in the primary visual cortex:
1. In Equation 2.35, the Gabor functions are localized at position (x, y) = (0, 0).
Shift terms (xo , yo ) can be added to shift function to arbitrary positions.
2. The scale σ determines the overall width of the receptive field.
3. The preferred orientation is given by the ratio ωy /ωx . If expressed as an angle
from the x-axis, orientation becomes tan−1 (ωx /ωy ).
4. The preferred spatial frequency is the frequency of a grating atany orientation
driving the cell most strongly. This frequency is determined by ωx2 + ωy2. Spa-
tial frequency preference is sometimes referred to as “localization in the spatial
frequency domain”.
5. The number of side lobes is determined by the product of the overall spatial
frequency and scale.
We call the new variables ξ , ζ , θ (read xi, zeta, and theta) and obtain (Koenderink
1988):
(ξ − ξo)2 + (ζ − ζ0 )2 + (ln(−θ ) − ln(−θo ))2
w(ξ , ζ , θ ) = exp for θ , θo < 0.
2σ 2
(2.37)
θo < 0 is the moment in the past, onto which the temporal window is centered. The
temporal window will be asymmetric, compressed in the forward direction (between
θo and 0) and expanded in the backward direction. Its overall size will shrink the
closer θo is to 0, i.e., the present moment.
The drifting grating of Equation 2.36 is then multiplied with the spatio-temporal
window function to generate a spatio-temporal Gabor function.
e = Φ (I), (2.38)
where I = I(x, y) is the stimulus (an image) and Φ is a general mapping from the
set of all images to the set of possible excitations e of the neuron. In mathematics,
a mapping from a function set into a number set is called a functional, the corre-
sponding discipline of mathematics is called functional analysis10. A functional Φ
is said to be linear if the following two requirements are satisfied (see also Equa-
tion 2.5, 2.6):
1. The response to the sum of two stimuli (I(x, y), J(x, y) is the sum of the individual
responses:
Φ (I + J) = Φ (I) + φ (J). (2.39)
Inserting from Equation 2.9, we obtain
i.e. this condition is obtained from the distributive law and the linearity of the
integral.
2. The response to a multiple of the stimulus is the multiple of the response:
i.e. this condition holds due to the associative law of multiplication and the lin-
earity of the integral.
These two properties taken together are known as the superposition principle. In
words, it states that in linear systems, responses to individual stimuli superimpose,
or add up, if the stimuli are presented together.
If I is taken to be the image intensity, i.e. the physical power (irradiance) im-
pinging per unit area on the retinal receptors, the response of a neuron can never be
strictly linear. Consider for example a stimulus leading to some excitation e. Linear-
ity requires that the neuron would react to the same stimulus, amplified by a factor of
10 If e is also considered a function, as was the case in the discussion of lateral inhibition, Φ in
Equation 2.38 becomes a mapping from a set of functions (input images) into another set of
functions (pattern of neural excitation). Such mappings are called operators. The definition of
linearity is extended to operators in a straight-forward way.
42 2 Receptive Fields and the Specificity of Neuronal Firing
Fig. 2.8 An important class of non-linear system can be described as a cascade of a linear
system, followed by a static non-linearity. In neuroscience, L–NL-cascades are considered as
a coarse model of dendritic summation followed by spike initiation at the axon hillock.
λ , with a λ -fold activity λ e. Clearly, this will work only for small positive values of
λ , since the activity of the neuron is limited and cannot become negative. Linearity,
therefore, is often a good model as long as small stimuli (or small deviations from
some standard stimulus) are considered but usually fails for large signal amplitudes.
Even then, however, the linear case is always considered as a first approximation of
the problem.
Note that non-linearity and translation-invariance (see Figure 2.3) are indepen-
dent concepts. A system may be linear but not translation invariant, or vice versa. In
either case, however, it cannot be described by convolution. Indeed, it can be proven
that all linear, translation-invariant systems are convolution systems.
u(t) = I(x, y,t − t ) w(x, y,t ) dx dy dt (2.44)
u
6
-
t
a.
e
f1 (u) 6
6
- - - -
u t
b.
e
f2 (u) 6
6
- - - -
θ u t
c.
e
f3 (u) 6
6
- - - -
uo u t
d.
e
f4 (u) 6
6
- - - -
−λ /2 u t
e.
e
f5 (u) 6
6
- - - -
u t
f.
Fig. 2.9 Static non-linearity applied to a sinusoidal. a. Sinusoidal used as an input in all
cases. b. Half-wave rectification. The boxed plot shows the static non-linearity function f (u),
the right part the effect on the sinusoidal input, i.e. f (sin u). Half-wave rectification deletes
all negative values and passes the positive values unchanged. c. Binary switching function
(Heaviside function) with threshold θ . d. Saturation function with half-saturation value uo .
e. Sigmoidal with slope λ . The value f (0) = 1/2 is the spontaneous activity of a neuron.
f. Squaring non-linearity doubles the frequency of a sinusoidal input (since sin2 t = 0.5 −
0.5 cos 2t). It plays an important role in so-called energy models of cortical complex cells
(see below).
0 if u ≤ 0
f1 (u) := , (2.45)
u if u > 0
Figure 2.9c shows the function f2 (u − θ ) where θ is the threshold. Note that the
threshold is modeled by a subtraction in the argument of f . This can be done with
all static non-linearities. In the case of a sinusoidal input, the threshold modulates
the ’duty-cycle’ of the binary output, i.e. the relative length of zero and non-zero
sections.
Saturation is often modeled by the equation
0 if u ≤ 0
f3 (u) := (2.47)
u+uo if u > 0
u
(Figure 2.9d). The constant uo determines the slope of the function; it can be thought
of as the input value where f3 generates the output value 0.5. Equation 2.47 has
been used to model the non-linearity of photoreceptors. In this case, uo can be used
to adjust the curve to the state of light adaptation (“Naka-Rushton non-linearity”).
The static non-linearity used most frequently in neural network modeling is the
sigmoidal (Figure 2.9e). It can be formalized in various ways, for example:
eλ u
f4 (u) = , λ > 0. (2.48)
eλ u + e− λ u
The parameter λ is the slope of the sigmoidal at u = 0. The value f4 (0) models the
spontaneous activity of the neuron. In the formulation given, f4 (0) = 1/2 for all
λ . If other spontaneous activities are desired, the function can be shifted along the
u-axis.
At first glance, the squaring non-linearity
f5 (u) = u2 (2.49)
does not seem particularly plausible as a model of neural transmission. It turns out,
however, to be very useful in the so-called energy models of complex cells which
will be discussed below. There, it can be thought of as a rectifying square (i.e.,
f (u) = 0 for u < 0 and f (u) = u2 for u ≥ 0) applied separately to an on and an off
channel processing the same signal. The sum of the two outputs can be calculated
faster by applying the two-sided squaring function to the linearly filtered input.
a. b. c. d.
Fig. 2.10 Different types of edges and receptive fields. In each frame, four receptive fields of
the simple type are shown. Black ellipses indicate inhibitory regions, white ellipses excitatory
regions. The two fields to the left have odd symmetry, the two to the right even symmetry. For
each symmetry type, the two possible polarities are also shown. The receptive field responsive
in each case is highlighted by a white circle. a., b.: Step edges. Only odd symmetric fields
of the appropriate polarity will respond. c., d.: Contour edges. Only even symmetric fields of
the appropriate polarity will respond.
where ecpx is the excitation of the complex neuron, gc and gs are the even and
odd Gabor functions, and ⊗ denotes the correlation (Equation 2.9). The static non-
linearity f could also be modeled as simple half-wave rectification, but the squaring
gives better results. Instead of writing down the on- and off-Gabor fields ±gc and
±gs separately, we observe I ⊗ (−gs,c ) = −(I ⊗ gs,c) and obtain
ecpx = (I ⊗ gs )2 + (I ⊗ gc )2 . (2.52)
Equation 2.52 is called the energy model of the cortical complex cell (Adelson &
Bergen, 1985), for a schematic depiction see Figure 2.12c. The term “energy” is not
to be taken literally, but simply reflects an analogy from the theory of alternate cur-
rent, where the sum of squared sine and cosine components is an electrical energy. It
is also related to the notion of a “signal power” introduced in the following chapter.
Although in Equation 2.52, only two receptive field functions are considered,
the formulation of Equation 2.51 is biologically more plausible, since it takes into
account the non-negativity of neuronal spike rates. In the final model, on- and off-
profiles which individually are subject to a rectifying non-linearity are combined to
a channel which behaves completely linear.
Figure 2.11 shows the energy model in more detail. The top row shows the four
edge types (two phases, two polarities). These edges are meant to appear on top of
some constant background, such that negative values are no problem. As before, we
assume that no responses to constant inputs occur. The left column shows the recep-
tive field profiles of the four types of simple cells, ±gc (x), ±gs (x). Each bar plot in
the center shows the simple cell activity ±gs,c ⊗ Δ I, where the center bar refers to a
neuron exactly centered at the edge, while the flanking bars refer to receptive fields
somewhat beside the edge. The bottom row shows the activity of an energy neuron.
Note that the response is the same for all four edge types.
The removal of phase and polarity information does not depend on the use of
Gabor functions but can be obtained with any so-called quadrature pair of receptive
field functions. A general quadrature pair consists of an odd function fo and an even
√ fe and satisfies the condition that the complex function fe + i fo , where
function
i = −1 is the complex unit, must be analytical, i.e. differentiable in the complex
plane. Or, to state this another way, the Hilbert transform must convert the even
function into the odd and vice versa. In fact, the odd/even Gabor functions only
approximately satisfy this condition.
g (x)
6c
-
x
simple unit x−1 xo x1 x−1 xo x1 x−1 xo x1 x−1 xo x1
−g (x)
6c
-
x
simple unit x−1 xo x1 x−1 xo x1 x−1 xo x1 x−1 xo x1
g (x)
6s
-
x
simple unit x−1 xo x1 x−1 xo x1 x−1 xo x1 x−1 xo x1
−g (x)
6s
-
x
simple unit x−1 xo x1 x−1 xo x1 x−1 xo x1 x−1 xo x1
6
-
x
complex unit x−1 xo x1 x−1 xo x1 x−1 xo x1 x−1 xo x1
Fig. 2.11 Energy model of visual neurons of the “complex” type. The top row shows from
left to right four local intensity profiles corresponding to a bright contour on dark ground,
a dark contour on bright ground, a step edge from a dark to a bright image region, and a
step edge from a bright to a darker image region. The leftmost column shows receptive field
functions of four simple cells modeled as Gabor function with even or odd phase (sine or
cosine) and positive or negative polarity. The histograms above the bar show the responses of
three such cells with receptive fields centered at x−1 , xo , and x1 , respectively. The row below
the bar shows a complex cell summing up the squared outputs of each of the simple cells.
Note that the histograms for all four input profiles are identical, indicating the complex cell’s
invariance to edge polarity and phase.
• Cortical neurons of the “complex” type are invariant to phase and polarity, but
share with simple cells the specificities for orientation, spatial frequency, and
scale. The invariance requires a non-linearity which is modeled by the energy
model (Figure 2.12 e).
2.4 Motion Detection 49
a. b.
e.
c. d.
Fig. 2.12 Visual receptive fields. a, b. Isotropic (rotationally symmetric) types found in
retina, LGN. a. On-center off-surround. b. Off-center on-surround. c, d. Oriented simple cells
of the visual cortex c. Even (cosine) Gabor-function d. Odd (sine) Gabor-function e. Cortical
complex neuron responsive to “contrast energy”.
- @u
r - - - - @u
r - - -
@ @
Δx -
Δx -
I(t) I(t − Δ t) I(t) I(t − Δ t)
∗δ (t−τ ) ∗h(t)
j j
⊗
a. ? b. ?
Fig. 2.13 a. Simple motion detector consisting of two input lines of which one is delayed by
the time lag τ , and a threshold. The delay is shown as a temporal convolution with a displaced
δ -pulse. The stimulus is moving from left to right with a velocity v = Δ x/Δ t, leading to the
inputs I(t) and I(t − Δ t) at the two receptors. If the delay τ matches Δ t, the activations in both
lines will reach the threshold unit simultaneously. The threshold has to be adjusted such that
it is passed only by coincident inputs on both lines. The detector is specific for direction and
speed. b. In the correlation, or Reichardt detector, the delay is replaced by a temporal lowpass
filter and the threshold is replaced by a multiplication and subsequent averaging over time.
The latter two steps realize a cross-correlation between the direct and the low-passed input
line with offset zero (symbol ⊗). The complete Reichardt detector consists of two mirror
symmetric circuits of the type shown, whose outputs are subtracted to give signed motion
output.
and if the time of travel matches the built-in delay, both inputs will arrive simul-
taneously and the threshold may be passed. If, however, the stimulus moves in the
opposite direction or with the wrong speed, both inputs arrive at different times and
the threshold will not be passed. This principle has been suggested as a model of
motion selectivity in rabbit retinal ganglion cells (Barlow & Levick 1965).
In order to prevent the circuit from responding to constant white input on both
lines, an appropriate preprocessing has to added such as a differentiation in time and
maybe a Difference-of-Gaussians operator in space.
Note that the delay operation is a linear, shift-invariant operation which can be
expressed as a convolution. Its impulse response is a delayed impulse, δ (t − Δ t).
Using the definition of the δ pulse, Equation 2.10, we may write
where x̄ and ȳ are the average values of x and y, and var(x) and var(y) are the respec-
tive variances. In signal theory, it is customary to neglect the averaged values (i.e.,
f (t)
6 f (t)
6
- -
t t
r - r r - r -
τ τ τ r τ
r - r τ
r - τ
r - r
Φ f f (τ )
6 g(t)
6
-
t
-
τ
Fig. 2.14 Left: The auto-correlation for a function f and an offset τ is calculated by evaluat-
ing the function at all pairs of points with separation τ . Of the resulting pairs ( f (t), f (t + τ )
for all possible positions t), “correlation” is calculated by multiplying the values in each pair
and summing the results. Since this procedure can be repeated for each interval τ , a so-called
auto-correlation function results which is denoted Φ f f (τ ). A typical auto-correlation func-
tion is shown in the lower part of the picture. Right: The same procedure can be applied to
two different functions, in which case the result is called cross-correlation, Φ f g (τ ).
assume x̄ = ȳ = 0) and omit the division by the variances.14 For a time dependent
signal f (t), we can then consider the correlation of a value occurring at time t with
a value occurring at time t + τ for some fixed “offset” τ . For various times t, the
data pairs ( f (t), f (t + τ )) form a set of paired variables, for which correlation can
be calculated. Since we have an infinite set of data pairs indexed by the variable t,
the sum in Equation 2.56 is to be replaced by an integral and we obtain
∞
Φ f f (τ ) = f (t) f (t + τ )dt. (2.57)
−∞
14 Without the normalization, the operation is actually more like a covariance, defined as
cov(x, y) = (1/n) ∑(xi − x̄)(yi − ȳ). In some texts, the auto- and cross-correlation functions are
therefore called auto- and cross-covariance, respectively. We will not adopt this terminology
here.
2.4 Motion Detection 53
∞
Φ f g (τ ) = f (t)g(t + τ )dt. (2.58)
−∞
This is to say, the output of the motion detector is given by convolving the auto-
correlation function of the input with the impulse response h and evaluating the
result at τ . If the image is a white noise pattern, i.e. a noise pattern where adjacent
pixels are uncorrelated, its auto-correlation function is a δ -pulse, ΦII (τ ) = δ (τ ) and
the tuning curve of the motion detector will be ρ (v) = h(Δ x/v).
Ri
∑
0 t 0 t
a. b. ? c. ?
Fig. 2.15 Motion selective receptive fields. a. A hypothetical neuron with three equally
spaced input sites, each with a different temporal delay (rectangular boxes). b. (x,t)-diagram
of a stimulus flashing subsequently at the three input positions of the neuron. With the shown
timing and the delays shown in a., all activities arrive simultaneously at the unit. c. (x,t)-
diagram of a spatio-temporal Gabor function (Equation 2.35 modeling the receptive field
of a motion selective simple cell. The orientation in the space-time plot corresponds to the
preferred velocity.
a receptive field function, it describes the response behavior of a simple cell tuned
for motion (rightwards), scale, spatial frequency, location, phase (cosinusoidal), and
polarity. Spatial orientation cannot be shown in the figure due to the restriction to
two dimensions, but is also included in the mathematical model.
For motion selective complex cells, the energy model (Section 2.3.4) is applied
analogously. It is built on two simple cells, one with sinusoidal and one with cos-
inusoidal phase, but with identical value for all other parameters. The outputs are
squared and summed up as in shown in Figure 2.12d above.
Original Papers
Adelson, E.H. and Bergen, J.R. (1985). Spatiotemporal energy models for the de-
tection of motion. Journal of the Optical Society of America A 2:284-299
Fundamental account of visual motion as spatio-temporal orientation. Derives
suitable spatio-temporal filters for motion detection and suggests biologically
plausible version constructed from spatio-temporally separable filters.
Itti, L. and Koch, C. (2000) A saliency-based search mechanism for overt and covert
shifts of visual attention. Vision Research 40:1489 – 1506
Saliency of visual field locations is defined using the overall output of a large
battery of visual filters corresponding to the so-called perceptual dimensions
orientation, granularity, motion, and color. The model is thus an application of
the receptive field theory explained in this chapter. It is still one of the standard
approaches to focal attention.
Mach, E. (1865) Über die Wirkung der räumlichen Vertheilung des Lichtreizes
auf die Netzhaut. Sitzungsberichte der mathematisch-naturwissenschaftlichen
Classe der kaiserlichen Akademie der Wissenschaften Wien 52/2, p.303 – 322.
An English translation (On the effect of the spatial distribution of the light stim-
ulus on the retina) appears in F. Ratliff, Mach Bands: Quantitative Studies on
Neural Networks in the Retina, Holden-Day, San Francisco, 1965.
This century-old paper is still recommended reading for its clear and clever ar-
gument as well as for its general discussion of psychophysics and the relation
between neural mechanisms and experienced perceptions. It introduces the ef-
fect now known as Mach bands.
Marr, D. and Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal
Society (London) B, 207:187 – 217.
This paper intuitively shows how center-surround mechanisms support early
visual processes. At the same time it gives a comprehensive account of the
Difference-of Gaussian and Laplacian-of-Gaussians operators. Together with a
series of similar treatments of other visual processing steps covered in D. Marr’s
book (see above), it belong to the foundations of the field of computational vision.
Olshausen, B. and Field, D. (1996a). Emergence of simple–cell receptive field prop-
erties by learning a sparse code for natural images. Nature, 381:607 – 609.
This paper shows how receptive field organization in the visual cortex can be
derived from optimality criteria introduced on the statistical considerations such
as sparseness of coding, redundancy reduction etc.
Srinivasan, M. V., Laughlin, S. B., and Dubs, A. (1982) Predictive coding: a fresh
view of inhibition in the retina. Proceedings of the Royal Society London B,
216:427 – 459.
Center-surround receptive field organization is shown to be optimal in the sense
that redundancies in visual coding are removed. Diameters of center and sur-
round parts are derived from the average auto-correlation function of the visual
input.
Chapter 3
Fourier Analysis for Neuroscientists
1.0
0
0.5
−1
0
0 500 1000 1500 2000 2500 300 400 500 600 700 800
x [nm] λ [nm]
Fig. 3.1 a. Distribution of electric field strength in two primary colors (blue and red, dashed
lines) and in the resulting mixture (magenta, solid line). b. Spectra of the pure colors red and
blue. The spectrum of the mixture includes both lines.
3.1 Examples
3.1.1 Light Spectra
The notion of a spectrum in the physics of light is closely related to the ideas of
Fourier1 transform. Light is an electromagnetic wave which can be described by the
electric and magnetic fields as a function of space and time. Consider a beam of
coherent light in direction x. At a given instant in time, the corresponding fields are
smooth functions of spatial position x. In pure, or “spectral” colors, both the electric
and the magnetic field strength oscillate according to a sinusoidal function of x, gen-
erally specified by its wavelength λ . Figure 3.1a shows the electric field distribution
of a blue (short wavelength, high frequency) and a red (long wavelength, low fre-
quency) light (dashed lines). If we superimpose both lights, we will experience the
color magenta, or purple. The electric field distribution is the sum of the according
distributions of the two basic colors red and blue (Figure 3.1a, solid line). Note that
this distribution is no longer a sinusoidal; it need not be periodic at all. Still, it can
be thought of as being composed of two sinusoidal components with different fre-
quencies. Figure 3.1b shows the spectral representation of the same lights. Here, for
each wavelength λ the amplitude of the sinusoidal with this particular wavelength
is shown. For the primaries red and blue, these are line-spectra, i.e. the spectrum is
zero except for the wavelengths λr = 650 nm and λb = 370 nm. The spectrum of the
purple light is the sum of the two line spectra.
3.1.2 Acoustics
In acoustics, time-dependent sound pressure is measured by a microphone and can
be visualized with an oscilloscope. For example, when playing the tone e’ on a treble
recorder, the oscilloscope will show a fairly clean sinusoidal wave with frequency
1 Jean Baptiste Joseph Fourier (1768 – 1830). French mathematician and physicist.
3.1 Examples 59
b’
(fifth)
g’
(major third)
g’
(minor third)
e’
(tonic)
0 10 20 30 40
a. time [milliseconds]
e’-g’-b’
(major triad)
e’-g’-b’
(minor triad)
0 10 20 30 40
b. time [milliseconds] c.
Fig. 3.2 Time- and frequency signals in music. a. Sound pressure of a pure tone at ωo =
330Hz, i.e. e’ (tonic), the minor third g’ (ω /ωo = 6/5), the major third g’ (ω /ωo = 5/4),
and the fifth b’ (ω /ωo = 3/2). The frequency ratios correspond to the just, or pure intonation.
The phases of the signals have been randomized. b. Sound pressure for the e-minor and e-
major triads e’-g’-b’ and e’-g’-b’. c. Musical notation for the same two triads. The staff and
clef define a reference frame for pitch which is akin to a logarithmic frequency scale.
330 Hz. Figure 3.2a shows the time-dependent signal for the tone e’ together with
the signals for the tones g’, g’, and b’, all of which are sinusoidal waves differ-
ing in frequency. If these tones are played together, the sound pressure signals add
up to the pattern shown in Figure 3.2b. In music, these pattern correspond to two
well-known chords, called the e-minor and e-major triads. When such chords are
played and reach the ear of a listener, the cochlea of the inner ear will decompose
the complex time signals of Figure 3.2b into the individual tones they are composed
of. Mathematically, this operation corresponds to a Fourier decomposition of the
60 3 Fourier Analysis for Neuroscientists
compound signal resulting in an acoustic spectrum with peaks at the respective fre-
quencies. The musical notation shown in Figure 3.2c represents this spectrum by
the individual note symbols.
When playing the same tone e’ on a piano, the microphone signal will again be a
periodic function with frequency 330 Hz, but the shape of the wave will be less sinu-
soidal. Representing the sound pressure function as a frequency spectrum will now
result in a peak at 330 Hz plus additional peaks at some or all integer multiples of
330 Hz. These multiples are known as harmonics. The pattern of harmonics gener-
ates the “timbre” of the sound, i.e. the differences between productions of the same
tone by various musical instruments or the voices of different singers. Complex pat-
tern of harmonics known as “formants” make the differences between vowels such
as an /a/ or an /ee/ sung with the same pitch. The pitch of the tone corresponds to its
fundamental frequency, of which the harmonics are multiples.
Complex acoustical events such as uttered speech or bird song are not easily
decomposed into tones, but may be represented as spectrograms (or sonograms).
Consider first a tune, i.e. a sequence of tones over time. In the spectrogram, the
signal will be decomposed into temporal sections (with some overlap defined by
a window function). Next, the Fourier transform within each temporal window is
computed and the result is plotted as a column of gray-values in a coordinate system
spanned by time (the center time of each window) and frequency. As a result, a line
will appear in the sonogram going up for high pitches and down as pitch is lowered.
In speech or bird song, many frequencies will be present simultaneously, resulting
in more complicated sonograms.
I6 T -
Imax
I¯
Imin
-
x
Fig. 3.3 Sinusoidal grating of the form I(x, y) = I¯+ 0.5Δ I sin(2π x/T ), where I¯ = 0.5(Imax +
Imin ). Such gratings with various period lengths T and contrasts are used for measuring of the
acuity of vision.
3.1 Examples 61
3.1.3 Vision
The representation of signals by sinusoidals and frequencies is rather intuitive in
the cases of colors and tones. In images, i.e. two-dimensional distributions of inten-
sity values, frequency representations seem to be much less intuitive at first glance.
However, properties such as resolution, acuity, or granularity of an image are closely
related to a frequency variable. Indeed, the well-known JPEG-encoding of images
rests on the discrete cosine transform (DCT), a close relative of the Fourier trans-
form. In this representation, homogeneous image regions with little intensity varia-
tion are represented by a coarse grating and thus need less storage space than image
regions with fine-grain contrasts.
Figure 3.3 shows a sinusoidal intensity variation over an image coordinate x.
Gratings are characterized by a wavelength, or alternatively by a (spatial) frequency.
Since image intensities cannot be negative, gratings will always be offset from zero
on the intensity axis. Instead of characterizing mean and amplitude, the strength of
modulation is often defined as the so-called Michelson2 contrast,
Imax − Imin
contrast := . (3.1)
Imax + Imin
In the two-dimensional case, the sinusoidals become plane waves, i.e. functions
with a sinusoidal modulation in one direction and zero variation in the orthogonal
direction (cf. Figure 2.7a). Images may then be decomposed into two-dimensional
gratings of various frequencies. Table 3.1 shows icons of two-dimensional gratings
with various combinations of spatial frequencies. Superposition of gratings amounts
to a pixelwise summation of intensities. Fourier theory posits that any image can be
generated by superimposing gratings with variable amplitude and phase (i.e., posi-
tional shift). Since the set of gratings is a two-dimensional manifold, the spectrum
becomes a two-dimensional function specifying the amplitude for each component
grating.
Table 3.1 Two-dimensional gratings of the form I(x, y) = sin(2π (ωx x + ωy y)) for some in-
teger values of ωx and ωy .
ωx = −1 ωx = 0 ωx = 1 ωx = 2 ωx = 3
ωy = −1
ωy = 0
ωy = 1
ωy = 2
ωy = 3
only the protons at a particular z-location will satisfy the resonance condition. These
protons fill a plane perpendicular to the axial direction. After the activation, another
DC gradient field is applied, say along the x-coordinate. This field component is
called a read-out gradient. The resonance signals emitted by the activated protons
have frequencies proportional to the total locally “perceived” field strength. Signals
emitted from voxels with different x-coordinate will therefore have different fre-
quencies. Since all signals are overlayed in the recording, the different frequency
components will have to be isolated in order to find the contributions from voxels at
a particular x-position. This is done with the Fourier transform. The remaining prob-
lem, then, is that all voxels with a given x-coordinate, corresponding to one Fourier
component of the signal, form a line in the slice, extending along the y-coordinate.
To localize signals also in the y-direction, measurements with other read-out gradi-
ents must be performed and combined. Also in this step, the Fourier transform can
be useful.
3.2 Why Are Sinusoidals Special? 63
3 The term “eigenfunction” is based on the German word “eigen” which means “own”. It expresses
the fact that eigenfunctions and eigenvalues characterize the operator.
64 3 Fourier Analysis for Neuroscientists
∞
( f ∗ sinω )(x) = f (x ) sin(ω (x − x )) dx (3.4)
−∞
where x is an auxiliary variable which cancels out as the integral is computed (cf.
Equation 2.16). Although in Equation 3.4 we use the spatial variable x, the argument
works just as well in the temporal domain if causality is taken care of by setting
f (t ) = 0 for t < 0.
It is important for the following argument, that the difference x − x appears in
the sine-term, even though the integral remains the same if we exchange the roles of
f and the sine (commutativity of convolution).
Applying the addition theorem sin(α − β ) = sin α cos β − cos α sin β , we obtain:
We may now split up the integral in two (distributive law) and move the terms not
depending on x out of the integrals, which are taken over x :
( f ∗ sinω )(x) = sin ω x f (x ) cos ω x dx − cos ω x f (x ) sin ω x dx
= f˜c sin ω x − f˜s cos ω x. (3.6)
After moving the variable x out of the integrals, the remaining integrals evaluate to
constants, for which we introduced the names
Thus, we have shown that the convolution of a sine with frequency ω with any
mask is a weighted sum of a sine and a cosine with the same frequency ω . A linear,
translation-invariant system cannot change the frequency of a signal.
Equation 3.6 is not yet the full answer to the eigenfunction problem, since the
response to a sine is a weighted sum of a sine and a cosine. We can get one step
closer if we observe that such sums of a sine and a cosine can be written as a sine
with a phase shift. This is in fact a special case of the addition theorem stated above.
We introduce the new variables A and φ , called amplitude and phase:
f˜s
A := f˜c2 + f˜s2 , φ := tan−1 . (3.9)
f˜c
A geometrical interpretation of the quantities f˜c , f˜s , A, and φ is given in Figure 3.5.
With these variables, we obtain
( f ∗ sinω )(x) = f˜c2 + f˜s2 (cos φ sin ω x − sin φ cos ω x) (3.10)
= A sin(ω x − φ ). (3.11)
3.2 Why Are Sinusoidals Special? 65
φI
f˜c
zr1 + z2
−z∗ r ib 6 rz 6
z1 z2r
i |z|
BB 2 r zr 1
i z
)
Yarg z B Y
- B o -
−1 a 1 −1 1
−i −i
r r
−z z∗
a. b.
6 exp{iφo } 6
i sin φo sr sin φo sr
cos φo sr
Y
φo - -
cos φo φo φ
c. d.
z := a + ib (3.13)
a complex number with real part a and imaginary part b (cf Figure 3.6a). It can
be shown that every algebraic equation (i.e. equation of the form ∑nk=0 ak xk = 0)
has exactly n solutions in the set of complex numbers, where multiple solutions are
counted accordingly. In the set of real numbers, it may have anything between 0 and
n solutions.
We consider a few basic properties of complex numbers (see Figure 3.6). For a
complex number z = a + ib, the real number
|z| := a2 + b2 = (a + ib)(a − ib) (3.14)
is called the absolute value or modulus of z. For a complex number z = a + ib, the
number z∗ := a − ib is called its complex conjugate. With this notation, we may
write Equation 3.14 as |z|2 = zz∗ .
The counterclockwise angle between the real axis and the “vector” z is called the
argument, of phase of z:
⎧
⎨ tan−1 b/a for a>0
argz := π + tan−1 b/a for a < 0, b ≥ 0 . (3.15)
⎩
−π + tan−1 b/a for a < 0, b < 0
With some computations, it can be shown that arg(z1 z2 ) = arg z1 +argz2 and |z1 z2 | =
|z1 ||z2 | (cf Figure 3.6b). This is to say, multiplication with a number z amounts to a
rotation by arg z and a scaling by |z|.
This latter result, i.e. expressing multiplication as an addition of the arg-values
and a multiplication of the moduli, is reminiscent of a property of the exponential
function, i.e. ex ey = ex+y . Indeed, a complex exponential function can be defined
based on the famous Euler4 formula:
Two illustrations of Euler’s formula are given in Figure 3.6c, d. Figure 3.6c is called
the polar (or “phasor”) representation of a complex number, i.e. its representation
by modulus and phase:
z = |z| exp{i arg z} (3.20)
Figure 3.6d illustrates the relation of the complex exponentials to sinusoidals. As the
phasor rotates about the origin in the complex plane (i.e. as the phase φ increases),
the real and imaginary parts of the complex number describe a cosine and a sine
wave.
As a final remark on complex numbers, we note two inverted versions of Euler’s
formula that allow to go back from the complex to the real notation:
1 iϕ
cos ϕ = (e + e−iϕ ) (3.21)
2
1
sin ϕ = (eiϕ − e−iϕ ). (3.22)
2i
The eigenvalue associated with the eigenfunction exp{iω x} is, then, f˜(ω ). The val-
ues f˜c and f˜s from Section 3.2.1 are the real and imaginary parts of f˜,
where A = | f˜|.
The function f˜(ω ) describes for each frequency ω the action that the system
exerts on an input signal exp iω x. In the output, the frequency itself is not changed,
but the amplitude of each frequency component is multiplied by | f˜(ω )| and its phase
is shifted by arg f˜(ω ). The function f˜(ω ) is therefore called the modulation transfer
function (MTF) of the convolution system. As we shall see later, its relation to the
point-spread function f in Equation 3.24 is already the definition of the Fourier
transform.
x2
f (x) = exp{− }. (3.26)
2σ 2
The modulation transfer function f˜ for this system is:
x2
f˜(ω ) = exp{− } exp{−iω x} dx. (3.27)
2σ 2
Next, we collect all exponents in one exponential function and add the term
−σ 4 ω 2 + σ 4 ω 2 = 0 in the exponent. We can then move a term not depending on x
outside the integral and are left inside with a completed square expression to which
the binomial rule can be applied:
1
f˜(ω ) = exp{− (x2 − 2iσ 2 ω x − σ 4ω 2 + σ 4 ω 2 )} dx (3.28)
2σ 2
1 1
= exp{− σ 2 ω 2 } exp{− 2 (x − iσ 2 ω )2 } dx. (3.29)
2 2σ
3.2 Why Are Sinusoidals Special? 69
√
Next, we substitute y = x − iσ 2 ω in the integral and note that exp{−y2 }dy = π
even for complex arguments5. We thus obtain
1 y2
f˜(ω ) = exp{− σ 2 ω 2 } exp{− 2 } dy (3.30)
2 2σ
√ 1 2 2
= 2πσ exp{− σ ω }. (3.31)
2
This is a Gaussian with width 1/σ . This result is also known as “uncertainty rela-
tion”: the product of the width values of the point spread function and the MTF, σ
and 1/σ , is a constant.
We now use Euler’s formula in the form of Equation 3.21 to write the result in
real notation. Note that this is the standard way to interpret complex number results:
1
( f ∗ cosω )(x) = ( f ∗ expω )(x) + ( f ∗ exp−ω )(x) (3.32)
2
1
= ( f˜(ω )eiω x + f˜(−ω )e−iω x ) (3.33)
2
1
= exp{− σ 2 ω 2 } cos ω x. (3.34)
2
The latter equality is due to the fact that the MTF is real and symmetric. Thus,
convolution with a Gaussian does not change the phase of the signal. This result is
true for all real, symmetric MTFs. For the sine-function, we obtain analogously
1
( f ∗ sinω )(x) = exp{− σ 2 ω 2 } sin ω x. (3.35)
2
Equations 3.31 to 3.35 show three things. First, the value of f˜(ω ) for a Gaussian
kernel is real and symmetric. Therefore, convolving with a Gaussian does not in-
volve a phase shift. In our blurred projection example, this means that the blur leads
to a symmetric smear of the point input, but not to a displacement. Second, the
amplification factor decreases as the frequency ω of the filtered pattern increases.
That is to say, the system will leave low spatial frequencies (coarse pattern) largely
unchanged, but will remove high spatial frequencies (fine detail). This behavior is
called low-pass. Of course, this behavior is very intuitive for the blurred projection
example given above. If spatial gratings of the type shown in Table 3.1 are used,
blur will leave the coarse pattern unchanged, but wash out the fine pattern. Third,
the width of the MTF, i.e. the “speed” with which f˜(ω ) decreases for higher fre-
quencies, depends on the width of the filter, σ . The wider the filter mask, the faster
does the amplification factor decrease, i.e. the stronger or more pronounced is the
low-pass property.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
a. b.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
c. d.
Fig. 3.7 Approximation of the “box function” (Equation 3.36 with ω = 1) by a Fourier-sine-
series (Equation 3.39). The box function is shown as a heavy gray line. In each panel, the two
thin lines show the current approximation gk (x) and the next correction term ak+2 sin(k +
2)ν x for k = 1, 3, 5, 7. For further explanations see text.
Figure 3.7 shows this box function (heavy gray line) together with two sinusoidals.
The first one has the same frequency as the box function itself; we call this frequency
the fundamental frequency ω f = 1/T . In the sequel, we will use the “wave number”
ν = 2π /T for the sake of simplicity. The sinusoidal thus has the form
where the amplitude b1 is chosen such that the quadratic difference between the box
function and the approximation is minimized. The factor ao = 1/2 is called
the DC,
or direct current part; it is the average of the box function, ao = 1/T oT b(x)dx. If
we consider the difference between b(x) and g1 (x), we note that it changes sign
three times as often as the fitting sine function. We can therefore arrive at a better
approximation of the box function by adding a second sine with frequency 3ν ; it
appears in the lower part of figure. (The fact that the frequency 2ν does not con-
tribute to better approximation is an accidental property of the box function.) Fig-
ure 3.7b shows the least square approximation of the box function by the function
g3 (x) = g1 (x) + b3 sin(3ν x). The approximation is better, but again deviates from
the desired shape by an oscillatory function, this time with frequency 5ν . We can
repeat this procedure of stepwise improvements and eventually obtain
a procedure for determining the coefficients for arbitrary functions will be presented
below.
We call series of the type shown in Equation 3.39 (including also cosine val-
ues) “trigonometric polynomials” or Fourier series (see also Equation 3.46 below).
Figure 3.7 shows the first steps of that series, i.e. the functions g1 (x)
72 3 Fourier Analysis for Neuroscientists
(Figure 3.7a) through g7 (x) (Figure 3.7d). For each x satisfying mod (x, T ) = 0.5
and mod (x, T ) = 1, i.e. for each x where b(x) is continuous, the series converges6
towards the true functional value:
Figure 3.8 shows a more general case where the sinusoidals needed to reconstruct
the signal have different phases. This can be seen by checking the value at x = 1 of
the correction term (lower sinusoid in each panel); while this is zero for all frequen-
cies in Figure 3.7, the value now changes from panel to panel. In the equation, the
phase shift is accounted for by adding a sine and a cosine term for each frequency,
each with its own coefficient ak and bk , respectively:
n n
gn (x) = ao + ∑ ak cos kν x + ∑ bk sin kν x. (3.42)
k=1 k=1
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
a. b.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
c. d.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
e. f.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
g. h.
1.2
b(x) 0.6
ao
1 •
0.8
0.5 bk
0.4
0.6
0.3
0.4
0.2 0.2
0 0.1
-0.2 0
0 0.5 1 1.5 2 0 5 10 15 20
a. −→ x b. −→ k
1.2
— (b ∗ f )(x) 0.6
ao f˜(0)
1 •
0.8
0.5
bk f˜(kν )
0.6
0.4
— 1/2 f˜(kν )
0.3
0.4
0.2 0.2
0 0.1
-0.2 0
0 0.5 1 1.5 2 0 5 10 15 20
c. −→ x d. −→ k
1.25
— (b ∗ g)(x)
0.5
ao g̃(0)
1
•
bk g̃(kν )
0.4
0.75
0.5
0.3 — 1/2g̃(kν )
0.25 0.2
0
0.1
-0.25
0
0 0.5 1 1.5 2 0 5 10 15 20
e. −→ x f. −→ k
Fig. 3.9 Illustration of the convolution theorem. a. Box function (Equation 3.36 with ω = 1)
b. Coefficients of the according Fourier-series (Equation 3.39). c. Low-pass filtered version of
the box-function, using a Gaussian filter kernel(b ∗ f ). d. Fourier-coefficients for the filtered
signal. The amplification factor f˜(ω ) (Fourier transform of the Gaussian kernel) is shown
in the background (not to scale). The coefficients are obtained by multiplying the original
coefficients (Figure b.) with the Gaussian. e High-pass filtered version of the box function
obtained by convolution with g := δ − f . f Fourier coefficients of the high-pass filtered signal.
The Fourier-transform of the filter kernel (g̃ = 1 − f˜) is shown in gray (not to scale).
This is a new Fourier series with the unchanged DC component 1/2 and the new
coefficients b∗k = bk f˜(kν ). This equation also illustrates why f˜(ω ) is called a mod-
ulation transfer function: For each “modulation (i.e. sinusiodal), it specifies a fac-
tor with which this sinusoidal is transferred through the linear translation-invariant
3.3 Fourier Decomposition: Basic Theory 75
system. Figure 3.9d illustrates this result for the box function with T = 1 and a
Gaussian blur with σ = 0.2.
Figure 3.9 shows that by convolving the box function with a Gaussian, its high
frequency components are strongly attenuated while the low frequency components
are passed largely unchanged. This behavior is known as low-pass. It amounts to
a smoothing, or blur of the input function. An ideal low-pass is a step function
(Heaviside function) in the frequency domain, which however is hard to realize as a
convolution in the space- or time- domains.
If the smoothed box function is subtracted from the original one, the high spatial
frequency components that were removed in smoothing, stand out in the difference
(Figure 3.9e). Since the box function itself can be thought of as the convolution of
the dot function with the Dirac δ impulse (Equation 2.11), we can formally write
this difference as a convolution with a new kernel g defined as g := δ − f . It will pass
the high frequencies and remove the low ones, and is therefore called a high-pass. In
the frequency domain, this particular high-pass is characterized by the coefficients
g̃(kν ) = 1 − f˜(kν ) (Figure 3.9f).
This example is an instant of the convolution theorem, the central result of this
chapter. It states that the convolution of a signal with a point spread function can
be replaced in the Fourier domain by the multiplication of the Fourier transform of
the signal and the modulation transfer function, and that this MTF is the Fourier
transform of the point-spread function (see also Figure 3.11) below.
Note that the DC-component has been incorporated into the first sum, keeping in
mind that cos(0ν x) = 1 for all ν and x. In Equation 3.46 ν is again the fundamental
frequency of the signal, i.e. pn repeats itself with a wave-length of T = 2π /ν . Note
that this is true even if a1 = b1 = 0, i.e. if the fundamental frequency itself is missing
in the signal. An example of such a “missing fundamental” stimulus is obtained from
the box function by subtracting its first (fundamental) Fourier component.
How can we find the coefficients ak , bk ? If we assume that every continuous peri-
odic function can in fact be written as a trigonometric polynomial,7 we can find the
coefficients by exploiting the so-called orthogonality relations of sinusoids which
hold for all k, l > 0:
7 The prove of this assumption is the mathematically hard part of Fourier theory. We do not prove
it here.
76 3 Fourier Analysis for Neuroscientists
2π
π if k = l
sin kx sin lx dx = (3.47)
0 0 if k = l
2π
π if k = l
cos kx cos lx dx = (3.48)
0 0 if k = l
2π
sin kx cos lx dx = 0. (3.49)
0
Geometrical motivations for these relations can be found by plotting the involved
functions for selected values of k and l 8 . With these relations, we obtain:
2 T
ak = p(x) cos kν x dx; k ∈ {0, 1, 2, . . .} (3.50)
T 0
2 T
bk = p(x) sin kν x dx; k ∈ {1, 2, 3, . . .}. (3.51)
T 0
These latter results are proven by substituting for p(x) from Equation 3.46 and ap-
plying the distributive law until the orthogonality terms appear for each element of
the sum. All of these vanish except for the one where k = l which evaluates to π .
We call ak the Fourier-sine coefficient for the frequency kν and bk the Fourier-
cosine coefficient for the frequency kν .
In the complex notation, we proceed in the same way, using the orthogonality
relation
2π
2π if k = l
exp{ikx} exp{−ilx}dx = . (3.52)
0 0 if k = l
The coefficients are obtained from
T
2
ck = g(x) exp{−ikν x}dx; k ∈ {. . . , −2, −1, 0, 1, 2, . . .}. (3.53)
T 0
Note that this time, we have to consider coefficients with negative index number.
The resulting series reads
n
pn (x) := ∑ ck exp{ikν x}. (3.54)
k=−n
Clearly, the coefficients of the real and complex notations are related. The equation
can be calculated from Euler’s formula. They read
⎧1
⎨ 2 (a j − ib j ) for j > 0
c j = 12 (a j + ib j ) for j < 0 . (3.55)
⎩
ao for j = 0
2kπ
ω = kν = ∈ {1, 2, 3, 4, 5, . . .}. (3.56)
T
If the period is twice as long, T = 4π , we obtain ν = 1/2 and
2kπ 1 3 5
ω = kν = ∈ { , 1, , 2, , . . .}. (3.57)
T 2 2 2
In the end, if the period is infinite (i.e. if the function is no more periodic at all), we
get a “coefficient” for every value of ω , i.e. a function of frequency. Switching back
to the complex notation, we thus obtain the Fourier transform:
∞
g̃(ω ) := g(x) exp{−iω x}dx. (3.58)
−∞
Equation 3.58 is called the Fourier forward transform and Equation 3.59 the Fourier
backward transform. Applying both in a sequence reconstructs the original function
as long as this was continuous and its square was integrable.
Note that the equations for Fourier forward and backward transform are almost
identical, up to a sign in the exponential and a normalizing factor, which can be split
symmetrically between the two transform equations. Applying the forward trans-
form twice results in a mirror image of the original function, applying it four times
reproduces the original.
78 3 Fourier Analysis for Neuroscientists
The Fourier transform of a periodic function does not produce the discrete coef-
ficients of the Fourier series, but rather a sum of δ -functions at the locations of the
multiples of the fundamental frequencies, weighted by the coefficients. In particu-
lar, the Fourier transform of a sine of frequency ωo is (δ (ω − ωo ) − δ (ω + ωo ))/2i.
Thus, Fourier series and Fourier transform are two different operations. In practi-
cal applications, where numerical versions of the Fourier transform are used, this
difference is of minor relevance.
The function g̃(ω ) in Equation 3.58 is a complex function of the real variable ω
(cf. Figure 3.10). By Euler’s formula (Equation 3.19) the complex number g̃(ωo ) =
g̃c (ωo ) + ig̃s (ωo ) for each ωo gives the amplitude and phase of the component with
spatial frequency ωo . If only the spatial frequencies present in a pattern are to be
considered, one often uses the so-called power spectrum of g, i.e. the square of
the absolute value (modulus) of ω̃ . A famous theorem in Fourier theory (Wiener9-
Khinchin10 theorem) states that the power spectrum equals the Fourier transform of
the auto-correlation function introduced in Equation 2.57. In formal notation, we
may write
Φ̃gg (ω ) = |g̃(ω )|2 = g̃g̃∗ . (3.60)
sin(ωx x + ωy y) = sin(
ω ·x) (3.61)
Im f˜(ω ) Im f˜(ω )
a. Re f˜(ω ) ω b. Re f˜(ω ) ω
as are shown in Table 3.1. The argument of the sine function is an inner product (see
below, Section 4.1.3) of a frequency vector (ωx , ωy , ...) and a vector of the original
coordinates (x, y, ...). Time and temporal frequency may be treated as just another
component of these vectors. In two dimensions, these plane waves look like corru-
gated surfaces or wash-boards whose contour lines form a set of parallel straight
lines. The orientation of these contour lines is orthogonal
to the vector (ωx , ωy ), the
separation of wave peaks (wave length) is 2π / ωx2 + ωy2 (cf. Figure 2.7a).
The Fourier transform then becomes a complex function of two or more real
frequency variables:
∞
∞
g̃(ωx , ωy ) := g(x, y) exp{i(ωx x + ωy y)}dxdy. (3.62)
−∞ −∞
Each point in the frequency plane (ωx , ωy ) corresponds to one plane wave. An intu-
ition for this frequency plane may be obtained from Table 3.1: each cell in this table
represents a two-dimensional frequency vector for which the associated grating is
shown.
1
backward: g(x) := g̃(ω ) exp{iω x}d ω . (3.64)
2π
g̃(ω is a complex number which can be decomposed into a sine and a cosine com-
ponent via Euler’s formula. These components are called Fourier sine and Fourier
cosine components, respectively. Intuitively, Equation 3.63 therefore means that
every continuous function can be represented as the sum of sine and cosine func-
tions.
2. In the set of functions, an infinite-dimensional coordinate system can be intro-
duced by the basis “functions” δ (x − y) for each value of y. If the function is
sampled and the list of values is treated as a vector, the canonical basis (i.e.,
basis vectors (0, ..., 0, 1, 0..., 0)
) is an approximation of this basis. The Fourier
transform can then be considered a coordinate transform with the new basis func-
tions exp{iω x}. The orthogonality constraint guarantees that this new basis is or-
thonormal. Since the
length of a vectordoes not depend on the coordinate system
used, the relation f 2 (x)dx = (1/2π ) f˜(ω )d ω holds (Parseval’s identity).
3. Convolution theorem: If the original function is replaced by its Fourier transform,
then convolution is replaced by multiplication:
6 Fourier 6 transform 6
? ? ?
multiplication with
frequency
f˜(ω ) - × g̃(ω ) - h̃(ω )
domain modulation transfer function
Fig. 3.11 Summary of the relation of Fourier transform and linear systems theory, i.e. the
convolution theorem.
Original Papers
Daugman, J. (1980) Two-dimensional spectral analysis of cortical receptive field
profiles. Vision Research 20:847 – 856
Extends spatial frequency models of the receptive field to the two-dimensional
case and gives clean definitions of orientation and frequency specificity.
Hendrikson, L., Nurminen, L., Hyvärinen, L., and Vanni, S. (2008) Spatial fre-
quency tuning in human retinotopic visual areas. Journal of Vision 8(10):5,1-13
Spatial frequency tuning in human visual cortex is studied based on fMRI
data. The paper shows differences between different areas and gives a useful
overview of earlier findings on spatial frequency tuning.
Jones, J. P. and Palmer, L. A. (1987) An evaluation of the two-dimensional Gabor
filter model of simple receptive fields in cat striate cortex. Journal of Neuro-
physiology 58:1233 – 1258
Describes detailed fits of cortical receptive field profiles by Gabor function with
suitable parameters. Additionally, spatial frequency specificities are measured
with grating stimuli (see next chapter) and the results are compared to predic-
tions derived from the Gabor fits.
Chapter 4
Artificial Neural Networks
3. The topology of the network is the pattern of connectivity between the neurons
involved. It is generally described by a weight matrix, but higher level descrip-
tions such as feed-forward vs. feed-back or layered vs. completely connected
are also used. We will assume the overall topology fixed, but dynamic changes
such as the creation of new neurons are also studied in the literature (“growing
networks”).
If ei ∈ IR, the set of all possible activity vectors forms a vector space with dimension
I, IRI . Note that I may be much larger than three, rendering geometric interpretations
of this vector space difficult. In the sequel, we will occasionally give geometric in-
terpretations for I = 2 or I = 3. For the general case, e should be thought of as an
ordered list without trying to imagine I-dimensional spaces. Note also that, follow-
ing standard conventions, vectors are always considered to be columns of numbers.
If we need to consider rows, we use the “transposition” symbol (e
or e ) which
works in both directions, i.e., (e
)
=e.
The vector e is also called a state-vector of the neural network since it contains
the information about the current activity state. In time-discrete models, an upper
index t is attached to the state vector which is then written as e t .
If the neurons have spatial coordinates, as in the cases studied in previous chap-
ters,e becomes a function e of space and time and each “neuron” i is identified with
a point (xi , yi ), leading to the interpretation
Equation 4.2 relates discrete neural network theory to layers of continuous activity
functions as were studied in convolution systems, Equation 2.16.
4.1 Elements of Neural Networks 85
which takes the complete activity vector of the network at time t as its input and
the activity of neuron i at time t + 1 as its output. The activation function is usually
described by synaptic weights wi j and a static non-linearity f :
J
αi (e) = f ( ∑ wi j e j ). (4.4)
j=1
Examples of static non-linearities which are also used in neural networks have been
given in Section 2.3.2. The weighted sum forming the argument of the static non-
linearity, ∑ wi j e j , is called the “potential” of cell i and denoted by ui .
Here, we use the convention that the weights wi j are indexed with the number
of the post-synaptic cell (i) and the pre-synaptic cell ( j). In more detail, one might
read “weight of input received by i from j.” If a unit k does not receive input from
unit l, the weight wkl is set to zero. Thus, the set of all weights also determines the
connectivity pattern or topology of the network.
Note that except for the non-linearity f , Equation 4.4 is analogous to the correla-
tion Equation 2.9 if we discretize the input plane (x j , y j ) and consider the image in-
tensities I(x j , y j ) as the presynaptic inputs e j (e.g., activities of receptor cells). The
corresponding values of the receptive field function φ (x j , y j ) become the weights
wo j of the sole output unit eo . The differences between the two equations are (i)
that in Equation 2.9 we have identified neurons with their spatial coordinates (x , y )
which implies a dense, continuous layout and hence an integration operation over
space (cf. Equation 4.2), and (ii) that the receptive field function φ in Equation 2.9
e1 Q
e2~
|
H Q s w
activation function, α
w21 HH Q i1
* B Y H HH
w32
H
j e2 P Pq Q
~
| B 23 He3~ |
B M
B PwP i2 Q f
e1 w
P Q 6
k
Q B w - i3 P Σ
w ui- - - ei
Q w42 B B
24 e3
Q B
w14 B
Q NB B w34
..
Q
QeB4~
| . w
3
a. b. eJ
iJ
Fig. 4.1 a. A simple neural network consisting of four units (“neurons”) with activities
e1 , . . . , e4 and links with transmission weights wi j . For explanations see text. b. Simple model
neuron for artificial neural networks. e j : Input activities, wi j synaptic weights, Σ summation,
ui potential (ui = Σi=1 J w s ), f static non-linearity, e Output activity (e = f (u )).
ij j i i
86 4 Artificial Neural Networks
is not modeling just one synaptic layer, but describes the entire preprocessing from
the retina to the neuron in question.
The dot product is also known as the inner product, in which case it is often written
as w
i e, or as scalar product to indicate that the result is a scalar (i.e. a number as
opposed to a vector). It has various geometric interpretations, which are helpful to
understand neural network theory.
Norm of a Vector
First, the length, or “norm” of a vector can be defined as
x := ∑ x2i = (x ·x), (4.6)
i
Thus, if the dot product of two vectors is zero, the angle included by the vectors
is ±90◦. The relation of the dot product to correlation will be explained later (see
Equation 4.16).
Projection
Finally, the dot product is related to the notion of a projection. If an arbitrary vector
(x1 , x2 , ...xn ) is multiplied with a coordinate vector (0, . . . , 0, 1, 0, . . . 0), where the
only 1 appears at position j, the result is x j , i.e., the component of the vector in
direction of the j-th coordinate vector. In general, the dot product of two vectors
can be considered as the length of the component of one vector in the direction of
the other (see Fig. 4.2). Clearly, for orthogonal vectors, this length is 0.
Note that each component of the output vector e t+1 j is a dot product of the j-th row
of the matrix and the input vector e t . Weights such as w12 which are missing in the
figure are simply set to zero. The matrix of all weights is a square matrix and will
be denoted by W . We may then write Equation 4.9 shorter as:
88 4 Artificial Neural Networks
e t+1 = W e t . (4.10)
Fig. 4.3 Classical conditioning and the Hebb synapse. US
H
CS: conditioned stimulus, US: unconditioned stimulus, HjH w0
H
R: Response, w0 , w1 : synaptic weights. Before learning, HH
a response R is elicited only by the unconditioned stim- R -
ulus US. If CS and US occur together, w1 will be rein- w1
forced since CS and R are both active. Eventually, CS *
can drive unit R by its own. CS
1 Donald O. Hebb (1904 – 1985). Canadian psychologist.
4.2 Classification 89
s2 6
@@@@@@@@@@@@@@@@@@@ s1 s2 s1 s2
@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@−1 R +1 1
R 1
@@@@@@@@@@@@@@@@@@@ ∑ 2
∑
2
@@@@@@@@@@@@@@@@@@ @
@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@ ?@@ ?
@@@@@@@@@@@@@@@@@@@ average
@@@@@@@@@@@@brighter
@@@@@@@ brightness
@@@@@@@@@@@on@ @@@@@@@
right
> 0.5
-
s1
a. b. c.
Fig. 4.4 Simple perceptrons with only two input lines. a. Feature space. b. Perceptron de-
tecting patterns with the characteristic “brightness increasing to the right”. c. Perceptron de-
tecting patterns whose average intensity is above 0.5.
4.2 Classification
4.2.1 The Perceptron
As already pointed out in Section 2.1.1, the function of neurons is often seen in
recognizing or detecting things, or “pattern”. Since all inputs to a neuron arrive
through its synapses, such pattern are eventually vectors of input activities. The idea
is that neurons implement logical predicates (i.e. statements that are either true or
false) about this stimulus vector. The value “TRUE” is signaled by the neuron taking
the activity 1, the value “FALSE” by the activity 0. A simple example is shown in
Fig. 4.4. Consider a neuron with just two inputs, let’s say from light sensitive recep-
tors. How can we build a neuron that responds if the right receptor receives more
light than the left receptor? Simply enough, we can use an excitatory connection
from the right receptor and an inhibitory connection from the left (Fig. 4.4b). If we
apply a step non-linearity to the weighted input sum, the cell will be active if and
only if the right sensor receives more light.
For the understanding of the perceptron, and indeed for pattern recognition in
general, the notion of a feature space is crucial. In our example, the feature space
consists of all possible pairs of light intensities (s1 , s2 ) that might be delivered to the
two sensors. This is equivalent to the positive quadrant of the real two-dimensional
plane (Fig. 4.4a). Each possible pattern corresponds to a point in feature space and
the perceptron will react with activity 1 to patterns taken from a certain subset of the
feature space; other patterns will elicit no response. In the example of Figure 4.4b,
all light intensity pairs where the right pixel is brighter fall in a triangular area
above and left of the diagonal s1 = s2 . Figure 4.4c shows a second example of
a perceptron with two inputs calculating the predicate “average brightness above
90 4 Artificial Neural Networks
0.5.” All pattern satisfying this condition are right and above the line s1 + s2 > 1
also shown in Figure 4.4a.
In general, feature spaces have a separate dimension for each input line of the
perceptron. If one considers the retina as the input layer, the number of dimensions
becomes the number of ganglion cells which in humans is in the order of 106 .
In terms of our neural network models, we can model a perceptron as a single
unit with inputs s1 , ..., sN , forming an input vector s, and with weights w1 , ..., wN ,
forming a weight vector w. The weighted sum of the inputs is passed through a step
non-linearity with threshold θ
J
u= ∑ w js j = (w ·s) (4.11)
j=1
0 if u ≤ θ
e = f (u) := . (4.12)
1 if u > θ
The perceptrons in Fig. 4.4b, c have w = (−1, 1)
with θ = 0, and w = (0.5, 0.5)
with θ = 0.5, respectively. Note that the weight vector w has the same dimension as
the stimulus vector and can therefore be interpreted as a vector in feature space.
s2 6
p pp
pp
Fig. 4.5 The perceptron classifies feature
u<0
pp
0
vectors to one side of the linear hyper-
u=
plane (dotted line) (w ·s) = θ in feature p
p pp s •
◦
space. The distance at which this hyper- p
plane passes the origin is θ /w. An input pp
vector s is projected orthogonally onto the p pp u>1
pp
p -
weight vector direction, yielding the poten-
H
H p p p s1
tial u.
θ
H p
pp H
w
p H H
j
H
p p u w
u = 0-point on the weight vector line. Mathematically, this plane is still defined by
the equation (w ·s) = θ which is also known as the normal form of a plane equation.
Indeed, this logic generalizes to an arbitrary number of dimensions. The decision
boundary in a J-dimensional feature space will always be a linear subspace with
dimension J − 1, cutting the feature space into two parts. Such subspaces are called
“hyperplanes”.
Optimal Stimulus
On inspection of Fig. 4.4 b, it can be seen that the perceptron detecting a brightness
increase to the right has a weight vector w = (−1, 1)
which, when interpreted
as an input image, also corresponds to a rightwards increase in brightness. This
observation reflects a general principle, i.e. that perceptrons are “matched filters”
detecting patterns similar to their weight vector. In the same logic, we discussed
center-surround receptive fields with an inhibitory center and excitatory surround as
“bug detector” in Section 2.1.1 above.
Mathematically, we can ask which pattern will lead to the strongest potential in a
perceptron. Since the calculation of the potential is a linear operation and therefore
is doubled by doubling input intensity, this question is only meaningful if we nor-
malize the input pattern, i.e. consider only stimuli s satisfying s = 1. In this case,
we call
s ∗ := argmax(w ·s) (4.13)
s
the “optimal stimulus” of the perceptron. From the interpretation of the dot product
as a projection, it is clear that the maximum is reached if s and w point in the same
direction,
w
s ∗ = . (4.14)
w
Formally, this result is proven in the so-called Cauchy-Schwarz-inequality, (a ·b) ≤
ab. The situation is completely analogous to the discussion of optimal stimuli
and matched filters in Section 2.1.1. In fact, if we consider continuous layers of
neurons (Equation 4.2), the resulting feature space will have an infinite number
92 4 Artificial Neural Networks
of dimensions, one for every point of the continuous plane. In this situation, the
dot-product is no longer a discrete sum, but an integral, and in fact the correlation
integral defined in Equation 2.9.
The idea of optimal stimuli can also be related to the statistical notion of covari-
ance. Formally, we can calculate the covariance of s and w by considering the value
pairs (s1 , w1 ), (s2 , w2 ), . . . , (sJ , w j ). If we denote the means as
J J
1 1
s̄ :=
J ∑ sj and w̄ :=
J ∑ w j, (4.15)
j=1 j=1
we obtain
J
1 1
cov(s,w) =
J ∑ (s j − s̄)(w j − w̄) = J (s · w) − s̄w̄. (4.16)
j=1
That is to say, the dot product equals the covariance up to a constant scaling factor
1/J and an additive constant which vanishes if the means are zero. Like the dot
product, covariance is maximal if s and w are aligned.
4.2.3 Limitations
If the perceptron is a good model of neural computation, it should be able to do
more interesting things than the examples given in Fig. 4.4. This is indeed the case
if the dimension of feature space is large. However, in the development of neural
network theory, two limitations of the perceptron have received much attention, the
so-called “XOR-problem” and the locality of the perceptron.
Suppose we want to construct a two-pixel perceptron that responds with e = 1
if one and only one of its inputs is active. In logic, the related predicate is called
“exclusive or” or XOR. If we plot this in two-dimensional feature space, the four
stimulus vectors,s = (0, 0), (0, 1), (1, 0), and (1, 1), form a square. Clearly, there can
be no line (hyperplane) such that the points (0, 1) and (1, 0) fall on one side of the
plane and (0, 0) and (1, 1) fall on the other side. If classification boundaries are linear
hyperplanes, this problem can only be solved by using a cascade of perceptrons
(multi-layer perceptron, MLP), where the outputs of three initial perceptrons form
the input of a fourth, higher level perceptron (Fig. 4.6). The output units of the initial
perceptrons are called “hidden” because their activities do not explicitly show up in
the classification procedure, i.e. are neither input nor output. Here, we denote the
activity of the hidden units by the letter h. The response of the resulting three-layer
perceptron, using the weights shown in Figure 4.6, is illustrated by the table in
Fig. 4.6b.
If we allow for multiple layers, perceptrons can thus do more than just linear
classification. By means of the hidden units, the classification problem is embedded
in a higher dimensional space. Figure 4.6c shows the position of the resulting h
vectors for the XOR-problem as corners of a unit cube in three-dimensional space.
In this embedding, it is possible to separate the “true” and “false”-cases by a linear
decision boundary, i.e. the plane −h1 + 2h12 − h2 = 0 also shown in the figure. The
4.2 Classification 93
a.
(1, 1, 1)
+1 h1 s
@−1 @
s
* R
@
(1, 0, 1)s
@
s1 H @
j
H
H+1 @ @ @ @
HH +2
h12 - @ u - @ q w @(0,
@ s 1, 1)
+1
@ J ] q
s
*
s2 H @ Jqq h12
H+1
j
H
HH
q J
h2 −1 6
@
@
b. @
In Hidden Out @
s1 s2 h1 h2 h12 e
@ @
h1 @
@
I
@
@@
0 0 0 0 0 0
: h
1 0 1 0 1 1 @s 2
0 1 0 1 1 1 c. (0, 0, 0)
1 1 1 1 1 0
Fig. 4.6 Three-layer perceptron implementing an exclusive or. a. Network topology with
two input lines and three hidden units. The thresholds are set to θ = 0. b. Truth table. c. 3D-
feature space for the final unit, taking the hidden units as input lines. Possible input vectors
are marked by black dots on the cube. In the 3D feature space, linear separation of “true” and
w = (−1, −1, 2)
, which
“false”-cases with a plane is possible. w denotes the weight vector,
is normal to the decision plane. The intersection between the decision plane and the unit cube
in h-space is indicated by a bold line.
“true” cases (0, 1, 0) and (1, 0, 0) fall above this plane and yield the output 1; the
“false” cases (0, 0, 0) and (1, 1, 1) lay in the decision plane and yield the output 0.
When projected back into the s1 , s2 -plane, the decision boundary is no longer linear
(i.e., a line), but a polygon with the desired separation properties.
A second limitation becomes apparent if a generalization of the XOR-problem
is considered which is known as the parity problem: is it possible to build a per-
ceptron that responds if and only if the number of active inputs is odd? For just two
inputs, this is the XOR-problem. Again, such perceptrons are possible if three layers
are used. One can show, however, that each such perceptron will need at least one
hidden unit looking at all inputs simultaneously. Similarly, if the “connectedness”
of complex geometrical line patterns such as intertwined spirals is considered, one
hidden unit is required receiving input from the entire input layer. This is in conflict
with the idea of breaking down vision into local bits and pieces, each processed by
a unit with a small receptive field. The famous book by Minsky & Papert (1988)
discusses this problem in detail. It has led many researchers to believe that neural
networks cannot explain perception. More recently, however, it has been recognized
that the parity and connectedness problems, interesting as they are from a compu-
tational point of view, may not be the central problems in visual perception and
visually guided behavior. In fact, judging the connectedness of intertwined spirals,
say, is not an easy task for human observers either.
94 4 Artificial Neural Networks
output
e1 e2 e3 e4 ei = f (ui )
6 6 6 6
u1 u2 u3 u4 ui = ∑ j wi j s j
6
a 6 6 !6
Fig. 4.7 Set of four two-layer per- Q
AQ A Q
aa Q A!!
ceptrons e1 , . . . , e4 with common Q a a Q
! !
weights
A Q A
! a
Q A wi j
input vector s1 , . . . s3 . The topology !Q
A!
aa
A QA
is a complete feed-forward connec- 6 6 6
tivity between the input and out- s1 s2 s3 input s j
put layer, just as in the hetero-
associator shown in Fig. 4.11.
4.2 Classification 95
performance error of the network by comparing actual and desired outputs. This is
formulated in terms of an error function depending on the network weights. Learn-
ing then becomes a minimization problem, i.e. it amounts to finding a set of weights
such that the error is minimized.
We start by rewriting the activity of a simple perceptron as in Equation 4.12:
J
e(s) = f ( ∑ s j w j ) = f ((s · w)) (4.18)
j=1
Gradient Descent
Minimization of functions of several variables is a general and important topic of
scientific computing. If the error function (also known as cost function or objective
function) is differentiable, the following logic is generally used:
The error function represents a “surface” over the space spanned by its variables,
in our case w1 , ..., wJ (see Fig. 4.8a). Note that this space can be identified with the
feature space. Every error value is an “elevation” in that landscape and the optimum
(minimum) corresponds to the deepest “depression” or trough in that landscape.
For finding this minimum we need to consider the partial derivatives of E with
respect to each coordinate w j . Fig. 4.8a shows a point (a, b) in the (w1 , w2 )-plane
and its surface elevation E(w1 , w2 ). We now consider a vertical section through the
landscape passing through (a, b) and running in parallel to the coordinate axis w1 .
Within this section, the error function reduces to E(w1 , b), i.e. a function of just one
variable w1 since b is constant. The derivative of this function at w1 = a is called the
partial derivative of E at (a, b) in direction w1 :
∂E E(a + h, b) − E(a, b)
Ew1 (a, b) = (a, b) := lim . (4.20)
∂ w1 h→0 h
96 4 Artificial Neural Networks
6
E(w1 , w2 ) w2 6
cv
PP p - s
P p ]Jv
J c
pp PP w1
E
Ew2 ppp
cv
pp
pp
pp p p p - c
v
p pp a w1
* 3
r v
c w
grad
k
Q p p pp p p v1
E
p p p p p p p p p p pQ c
w2
b w1
-
w1
a. w2
b.
Fig. 4.8 Learning as error minimization for a two-dimensional example. a. The error function
is shown as a surface over the weight space. At a point (a, b) in weight space, the partial
derivatives are defined via tangents to the surface in the coordinate directions. The gradient is
the direction of steeped ascent. b. The error function is shown as a contour plot. The optimum
corresponds to the deepest trough in the error-surface. Numerical optimization searches this
trough by starting at a random location w1 , and calculating the local gradient, i.e. the steepest
upward direction. It then moves the current weight estimate by a certain step in the negative
gradient direction, i.e. downhill, and iterates. If a minimum is reached, the gradient evaluates
to zero and the procedure stops. Note that this approach can be caught by “local minima”,
since it cannot take into account distant troughs.
Figure 4.8a shows for the surface point (a, b, E(a, b)) the tangent line in the sectional
plane together with a unit vector in the w1 direction. The difference at the tip of this
vector is the partial derivative. The same logic can be applied to the w2 direction,
∂E E(a, b + h) − E(a, b)
Ew2 (a, b) = (a, b) := lim , (4.21)
∂ w2 h→0 h
and indeed for each individual dimension in the multi-dimensional case.
For a differentiable function of J variables, J such partial derivatives exist. They
are combined into a vector called the “gradient” of the function, gradE or ∇E (read
nabla E):
∂E ∂E
gradE (a, b) = (a, b), (a, b) . (4.22)
∂ w1 ∂ w2
The gradient direction is the direction of steepest ascent on the surface; i.e. it is
always orthogonal to the surface contour lines, pointing uphill. The length of the
gradient is local slope. The gradient is defined at every point in the domain of the
function, i.e. it establishes a vector field.
In gradient descent, minimization starts at a random initial position, marked as
w1 in Fig. 4.8b. The gradient is calculated at this position and the current weight
4.2 Classification 97
The δ -Rule
In neural networks, as in biological learning, training is usually sequential. That is
to say, one stimulus is presented and the weight vector is adjusted. Then another
stimulus is presented and again a small learning step is carried out. For one such
step, we can therefore treat s in Equation 4.19 as a constant and reduce E by an
appropriate shift of w. We therefore calculate the partial derivatives of E(w) with
respect to the weights, using the chain rule:
∂E J
∂ J
(w1 , ..., wJ ) = 2 f ∑ s j w j − T (s) f ∑ s jw j . (4.23)
∂ wk j=1 ∂ wk j=1
Next, we substitute from Equation 4.18 and apply the chain rule a second time:
∂E J
∂ J
(w1 , ..., wI ) = 2 [e(s) − T (s)] f ∑ s j w j
∑ s jw j. (4.24)
∂ wk j=1 ∂ wk j=1
∂E
(w1 , ..., wJ ) = 2 [e(s) − T (s)] f (u) sk . (4.25)
∂ wk
Equation 4.25 gives the gradient direction in the error landscape E. By comparison
with Equation 4.17, we see that this rule which was based on a simple heuristic,
does indeed move the weight vector in the negative gradient direction, i.e. downhill.
We now introduce the notation
Δ wk = λ δ sk . (4.27)
In our iterative learning process, we start with some (maybe random) weight vector
w1 where the upper index denotes the current classification trial (Fig. 4.8b). We now
present a stimulus and obtain a response from which we calculate δ and apply the
learning rule. The new weight set will generate a reduced error if the last stimulus
is presented again. However, we now proceed to present the next stimulus and again
98 4 Artificial Neural Networks
calculate δ . Strictly speaking, the optimization in each step therefore refers to a dif-
ferent error function, namely the one for the current stimulus. In perceptron learning,
this somewhat surprising approach works better than a more systematic approach,
where each weight vector would be tested with all stimuli before performing the
next descent step. This is at least partially due to the problem of overlearning which
occurs for fixed training sets of stimuli. Optimizing to changing stimuli introduces
an element of randomness that renders the eventual result more robust.
For sets of perceptrons with shared input, i.e. a layer of perceptrons as depicted
in Fig. 4.7, we can consider each output neuron individually and obtain
Δ wi j := λ δi s j = λ (Ti − ei )s j . (4.28)
output
e1 e2 ei = f (ui )
Fig. 4.9 Multi-layer perceptron
with K = 3 input units, J = 6 6
4 hidden units and I = 2 out-
u1 u1 ui = ∑ j wi j h j
put units. Note that the input
and hidden layers are identi- 6
H 6
cal to the two-layer percep- H
@
@ H @
@ 2nd layer weights
trons depicted in Fig. 4.7. sk @ HH@ wi j
input unit activities, v jk input- @ HH@
to-hidden weights, r j potentials
6
6 6
6
hidden
of hidden units, h j hidden unit h1 h2 h3 h4 h j = f (r j )
activities, wi j hidden-to-output
weights, ui potentials of output 6 6 6 6
units, ei output unit activity, f r1 r2 r3 r4 r j = ∑k v jk sk
static non-linearity.
6
a 6 6 !6
Q
AQaaAQQA!!
Q a Q !
1st layer weights
A Q!Aa!a QA
v jk
!Q
A!
aa
A QA
6 6 6
s1 s2 s3 input sk
•
s2 6 CC CC CC
• s2 6
◦•C C •C •
C C C
•
•◦ C C C • 1•
◦
•
C C C
•◦ C C C
C C C ◦
•
• -
◦C C C
• s1
•◦ 1
◦ C C C
•
C C C s- 1
a. C C C b.
Fig. 4.10 Support vector machine. a. Between the two data clusters, the linear decision
boundary (heavy line) is chosen such that the margin between the two clusters is maxi-
mized. In the two-dimensional case depicted here, this is obtained by choosing three points
(two from one cluster and one from the other) defining two parallel lines between which no
data points fall (thin lines). The midline between these parallels is the decision boundary.
b. Solution of the XOR-problem using the kernel (s1 , s2 , s1 s2 ) (Eq. 4.30) and the weights
(w1 , w2 , w3 ) = (−1, −1, 2) in Eq. 4.31.
ε j = f (r j ) ∑ δi wi j , (4.29)
i
from which the hidden layer weights are updated as Δ v jk = λ ε j sk . Eq. 4.29 is called
“back-propagation” because the summation is over the first index of the weights, as
if the correction signals would run backwards from the output units to the hidden
units and were summed there according to the hidden-to-output weights. Of course,
this is just an elegant mathematical result of the gradient calculation in the multi-
layer case, not an actual neurobiological process.
been named. In the n-dimensional case, n + 1 such vectors will be needed to define a
(n − 1)-dimensional hyperplane and the margin. Algorithms for calculating the sup-
port vectors and decision boundaries from a set of data can be found in textbooks
of machine learning, e.g., Schölkopf & Smola (2002), Haykin (2008). These algo-
rithms can be relaxed to allow for a small number of data points to fall within the
margin, which is necessary if the two data clusters overlap.
So far, the support vector machine is yet another example of linear classification,
since the boundaries are again hyperplanes. Curved decision boundaries (non-linear
classification) can be achieved by the so-called kernel-approach in which the orig-
inal data vector, in the two-dimensional case s = (s1 , s2 )
, is replaced by a higher-
dimensional function Φ (s) such as
where w is the normal vector of the hyperplane and θ specifies its position. In the
example of Equation 4.30, Equation 4.31 can be rewritten as
w1 s1 + w2 s2 + w3 s1 s2 − θ = 0. (4.32)
Solving this equation for s2 yields a functional expression for curved decision sur-
face in the original (s1 , s2 ) feature space:
θ − w1 s1
s2 = . (4.33)
w2 + w3 s1
This equation describes a hyperbola with a pole at s1 = −w2 /w3 and the asymptote
s2 = −w1 /w3 approached as s1 goes to ±∞. This decision surface could for example
be used to solve the XOR-problem (see Fig. 4.10b). If other maps Φ are used, other
classes of decision functions can be obtained.
some image recorded in the retina may be associated with a motor output vector to
the larynx and vocal tract, causing the utterance of the name of a person depicted in
this image. In the vestibulo-ocular reflex, input data obtained from the semicircular
canals my be associated with (i.e. transformed to) a motor command for stabilizing
posture. In technical applications, mappings from multi-electrode recordings of the
motor cortex to the joint angles of the arm have been learned and used to control
the movement of arm prostheses with neural commands. The important question for
neural network theory is to find mechanisms, by which such association rules can
be learned.
The distinction between classification and association is not completely clear-
cut, especially if “perceptrons” with multiple output neurons are considered as in
Figs. 4.7 and 4.9. In this section, we will mostly consider linear neurons, i.e. identify
potential and excitation, and study the correlations between the various inputs and
outputs.
since ⎛ ⎞ ⎛ ⎞
10
1
1
e 1 = C1s 1 = ⎝ 0 0 ⎠ = ⎝ 0 ⎠. (4.37)
0
10 1
The weight matrix C1 was found by setting to the value of 1 all weights with indices
i, j for which si = 1 and e j = 1. In our example, where all activities are either 0 or
1, this is equivalent to saying
ci j = s j ei . (4.38)
The activities s j and ei are the pre- and postsynaptic activities with respect to the
connection ci j . The product of these activities is a standard component in formaliza-
tions with the Hebb learning rule mentioned above. It can be thought of as some kind
of “one-shot” learning where the weight is set and fixed after just one presentation
of stimulus and desired output. In matrix notation, we obtain
⎛ ⎞ ⎛ ⎞
c11 , c12 e1
C := ⎝ c21 , c22 ⎠ = ⎝ e2 ⎠ (s1 , s2 ) = e s
. (4.39)
c31 , c22 e3
s1
-
c11 -
c21 -
c31
Fig. 4.11 A 2 × 3 associator. Each unit in s2
the input layer (s1 , s2 ) is connected to each -
c12 -
c22 -
c32
neuron in the output layer (e1 , e2 , e3 ). By
suitable choices of the connecting weights,
the network can be made to store “associ- ?
?
?
ations”, i.e. transform known input pattern e1 e2 e3
into arbitrary output pattern.
4.3 Associative Memory 103
It turns out that, at least in this example, the two connectivity matrices C 1 and C 2 can
simply be added up to obtain an associator that works for both pairs simultaneously.
⎛ ⎞
10
C = C(1) + C(2) = ⎝ 0 1 ⎠ . (4.41)
11
Easy calculation proves the result: e1 = Cs1 and e2 = Cs2 . We will see below that is
is always true if the input vectors are orthogonal to each other.
2 The outer product is also known as the tensor product. Its result is a matrix. Recall for comparison
that the dot product discussed in Section 4.1.3, c =x
y, evaluates to a number. In contrast to the
outer product, the dot product is also called the inner product.
104 4 Artificial Neural Networks
This is the desired output, up to a factor s2 . We can incorporate this factor in the
weight matrix by choosing the coefficients ci j = ei s j s−2 .
In general, an associative network should store not just one association, but many.
We denote (as before) the various input and output vectors by raised indices, i.e. s p
is the p-th input vector. Here, p is mnemonic for presentation. For each presentation
(or association pair), we can calculate the connectivity matrix by the outer product
rule. As in the example, we now simply add up these outer products and obtain:
⎛ q q q q q q⎞
∑q e1 s1 ∑q e1 s2 . . . ∑q e1 sJ
⎜ ∑q e q s q ∑q e q s q . . . ∑q e q s q ⎟
⎜ 2 J ⎟
C = ∑e ·s = ⎜
q q
2 1 2 2
.. .. .. ⎟. (4.45)
q ⎝ . . . ⎠
q q q q q q
∑q eI s1 ∑q eI s2 . . . ∑q eI sJ
This matrix is essentially the matrix of the covariances that can be computed
between all input units and all output units, up to their respective means (cf. Equa-
tion 4.16). Note that in Equation 4.16 the covariance was calculated over the com-
ponents of a stimulus vector, whereas here, the sum runs over the presentations.
Mathematically, this is no problem, but the interpretation is not the same.
C in Equation 4.45 is a “mixed” covariance matrix of e and s. Later we will also
encounter the (standard) covariance matrix of s, (1/P) ∑P s ps p
.
In order to test whether the matrix C has the desired property of realizing the
complete set of associations, we apply it to the p-th input pattern:
!
Cs = ∑e ·s
p q q
s p = ∑e q s q
s p . (4.46)
q q
The term (s q
s p ) is the dot product of the input vectors for the p-th and q-th as-
sociation pair. In order to produce the correct associations, this expression has to
evaluate to zero if p = q and to one if p = q. This is to say that the input vectors
must be pairwise orthogonal and of unit length. Therefore, if the associator has J
input lines, only up to J association pairs can be stored since no more than J pair-
wise orthogonal vectors can be placed in the J-dimensional feature space. For larger
numbers of P in Eq. 4.45, mixtures of outputs will occur.
and ⎛ ⎞
e11 e21 . . . eP1
⎜ e1 e22 . . . eP2 ⎟
⎜ 2 ⎟
E := [e ;e ; ...;e ] = ⎜ .
1 2 P
.. .. ⎟ . (4.48)
⎝ .. . . ⎠
e1I e2I . . . ePI
These are simply obtained by writing the column vectors s(p) and e(p) one after the
other. Instead of Eq. 4.42, we may now write the relation for all presentations p
jointly as
E = CS, (4.49)
where E is a I × P, C a I × J, and S a J × P matrix.
If there are only J input pattern which are pairwise orthogonal, the matrix S will
be square and unitary, i.e. S−1 = S
. We therefore obtain C simply from C = ES
.
In the general case, i.e. if the input vectors are not orthogonal, one can show that the
best possible weight matrix for the desired associations is
This matrix minimizes the squared error between the desired outputs E and the ac-
tual outputs C∗ . The matrix S
(SS
)−1 is called the Moore-Penrose pseudo inverse
of S. It also occurs in regression analysis, in the solution of general linear problems.
4.3.5 Applications
In this section, we have discussed very general ideas of associative memory, which—
on the level of our presentation—amounts basically to linear regression. This idea is
simple, but powerful, especially if large numbers of neurons, i.e. high dimensional
feature vectors are considered. We briefly discuss three cases.
Memory
Associative networks store the associations of an input pattern, or stimulus, and
an output pattern, or recall. The memory is content-addressable in the sense that
it retrieves items not just by reference number, but by some sort of meaning. For
example, one could code words (strings) of n characters into n groups of 26 input
neurons (one for each letter of the alphabet), and code k × l-pixel images into the
activities of kl output neurons. In a training session, each image is associated with
a written name and the network learns the covariances between the input neuron
106 4 Artificial Neural Networks
activities and the output neuron activities. If a string is presented at the input, the
associated image will thus form at the output.
Associative memory is also distributed in the sense that deleting a number of in-
ternal connections will not delete specific memory items, since the output is stored
in all covariances. Therefore, the result of partial deletion will be an unspecific de-
terioration of the output image, not the loss of individual images or memory items.
By the same token, miss-spellings of the input string can be corrected. Memories
with this property are also called “holographic”.
Both properties, content-addressability and distributedness, are highly plausible
for models of biological memories. For a classical reference, see Kohonen et al.
(1976).
Neuroprostheses
Robotic arms have been controlled by signals recorded from large number of neu-
rons in the motor- and parietal cortices of a monkey. Simultaneously with this
recordings, movements of the monkey’s are also recorded and encoded in terms
of the angles and angular velocities occurring at the shoulder and elbow joints. The
“association” between the neural activities and the arm movement can be learned
along the lines discussed above; indeed, this amounts to a linear regression analysis
of the arm movement as a function of neural activity. Once the covariance matrix is
known, new neural activities can be used to predict the “intended” arm movement
and drive a robotic arm with this signal. The results show substantial agreement be-
tween the monkey’s arm movements and the movements of the robot arm (Wessberg
et al. 2000).
i j = wi j + λ ei s j
wt+1 t
or W t+1 = W t + λe s
(4.51)
4.4 Self-organization and Competitive Learning 107
where λ ≥ 0 is again a learning rate and t and t + 1 are time steps. The left equation
applies to an individual synaptic weight while the right is formulated for the entire
weight matrix. In the unsupervised case considered here, ei is simply the result of
the network activity, i.e. e = f (Ws) where f is the static non-linearity introduced in
Equation 4.4. If we substitute this activation dynamics into Equation 4.51 we obtain
a difference equation for W depending on the input pattern:
If, as a first approximation, the activation function is assumed linear, we can omit the
function f and obtain Δ W = λ Wss
. Since the sum of outer productsss
taken over
time is the covariance matrix of the input set, we can expect that the weight matrix
eventually learned by the system will reflect the statistical properties of the stimulus
set. Indeed, in competitive learning, it can be shown that neural networks have the
ability to form statistically optimal representations by means of self-organization.
s2
6 w + λ ets t p p
Fig. 4.12 Geometric interpretation of Oja’s
p
learning rule, Eq. 4.54. At time t, the weight p
p
vector is w(t). Assume now that a stimulus s t p
p
is delivered. The new weight vector w t+1 is ob-
s
p
t
p
tained by adding a certain multiple of the stim- pw t+1
ulus vector to the old weight vector and nor- p p
7
w t
malizing the sum. The tip of the weight vec-
tor moves on a circle (due to the normalization)
towards the side where the stimulus occurred.
If the system reaches a steady state, the weight
vector will be pointing to the center of the cloud
-
of input vectors delivered to the system during s1
training.
J
et := ∑ w j stj = w
s t (4.53)
j=1
and a set of input vectors presented as a temporal sequence s t := (st1 , ..., stJ )
. We
now introduce the normalizing Hebbian rule (also known as Oja’s learning rule):
Next, we substitute from Equation 4.53, keeping in mind that w
s =s
w due to
the commutativity of the dot product. This yields
s
w −
Δ w = λ [s w
s
s
w w]. (4.56)
e e e
Due to the associativity of matrix multiplication, we may bracket the outer prod-
ucts ss
. In the temporal average, i.e. for many stimulus presentations, these outer
products become the covariance matrix of the training set, C := (1/tmax ) ∑t ss
. We
obtain:
Δ w = λ (Cw − (w
Cw)w). (4.57)
The weight vector will converge when Δ w = 0. For the resulting weight set, we
obtain
Cw = (w
Cw)w. (4.58)
The term in braces, (w
Cw), is a number, let’s call it μ . Equation 4.58 therefore
becomes the eigenvalue3 equation Cw = μ w, i.e. the weight vector will converge to
an eigenvector of C. Moreover, Oja (1982) has shown that w will converge to the
eigenvector with the largest eigenvalue, i.e. the perceptron will evaluate the loading
(coefficient) of the first principal component of the data set.
This result means that by the normalizing Hebb rule, a perceptron will automat-
ically adjust its weight vector to the first principle component of the data set. If we
think of the set of input data as a cloud of dots in feature space, the first princi-
ple component is the longest axis of this cloud (Figure 4.13). This result was to be
expected from our graphical consideration in Fig. 4.12 since the weight vector is at-
tracted by all stimulus vectors in cloud and thus ends up in the center of that cloud.
Since the activation function of the perceptron performs a projection on the weight
vector (Section 4.2.2), this result therefore means that the Oja rule automatically
finds the axis representing the largest possible variability of the data set in just one
number. Thus, the information conveyed is maximized.
Note that learning only occurs if the output e is positive. This means that the
perceptron needs to start with a weight vector not too distant from the stimulus.
Also, if the data set consists of two distinct and sufficiently distant clouds of dots,
the weight vector will pick up on the one which is closer to its start position.
Here, ci = (ci1 , ci2 , ..., ciJ )
is again the vector of all input weights of unit i, i.e. the
receptive field of the unit. The so-called Kohonen map uses the following competi-
tive weight dynamics. Let
k := argmax{ei } (4.60)
1≤i≤I
be the index number of the output unit generating the strongest excitation after a
given stimulus presentation. This unit will be the one whose weight vector most
closely resembles the input vector; it is usually the “winner”-unit. After each stim-
ulus presentation, the winner unit is selected and subjected to a learning step which
will make it react even stronger when the same stimulus occurs the next time. Learn-
ing is also applied to the units in the neighborhood of the winner, i.e. these units as
well will react stronger if the last stimulus re-occurs:
ci·t + λ ts t λo
for all i ∈ Nk : ci·t+1 := ; λ t := ∈ IR. (4.61)
ci·t + λ ts t t
The learning rule itself is again a normalizing Hebb rule. Thus, there are two points
where competition occurs, first in the selection of a “winner neuron” and its neigh-
borhood which are eligible for learning, and second among the various input weights
of these neurons since weight increase at one input line results in unspecific reduc-
tion of all other weights. The time-dependence of the learning rate is introduced to
enforce convergence of the procedure. This type of enforced convergence is also
known as “annealing”.
Fig. 4.14b,c illustrate the learning process in a self-organizing feature map. The
coordinate system plotted represents the unit-hypersphere on which the weight vec-
torsci move. In the three-dimensional case, the coordinates σ1,2 on this hypersphere
can be thought of as elevation and azimuth. The shaded area shows the region
in feature space where stimuli are likely to occur. The network starts with a ran-
dom initialization (Fig.4.14b) where the (σ1 , σ2 )-coordinates of each output unit
are independent of the unit’s respective position in the neighborhood grid, the map
therefore looks scrambled. During learning, the weight vectors move into the gray
area because at each learning step, the winner and its neighbors are attracted by a
4.5 Suggested Reading 111
r
r
r
σ2 6 σ2 6
e1i ei 2 ei 3 ei 4 ei 5 e6i cw
g
1
g
w
cH
2
}
Z @ HHcw
g
BM Z@ 6 J
I ]
BM 6
>
@ A4
B Z@
J B cw
g2 cw
g @c
g
w
c11 B Z
@J B c62
1
J cw
3
A AA
g cw
g
B
Z @JB -
H 6-
3JAAHHcw AA
6
Z cw
g g σ1 cw
g σ
B
@
ZJB H HJ 5 5 1
a. s1i s2i b. HJ
A
cg
w4 c.
Fig. 4.14 Self-organizing feature map. a. Network topology with input layer (s j ), input
weights (ci j ), map layer (ei ) and map layer adjacencies, in this example with the topology
of a 2 × 3 grid. b. Unit-hypersphere in the input space, with coordinates σ1 , σ2 . For each
map unit ei , its input weight vector (ci1 , ci2 )
is marked as ci . The heavy black lines show
the map layer adjacencies. The weights are initialized at random, leading to an unordered
arrangement of weight vectors. The gray area marks the region in feature space where input
probability is high. c. After learning, the weight vector grid has “crawled” into the interesting
area of feature space. It also has “unfolded” such that adjacent units now have similar weight
vectors (i.e., receptive fields).
stimulus vector from within that range (Fig. 4.14c). At the same time, the map un-
folds and ends up in a “neighborhood-preserving” (or continuous) fashion where
adjacent output neurons have similar weight vectors. This continuity is a result of
the shared learning in the neighborhoods. Input pattern with high frequency of oc-
currence will be represented by more units (larger grid regions) than rare pattern.
Kohonen maps or similar forms of competitive learning underly a large class of
models for self-organization of representations, for example for orientation columns
in the visual cortex or for representational plasticity in the somatosensory cortex. For
examples, see von der Malsburg (1973) or Antolik & Benard (2011).
Books
Arbib, M. (ed.), (2002). The Handbook of Brain Theory and Neural Networks. 2nd
edition, Cambridge, MA: The MIT Press
Haykin, S. (2008). Neural Networks and Learning Machines. 3rd edition, Upper
Saddle River, NJ: Pearson Prentice Hall.
Minsky, M. L. and Papert, S. A. (1988). Perceptrons, expanded edition. Cambridge,
MA: The MIT Press.
Shepherd, G., and Grillner, S. (eds.), (2010) Handbook of Brain Microcircuits. Ox-
ford University Press
112 4 Artificial Neural Networks
Original Papers
Antolik, J., Bednar, J.A. (2011) Development of maps of simple and complex cells
in the primary visual cortex. Frontiers in Computational Neuroscience 5 — 17
— 1-19
Recent example of a self-organization model for cortical orientation selectivity,
taking into account differences between cortical layers and cell-types.
Buchsbaum, G., Gottschalk, A. (1983) Trichromacy, opponent colours coding and
optimum colour information transmission in the retina. Proceedings of the
Royal Society (London) B 220:89 – 113.
Shows that color opponency can be interpreted as a recoding of the cone re-
ceptor signals via principle component analysis. One of the most compelling
examples of an application of this theory in neuroscience.
Carreira-Perpiñán, A. Á., Lister, R. J., Goodhill, G. J. (2005) A computational model
for the development of multiple maps in primary visual cortex. Cerebral Cortex
15:1222 – 1233
This paper uses a slightly different approach to self-organization to model the
interlaced structured of cortical maps for various neuronal specificities. The
results are in good agreement with neurophysiological data.
von der Malsburg, C. (1973). Self–organization of orientation sensitive cells in the
striate cortex. Kybernetik, 14:85 – 100.
One of the first papers demonstrating that the formation of cortical orientation
columns might be a result of competitive learning.
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T. (2007) Robust object
recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence 29:411 – 426.
Modern classificator model based on the multi-layer perceptron. The layers are
interpreted as areas of the visual and temporal cortices. Pattern classification
abilities are tested with real images.
J. Wessberg, C. R. Stambaugh, J. D. Kralik, P. D. Beck, M. Laubach, J. K. Chapin,
J. Kim, S. J. Biggs, M. A. Srinivasan, and M. A. L. Nicolelis. Real-time pre-
diction of hand trajectory by ensembles of cortical neurons in primates. Nature,
408:361 – 365, 2000.
Large numbers of neurons are recorded from the cortex of a monkey and the
activities are correlated to joint angles and joint velocities of the monkey’s
forearm. By statistical methods akin to those described in this chapter, arm
movements can be predicted from the neural activities. The quality of these
predictions is assessed by using them to control a robot arm and compare the
movements of robot and monkey arm.
Chapter 5
Coding and Representation
Abstract. Neuronal activity does generally not in itself contain information about
the stimulus. Spikes elicited by different stimuli look quite the same, but do gener-
ally occur in different cells. The information is contained in the specificity, or tuning
curve of the cell generating the spike. This may be considered a modern account of
the notion of the “specific sense energies” formulated by Johannes Müller1 already
in 1826. In effect, any stimulation of the eye (or visual pathways) is perceived as
visual and any stimulation of the ear (or auditory pathways) as auditory. The in-
formation encoded by each neuron is described by its tuning curve. Often, neurons
are tuned simultaneously to different parameters such as position in visual space,
edge orientation, and color, albeit to various extends (i.e., with sharper or coarser
tuning curves). Tuning curves of different neurons overlap, leading to population
coding where each stimulus is represented by the activities of a group, or popula-
tion, of neurons. The first part of this chapter explores the consequences of popula-
tion coding for neural information processing. In the second section, we study the
fact that neighboring neurons in the cortical surface tend to have similar receptive
fields and tuning curves. This similarity is defined for a combination of many pa-
rameters including position in the visual field as well as the well-known “perceptual
dimensions” orientation, spatial frequency, color, motion, and depth. It is particu-
larly evident for the tuning to visual field position, i.e. in retinotopic mapping of the
visual field onto the cortical surface in the centimeter range.
with our excitation variable e(t). In this section, we assume that spike rate carries
the major part of neural information, as is generally also assumed in single-cell
neurophysiology. A related variable is the local field potential measured by low-
resistance extracellular electrodes which approximately is a local spatial average of
spike rate. In addition to spike rates and local field potentials, higher-order pattern
in spike trains have been studied which are revealed by auto- and cross-correlation
(cf. Equation 2.57) or other techniques from time series analysis.
Consider a continuous variable taking values in the interval (0, 1). In principle,
there are at least three ways of coding such variables in neural activity, or spike rate:
1. Intensity code: The activity of the neuron is proportional to (or a monotonic
function of) the coded parameter. Image contrast is coded in this way, leading to
stronger activities if contrast is increased. One problem of this coding scheme is
that the dynamic range of the neuron (0 to < 500 spikes per second) determines
the resolution of the code.
2. Channel coding or labeled line code without overlap (Fig. 5.2a): The set of all
possible stimulus values—in our example the interval (0, 1)—is partitioned into
n parts, each of length 1/n. For each part i, a detector neuron exists which is
n , n ). The tuning
active if and only if the stimulus falls within the interval ( i−1 i
curve of each detector neuron is one on this interval and zero elsewhere.
3. Population code or labeled line code with overlap (Fig. 5.2b): Each stimulus is
coded by a set or population of neurons each of which is tuned to a section of
the coded stimulus. The tuning curves may overlap. This is the standard way of
neural coding.
Population codes need more than one neuron to encode a parameter value, whereas
in simple channel coding, one neuron may suffice. Despite this apparent disadvan-
tage, it can be shown that population codes are superior to non-overlapping channels
in many respects. We will now turn to the question of information content.
i pi i pi
yes 9 9
Q1 : is it “00” ? - 1
16 16
no yes 3
- Q2 : is it “01” ?
" - 2
6
16 16
no yes
- Q3 : is it “10” ?
" -⎫
⎬ 4 12
3
no ⎭ 16 16
" -
average number 27
of questions asked 16
Fig. 5.1 Questioning scheme for recovering a sequence of the characters “0” and “1” appear-
ing with probability 3/4 and 1/4 respectively. i, number of questions asked; pi , probability of
finding the answer after the i-th question. Each cycle recovers two digits by asking on average
27/16 = 1.6875 questions, i.e. roughly 0.84 binary questions per digit.
p3 = 0.25 × 0.75 + 0.25 × 0.25 = 0.25. With this question answered, we have re-
constructed the next two digits. The average number of questions to be asked in this
example is
ρ1 6 ρ1 6
- -
ρ2 6 ρ2 6
- -
ρ3 6 ρ3 6
- -
ρ4 6 ρ4 6
- -
q q
0 .25 .5 .75 1 0 .25 .5 .75 1
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞
1 0 0 0 1 1 1 1 0 0 0 0
⎜0⎟ ⎜1⎟ ⎜0⎟ ⎜0⎟ ⎜ 0 ⎟⎜ 1 ⎟⎜ 1 ⎟⎜ 1 ⎟⎜ 1 ⎟⎜ 0 ⎟⎜ 0 ⎟⎜ 0 ⎟
⎝0⎠ ⎝0⎠ ⎝1⎠ ⎝0⎠ ⎝ 0 ⎠⎝ 0 ⎠⎝ 1 ⎠⎝ 1 ⎠⎝ 1 ⎠⎝ 1 ⎠⎝ 0 ⎠⎝ 0 ⎠
0 0 0 1 0 0 0 1 1 1 1 0
a. b.
Fig. 5.2 Information transmission in channel-coded systems without overlap (a.) and with
overlap (b.) With overlap, more information can be transmitted. The vertically printed vectors
are the channel activities (ρ1 (q), ρ2 (q), ρ3 (q), ρ4 (q)) for each q-Interval. For further expla-
nation see text.
n <q≤
1 if i−1 i
ρi (q) = n . (5.4)
0 otherwise
By coding the signal in this scheme, it is digitized to steps of 1/n, each encoded by
one of the activity patterns (1, 0, .., 0), (0, 1, ..., 0), ..., (0, 0, ..., 1). Smaller differences
cannot be resolved.
The probability of each of these n activity patterns to occur is 1/n, since we
assumed that the parameter q is equally distributed in the interval (0, 1). With n
equally likely activity patterns, we may calculate the entropy
n
1 1
H =−∑ log2 = log2 n. (5.5)
i=1 n n
Information content increases with the number of channels, but only very slowly;
information content per channel decreases as the number of channels increases.
Consider now a coding scheme with overlapping channels as shown in Fig. 5.2b.
The width of each tuning curve is 1/2, and the tuning curves of the n channels are
shifted by 1/(2n):
2n < q < 2n + 2 .
1 if i−1 i−1 1
ρi (q) = (5.6)
0 otherwise
As can be seen from Figure 5.2b, there are are 2n different activity distributions (or
code words) in this case. Therefore, the entropy is:
5.1 Population Code 117
2n
1 1
H =−∑ log2 = 1 + log2 n. (5.7)
i=1 2n 2n
6 ρ1 ρ2 ρ3 ρ4
@ @ r@ @ -
@ @r @ @ -
@ @
r @ @@ -
@ @ @
@r -
@ @ @ @ @ - -
6 p 1 2 6
3 4 p̂
1 2 3 4 5
a. q b. q∗
Fig. 5.3 a. An array of equidistant triangular tuning curves (Equation 5.9). Also shown is
a stimulus value q which excites all four channels according to their tuning curves. b. The
activities generated in these four channels. Their center of gravity equals the original stimulus.
Fig. 5.3a shows a set of four such tuning curves where the preferred stimuli of each
neuron have been identified with the neuron’s number, i.e. p̂i = i.
Now, consider a stimulus with parameter value q; in Figure 5.3a it appears be-
tween the preferred stimuli of neurons 2 and 3. From Equation 5.9 it is easy to
calculate the activities in the four channels. They are shown in Figure 5.3b as gray
columns. As explained before, we consider these columns to be loads placed on the
beam of a balance. Since we now have more than two channels, each load is placed
at the position corresponding to the channel’s preferred stimulus. The definition of
the center-of-gravity as support-point of the beam remains the same. In the case
of triangular tuning curves, it is easy to show that the center-of-gravity estimator
actually recovers the encoded parameter value, q = q∗ in Equation 5.8.
If the equality q = q∗ does not strictly hold, the center-of-gravity estimator is
still useful as an approximation; for an interesting example, see Kay et al. (2008).
In addition, more sophisticated methods of estimation theory have been applied to
the problem, most notably the idea of maximum likelihood estimation. It interprets
tuning curves as conditional probabilities for neuronal firing given that a particular
stimulus was presented. For a recorded pattern of population activity, one can then
calculate for each possible stimulus the probability of the recorded pattern to occur.
This probability is called the likelihood3 of the stimulus. Finally, one picks the one
stimulus generating the largest likelihood.
3 Note that the likelihoods of all stimuli need not sum to unity. This is why the term “likelihood”
is introduced, to distinguish the quantity from a true probability.
5.1 Population Code 119
??
q
q
q
b.
a.
Fig. 5.4 Hyperacuity (or sub-pixel resolution) in a population code for visual position. a. The
arrows mark visual positions for two ideal point lights. By the imaging properties of the eye,
they are washed out into two blurring disks on the retina (shown as Gaussians). The width
of blurring under ideal conditions is about 0.5 minutes of arc. These pattern excites the cone
photoreceptors which in the fovea have a diameter of again 0.5 minutes of arc. Each blurring
disc will therefore excite a small population of adjacent cone receptors. The activity pattern
over these group differs even if the light positions are less than 0.5 minutes of arc apart.
b. Test patterns for hyperacuity: left, vernier, middle, arc, right, row of dots. Well trained
subjects correctly judge offset directions for offsets below 10 seconds of arc, i.e. less than
one third of the cone diameter.
Fig. 5.4b shows some patterns which may be used to measure visual resolution.
The versions of the patterns depicted, or their mirror images, are shown to the test
subject. Then the subject is asked whether the upper line is to the left or the right
of the lower one (example at the left); in which direction the arc is bent (middle
example), or whether the middle of the three points is to the left or the right of the
line connecting the two other points (right example). All of these experiments reveal
perceptual thresholds on the order of 10 seconds of arc or less, i.e. well below the
resolution of the cone mosaic.
a. qo b. qad > qo c. qo
? ? ?
- - -
p̂1 p̂2 p p̂1 p̂2 p p̂1 p̂2 p
- -
p̂ p̂
q∗ = qo q∗ < qo
Fig. 5.6 After-effects are a common phenomenon in the perception of motion, color, texture
granularity, etc. A standard explanation assumes that perception is based on the population
activity of channels with overlapping tuning curves. a. A stimulus qo is presented intermedi-
ate between the preferred stimuli p̂1 and p̂2 of two channels. The resulting population activity
shows two equal excitations and the center of gravity estimator q∗ truly reproduces the origi-
nal stimulus qo . b. The system now adapts to a new stimulus, qad . Adaption is modeled as a
reduction of the sensitivity of active channels. c. If the original stimulus is again presented to
the adapted system, the activity of the previously active channel will be reduced. The center-
of-gravity estimator is therefore displaced away from the adapting stimulus, i.e. to the left.
color, or depth, are represented in a population code. In this coding scheme, adapta-
tion is modeled as an overall decrease of sensitivity caused by strong and sustained
activity of a given channel in the adaptation phase. This decrease affects the chan-
nels most closely tuned to the adapting stimulus and may be due to fatigue, learning
processes or other mechanisms. In any case, the sensitivity of the channels tuned to
the adapting stimulus will be reduced in the test phase of the experiment. Thus, the
population code will be biased away from the adapting stimulus (Fig. 5.6). In on-
going perception, channel adaptation will lead to an improved detection of changes,
or of deviations from the average. The perception of constant stimuli will be sup-
pressed while more sensitivity will be assigned to parameter ranges with higher
variability.
pattern of activity is given on the motor cell population, one may calculate the center
of gravity estimator (Equation 5.8) where it is understood that the scalar “preferred
stimuli” p̂i of each neuron are to be replaced by vectorial “preferred movements” p̂i .
Note that Equation 5.8 works just as well with vectorial quantities, since the activ-
ities ei always remain scalars. Georgopoulos et al. (1993) use the term “population
vector” for the center-of-gravity estimator of vectorial quantities. It turns out that by
means of this population vector, the actual arm movement is nicely predicted. If the
population vector is monitored over time while the monkey is planning a movement,
it can be seen anticipating this movement (“mental rotation”).
Fig. 5.7 shows the general situation for a two-dimensional parameter space. The
tuning curves (or motor fields) are bell-shaped curves in that space, centered at
the cell’s preferred parameter value p̂i ∈ IR2 . The interpolation between two such
preferred parameter vectors is realized by decreasing activity in one channel and
increasing activity in the other. By means of the center-of-gravity calculation, this
amounts to an interpolation of the population estimate. If the two preferred stimuli
are different pointing directions, the described shift of activity from one channel to
another amounts of a rotation of the population vector.
Two- or higher-dimensional tuning curves are are generally modeled as “radial
basis functions” (RBF):
ρi (p) := f (p − p̂i ) (5.10)
5.1.5 Summary
Population coding is an ubiquitous scheme of neural coding which has a number
of characteristic properties, which play also a role in neural information processing.
Some of these properties are:
• Information content: Population codes can convey more information than non-
overlapping labeled lines. The most compelling example is color vision where
three channels can code for some two million colors, differentiated in hue, satu-
ration and brightness.
• Hyperacuity (sub-pixel resolution): Population codes inherently allow for a res-
olution better than the channel spacing. Hyperacuity in visual line localization is
a clear example.
• After-effects and the adjustment of working-ranges: Since individual parameter
values are coded by the balance of activity in overlapping channels, adaptation
of individual channels leads to distorted percepts. Overall, aftereffects tend to
emphasize contrasts, rather than static situations. If the average value of a sensory
parameter changes, the system can adjust its working-range.
• Interpolation and mental rotation : Reading population codes by the center of
gravity estimator implies an interpolation operation. Examples include the cod-
ing of arm movements in cortical motor neurons, eye-movements to multiple
targets, both in the fronto-parallel plane and in depth, and the hippocampal place-
field code to location. The different phenomena described as mental rotation (cor-
relation of reaction time and turning angle in same-different judgments, neural
representation of arm movements) can be interpreted as direct consequences of
the interpolation property of population codes.
The retinal and cortical coordinates are given by the two-dimensional vector (x, y)
and (u, v), respectively.
124 5 Coding and Representation
b
*
7
Fig. 5.8 The area of a parallelogram. A parallelogram may be de-
scribed by the vectors a = (a1 , a2 )
and b = (b1 , b2 ). After divid-
Z
ing it into two triangles along the diagonal a +b, the surface area a Z
Z
may be calculated as the base times the height. The length of the
base is a +b. The normal vector in the direction of the height is
(−a2 − b2 , a1 + b1 )
/a +b. The height may be obtained by pro-
jecting a onto this normal. Then the surface area may be calculated:
A = |a1 b2 − a2 b1 |.
can be determined by examining the square defined by the vectors a = (1, 0)
and
b = (0, 1)
in the domain of the mapping. The area of this square is 1. Its image is
the parallelogram defined by the vectors L(a) = (a11 , a21 )
and L(b) = (a12 , a22 )
.
According to the rule discussed above, its area is |a11 a22 − a12 a21 |. Since the area of
the original square was 1, the new area is also the areal magnification of the linear
mapping. For the linear mapping, this magnification is the same everywhere.
For a general (differentiable) mapping M : IR2 → IR2 , areal magnification can
be determined by first finding a local linear approximation and than calculating the
magnification of this linear mapping (cf. Fig. 5.9). A linear approximation (gray
grid in Fig. 5.9) can be obtained from the matrix of the partial derivatives of the two
components of the mapping M , that is, from the Jacobian matrix of M (see, for
example Rudin 1976):
⎛ ∂M ∂M ⎞
1 1
⎜ ∂x ∂ y ⎟
JM (x, y) := ⎝ ∂ M ∂ M ⎠ . (5.13)
2 2
∂x ∂y
5.2 Retinotopic Mapping 125
Fig. 5.9 The definition of areal magnification of a general transformation T . Left: the domain
in which T is defined. Right: The image of the grid as transformed by T (black) and the local
approximation of T in the vicinity of the center of the grid.
The reason for having different areal magnifications in the cortical representation
of the visual field is that the density of retinal ganglion cells is not constant. We
can assume that, at least approximately, each retinal ganglion cell maps to the same
number of cortical neurons covering a constant area of the cortical surface. In the
primate retina, the distribution of ganglion cells has a peak around the fovea and
declines towards the periphery in a more or less isotropical way. In order to obtain
equal representation in the cortex, the mapping must therefore be distorted, with an
expansion (high areal magnification) of the central retina and a compression (low
areal magnification) of the periphery. If dg (x, y) denotes the density of ganglion cells
and R represents the imaging function describing the retinotopic map, equal cortical
representation of all ganglion cells amounts to
∂ C1 ∂ C2 ∂ C1 ∂ C2
= and =− . (5.16)
∂x ∂y ∂y ∂x
It follows immediately that the Jacobian of a conformal function describes a pure
rotation combined with a scaling, but no shear, in accordance with the mapping
properties described above.
AB
JC (x, y) = (5.17)
−B A
with
∂ C1 ∂ C2 ∂ C1 ∂ C2
A= = and B = =− . (5.18)
∂x ∂y ∂y ∂x
For cortical magnification we obtain
1
= c(R (r))2 (5.20)
r2
5.2 Retinotopic Mapping 127
for some constant c. From this, we immediately compute (see Fischer 1973):
1
R(r) = log r. (5.21)
c
This function will be discussed in the following section.
r
φ
6
I
f
f
v
6
-u
Fig. 5.10 Log-polar mapping of the visual hemifield with offset. The inset on the left shows
the contralateral (right) visual hemifield with f = (0, 0) marking the center. The big figure
shows the image of the polar grid under the log-polar map (u, v) = Pc (x, y) where f =
Pc ( f ) is the foveal representation. The representation of the vertical meridian delimiting
the visual hemifield to the left, is the ⊂-shaped curve. The semicircular perimeter maps to the
straight line to the right. Close to the fovea, the radial structure of the retinal grid is preserved.
Further to the periphery, the visual field radii become parallel lines and the iso-eccentricity
circles approximate straight verticals. This effect is increasingly pronounced if the value of c
is reduced. The straight line in the map, running from top left to bottom right is the image of
the logarithmic spiral marked in the visual field.
128 5 Coding and Representation
The retinotopic map Pc is shown in Fig. 5.10 by means of a polar grid in the right
visual field and the image of this grid in the left visual cortex, area V1. The plot
nicely models neurophysiological measurements of visual retinotopic maps in mon-
keys. Also shown is a straight line in the cortical representation together with a curve
in the visual field that will be mapped to this straight line under the log-polar map-
ping. Easy calculation proves that this curve is a logarithmic spiral. This is thought
to be the reason for the spiral structure of visual hallucination patterns known for
example from migraine “aura” events. In migraine aura events, a wave front known
as spreading depression is moving across the visual cortex, inducing a temporary
scotoma in the visual field. While the depression wave front is approximately a
straight line, the subjects perceive a spirally shaped scotoma, since they (as always)
interpret the visual field position of cortical activity via the inverse of the retinotopic
map (see Bressloff et al. 2001.
Original Papers
Bressloff, P. C., and Cowan, J. D., and Golubitsky, M., and Thomas, P. J. and Wiener,
M. C. (2001) Geometric visual hallucinations, Euclidean symmetry and the
5.3 Suggested Reading 129
Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion.
Journal of the Optical Society of America A 2, 284–299 (1985)
Antolik, J., Bednar, J.A.: Development of maps of simple and complex cells in the primary
visual cortex. Frontiers in Computational Neuroscience 5(17) (2011)
Aidley, D.J.: The Physiology of Excitable Cells, 4th edn. Cambridge University Press, Cam-
bridge (1998)
Bressloff, P.C., Cowan, J.D., Golubitsky, M., Thomas, P.J., Wiener, M.C.: Geometric visual
hallucinations, Euclidean symmetry and the function architecture of striate cortex. Philo-
sophical Transactions of the Royal Society (London) B 356, 299–330 (2001)
Barlow, H.B., Levick, R.W.: The mechanism of directional selectivity in the rabbit’s retina.
Journal of Physiology 173, 477–504 (1965)
Blakemore, C., Sutton, P.: Size adaptation: a new aftereffect. Science 166, 245–247 (1969)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York
(1991)
Eichner, H., Joesch, M., Schnell, B., Reiff, D.F., Borst, A.: Internal structure of the fly ele-
mentary motion detector. Neuron 70, 1155–1164 (2011)
Fischer, B.: Overlap of receptive field centers and representation of the visual field in the cat’s
optic tract. Vision Research 13, 2113–2120 (1973)
FitzHugh, R.: Impulses and physiological states in theoretical models of nerve membrane.
Biophysical Journal 1, 445–466 (1961)
Georgopoulos, A.P., Taira, M., Lukashin, A.: Cognitive neurophysiology of the motor cortex.
Science 260, 47–52 (1993)
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson Prentice Hall, New
York (2008)
Hines, M.L., Carnevale, N.T.: The NEURON simulation environment. Neural Computation 9,
1179–1209 (1997)
Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and its appli-
cation to conduction and excitation in nerve. Journal of Physiology 117, 500–544 (1952)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational
abilities. Proceedings of the National Academy of Sciences 79, 2554–2558 (1982)
Hartline, H.K., Ratliff, F.: Spatial summation of inhibitory influences in the eye of limulus,
and mutual interaction of receptor units. Journal of General Physiology 41, 1049–1066
(1958)
Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual
attention. Vision Research 40, 1489–1506 (2000)
132 References
Jack, J.J.B., Noble, D., Tsien, R.W.: Electric current flow in excitable cells. Clarendon Press,
Oxford (1975)
Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional Gabor filter model of simple
receptive-fields in the cat striate cortex. Journal of Neurophysiology 58, 1233–1258 (1987)
Kay, K.N., Naselaris, T., Prenger, R.J., Gallant, J.L.: Identifying natural images from human
brain activity. Nature 452, 352–355 (2008)
Koenderink, J.J.: Scale-time. Biological Cybernetics 58, 159–162 (1988)
Kohonen, T., Reuhkala, E., Mäkisara, K., Vainio, L.: Associative recall of images. Biological
Cybernetics 22, 159–168 (1976)
Lettvin, J.Y., Maturana, H.R., McCulloch, W.S., Pitts, W.H.: What the frog’s eye tells the
frog’s brain. Proceedings of the Institute of Radio Engineers 47, 1950–1961 (1959)
von der Malsburg, C.: Self–organization of orientation sensitive cells in the striate cortex.
Kybernetik 14, 85–100 (1973)
Mallot, H.A.: An overall description of retinotopic mapping in the cat’s visual cortex areas
17, 18, and 19. Biological Cybernetics (1985)
Minsky, M.L., Papert, S.A.: Perceptrons, expanded edition. The MIT Press, Cambridge
(1988)
Murray, J.D.: Mathematical Biology. I. An introduction, 3rd edn. Springer, Berlin (2002)
Mallot, H.A., von Seelen, W., Giannakopoulos, F.: Neural mapping and space–variant image
processing. Neural Networks 3, 245–263 (1990)
Milescu, L.S., Yamanishi, T., Ptak, K., Smith, J.C., Mogri, M.Z.: Real time kinetic modeling
of voltage-gated ion channels using dynamic clamp. Biophysical Journal 95, 66–87 (2008)
Oja, E.: A simplified neuron model as a principal component analyzer. Journal of Mathemat-
ical Biology 15, 267–273 (1982)
Poggio, T., Girosi, F.: Regularization algorithms for learning that are equivalent to multilayer
networks. Science 247, 978–982 (1990)
Pollen, D.A., Ronner, S.F.: Visual cortical neurons as localized spatial frequency filters. IEEE
Transactions on Systems, Man, and Cybernetics 13, 907–916 (1983)
Rall, W.: Electrophysiology of a dendritic neuron model. Biophysical Journal 2, 145–167
(1962)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back–propagating
errors. Nature 323, 533–536 (1986)
Rudin, W.: Principles of mathematical analysis, 3rd edn. McGraw-Hill, New York (1976)
Schölkopf, B., Smola, A.: Learning with Kernels. Support Vector Machines, Regularization,
Optimization and Beyond. The MIT Press, Cambridge (2002)
Tuckwell, H.: Introduction to theoretical neurobiology (2 Vols.). Cambridge University Press,
Cambridge (1988)
Index