Unconventional Computation - MacLennan - 2018
Unconventional Computation - MacLennan - 2018
Unconventional Computation - MacLennan - 2018
Bruce MacLennan
Department of Electrical Engineering and Computer Science
University of Tennessee, Knoxville
Copyright
2018
c
Version of
I Introduction 3
A What is unconventional computation? . . . . . . . . . . . . . . 3
B Post-Moore’s Law computing . . . . . . . . . . . . . . . . . . 5
C Super-Turing vs. non-Turing . . . . . . . . . . . . . . . . . . . 7
C.1 The limits of Turing computation . . . . . . . . . . . . 8
C.2 New computational models . . . . . . . . . . . . . . . . 11
C.3 Computation in general . . . . . . . . . . . . . . . . . 16
C.4 Expanding the range of computing technologies . . . . 22
C.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 26
II Physics of Computation 29
A Energy dissipation . . . . . . . . . . . . . . . . . . . . . . . . 29
B Thermodynamics of computation . . . . . . . . . . . . . . . . 35
B.1 Von Neumann-Landaur Principle . . . . . . . . . . . . 35
B.2 Mechanical and thermal modes . . . . . . . . . . . . . 43
C Reversible computing . . . . . . . . . . . . . . . . . . . . . . . 47
C.1 Reversible computing as a solution . . . . . . . . . . . 47
C.2 Foundations of Conservative Computation . . . . . . . 51
C.3 Conservative logic circuits . . . . . . . . . . . . . . . . 54
C.4 Universality . . . . . . . . . . . . . . . . . . . . . . . . 55
C.5 Constants and garbage . . . . . . . . . . . . . . . . . . 57
C.6 Garbageless conservative logic . . . . . . . . . . . . . . 57
C.7 Ballistic computation . . . . . . . . . . . . . . . . . . . 60
D Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3
4 CONTENTS
Introduction
These lecture notes are exclusively for the use of students in Prof. MacLen-
nan’s Unconventional Computation course.
2018,
c B. J. MacLennan, EECS,
University of Tennessee, Knoxville. Version of November 17, 2018.
3
4 CHAPTER I. INTRODUCTION
2.71828
0010111001100010100
… …
Fall 2012 Unconventional Computation 7
Figure I.1: Hierarchy of spatial scales in conventional computing.
(Images from Wikipedia)
X := Y / Z
P[0] := N
i := 0
while i < n do
if P[i] >= 0 then
q[n-(i+1)] := 1
P[i+1] := 2*P[i] - D
else
q[n-(i+1)] := -1
P[i+1] := 2*P[i] + D
end if
i := i + 1
end while
computation will force it to take place on the scale (spatial and temporal)
near that of the underlying physical processes. With fewer hierarchical levels
between computations and their physical realizations, and less time for im-
plementing computational processes, computation will have to become more
like the underlying physical processes. That is, post-Moore’s Law computing
will depend on a greater assimilation of computation to physics.
In discussing the role of physical embodiment in the “grand challenge” of
unconventional computing, Stepney (2004, p 29) writes,
Computation is physical; it is necessarily embodied in a device
whose behaviour is guided by the laws of physics and cannot be
completely captured by a closed mathematical model. This fact
of embodiment is becoming ever more apparent as we push the
bounds of those physical laws.
Traditionally, a sort of Cartesian dualism has reigned in computer science;
programs and algorithms have been conceived as idealized mathematical ob-
jects; software has been developed, explained, and analyzed independently
of hardware; the focus has been on the formal rather than the material.
Post-Moore’s Law computing, in contrast, because of its greater assimilation
to physics, will be less idealized, less independent of its physical realization.
On one hand, this will increase the difficulty of programming since it will
be dependent on (or, some might say, contaminated by) physical concerns.
On the other hand, as will be explored in this book, it also presents many
opportunities that will contribute to our understanding and application of
information processing in the future.
that occurs in the brain is perhaps the most familiar example of computa-
tion in nature, but there are many others, such as the distributed and self-
organized computation by which social insects solve complicated optimiza-
tion problems and construct sophisticated, highly structured nests. Also,
the DNA of multicellular organisms defines a developmental program that
creates the detailed and complex structure of the adult organism. For exam-
ples of computation inspired by that in nature, we may cite artificial neural
networks, genetic algorithms, artificial immune systems, and ant swarm op-
timization, to name just a few. Next I will consider a few of the issues that
are important in natural computation, but outside the frame of relevance of
the CT model.
One of the most obvious issues is that, because computation in nature
serves an adaptive purpose, it must satisfy stringent real-time constraints.
For example, an animal’s nervous system must respond to a stimulus —
fight or flight, for example — in a fraction of a second. Also, in order to
control coordinated sensorimotor behavior, the nervous system has to be
able to process sensory and proprioceptive inputs quickly enough to generate
effector control signals at a rate appropriate to the behavior. And an ant
colony must be able to allocate workers appropriately to various tasks in real
time in order to maintain the health of the colony.
In nature, asymptotic complexity is generally irrelevant; the constants
matter and input size is generally fixed or varies over a relatively limited range
(e.g., numbers of sensory receptors, colony size). Whether the algorithm
is linear, quadratic, or exponential is not so important as whether it can
deliver useful results in required real-time bounds in the cases that actually
occur. The same applies to other computational resources. For example, it is
not so important whether the number of neurons required varies linearly or
quadratically with the number of inputs to the network; what matters is the
absolute number of neurons required for the actual number of inputs, and
how well the system will perform with the number of inputs and neurons it
actually has.
Therefore, in natural computation, what does matter is how the real-time
response rate of the system is related to the real-time rates of its components
(e.g., neurons, ants) and to the actual number of components. This means
that it is not adequate to treat basic computational processes as having an
indeterminate duration or speed, as is commonly done in the CT model.
In the natural-computation frame of relevance, knowing that a computation
will eventually produce a correct result using finite but unbounded resources
C. SUPER-TURING VS. NON-TURING 13
C.2.d Nanocomputation
Nanocomputation is another domain of computation that seems to be out-
side the frame of relevance of the CT model. By nanocomputation I mean
any computational process involving sub-micron devices and arrangements of
information; it includes quantum computation (Ch. III) and molecular com-
putation (e.g., DNA computation), in which computation proceeds through
molecular interactions and conformational changes (Ch. IV).
Due to thermal noise, quantum effects, etc., error and instability are un-
avoidable characteristics of nanostructures. Therefore they must be taken
as givens in nanocomputational devices and in their interrelationships (e.g.,
interconnections), and also in the structures constructed by nanocomputa-
tional processes (e.g., in algorithmic self-assembly: Winfree, 1998). There-
fore, a “perfect” structure is an over-idealized assumption in the context
of nanocomputation; defects are unavoidable. In many cases structures are
not fixed, but are stationary states occurring in a system in constant flux.
Similarly, unlike in the CT model, nanocomputational operations cannot
be assumed to proceed correctly, for the probability of error is always non-
negligible. Error cannot be considered a second-order detail added to an
assumed perfect computational system, but should be built into a model
of nanocomputation from the beginning. Indeed, operation cannot even be
assumed to proceed uniformly forward. For example, chemical reactions al-
16 CHAPTER I. INTRODUCTION
C.3.e Purpose
First, these definitions make reference to the function or purpose of a system,
but philosophers and scientists are justifiably wary of appeals to purpose, es-
pecially in a biological context. However, the use of purpose in the definition
of computation is unproblematic, for in most cases of practical interest, pur-
pose is easy to establish. (There are, of course, borderline cases, but that
fact does not invalidate the definition.) On the one hand, in a technological
context, we can look to the stated purpose for which an artificial system was
designed. On the other, in a biological context, scientists routinely inves-
tigate the purposes (functions) of biological systems, such as the digestive
system and immune system, and make empirically testable hypotheses about
their purposes. Ultimately such claims of biological purpose may be reduced
to a system’s selective advantage to a particular species in that species’ en-
vironment of evolutionary adaptedness, but in most cases we can appeal to
more immediate ideas of purpose.
20 CHAPTER I. INTRODUCTION
C.3.f Transduction
The purpose of computation is the abstract transformation of abstract ob-
jects, but obviously these formal operations will be pointless unless the com-
putational system interfaces with its environment in some way. Certainly
our computers need input and output interfaces in order to be useful. So
also computational systems in the brain must interface with sensory recep-
tors, muscles, and many other noncomputational systems to fulfill their func-
tions. In addition to these practical issues, the computational interface to the
physical world is relevant to the symbol grounding problem, the philosophi-
C. SUPER-TURING VS. NON-TURING 21
cal question of how abstract symbols can have real-world content (Harnad,
1990, 1993; MacLennan, 1993). Therefore we need to consider the interface
between a computational system and its environment, which comprises input
and output transducers.
The relation of transduction to computation is easiest to see in the case
of analog computers. The inputs and outputs of the computational system
have some physical dimensions (light intensity, air pressure, mechanical force,
etc.), because they must have a specific physical realization for the system
to accomplish its purpose. On the other hand, the computation itself is
essentially dimensionless, since it manipulates pure numbers. Of course,
these internal numbers must be represented by some physical quantities, but
they can be represented in any appropriate physical medium. In other words,
computation is generically realized, that is, realized by any physical system
with an appropriate formal structure, whereas the inputs and outputs are
specifically realized, that is, constrained by the environment with which they
interface to accomplish the computational system’s purpose.
Therefore we can think of (pure) transduction as changing matter (or
energy) while leaving form unchanged, and of computation as transforming
form independently of matter (or energy). In fact, most transduction is not
pure, for it modifies the form as well as the material substrate, for example, by
filtering. Likewise, transductions between digital and analog representations
transform the signal between discrete and continuous spaces.
A powerful feedback loop has amplified the success of digital VLSI technol-
ogy to the exclusion of all other computational technologies. The success
of digital VLSI encourages and finances investment in improved tools, tech-
nologies, and manufacturing methods, which further promote the success of
digital VLSI. Unfortunately this feedback loop threatens to become a vi-
cious cycle. We know that there are limits to digital VLSI technology, and,
although estimates differ, we will reach them soon (see Ch. II). We have as-
sumed there will always be more bits and more MIPS, but that assumption
is false. Unfortunately, alternative technologies and models of computation
remain undeveloped and largely uninvestigated, because the rapid advance
of digital VLSI has surpassed them before they could be adequately refined.
Investigation of alternative computational technologies is further constrained
by the assumption that they must support binary logic, because that is the
only way we know how to compute, or because our investment in this model
of computation is so large. Nevertheless, we must break out of this vicious
cycle or we will be technologically unprepared when digital VLSI finally, and
inevitably, reaches its limits.
C. SUPER-TURING VS. NON-TURING 23
Therefore, as a means of breaking out of this vicious cycle, let us step back
and look at computation and computational technologies in the broadest
sense. What sorts of physical processes can we reasonably expect to use for
computation? Based on the preceding discussion, we can see that any math-
ematical process, that is, any abstract transformation of abstract objects, is
a potential computation. Therefore, in principle, any reasonably controllable,
mathematically described, physical process can be used for computation. Of
course, there are practical limitations on the physical processes usable for
computation, but the range of possible technologies is much broader than
might be suggested by a narrow conception of computation. Considering
some of the requirements for computational technologies will reveal some of
the possibilities as well as the limitations.
One obvious issue is speed. The rate of the physical process may be either
too slow or too fast for a particular computational application. That it might
be too slow is obvious, for the development of conventional computing tech-
nology has been driven by speed. Nevertheless, there are many applications
that have limited speed requirements, for example, if they are interacting
with an environment with its own limited rates. Conversely, these applica-
tions may benefit from other characteristics of a slower technology, such as
energy efficiency; insensitivity to uncertainty, error, and damage; and the
ability to be reconfigured or to adapt or repair itself. Sometimes we simply
want to slow a simulated process down so we can observe it. Another con-
sideration that may supersede speed is whether the computational medium
is suited to the application: Is it organic or inorganic? Living or nonliving?
Chemical, optical, or electrical?
A second requirement is the ability to implement the transducers required
for the application. Although computation is theoretically independent of its
physical embodiment, its inputs and outputs are not, and some conversions
to and from a computational medium may be easier than others. For exam-
ple, if the inputs and outputs to a computation are chemical, then chemical
or molecular computation may permit simpler transducers than electronic
computation. Also, if the system to be controlled is biological, then some
form of biological computation may suit it best.
Finally, a physical realization should have the accuracy, stability, control-
lability, etc. required for the application. Fortunately, natural computation
provides many examples of useful computations that are accomplished by
24 CHAPTER I. INTRODUCTION
realizations that are not very accurate, for example, neuronal signals have at
most about one digit of precision. Also, nature shows us how systems that
are subject to many sources of noise and error may be stabilized and thereby
accomplish their purposes.
them in alternative physical systems with the same abstract structure. For
example, neural computation or insect colony-like self-organization might be
realized in an optical system.
C.5 Conclusions
The historical roots of Church-Turing computation remind us that the the-
ory exists in a frame of relevance, which is not well suited to post-Moore’s
C. SUPER-TURING VS. NON-TURING 27
Physics of Computation
These lecture notes are exclusively for the use of students in Prof. MacLen-
nan’s Unconventional Computation course.
2018,
c B. J. MacLennan, EECS,
University of Tennessee, Knoxville. Version of November 17, 2018.
A Energy dissipation
As an introduction to the physics of computation, and further motivation
for unconventional computation, we will discuss Michael P. Frank’s analysis
of energy dissipation in conventional computing technologies (Frank, 2005b).
The performance R of a computer system can meeasured by the number of
computational operations executed per unit time. This ratio is the product
of the number operations per unit of dissipated energy times the energy
dissipation per unit time:
29
30 CHAPTER II. PHYSICS OF COMPUTATION
per unit charge, so the work to move an infinitesimal charge dq from one
plate to the other is V dq, where V is the voltage between the plates. But
V is proportional to the charge already on the capacitor, V = q/C. So the
change in energy is dE = V dq = Cq dq. Hence the energy to reach a charge
Q is Z Q
q 1 Q2
E= dq = .
0 C 2C
Therefore, E = 12 (CV )2 /C = 12 CV 2 and FE ≈ (1 op)/( 12 CV 2 ).
Frank observes that Moore’s law in the 1985–2005 period was a result of
an exponential decrease in C resulting from decreasing feature sizes (since
capacitance is proportional to area) and a decrease in logic voltage V from
5V to about 1V (further improving E by a factor of 25). The clock rate also
went up with smaller feature sizes. (See Fig. II.1.)
Unfortunately, neither the transistor lengths nor the voltage can be re-
duced much more, for if the signal is too small in comparison with thermal
energy, then thermal noise will lead to unreliable operation, because the
thermal fluctuations will be of the same order as the signals (Fig. II.2). The
thermal energy is ET = kB T , where kB is Boltzmann’s constant and T is the
absolute temperature. Since kB ≈ 8.6 × 10−5 eV/K = 1.38 × 10−23 J/K, and
room temperature T ≈ 300K, room-temperature thermal energy is
ET = kB T ≈ 26 meV ≈ 4.14 × 10−21 J ≈ 4 zJ.
(Fig. II.1 shows ET .)
We have seen that Esig = 21 CV 2 , but for reliable operation, how big
should it be in comparison to ET ? Frank estimates Esig ≥ kB T ln R, where
the reliability R = 1/perr , for a desired probability of error perr .1 For example,
for a reasonable reliability R = 2 × 1017 , Esig ≥ 40kB T ≈ 1 eV, which is the
energy to move one electron with 1V logic levels. This implies a maximum
energy efficiency of
1 op
FE = 1 op/eV ≈ −19
= 6.25 × 1018 op/J. (II.2)
1.6 × 10 J
A round 100kB T corresponds to an error probability of perr = e−100 = 3.72 ×
10−44 (at room temperature). Therefore, a reasonable target for reliable
operation is
Esig & 100kB T ≈ 2.6 eV = 414 zJ.
1
Frank (2005b, slide 7).
A. ENERGY DISSIPATION 31
Trend of ITRS
Min.'97-'03
Transistor Switching
Gate Energy Trends Energy
Based on ITRS ’97-03 roadmaps
1.E-14
250 LP min gate energy, aJ
1.E-15 180 HP min gate energy, aJ
130 Node numbers fJ
100 k(300 K)
90 (nm DRAM hp) ln(2) k(300 K)
1.E-16 65
1 eV
45
32 k(300 K)
CVV/2 energy, J
1.E-17 22
Practical limit for CMOS?
1.E-18 Room-temperature 100 kT reliability limit aJ
One electron volt
1.E-19
1.E-22
1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045
Year
2005/05/05 M. Frank, "Introduction to Reversible Computing" 6
Figure II.1: Historical and extrapolated switching energy. Figure from Frank
(2005b, slide 9).
32 CHAPTER II. PHYSICS OF COMPUTATION
0.75
0.50
0.25
0.00
0.25
0.50
0.75
0 1 2 3 4 5
puter’s capacity for any extended period; most of the processors are idling.8
So with those 10.6 × 106 cores, most of the time about ten million of them
are idle! There has to be a better way.
8
Spectrum (Feb. 2011) spectrum.ieee.org/computing/hardware/nextgeneration-
supercomputers/0 (accessed 2012-08-20).
B. THERMODYNAMICS OF COMPUTATION 35
B Thermodynamics of computation
“Computers may be thought of as engines for transforming free energy into
waste heat and mathematical work.” — Bennett (1982)
P
Or more briefly, H = − k pk log pk . For example, if P{1} = 1/16 and
P{0} = 15/16, we will receive, on the average, H = 0.3 bits of information.
According to a well-known story, Shannon was trying to decide what to
call this quantity and had considered both “information” and “uncertainty.”
Because it has the same mathematical form as statistical entropy in physics,
von Neumann suggested he call it “entropy,” because “nobody knows what
entropy really is, so in a debate you will always have the advantage.”9 (This
is one version of the quote.) I will call it information entropy when I need to
distinguish it from the thermodynamical concept.
An important special case is the entropy of a uniform distribution. If
there are N signals that are all equally likely, then H = log N . Therefore, if
we have eight equally likely possibilities, the entropy is H = lg 8 = 3 bits.10
9
https://en.wikipedia.org/wiki/History of entropy (accessed 2012-08-24).
Ralph Hartley laid the foundations of information theory in 1928, on which Claude
Shannon built his information theory in 1948.
10
I use the notations lg x = log2 x and ln x = loge x.
36 CHAPTER II. PHYSICS OF COMPUTATION
additional
DoF
logical 0
logical 0
logical 1
process
t t+1
Figure II.4: Thermodynamics of erasing a bit. On the left is the initial state
(time t), which may be logical 0 or logical 1; on the right (time t + 1) the
binary device has been set to logical 0. In each case there are N microstates
representing each prior state, so a total of 2N microstates. However, at time
t + 1 the system must be in one of N posterior microstates. Therefore N
of the microstate trajectories must exit the defined region of phase space
by expanding into additional, uncontrolled degrees of freedom. Therefore
entropy of the environment must increase by at least ∆S = k log(2N ) −
k log N = k log 2. We lose track of this information because it passes into
uncontrolled degrees of freedom.
space volume.
The information lost, or dissipated into NIBDF (typically as heat), is
∆S = k log(2N )−k log N = k log 2. (In physical units this is ∆S = kB ln 2 ≈
10−23 J/K.) Therefore the increase of energy in the device’s environment is
∆Q = ∆S × Tenv = kB Tenv ln 2 ≈ 0.7kTenv . At Tenv = 300K, kB Tenv ln 2 ≈
18 meV ≈ 3 × 10−9 pJ = 3 zJ. We will see that this is the minimum energy
dissipation for any irreversible operation (such as erasing a bit); it is called the
von Neumann–Landauer (VNL) bound (or sometimes simply the Landauer
bound). Von Neumann suggested the idea in 1949, but it was published first
by Rolf Landauer (IBM) in 1961.13 Recall that for reliable operation we need
minimum logic levels around 40kB Tenv to 100kB Tenv , which is two orders of
magnitude above the von Neumann–Landauer limit of 0.7kB Tenv . “From a
technological perspective, energy dissipation per logic operation in present-
day silicon-based digital circuits is about a factor of 1,000 greater than the
ultimate Landauer limit, but is predicted to quickly attain it within the next
couple of decades” (Berut et al., 2012). That is, current circuits have signal
levels of about 18 eV, which we may compare to the VNL, 18 meV.
In research reported in March 2012 Berut et al. (2012) confirmed ex-
perimentally the Landauer Principle and showed that it is the erasure that
dissipates energy. They trapped a 2µ silica ball in either of two laser traps,
representing logical 0 and logical 1. For storage, the potential barrier was
greater than 8kB T , and for erasure, the barrier was lowered to 2.2kB T by
decreasing the power of the lasers and by tilting the device to put it into
the logical 0 state (see Fig. II.5). At these small sizes, heat is a stochastic
property, so the dissipated heat was computed by averaging the trajectory
of the particle over multiple trials:
Z τ
∂U (x, t)
hQi = − ẋ(t) dt .
0 ∂x x
(The angle brackets means “average value.”) Complete erasure results in the
ball being in the logical 0 state; incomplete erasure results in it being in the
logical 0 state with probability p. They established a generalized Landauer
bound in which the dissipated heat depends on the completeness of erasure:
Figure II.5: Erasing a bit by changing potential barrier. (Figure from Berut
et al. (2012).)
B. THERMODYNAMICS OF COMPUTATION 41
We can separate this into the macroscopic entropy associated with the macro-
states (IBDF) and the microscopic entropy associated with the microstates
PNj
(NIBDF). Now let Pj = i=1 pij be the probability of being in macrostate
j. Then Eq. II.3 can be rearranged (Exer. II.1):
Nj
X X X pij pij
S = −k Pj log Pj − k Pj log = Si + Sh . (II.4)
j j i=1
Pj Pj
Note that the ratios in the inner summation are essentially conditional prob-
abilities, and that the inner summation is the conditional entropy given that
you are in macrostate j.
When we erase a bit, we go from a maximum Si of 1 bit (if 0 and 1 are
equally likely), to 0 bits (since there is no uncertainty). Thus we lose one bit
of information, and the macrostate entropy decreases ∆Si = −k log 2. (The
actual entropy decrease can be less than 1 bit if the 0 and 1 are not equally
likely initial states.) Since according to the Second Law of Thermodynamics
42 CHAPTER II. PHYSICS OF COMPUTATION
For each gate, we can express Ho in terms of the probabilities of the inputs
and compute the decrease from Hi (exercise). If the inputs are not equally
likely, then the input entropy will be less than 2 bits, but we will still have
Hi > Ho and energy will be dissipated. (Except in a trivial, uninteresting
case. What is it?)
More generally, any irreversible operation (non-invertible function) will
lose information, which has to be dissipated into the environment. That is,
it is irreversible because it loses information, and every time we lose informa-
tion, we lose energy. If the function is not one-to-one (injective), then at least
two inputs map to the same output, and so information about the inputs is
lost. For example, changing a bit, that is, overwriting a bit with another bit,
is a fundamental irreversible operation, subject to the VNL limit. Therefore,
the assignment operation is bound by it. Also, when two control paths join,
we forget where we came from, and so again we must dissipate at least a
bit’s worth of entropy (Bennett, 2003). These considerations suggest that
reversible operations might not be subject to the VNL limit, and this is in
fact the case, as we will see.
The preceding observations have important connections with the problem
of Maxwell’s Demon and its resolution. Briefly, the demon has to reset its
mechanism after each measurement in preparation for the next measurement,
and this dissipates at least kT log 2 energy into the heat bath for each decision
that it makes. That is, the demon must “pay” for any information that it
acquires. Therefore, the demon cannot do useful work. Further discussion
is outside the scope of this book, so if you are interested, please see Leff &
B. THERMODYNAMICS OF COMPUTATION 43
Rex (2003) and Leff & Rex (1990) (which have a large intersection), in which
many of the papers on the topic are collected.
they don’t dissipate any energy. (This is what we have with elastic collisions.)
This is the approach of reversible computing.
Suppose we want irreversible mechanical modes, e.g., for implementing
irreversible logic. The underlying physics is reversible, and so the informa-
tion lost by the mechanical modes cannot simply disappear; it must be trans-
ferred to the thermal modes. This is damping: Information in the mechanical
modes, where it is accessible and usable, is transferred to the thermal modes,
where it is inaccessible and unusable. This is the thermalization of informa-
tion, the transfer of physical information from accessible DoF to inaccessible
DoF. But the interaction is bidirectional, so noise (uncontrolled DoF) will
flow from the thermal modes back to the mechanical modes, making the
system nondeterministic.
As Feynman said, “If we know where the damping comes from, it turns
out that that is also the source of the fluctuations” [Feynman, 1963]. Think
of a bullet ricocheting off a flexible wall filled with sand. It dissipates energy
into the sand and also acquires noise in its trajectory (see Fig. II.6). To avoid
nondeterminacy, the information may be encoded redundantly so that the
noise can be filtered out. For example, the signal can be encoded in multiple
mechanical modes, over which we take a majority vote or an average. Or
the signal can be encoded with energy much greater than any one of the
thermal modes, E kT , to bias the energy flow from mechanical to ther-
mal (preferring dissipation over noise). Free energy must be used to refresh
the mechanical modes, and heat must be flushed from the thermal modes.
“[I]mperfect knowledge of the dynamical laws leads to uncertainties in the
46 CHAPTER II. PHYSICS OF COMPUTATION
C Reversible computing
C.1 Reversible computing as a solution
Notice that the key quantity FE in Eqn. II.1 depends on the energy dissipated
as heat.15 The 100kB T limit depends on the energy in the signal (necessary
to resist thermal fluctuation causing a bit flip). Although most common
computing operations must dissipate a minimum amount of energy (as given
by VNL), there is nothing that says that information processing has to be
done this way. There is no need to dissipate energy, and an arbitrarily large
amount of it can be recovered for future operations (“arbitrary” in the sense
that there is no inherent physical lower bound on the energy that must be
dissipated and cannot be recovered). Accomplishing this becomes a mat-
ter of precise energy management: moving it around in different patterns,
with as little dissipation as possible. Indeed, Esig can be increased to im-
prove reliability, provided we minimize dissipation of energy. This goal can
be accomplished by making the computation logically reversible (i.e., each
successor state has only one predecessor state).
All fundamental physical theories are Hamiltonian dynamical systems,
and all such systems are time-reversible. That is, if ψ(t) is a solution, then
so is ψ(−t). That is, in general, physics is reversible. Therefore, physical
information cannot be lost, but we can lose track of it. This is entropy:
“unknown information residing in the physical state.” Note how this is fun-
damentally a matter of information and knowledge: processes are irreversible
because information becomes inaccessible. Entropy is ignorance.
To avoid dissipation, don’t erase information. The problem is to keep
track of information that would otherwise be dissipated, to avoid squeezing
information out of logical space (IBDF) into thermal space (NIBDF). This is
accomplished by making computation logically reversible (it is already phys-
ically reversible). In effect, computational information is rearranged and
recombined in place. (We will see lots of examples of how to do this.)
In 1970s, Ed Fredkin, Tommaso Toffoli, and others at MIT formed the Infor-
mation Mechanics group to the study the physics of information. As we will
15
This section is based on Frank (2005b).
48 CHAPTER II. PHYSICS OF COMPUTATION
see, Fredkin and Toffoli described computation with idealized, perfectly elas-
tic balls reflecting off barriers. The balls have minimum dissipation and are
propelled by (conserved) momentum. The model is unrealistic but illustrates
many ideas of reversible computing. Later we will look at it briefly (Sec. C.7).
They also suggested a more realistic implementation involving “charge pack-
ets bouncing around along inductive paths between capacitors.” Richard
Feynman (Caltech) had been interacting with Information Mechanics group,
and developed “a full quantum model of a serial reversible computer” (Feyn-
man, 1986).
Charles Bennett (1973) (IBM) first showed how any computation could be
embedded in an equivalent reversible computation. Rather than discarding
information (and hence dissipating energy), it keeps it around so it can later
“decompute” it back to its initial state. This was a theoretical proof based on
Turing machines, and did not address the issue of physical implementation.
Bennett (1982) suggested Brownian computers (or Brownian motion ma-
chines) as a possible physical implementation of reversible computation. The
idea is that rather than trying to avoid randomization of kinetic energy
(transfer from mechanical modes to thermal modes), perhaps it can be ex-
ploited. This is an example of respecting the medium in embodied computa-
tion. A Brownian computer makes logical transitions as a result of thermal
agitation. However, because it is operating at thermal equilibrium, it is
about as likely to go backward as forward; it is essentially conducting a ran-
dom walk, and therefore can be expected to take Θ(n2 ) time to advance n
steps from its initial state. A small energy input — a very weak external
driving force — biases the process in the forward direction, so that it pre-
cedes linearly, but still very slowly. This means we will need to look at the
relation between energy and computation speed (Sec. C.1.b). DNA polymer-
ization provides an example. We can compare its energy dissipation, about
40kB T (∼ 1 eV) per nucleotide, with its rate, about a thousand nucleotides
per second.
Bennett (1973) also described a chemical Turing machine, in which the
tape is a large macromolecule analogous to RNA. An added group encodes
the state and head location, and for each transition rule there is a hypothet-
ical enzyme that catalyzes the state transition. We will look at molecular
computation in much more detail later in the class.
As in ballistic computing, Brownian computing needs logical reversibility.
With no driving force, it is equally likely to move forward or backward, but
any driving force will ensure forward movement. Brownian computing can
C. REVERSIBLE COMPUTING 49
computation speed (Frank, 2005b). Let Ediss be the energy dissipated per
operation and fop be the frequency of operations; then the energy coefficient
is defined:
def
cE = Ediss /fop .
Since the 1980s, and especially in the 1990s there has been work in adiabatic
circuits. An adiabatic process takes place without input or dissipation of
energy, and adiabatic circuits minimize energy use by obeying certain circuit
design rules. For example: (1) Never turn on a transistor when there is a
voltage potential between the source and drain. (2) Never turn off a transistor
when current is flowing through it. “[A]rbitrary, pipelined, sequential logic
could be implemented in a fully-reversible fashion, limited only by the energy
coefficients and leakage currents of the underlying transistors.” As of 2004,
about cE = 3 meV/kHz was achievable, which is about 250× less than DNA.
In conservative logic, all signal processing is ultimately reduced to conditional routing of signals. Roughly
speaking, signals are treated
P8 “The as unalterable
topology objects that
of space-time is can be moved
locally around in“Intuitively,
Euclidean. the course of athe
computation
but never created amount
or destroyed. For the physical significance of this approach, see Section
of ‘room’ available as one moves away from a certain point 6.
in space increases
2.5. Conservative-Logic as a Finally,
Circuits. power (rather
we shall than as ana scheme
introduce exponential) of the
for connecting signals,
distance
represented by unit from
wires, with thatrepresented
events, point, thus severely limitinggates.
by conservative-logic the connectivity of a
circuit.”
We will see that two primitive operations are sufficient for conservative logic:
the unit wire and the Fredkin gate.
C. REVERSIBLE COMPUTING 53
c c 0 0 1 1
a a' a a a b
b b' b b b a
(a) (b)
Figure II.9: Fredkin gate or CSWAP (conditional swap): (a) symbol and (b)
operation.
c c c c c c a a'
a a' a a' a a' b b'
b b' b b' b b' c c
Figure
A conservative-logic circuit is a II.10:
directedAlternative
graph whosenotations for Fredkin gate.gates and whose arcs are
nodes are conservative-logic
wires of any length (cf. Figure 3).
0
u u a
a a' = ūa+ub b
b b' = ua+ūb ab
āb a
(a) (b)
C.4 Universality
Fig. II.12(b) illustrates how the Fredkin gate can be used to implement AND;
other conventional gates, such as NOT, OR, and FAN-OUT can be imple-
mented similarly. (You will show this in Exercises II.4 to II.5). Notice that
the implementation of AND requires that we provide a constant 0 on the
second data line, and the Fredkin gate produces two “garbage bits” whose
values (ab and a) we might not need. Fig. II.14 shows a more complicated
example, a 1-line to 4-line demultiplexer. Depending on the value of the
address bits A1 A0 it will direct the input X to Y0 , Y1 , Y2 , or Y3 . The circuit
requires three constant 0 inputs and produces two garbage bits in addition
the the desired outputs.
As a consequence, you can convert conventional logic circuits (constructed
from AND, OR, NOT, etc.) into conservative circuits, but the process is not
very efficient, because of the need for many ancillary (constant input) and
garbage bits. It’s better to design the conservative circuit from scratch. Nev-
56 CHAPTER II. PHYSICS OF COMPUTATION
Figure 4. Behavior of the Fredkin gate (a) with unconstrained inputs, and (b) with x2 constrained to the
value 0, thus realizing the AND function.
Figure 5.Figure
Realization by !using source
II.13:of f“Realization of fand
bysink. The function
φ using source and: (c, x)
sink.(y, The
g) is chosen so that, for a
function
particular value of c, y = f(x).
φ : (c, x) 7→ (y, g) is chosen so that, for a particular value of c, y = f (x).”
(Fredkin & Toffoli, 1982)
Terminology: source, sink, constants, garbage. Given any finite function , one obtains a new function f
"embedded" in it by assigning specified values to certain distinguished input lines (collectively called the
source) and disregarding certain distinguished output lines (collectively called the sink). The remaining
input lines will constitute the argument, and the remaining output lines, the result. This construction (Figure
5) is called a realization of f by means of !using source and sink. In realizing f by means of , the source
lines will be fed with constant values, i.e., with values that do not depend on the argument. On the other
hand, the sink lines in general will yield values that depend on the argument, and thus cannot be used as
input constants for a new computation. Such values will be termed garbage. (Much as in ordinary life, this
0 0 0
A0
A1 Y3
X Y2
Y1
Y0
A1 A0
ertheless, this shows that any conventional sequential circuit can be converted
into a conservative logic circuit, provided there is a source for constants and
a sink for garbage. As a consequence, the unit wire and Fredkin gate are a
universal set of conservative operators, since they can be used to implement
(for example) AND, NOT, and FAN-OUT, which are universal.
Figure 20. (a) Computation of y = f(x) by means of a combinational conservative-logic network . (b) This
58 -1
computation
Figure 21. The network obtained isCHAPTER
"undone"
by combining byII.
and the PHYSICS
-1 inverse OF COMPUTATION
network,
'looks from the outside like a bundle of parallel
wires. The value y(=f(x)) is buried in the middle.
The remarkable achievements of this construction are discussed below with the help of the schematic
representation of Figure 24. In this figure, it will be convenient to visualize the input registers as "magnetic
bulletin boards," in which identical, undestroyable magnetic tokens can be moved on the board surface. A
token at a given position on the board represents a 1, while the absence of a token at that position represents
a 0. The capacity of a board is the maximum number of tokens that can be placed on it. Three such registers
Figure 21. The network
Figure obtained by combining
II.15: box"
Composition and -1 'looks
of combinational from the outside like
conservative-logic a bundle of parallel
are sent through a "black F, which represents the conservative-logic circuit of network
Figure 23,with
and when they
its inverse wires. The value y(=f(x)) is buried in the middle.
reappear some of the to consume
tokens the garbage.
may have been moved, (Fredkin
but none& Toffoli,
taken away1982)
or added. Let us follow this
process, register by register.
The remarkable achievements of this construction are discussed below with the help of the schematic
representation of Figure 24. In this figure, it will be convenient to visualize the input registers as "magnetic
bulletin boards," in which identical, undestroyable magnetic tokens can be moved on the board surface. A
token at a given position on the board represents a 1, while the absence of a token at that position represents
a 0. The capacity of a board is the maximum number of tokens that can be placed on it. Three such registers
are sent through a "black box" F, which represents the conservative-logic circuit of Figure 23, and when they
reappear some of the tokens may have been moved, but none taken away or added. Let us follow this
process, register by register.
Figure 22. Figure
The value a carried
II.16: Theby“spy
an arbitrary
circuit”linefor(a)tapping
can be inspected
into the in aoutput.
nondestructive way by
(Fredkin & the "spy"
device in (b).
Toffoli, 1982)
We use this circuit on the bits of y between the computation φ and the de-
computation φ−1 . Notice that the spy circuit requires two ancillary bits for
Figure 22.each
The value a carried
bit that by an arbitrary
it extracts; line (a)the
it outputs can be inspected
desired in aand
value nondestructive way by the "spy"
its complement
(presumably garbage). device in (b).
We can use these ideas to design a general approach to garbageless compu-
tation (Fig. II.17). The desired computation has m input bits, x1 , x2 , . . . , xm ,
and n output bits y1 , . . . , yn . To do it reversibly requires (we suppose) h con-
stants c1 , . . . , ch and generates (necessarily) h + m − n garbage bits (for the
number of outputs of a reversible computation has to equal the number of
inputs). Extracting the output requires the provision of 2n new constants
and generates the n output bits and their n complements (which can be con-
sidered garbage). Initializing the machine for a new computation requires
putting in the new input, which will dissipate energy, and restoring the out-
put registers y ȳ to their 00 · · · 0011 · · · 11 state, which also dissipates energy.
Therefore the energy dissipated will be proportional to the size of the input
and output (specifically, m + n). The ancillary constants are automatically
restored by decomputation.
Consider the more schematic diagram in Fig. II.18. Think of arranging
tokens (representing 1-bits) in the input registers, both to represent the input
C. REVERSIBLE COMPUTING 59
Figure 23. A "garbageless" circuit for computing the function y = f(x). Inputs C ,..., C and X ,…., X are
Figure 23. A "garbageless" circuit for computing the function y = f(x). Inputs C11,..., Chhand X11,…., Xmmare
Figure
returned unchanged, while II.17: Garbageless
the constants circuit.
0,...,0 and 1,..., 1 (Fredkin & part
in the lower Toffoli,
of the1982)
circuits are replaced by the
returned unchanged, while the constants 0,...,0 and 1,..., 1 in the lower part of the circuits are replaced by the
result, y1,...., yn and its complement, ¬y1,...., ¬yn
result, y1,...., yn and its complement, ¬y1,...., ¬yn
Figure 24. The conservative-logic scheme for garbageless computation. Three data registers are "shot"
Figure Figure
24. The II.18:
conservative-logic scheme for garbageless
“The conservative-logic schemecomputation. Three data
for garbageless registers are "shot"
computation.
through a conservative-logic black-box F. The register with the argument, x, is returned unchanged; the
throughThree
a conservative-logic black-box F. The register with the argument, x, is returnedFunchanged; the
clean registerdata
on topregisters are ‘shot’
of the figure, through
representing anaappropriate
conservative-logic black-box
supply of input . The
constants, is used as a
clean register
register on top of the figure, representing an appropriate supply of input constants, is used as a
scratchpad during with the argument,
the computation (cf. thex,c is
andreturned
g lines in unchanged;
Figure 23) butthe clean register
is returned on end of
clean at the the
scratchpad during the computation (cf. the c and g lines in Figure 23) but is returned clean at the end of the
top
computation. of the
Finally, figure, representing
the tokens on the registeran atappropriate
the bottom ofsupply of
the figure input
are constants,
rearranged so as to
computation. Finally, the tokens on the register at the bottom of the figure are rearranged so as to encode the
is encode the
used as a scratchpad during result
theyyand
and its complement
computation ¬y c and g lines in Figure
(cf. ¬y
the
result its complement
[II.17]) but is returned clean at the end of the computation. Finally, the
(a) The "argument" register, containing a given arrangement of tokens x, is returned unchanged. The
(a) The "argument"
tokens on theregister,
registercontaining a given arrangement
at the bottom of are
of the figure tokens x, is returned
rearranged unchanged.
so as to The
capacity of this register is m, i.e., the number of bits in x.
capacity of this register is m, i.e., the number of bits in x.
encode the result y and its complement ¬y” (Fredkin & Toffoli, 1982)
(b) A clean "scratchpad register" with a capacity of h tokens is supplied, and will be returned clean.
(b) A clean "scratchpad register" with a capacity of h tokens is supplied, and will be returned clean.
(This is the main supply of constants-namely, c , . . . , c in Figure 23.) Note that a clean register means one
(This is the main supply of constants-namely, c11, . . . , chhin Figure 23.) Note that a clean register means one
with all 0's (i.e., no tokens), while we used both 0's and l's as constants, as needed, in the construction of
with all 0's (i.e., no tokens), while we used both 0's and l's as constants, as needed, in the construction of
Figure 10. However, a proof due to N. Margolus shows that all 0's can be used in this register without loss of
Figure 10. However, a proof due to N. Margolus shows that all 0's can be used in this register without loss of
generality. In other words, the essential function of this register is to provide the computation with spare
generality. In other words, the essential function of this register is to provide the computation with spare
room rather than tokens.
room rather than tokens.
(c) Finally, we supply a clean "result" register of capacity 2n (where n is the number of bits in y). For
(c) Finally, we supply a clean "result" register of capacity 2n (where n is the number of bits in y). For
this register, clean means that the top half is empty and the bottom half completely filled with tokens. The
this register, clean means that the top half is empty and the bottom half completely filled with tokens. The
60 CHAPTER II. PHYSICS OF COMPUTATION
x, but also to provide a supply of n of them in the black lower square. Next,
run the computation (including both the forward and backward passes). The
backward pass restores the input argument tokens to their initial positions.
The 2n-bit string 00 · · · 0011 · · · 11 in the lower register has been rearranged
to yield the result and its complement, y ȳ. Restoring the 0 · · · 01 · · · 1 inputs
for another computation dissipates energy, but the amount of energy depends
on the size of the output (number of bits), not the amount of computation.17
Figure
Figure II.20: 14 Billiard
“Billiard ball ball model
model realizationofofthe
realization theinteraction
interaction gate.
gate.” (Fred-
kin & Toffoli, 1982)
All of the above requirements are met by introducing, in addition to collisions between two balls, collisions
between a ball and a fixed plane mirror. In this way, one can easily deflect the trajectory of a ball (Figure
15a), shift it sideways (Figure 15b), introduce a delay of an arbitrary number of time steps (Figure 1 Sc), and
guarantee correct signal crossover (Figure 15d). Of course, no special precautions need be taken for trivial
crossover, where the logic or the timing are such that two balls cannot possibly be present at the same
moment at the crossover point (cf. Figure 18 or 12a). Thus, in the billiard ball model a conservative-logic
wire is realized as a potential ball path, as determined by the mirrors.
Note that, since balls have finite diameter, both gates and wires require a certain clearance in order to
function properly. As a consequence, the metric of the space in which the circuit is embedded (here, we are
considering the Euclidean plane) is reflected in certain circuit-layout constraints (cf. P8, Section 2).
Essentially, with polynomial packing (corresponding to the Abelian-group connectivity of Euclidean space)
some wires may have to be made longer than with exponential packing (corresponding to an abstract space
Figure 12.62(a) Balls of radius l/sqrt(2) traveling
CHAPTERon a unit
II.grid. (b) Right-angle
PHYSICS elastic collision between two
OF COMPUTATION
balls.
Figure 13. (a) The interaction gate and (b) its inverse.
Figure II.21: “(a) The interaction gate and (b) its inverse.” (Fredkin &
Toffoli, 1982)
6.2. The Interaction Gate. The interaction gate is the conservative-logic primitive defined by Figure
13a, which also assigns its graphical representation.7
In the billiard ball model, the interaction gate is realized simply as the potential locus of collision of two
balls. Withfour logical
reference possibilities,
to Figure 14, let p,pqq,bep̄the
q, values
p q̄, andat ap̄certain
q̄ (theinstant
latterofrepresented by no associated
the binary variables
balls emerging from the gate). (Of course, the gate is conservative:
with the two points P, Q, and consider the values-four time steps later in this particular example-of the the
variables associated
number of with the four
balls pointsitA,has
entering B, C,toD.equal
It is clear
thethat these values
number are, inNotice
exiting.) the order shown in the
that
figure, pq,the
¬pq,interaction
p¬q; and pq.gate
In other words, there
is invertible, will beifa we
because ballput
at Ainif on
andthe
onlyright
if there
onewas
of athe
ball at P and
one at Q; similarly, there will be a ball at B if and only if there was a ball at
four possible outputs (11, 10, 01, 00), we will get the corresponding input onQ and none at P; etc.
the left (p q, p̄ q, Timing
6.3. Interconnection; p q̄, p̄ q̄, respectively).
and Crossover;CheckThe
to make sure Owing
Mirror. you seetothis (Ex. and NOT
its AND
II.10).
capabilities, Fig. II.21
the interaction is isa clearly
gate more abstract
a universalsymbol for the (as
logic primitive interaction
explained ingate and 5,
Section itswe assume
the availability of input constants). To verify that these capabilities are retained in the billiard ball model,
inverse.
one must make sureinteraction
The that one cangate
realize the appropriate
is universal interconnections,
because it can compute i.e., both
that one
ANDcanand
suitably route
balls fromNOT.
one collision
However, we must make provisions for arbitrary interconnections in aconsidering
locus to another and maintain proper timing. In particular, since we are
a planar grid, one must provide a way of performing signal crossover.
planar grid. Therefore, we need to implement signal crossover and to control
timing so that balls arrive at the correct time in order to interact. In fact,
it’s only necessary to deal with non-trivial crossover, for trivial crossover
is when two balls cannot possibly be at the same place at the same time.
Fig. II.22 shows mechanisms for realizing nontrivial crossover, delays, and
7
Note thatdirection
the interaction gate has
changes. four that
Notice outputthe
lines“wires”
but onlyinfour
this(rather than 24are
computer ) output states-in other
virtual,
words, therepresented
output variables
by the possible trajectories of the balls, and not physical objects. the same
are constrained. When one considers its inverse (Figure 13b),
constraintsFor
appear
reversible input
on the variables.the
computing, In Fredkin
composinggate
functions
is moreof this kind, one
relevant, andmust
Fig.exercise
II.23 due care
that the constraints are satisfied.
shows its realization in terms of multiple interaction gates. (The “bridge”
indicates non-trivial crossover.) Since the Fredkin gate is universal, any
reversible computation can be implemented on the billiard ball computer.
Of course, the billiard ball computer is an idealized model of computation,
and like other abstract models, such as the Turing machine, it has practical
limitations (Bennett, 1982). For example, minuscule errors of any sort (po-
sition, velocity, alignment) will accumulate rapidly (by about a factor of 2
between a ball and a fixed plane mirror. In this way, one can easily deflect the trajectory of a ball (Figure
15a), shift it sideways (Figure 15b), introduce a delay of an arbitrary number of time steps (Figure 1 Sc), and
guarantee correct signal crossover (Figure 15d). Of course, no special precautions need be taken for trivial
crossover, where the logic or the timing are such that two balls cannot possibly be present at the same
moment at the crossover point (cf. Figure 18 or 12a). Thus, in the billiard ball model a conservative-logic
wire is realized as a potential ball path, as determined by the mirrors.
Note that, since balls have finite diameter, both gates and wires require a certain clearance in order to
C. REVERSIBLE
function properly. COMPUTING
As a consequence, the metric of the space in which the circuit is embedded 63
(here, we are
considering the Euclidean plane) is reflected in certain circuit-layout constraints (cf. P8, Section 2).
Essentially, with polynomial packing (corresponding to the Abelian-group connectivity of Euclidean space)
some wires may have to be made longer than with exponential packing (corresponding to an abstract space
with free-group connectivity) (Toffoli, 1977).
Figure 15. The mirror (indicated by a solid dash) can be used to deflect a ball’s path (a), introduce a
Figure II.22: “The
sideways mirror
shift (b), (indicated
introduce by and
a delay (c), a solid dash)
realize can be
nontrivial used to(d).
crossover deflect
a ball’s path (a), introduce a sideways shift (b), introduce a delay (c), and
realize nontrivial crossover (d).” (Fredkin & Toffoli, 1982)
Figure 16. The switch gate and its inverse. Input signal x is routed to one of two output paths depending on
the value of the control signal, C.
!
! !$
" "$
# #$
Figure 3.14. A simple billiard ball computer, with three input bits and three output bits, shown entering on the left
Figure II.23:
and leaving on theRealization
right, respectively.ofThe
the Fredkin
presence gate
or absence of a in terms
billiard of multiple
ball indicates a 1 or a 0, interaction
respectively.
gates.
Empty [NC]
circles illustrate potential paths due to collisions. This particular computer implements the Fredkin classical
reversible logic gate, discussed in the text.
we will ignore the effects of noise on the billiard ball computer, and concentrate on
understanding the essential elements of reversible computation.
The billiard ball computer provides an elegant means for implementing a reversible
universal logic gate known as the Fredkin gate. Indeed, the properties of the Fredkin gate
provide an informative overview of the general principles of reversible logic gates and
circuits. The Fredkin gate has three input bits and three output bits, which we refer to
as a, b, c and a! , b! , c! , respectively. The bit c is a control bit, whose value is not changed
by the action of the Fredkin gate, that is, c! = c. The reason c is called the control bit
is because it controls what happens to the other two bits, a and b. If c is set to 0 then a
and b are left alone, a! = a, b! = b. If c is set to 1, a and b are swapped, a! = b, b! = a.
The explicit truth table for the Fredkin gate is shown in Figure 3.15. It is easy to see
64 CHAPTER II. PHYSICS OF COMPUTATION
D Exercises
Exercise II.1 Show that Eq. II.3 (p. 41) can be rearranged to Eq. II.4 (p.
41).
Exercise II.3 Show that the Fredkin gate implements (a, 0, 1) 7→ (a, a, ā)
(the “spy circuit”).
Exercise II.4 Show how to use a single Fredkin gate to implement each of
the NOT, OR, and FAN-OUT gates. (FAN-OUT copies its input value to
two output wires.) Note! You cannot use FAN-IN or FAN-OUT in your
implementations, only Fredkin gates.
Exercise II.5 Use the Fredkin gate to implement XOR. Minimize the num-
ber of Fredkin gates you use.
Exercise II.6 Give a truth table for a reversible gate that is not conservative
(does not preserve the total number of 1s and 0s). It should have the same
number of outputs as inputs.
Exercise II.7 Give a truth table for a 2-input / 2-output gate that is con-
servative but not reversible.
Exercise II.8 Show for the eight possible inputs that Fig. II.14 is a correct
implementation of a 1-line to 4-line demultiplexer. That is, show in each
of the four cases A1 A0 = 00, 01, 10, 11 the bit X = 0 or 1 gets routed to
Y0 , Y1 , Y2 , Y3 , respectively. You can use a Boolean algebra proof, if you prefer.
J K̄ behavior
0 0 reset, Q → 0
0 1 hold, Q doesn’t change
1 0 toggle, Q → Q̄
1 1 set, Q → 1
66 CHAPTER II. PHYSICS OF COMPUTATION
Q
J
K
?
Figure II.24: Implementation of J-K̄ flip-flop. [adapted from Fredkin & Tof-
foli (1982)]
Exercise II.10 Show that the inverse of the interaction gate (Fig. II.20)
works correctly. Hint: It only needs to work correctly for outputs that actu-
ally occur. Therefore, to invert a pq output, balls must be shot into outputs
A and D simultaneously.
Exercise II.11 Use interaction gates (and constant inputs and garbage out-
puts, as necessary) to implement NOT, OR (inclusive or), and XOR (exclu-
sive or).
Exercise II.12 Show that the realization of the Fredkin gate in terms of
interaction gates (Fig. II.23) is correct, by labeling the inputs and outputs
of the interaction gates with Boolean expressions of a, b, and c.
Chapter III
Quantum Computation
These lecture notes are exclusively for the use of students in Prof. MacLen-
nan’s Unconventional Computation course.
2018,
c B. J. MacLennan, EECS,
University of Tennessee, Knoxville. Version of November 17, 2018.
A Mathematical preliminaries
“[I]nformation is physical, and surprising physical theories such as quantum
mechanics may predict surprising information processing abilities.” (Nielsen
& Chuang, 2010, p. 98)
67
68 CHAPTER III. QUANTUM COMPUTATION
it is elegant and powerful, as are all good notations. Think of it like a new
programming language.
First, the notation |ψi represents an n-dimensional vector, which we
can write in a particular basis as a n × 1 complex column vector, |ψi =
(v1 , . . . , vn )T . We pronounce |ψi “ket psi” or “psi ket.” Normally the vectors
are finite-dimensional, but they can be infinite-dimensional if thePvectors have
a finite magnitude (their components are square-summable), k |vk |2 < ∞.
The Dirac notation has the advantage that we can use arbitrary names
for vectors, for example:
|excitedi, |zeroi, |onei, | ↑i, | %i, |1i, |101i, |5i, |f (x)i, |1⊗g(1)i.
It may be easier to remember if you notice that it looks kind of like an arrow;
compare |vi and ~v .
The notation hφ| represents a 1×n complex row vector, hφ| = (u1 , . . . , un )
in a particular basis. We pronounce hψ| “bra psi” or “psi bra.” If |ψi =
(v1 , . . . , vn )T , then hψ| = (v1 , . . . , vn ), where vk is the complex conjugate of
vk . Recall that
x + iy = x − iy and reiφ = re−iφ .
We define the adjoint (conjugate transpose, Hermitian transpose) M † of
a matrix M by
(M † )jk = Mkj .
We pronounce it “M dagger,” “M adjoint,” etc. Note that corresponding
bras and kets are adjoints: hψ| = |ψi† and |ψi = hψ|† .
1. It is positive definite:
hψ | ψi > 0, if |ψi =
6 0,
hψ | ψi = 0, if |ψi = 0.
Note that conjugate symmetry and linearity in the second argument together
imply that hcφ | ψi = chφ | ψi (antilinearity in the first argument). The
complex inner product is called sesquilinear, which means “one-and-a-half
linear” (in contrast to the inner product of real vectors, which is linear in
both arguments, i.e., bilinear). p
The norm or magnitude of a vector is defined k|ψik = hψ | ψi. That
is, k|ψik2 = |v1 |2 + · · · |vn |2 . A vector is normalized if k|ψik = 1. Note that
normalized vectors fall on the surface of an n-dimensional hypersphere.
This is just the vector’s representation inP a particular basis. (Note that this
equation implies the identity matrix I = k |kihk|.)
A linear operator L : H → Ĥ satisfies L(c|φi + d|ψi) = cL(|φi) + dL(|ψi)
for all |φi, |ψi ∈ H and c, d ∈ C. For example, differentiation is a linear
operator.
A linear operator L : H → Ĥ can be represented by a (possibly infinite-
dimensional) matrix relative to bases for H and Ĥ. To see this, suppose |1i,
|2i, . . . is a basis for H and |1̂i, |2̂i, . . . is a basis for Ĥ. Consider |φi = L|ψi
and represent the vectors in these bases by their Fourier coefficients: bj = ĥ |
φi and ck = hk | ψi. Hence |φi is represented by the vector b = (b1 , b2 , . . .)T
and |ψi by the vector c = (c1 , c2 , . . .)T . Apply the linearity of L:
bj = ĥ | φi
= ĥ | L | ψi
!
X
= ĥ|L ck |ki
k
!
X
= ĥ| ck L|ki
k
X
= ĥ | L | kick .
k
A. MATHEMATICAL PRELIMINARIES 71
def
Therefore, define the matrix Mjk = ĥ | L | ki and we see b = Mc. For
this reason, an expression of the form ĥ | L | ki is sometimes called a matrix
element of the operator L. Note that the matrix depends on the bases we
choose.
That is, |φihψ| is the operator that returns |φi scaled by the inner product of
|ψi and its argument. To the extent that the inner product hψ | χi measures
the similarity of |ψi and |χi, the result |φi is weighted by this similarity. The
product of a ket and a bra |φihψ| can be pronounced “φ-ket bra-ψ” or “φ
ket-bra ψ,” and abbreviated |φihψ|. This product is also called a dyad.
The special case of |φihφ| in which |φi is normalized is called a projector
onto |φi. This is because |φihφ| |ψi = |φi hφ | ψi, that is, |φi scaled by the
projection
P of |ψi on |φi. More generally, if |η1 i, . . . , |ηm i are orthonormal,
then m k=1 k ihηk | projects into the m-dimensional subspace spanned by these
|η
vectors.
Any linear operator can be represented as a weighted sum of outer prod-
ucts. To see this, suppose L : H → Ĥ, |̂i is a basis for Ĥ, and |ki is a basis
for H. Consider |φi = L|ψi. We know from Sec. A.2.c that
X
ĥ | φi = Mjk ck , where Mjk = ĥ | L | ki, and ck = hk | ψi.
k
Hence,
X
|φi = |̂i ĥ | φi
j
72 CHAPTER III. QUANTUM COMPUTATION
!
X X
= |̂i Mjk hk | ψi
j k
!
X X
= |̂i Mjk hk| |ψi
j k
!
X
= Mjk |̂ihk| |ψi.
jk
0
by all linear combinations of the basis vectors |ηP
j i ⊗ |ηk i. Therefore each
element of H ⊗ H0 is represented by a unique sum jk cjk |ηj i ⊗ |ηk0 i. In order
to make H ⊗ H0 a Hilbert space, we need an inner product:
That is, we multiply the inner products of the corresponding elements of the
tensor product pairs.
Usually, we are dealing with finite-dimensional spaces, in which case the
tensor products can be defined less abstractly in terms of matrices. Suppose
in a given basis |φi = (u1 , . . . , um )T and |ψi = (v1 , . . . , vn )T , then their tensor
product in that basis can be defined by the Kronecker product:
u1 |ψi
..
|φi ⊗ |ψi = .
um |ψi
T
= u1 |ψiT , . . . , um |ψiT
= (u1 v1 , . . . , u1 vn , . . . , um v1 . . . , um vn )T .
(|φi ⊗ |ψi)(j−1)n+k = uj vk .
Hence, the inner product of U |φi and U |ψi is the same as the inner product
of |φi and |ψi. Therefore, unitary operators are isometric, that is, they
preserve norms:
kU |ψik2 = hψ | U † U | ψi = hψ | ψi = k|ψik2 .
with respect to the |E0 i, . . . , |En i basis, we will get the result |Ej i with
probability |cj |2 and the state will “collapse” into state |Ej i. Since the prob-
abilities must add to 1, |c0 |2 + |c1 |2 + · · · + |cn |2 = 1, we know k|ψik = 1, that
is, the vector is normalized.
78 CHAPTER III. QUANTUM COMPUTATION
Figure III.1: Probability density of first six hydrogen orbitals. The main
quantum number (n = 1, 2, 3) and the angular momentum quantum number
(` = 0, 1, 2 = s, p, d) are shown. (The magnetic quantum number m = 0 in
these plots.) [fig. from wikipedia commons]
B.2 Postulates of QM
In this section you will learn the four fundamental postulates of quantum
mechanics.1
1
Quotes are from Nielsen & Chuang (2010) unless otherwise specified.
B. BASIC CONCEPTS FROM QUANTUM THEORY 79
Figure III.2: Relative phase vs. global phase. What matters in quantum
mechanics is the relative phase between state vectors (e.g., θ in the figure).
Global phase “has no physical meaning”; i.e., we can choose to put the 0◦
point anywhere we like.
Figure III.3: Relative phase vs. global phase of sine waves. There is no
privileged point from which to start measuring absolute phase, but there is
a definite relative phase between the two waves.
B. BASIC CONCEPTS FROM QUANTUM THEORY 81
This is another way of expressing the fact that the form is significant, but
not the size. However, it is more convenient to use normalized vectors in
ordinary Hilbert spaces and to ignore global phase.
def
Therefore define U (t) = exp(−iHt/~); then |ψ(t + s)i = U (t)|ψ(s)i. It turns
out that U is unitary (Exer. III.6). Hence the evolution of a closed quantum
mechanical system from a state |ψi at time t to a state |ψ 0 i at time t0 can
be described by a unitary operator, |ψ 0 i = U |ψi. Conversely, for any unitary
operator U there is a Hermitian K such that U = exp(iK) (Exer. III.7).
light won’t go on. In this case, it will collapse to state √1 |1i + √1 |2i, which
2 2
we get by renormalizing the state measured:
1
2
+ 21 |2i
|1i 1 1
q = √ |1i + √ |2i.
1 2 + 1 2 2 2
2 2
In other words, we zero out the coefficients of the states we didn’t measure
and renormalize (because quantum states are always normalized). Now we
develop these ideas more formally.
A measurement can be characterized by a set of projectors Pm , for each
possible measurement outcome m. In the first example above, the measure-
ment operators are P1 = |0ih0| and P2 = |1ih1|. In the second example, the
operators are P1 = |0ih0| and P2 = |1ih1|+|2ih2|. In the latter case, P1 projects
the quantum state into the subspace spanned by {|0i}, and P2 projects the
quantum state into the subspace spanned by {|1i, |2i}. These are orthogonal
subspaces of the original space (spanned by {|0i, |1i, |2i}).
Since a measurement must measure some definite state, a projective mea-
surement is a set of projectors P1 , . . . , PN satisfying: (1) They project into
orthogonal subspaces, so for m 6= n we have Pm PP n = 0, the identically zero
operator. (2) They are complete, that is, I = N m=1 Pm , so measurement
always produces a result. Projectors are also idempotent, Pm Pm = Pm , since
if a vector is already projected into the m subspace, projecting it again has
no effect. Finally, projectors are Hermitian (self-adjoint), as we can see:
!†
X X X
Pm† = |ηj ihηj | = (|ηj ihηj |)† = |ηj ihηj | = Pm .
j j j
p(m) = hψ | Pm | ψi
= hψ|(|mihm|)|ψi
= hψ | mihm | ψi
= hm | ψihm | ψi
= |hm | ψi|2
= |cm |2 .
P
P if P2m projects into a subspace, Pm = k |kihk|;
More generally, the same holds
the probability is p(m) = k |ck | . Alternatively, we can “zero out” the cj
for the orthogonal subspace, that is, for the |jihj| omitted by Pm . To maintain
a total probability of 1, the normalized state vector after measurement is
P |ψi Pm |ψi
pm = .
p(m) kPm |ψik
P12 = |A1 + A2 |2 = A1 A1 + A1 A2 + A2 A1 + A2 A2 ,
= P1 + A1 A2 + A1 A2 + P2 .
A C
A B C
Here we have a nonintuitive effect. Classical experience suggests that adding a filter should
only be able toFigure
decreaseIII.4: Fig.offrom
the number Rieffel
photons getting&through.
Polak How
(2000).
can it increase it?
2.1.2 The Explanation. A photon’s polarization state can be modelled by a unit vector
pointing in the appropriate direction. Any arbitrary polarization can be expressed as a
B.4 Superposition
linear combination a|↑" + b|→" of the two basis vectors2 |→" (horizontal polarization) and
|↑" (vertical polarization).
A simpleSinceexperiment demonstrates
we are only interested quantum
in the direction of theeffects that (the
polarization cannotion
not be of explained
“magni-
2
by tude”
classical
is notphysics (seethe
meaningful), Fig. III.4).
state vectorSuppose
will be a unitwe vector,
have three + |b|2 = 1.filters,
i.e., |a|polarizing In
◦
A, general,
B, andthe C,polarization
polarized ofhorizontally,
a photon can be45 expressed
, and as a|↑" + b|→"
vertically, where a and b Place
respectively. are
numbers3 such that |a|2 + |b|2 = 1. Note, the choice of basis for this representa-
thecomplex
horizontal filter A between a strong light source, such as a laser, and
tion is completely arbitrary: any two orthogonal unit vectors will do (e.g. {|$", |%"}).
a screen. The lightpostulate
The measurement intensity is reduced
of quantum mechanics by states
one that
halfany and themeasuring
device light isa hori-
2-
zontally
dimensional system has an associated orthonormal basis with respect to which thei.e.,
polarized. (Note: Since the light source is unpolarized, it has
quantum
all measurement
polarizations, takesthe
place. Measurement
resulting of a state
intensity would transforms
be much the less
state than
into oneoneof half
the if
themeasuring device’sonly
filter allowed associated
exactly basishorizontally
vectors. The probability
polarizedthat the state
light is measured
through, as
as would
basis vector |u" is the square of the norm of the amplitude of the component of the original
be implied by a sieve model of polarization.) Next insert filter C, polarized
state in the direction of the basis vector |u". For example, given a device for measuring
vertically, and the
the polarization intensity
of photons drops tobasis
with associated zero.{|↑",
This
|to"},isthe
notstatesurprising,
ψ = a|↑" + since
b|→" isthe
filters are cross-polarized.
measured as |↑" with probabilityFinally, insert
|a|2 and as |→" withfilter B, polarized
probability diagonally,
|b|2 (see Figure 1). Notebe-
that A
tween different
and C, measuring devices with some
and surprisingly have different associated
light (about 1/8basis, and measurements
intensity) will return!
using these devices will have different outcomes.
This can’t be explained by the sieve model. How can putting As measurements are always made with
in more filters
respect to an orthonormal basis, throughout the rest of this paper all bases will be assumed
increase the light intensity?
to be orthonormal.
Quantum
Furthermore,mechanics
measurement provides a simple
of the quantum explanation
state will change the state of the
to thethis effect;
result of the in
fact, it’s exactly
measurement. Thatwhat
is, if we should expect.
measurement of ψ = a|↑" A+ photon’s
b|→" results polarization state
in |↑", then the statecan
ψ changes to |↑" and a second measurement with respect to
be represented by a unit vector pointing in appropriate direction. Therefore, the same basis will return |↑"
with probability 1. Thus, unless the original state happened
arbitrary polarization can be expressed by a|0i+b|1i for any two basis vectors to be one of the basis vectors,
measurement will change that state, and it is not possible to determine what the original
|0i state
andwas.|1i, where |a|2 + |b|2 = 1.
A polarizing filter measures a state with respect to a basis that includes
2 The notation |→" is explained in section 2.2.
a vector parallel to its polarization and one orthogonal to it. The effect of
3 Imaginary coefficients correspond to circular polarization.
filter A is the projector PA = | →ih→ |. To get the probability amplitude,
def
apply it to |ψi = a|→i + b| ↑i:
p(A) = | h→ |ψ i |2 = |h→ |(a|→i + b|↑i)|2 = |ah→|→i + bh→|↑i|2 = |a|2 .
So with probability |a|2 we get | →i (recall Eqn. III.1, p. 84). So if the
polarizations are randomly distributed from the source, half will get through
B. BASIC CONCEPTS FROM QUANTUM THEORY 87
|↑>
|diag up>
|→>
|diag down>
and all of them will be in state |→i. Why one half? Note that a = cos θ,
where θ is the angle between |ψi and |→i, and that
Z 2π
2 1 1
ha i = cos2 θ dθ = .
2π 0 2
p(AC) = |h↑|→i|2 = 0.
Now insert the diagonal filter B between the horizontal and vertical filters
A and C. Filter B measures with respect to the projector {| %i, | &i} basis
(see Fig. III.5). Transmitted light is given by the projector PB = | %ih% |.
To find the result of applying filter B to the horizontally polarized light
emerging from filter A, we must express |→i in the diagonal basis:
1
|→i = √ (| %i + | &i).
2
88 CHAPTER III. QUANTUM COMPUTATION
Therefore we get |→i with another 1/2 decrease in intensity (so 1/8 overall).
B. BASIC CONCEPTS FROM QUANTUM THEORY 89
U |ψci = |ψψi
= (a|0i + b|1i) ⊗ (a|0i + b|1i)
= a2 |00i + ba|10i + ab|01i + b2 |11i.
Note that these two expansions cannot be made equal in general, so no such
unitary transformation exists. Cloning is possible only in the special cases
a = 0, b = 1 or a = 1, b = 0, that is, only where we know that we are
cloning a determinate (classical) basis state. The inability to simply copy
a quantum state is one of the characteristics of quantum computation that
makes it significantly different from classical computation.
B.6 Entanglement
B.6.a Entangled and decomposable states
The possibility of entangled quantum states is one of the most remarkable
characteristics distinguishing quantum from classical systems. Suppose that
H0 and H00 are the state spaces of two quantum systems. Then H = H0 ⊗ H00
is the state space of the composite system (Postulate 4). For simplicity,
suppose that both spaces have the basis {|0i, |1i}. Then H0 ⊗ H00 has the
90 CHAPTER III. QUANTUM COMPUTATION
basis {|00i, |01i, |10i, |11i}. (Recall that |01i = |0i ⊗ |1i, etc.) Arbitrary
elements of H0 ⊗ H00 can be written in the form
X X
cjk |jki = cjk |j 0 i ⊗ |k 00 i.
j,k=0,1 j,k=0,1
Sometimes the state of the composite systems can be written as the tensor
product of the states of the subsystems, |ψi = |ψ 0 i ⊗ |ψ 00 i. Such a state is
called a separable, decomposable or product state. In other cases the state
cannot be decomposed, in which case it is called an entangled state
For an example of an entangled state, consider the Bell state |β01 i, which
might arise from a process that produced two particles with opposite spin
(but without determining which is which):
def 1 def
|β01 i = √ (|01i + |10i) = |Φ+ i. (III.2)
2
(The notations |β01 i and |Φ+ i are both used.) Note that the states |01i and
|10i both have probability 1/2. Such a state might arise, for example, from
a process that emits two particles with opposite spin angular momentum in
order to preserve conservation of spin angular momentum.
To show that |β01 i is entangled, we need to show that it cannot be de-
composed, that is, that we cannot write |β01 i = |ψ 0 i ⊗ |ψ 00 i, for two state
vectors |ψ 0 i = a0 |0i + a1 |1i and |ψ 00 i = b0 |0i + b1 |1i. Let’s try a separation
or decomposition:
?
|β01 i = (a0 |0i + a1 |1i) ⊗ (b0 |0i + b1 |1i).
In addition to Eq. III.2, the other three Bell states are defined:
def 1 def
|β00 i = √ (|00i + |11i) = |Ψ+ i, (III.3)
2
def 1 def
|β10 i = √ (|00i − |11i) = |Ψ− i, (III.4)
2
def 1 def
|β11 i = √ (|01i − |10i) = |Φ− i. (III.5)
2
The Ψ states have two identical qubits, the Φ states have opposite qubits.
The + superscript indicates they are added, the − that they are subtracted.
The general definition is:
1
|βxy i = √ (|0, yi + (−1)x |1, ¬yi).
2
Remember this useful formula! The Bell states are orthogonal and in fact
constitute a basis for H0 ⊗ H00 (exercise).
B.7.a Informally
The uncertainty principle states a lower bound on the precision with which
certain pairs of variables, called conjugate variables, can be measured. These
are such pairs as position and momentum, and energy and time. For example,
the same state can be represented by the wave function ψ(x) as a function
of space and by φ(p) as a function of momentum. The most familiar version
of the Heisenberg principle, limits the precision with which location and mo-
mentum can be measured simultaneously: ∆x ∆p ≥ ~/2, where the reduced
Plank constant ~ = h/2π, where h is Planck’s constant.
It is often supposed that the uncertainty principle is a manifestation of
the observer effect, the inevitable effect that measuring a system has on it,
but this is not the case. “While it is true that measurements in quantum
mechanics cause disturbance to the system being measured, this is most em-
phatically not the content of the uncertainty principle.”(Nielsen & Chuang,
2010, p. 89)
Often the uncertainty principle is a result of the variables representing
measurements in two bases that are Fourier transforms of each other. Con-
sider an audio signal ψ(t) and its Fourier transform Ψ(ω) (its spectrum).
Note that ψ is a function of time, with dimension t, and its spectrum Ψ is a
function of frequency, with dimension t−1 . They are reciprocals of each other,
and that is always the case with Fourier transforms. Simultaneous mea-
surement in the time and frequency domains obeys the uncertainty relation
∆t∆ω ≥ 1/2. (For more details on this, including an intuitive explanation,
see MacLennan (prep, ch. 6).)
Time and energy are also conjugate, as a result of the de Broglie relation,
according to which energy is proportional to frequency: E = hν (ν in Hertz,
or cycles per second) or E = ~ω (ω in radians per second). Therefore simul-
taneous measurement in the time and energy domains obeys the uncertainty
principle ∆t∆E ≥ ~/2.
94 CHAPTER III. QUANTUM COMPUTATION
2 ∆P ∆Q ≥ |h[P, Q]i|,
B.7.b Formally
In this section we’ll derive the uncertainty principle more formally. Since
it deals with the variances of measurements, we begin with their definition.
To understand the motivation for these definitions, suppose we have a quan-
tum system (such as an atom) that can be in three distinct states |groundi,
|first excitedi, |second excitedi with energies e0 , e1 , e2 , respectively. Then the
energy observable is the operator
where the Pm are projectors onto the eigenspaces of M , and the eigenvalues
em are the corresponding measurement results. The projector Pm projects
B. BASIC CONCEPTS FROM QUANTUM THEORY 95
Note that E 2 , the matrix E multiplied by Pitself, is also the operator that
2 2
measures the square of the energy, E = j em |mihm|. (This is because E
is diagonal in this basis; alternately, E 2 can be interpreted as an operator
function.)
We now proceed to the derivation of the uncertainty principle.2
[L, M ] = LM − M L. (III.6)
Remark B.1 In effect, [L, M ] distills out the non-commutative part of the
product of L and M . If the operators commute, then [L, M ] = 0, the iden-
tically zero operator. Constant-valued operators always commute (cL = Lc),
and so [c, L] = 0.
{L, M } = LM + M L. (III.7)
See B.2.c (p. 82) for the justification of the following definitions.
Var{M } = h(M − hM i2 )i = hM 2 i − hM i2 .
2x = hψ | LM | ψi + (hψ | LM | ψi)∗
= hψ | LM | ψi + hψ | M † L† | ψi
= hψ | LM | ψi + hψ | M L | ψi since L, M are Hermitian
= hψ | {L, M } | ψi.
Likewise,
Hence,
Let |λi = L|ψi and |µi = M |ψi. By the Cauchy-Schwarz inequality, k|λik k|µik ≥
|hλ | µi| and so hλ | λi hµ | µi ≥ |hλ | µi|2 . Hence,
hψ | L2 | ψi hψ | M 2 | ψi ≥ |hψ | LM | ψi|2 .
Proposition B.2 Prop. B.1 can be weakened into a more useful form:
Hence,
2 ∆P ∆Q ≥ |h[P, Q]i|
C. QUANTUM INFORMATION 99
C Quantum information
C.1 Qubits
C.1.a Single qubits
Just as the bits 0 and 1 are represented by distinct physical states in a conven-
tional computer, so the quantum bits (or qubits) |0i and |1i are represented
by distinct quantum states. We call |0i and |1i the computational or stan-
dard measurement basis. What distinguishes qubits from classical bits is that
they can be in a superposition of states, a0 |0i + a1 |1i, for a0 , a1 ∈ C, where
|a0 |2 + |a1 |2 = 1. If we measure this state in the computational basis, we will
observe |0i with probability |a0 |2 and likewise for |1i; after measurement the
qubit is in the observed state. This applies, of course, to measurement in
any basis. I will depict the measurement possibilities this way:
|a0 |2
a0 |0i + a1 |1i −→ |0i,
|a1 |2
a0 |0i + a1 |1i −→ |1i.
def 1
|+i = √ (|0i + |1i), (III.8)
2
def 1
|−i = √ (|0i − |1i). (III.9)
2
Notice that |+i is “halfway” between |0i and |1i, and likewise |−i is halfway
between |0i and −|1i. Draw them to be sure you see this. As a consequence
(Exer. III.33):
1
|0i = √ (|+i + |−i),
2
1
|1i = √ (|+i − |−i).
2
To remember this, think (+x) + (−x) = 0 and (+x) − (−x) = (+2x), which
is nonzero (this is just a mnemonic).
original.
classical channel
Alice Bob
quantum channel
Eve
To begin the process of establishing a secret key, Alice sends a sequence of bits to Bob
Figure III.6:
by encoding Quantum
each key distribution
bit in the quantum [fromasRieffel
state of a photon follows.& For
Polak
each (2000)].
bit, Alice
randomly uses one of the following two bases for encoding each bit:
0 → |↑#
1 → |→#
Eve chose the wrong basis). That is, about 50% of the time Eve picks the
same basis as Alice, so she reads the bit correctly and transmits it to Bob
correctly. About 50% of the time Eve guesses the wrong basis. She will
know this, if she is listening in on the classical channel, but she has already
transmitted it to Bob in the wrong basis. If this is a case in which Alice and
Bob used the same basis (and so Bob should get it correct), he will get it
incorrect 50% of the time, since Eve transmitted it in the other basis. So
25% of the bits that should be correct will be wrong. This high error rate
will be apparent to Alice and Bob if they have been using an error-detecting
code for the key. (In effect Eve is introducing significant, detectable noise
into the channel.) Furthermore, Eve’s version of the key will be about 25%
incorrect. Therefore Bob knows that the key was not transmitted securely
and Eve gets an incorrect key.
This is only the most basic technique, and it has some vulnerabilities, and
so other techniques have been proposed, but they are outside the scope of this
book. “The highest bit rate system currently demonstrated exchanges secure
keys at 1 Mbit/s (over 20 km of optical fibre) and 10 kbit/s (over 100 km
of fibre)”4 “As of March 2007 the longest distance over which quantum key
distribution has been demonstrated using optic fibre is 148.7 km, achieved
by Los Alamos National Laboratory/NIST using the BB84 protocol.” In
Aug. 2015 keys were distributed over a 307 km optical cable, with 12.7 kbps
key generation rate. “The distance record for free space QCD [quantum
key distribution] is 144 km between two of the Canary Islands, achieved
by a European collaboration using entangled photons (the Ekert scheme)
in 2006,[7] and using BB84 enhanced with decoy states[8] in 2007.[9] The
experiments suggest transmission to satellites is possible, due to the lower
atmospheric density at higher altitudes.” At least three companies offer
commercial QKD. “Quantum encryption technology provided by the Swiss
company Id Quantique was used in the Swiss canton (state) of Geneva to
transmit ballot results to the capitol in the national election occurring on
October 21, 2007.” Four QKD networks have been in operation since mid-
late 2000s. Among them,
[t]he world’s first computer network protected by quantum key
distribution was implemented in October 2008, at a scientific
conference in Vienna. The name of this network is SECOQC
(Secure Communication Based on Quantum Cryptography) and
4
https://en.wikipedia.org/wiki/Quantum key distribution (accessed 12-09-18).
C. QUANTUM INFORMATION 103
!
! !"' ! ! $!% "
"
"#$ "%$
! ! "# " !
! &"# "
" "
"&$ "'$
!
"($ ! !$!% " !
"
!
")$ ! !"# " !
"
Figure 1.6. On the left are some standard single and multiple bit gates, while on the right is the prototypical
Figure
multipleIII.9: Left:
qubit gate, classical gates.
the controlled- Right:
. The matrix controlled-Not
representation of the controlled- gate.
, UCN[from Nielsen
, is written with
& respect
Chuang to the(2010,
amplitudesFig.
for |00!, |01!, |10!, and |11!, in that order.
1.6)]
qubit. The action of the gate may be described as follows. If the control qubit is set to
C.2
0, thenQuantum
the target qubit gates
is left alone. If the control qubit is set to 1, then the target qubit
is flipped. In equations:
Quantum gates are analogous to ordinary logic gates (the fundamental build-
ing blocks of circuits), but |01!
|00! → |00!; they
→ must be →
|01!; |10! unitary transformations
|11!; |11! → |10!. (see Fig.
(1.18)
III.9, left, for ordinarty logic gates). Fortunately, Bennett, Fredkin, and
Another
Toffoli wayalready
have of describing the how is
shown allas the
a generalization
usual logic of the classical cangate,
operations be since
done
the action In
reversibly. of the
thisgate may beyou
section summarized
will learnas |A,
the B ⊕ A!, where
→ |A,important
B!most ⊕ is addition
quantum gates.
modulo two, which is exactly what the gate does. That is, the control qubit and the
target qubit are ed and stored in the target qubit.
C.2.a Single-qubit gates
Yet another way of describing the action of the is to give a matrix represen-
Thetation,
NOT as shown
gate isinsimple
the bottom right of
because it Figure 1.6. You can
is reversible: easily verify
NOT|0i = |1i,that the first=
NOT|1i
column of
|0i. Its desired U describes the transformation
CNbehavior can be represented: that occurs to |00!, and similarly for the
other computational basis states, |01!, |10!, and |11!. As for the single qubit case, the
requirement that probability be conserved
NOT : |0i 7→ |1iin the fact that UCN is a unitary
is expressed
†
matrix, that is, UCN UCN = I. |1i 7→ |0i.
We noticed that the can be regarded as a type of generalized- gate. Can
other
Note classical
that gates such
defining it onasathe or the regular
basis defines gate be understood
it on all quantum as unitary
states. Therefore
gatesbe
it can in awritten
sense similar to theof
as a sum way the quantum
dyads gate represents the classical
(outer products):
gate? It turns out that this is not possible. The reason is because the and gates
NOT = |1ih0|For
are essentially irreversible or non-invertible. |0ih1|. given the output A ⊕ B from
+ example,
an gate, it is not possible to determine what the inputs A and B were; there is an
You can read this, “return |1i if the input is |0i, and return
irretrievable loss of information associated with the irreversible T
|0i if the input
action of the gate.
|1i.”
is On the Recall thatunitary
other hand, in the standard
quantum gates basis |0i =
are always (1 0) since
invertible, and the = (0 of1)aT .
|1iinverse
unitary matrix is also a unitary matrix, and thus a quantum gate can always be inverted
by another quantum gate. Understanding how to do classical logic in this reversible or
invertible sense will be a crucial step in understanding how to harness the power of
C. QUANTUM INFORMATION 105
The first column represents the result for |0i, which is |1i, and the second
represents the result for |1i, which is |0i.
Although NOT is defined in terms of the computational basis vectors, it
applies to any qubit, in particular to superpositions of |0i and |1i:
We have seen that X is NOT, and I is obviously the identity gate. Z leaves |0i
unchanged and maps |1i to −|1i. It is called the phase-flip operator because
it flips the phase of the |1i component by π relative to the |0i component.
(Recall that global/absolute phase doesn’t matter.) The Pauli matrices span
the space of 2 × 2 complex matrices (Exer. III.20).
Note that Z|+i = |−i and Z|−i = |+i. It is thus the analog in the sign
basis of X (NOT) in the computational basis. What is the effect of Y on the
computational basis vectors? (Exer. III.14)
Note that there is an alternative definition of Y that differs only in global
phase:
def 0 1
Y = .
−1 0
106 CHAPTER III. QUANTUM COMPUTATION
Its first argument is called the control and its second is called the target,
controlled, or data qubit. It is a simple example of conditional quantum
computation. CNOT can be translated into a sum-of-dyads representation
(Sec. A.2.d), which can be written in matrix form (Ex. III.23, p. 195):
CNOT = |00ih00|
+ |01ih01|
+ |11ih10|
+ |10ih11|
!
C. QUANTUM INFORMATION 107
×
Figure III.10: Diagram for CCNOT or Toffoli gate [fig. from Nielsen &
bit operations
Chuangare graphically
(2010)]. represented
Sometimes the × by appropriately
is replaced by ⊕ because CCNOT|xyzi = labelled boxes
|x, y, xy ⊕ zi.
panel of Fig. III.9 (p. 104) for the matrix and note the diagram notation for
CNOT.
Y
CNOT can be used to produce an entangled state:
1 1 1
2 2
Z
CNOT √ (|0i + |1i) |0i = CNOT √ (|00i+|10i) = √ (|00i+|11i) = |β00 i.
2
Note also that CNOT|x, 0i = |x, xi, that is, FAN-OUT, which would seem
to violate the No-cloning Theorem, but it works as expected only for x ∈ 2.
In general CNOT|ψi|0i = 6 |ψi|ψi (Exer. III.24).
Another useful gate is the three-input/output Toffoli gate or controlled-
controlled-NOT. It negates the third qubit if and only if the first two qubits
are both 1. For x, y, z ∈ 2,
def
CCNOT|1, 1, zi = |1, 1, ¬zi,
def
CCNOT|x, y, zi = |x, y, zi, otherwise.
That is, CCNOT|x, y, zi = |x, y, xy ⊕ zi. All the Boolean operations can be
implemented (reversibly!) by using Toffoli gates (Exer. III.29). For example,
CCNOT|x, y, 0i = |x, y, x ∧ yi. Thus it is a universal gate for quantum logic.
In Jan. 2009 CCNOT was implemented successfully using trapped ions.5
5
Monz, T.; Kim, K.; Hänsel, W.; Riebe, M.; Villar, A. S.; Schindler, P.; Chwalla, M.;
Hennrich, M. et al. (Jan 2009). “Realization of the Quantum Toffoli Gate with Trapped
Ions.” Phys. Rev. Lett. 102 (4): 040501. arXiv:0804.0082.
108 CHAPTER III. QUANTUM COMPUTATION
Note that “2n −1” represents a string of n 1-bits, and that 2 = {0, 1}. Hence,
H ⊗n |0i⊗n generates an equal superposition of all the 2n possible values of the
n-qubit register. We often write Wn = H ⊗n for the Walsh transformation.
An linear operation applied to such a superposition state in effect applies
the operation simultaneously to all 2n possible input values. This is expo-
nential quantum parallelism and suggests that quantum computation might
be able to solve exponential problems much more efficiently than classical
computers. To see this, suppose U |xi = |f (x)i. Then:
" n −1
2X
# n −1
2X n −1
2X
1 1 1
U (H ⊗n |0i⊗n ) = U √ |xi = √ U |xi = √ |f (x)i
2n x=0 2n x=0 2n x=0
This is a superposition of the function values f (x) for all of the 2n possible
values of x; it is computed by one pass through the operator U .
sequence of gates has the following sequence of effects on a computational basis state
|a, b!,
|a, b! −→ |a, a ⊕ b!
−→ |a ⊕ (a ⊕ b), a ⊕ b! = |b, a ⊕ b!
−→ |b, (a ⊕ b) ⊕ b! = |b, a! , (1.20)
where all additions are done modulo 2. The effect of the circuit, therefore, is to inter-
change the state of the two qubits.
110 CHAPTER III. QUANTUM COMPUTATION
Figure 1.7. Circuit swapping two qubits, and an equivalent schematic symbol notation for this common and useful
circuit.
Figure III.11: Diagram for swap [from Nielsen & Chuang (2010)].
There are a few features allowed in classical circuits that are not usually present in
quantum circuits. First of all, we don’t allow ‘loops’, that is, feedback from one part of the
C.3quantum Quantum
circuit to another;circuits
we say the circuit is acyclic. Second, classical circuits allow
wires to be ‘joined’ together, an operation known as , with the resulting single wire
Acontaining
quantumthecircuit
bitwise is a sequential series of quantum
of the inputs. Obviously this operation transformations
is not reversible and on a
therefore not
quantum unitary,The
register. so weinputs
don’t allow
are usually in our quantum circuits.
computational Third,
basis (all |0i
the inverse
states
operation,
unless , wherebyQuantum
stated otherwise). several copies of a bit
circuit are produced
diagrams is also not
are drawn with allowed
timeingo-
ingquantum circuits.
from left In fact,with
to right, it turns
theout that quantum
quantum gates mechanics
crossingforbids
one or themore
copying of a
“wires”
qubit, making
(qubits) the
as appropriate. operation
The circuitimpossible! We’ll see
represents an example of
a sequence of this in theopera-
unitary next
section when we attempt to design a circuit
tions on a quantum register rather than physical wires. to copy a qubit.
As we proceed we’ll introduce new quantum gates as needed. It’s convenient to in-
These “circuits” are different in several respects from ordinary sequential
troduce another convention about quantum circuits at this point. This convention is
logic circuits.
illustrated First,
in Figure 1.8.loops
Suppose (feedback) are not
U is any unitary allowed,
matrix acting onbutsomeyou can napply
number of
transforms repeatedly. Second, Fan-In (equivalent to OR)
qubits, so U can be regarded as a quantum gate on those qubits. Then we can define a is not allowed,
controlled-U
since it it notgate which is aornatural
reversible unitary. Fan-Out
extension of the controlled- gate. Suchbecause
is also not allowed, a gate
it has a single
would controlthe
violate qubit, indicated by
No-cloning the line with
Theorem. the black
(N.B.: dot,does
This and nnottarget qubits,
contradict
theindicated by the boxed
universality . If the or
of theUToffoli control qubitgates,
Fredkin is set towhich
0 thenarenothing happens
universal onlyto the
with
target qubits. If the control qubit
respect to logical or classical states.) is set to 1 then the gate U is applied to the target qubits.
The prototypical example of the controlled-U gate is the controlled- gate, which is
Fig. III.9 (right) on page 104 shows the symbol for CNOT and its effect.
a controlled-U gate with U = X, as illustrated in Figure 1.9.
Another important operation is measurement, which we represent by a ‘meter’ symbol,
The swap operation is defined |xyi 7→ |yxi, or explicitly
X
SWAP = |yxihxy|.
x,y∈2
We can put three CNOTs in series to swap two qubits (Exer. III.39). Swap
has a special symbol as shown in Fig. III.11.
In general, any unitary operator U (on any number of qubits) can be
conditionally controlled (see Fig. III.12); this is the quantum analogue of
an if-then statement. If the control bit is 0, this operation does nothing,
otherwise it does U . This is implemented by |0ih0| ⊗ I + |1ih1| ⊗ U . Effectively,
the operators are entangled.
Suppose the control bit is in superposition, |χi = a|0i + b|1i. The effect
C.24 QUANTUM INFORMATION
Introduction and overview 111
Figure III.12: Diagram for controlled-U [from Nielsen & Chuang (2010)].
In Out
√
|00! (|00! + |11!)/ 2 ≡ |β00 !
√
|01! (|01! + |10!)/ 2 ≡ |β01 !
√
|10! (|00! − |11!)/ 2 ≡ |β10 !
√
Figure 1.9. Two different representations for the controlled- .
|11! (|01! − |10!)/ 2 ≡ |β11 !
as shown in Figure 1.10. As previously described, this operation converts a single qubit
Figure 1.12. Quantum circuit to create Bell states, and its input–ouput quantum ‘truth table’.
state |ψ! =
Figure III.13: + β|1! into acircuit
α|0! Quantum probabilistic classical bit MBell
for generating (distinguished
states. [from from aNielsen
qubit by &
2
drawing it as a double-line
Chuang (2010, fig. 1.12)] wire), which is 0 with probability |α| , or 1 with probability
|β|2 . 1.3.7 Example: quantum teleportation
We will now apply the techniques of the last few"# " " pages to understand something non-
## " " !" ! " " ###
trivial, surprising, and a lot of fun – quantum teleportation! ## !
###
Quantum teleportation is a
!!
technique for moving quantum states around, even " " " in
" "! " the " " absence of a quantum commu-
## ! #
|c! ! ! ! |c!
|x! ! ! ! |x!
|y! ! ! ! |y!
|0! × × × |s!
|0! × × × |c! !
Figure
whereIII.15: Quantum
x and y are circuit
the data bits, forsum
s is their 1-bit full2),adder
(modulo c is the[from
incomingRieffel & and
carry bit, Polak
!
c is the new carry bit. Vedral, Barenco and Ekert [Vedral et al.
(2000)]. “x and y are the data bits, s is their sum (modulo 2), c is the 1996] define more complex
circuits that
incoming include
carry bit,in-place
and c0addition
is the and
new modular
carryaddition.
bit.”
The Fredkin gate is a “controlled swap” and can be defined as
F = |0!"0| ⊗ I ⊗ I + |1!"1| ⊗ S
where S is the swap operation
S = |00!"00| + |01!"10| + |10!"01| + |11!"11|.
The reader can verify that F , like T , is complete for combinatorial circuits.
x
x x
g(x) g(x)
0 0
Φ Φ-1
0 0
f(x) CNOT f(x)
y y⊕f(x)
y y⊕f(x)
Uf
The result f (x) is in the result register and the garbage g(x) is in the
workspace register. Notice that x and y (data and target) are passed through.
Now use CNOTs between corresponding places in the result and target reg-
isters to compute y ⊕ f (x), where ⊕ represents bitwise exclusive-or, in the
target register. Thus we have computed:
Now we uncompute with Φ−1 , but since the data and target registers are
passed through, we get (x, 0, 0, y ⊕ f (x)) in the registers. We have restored
the data, workspace, and result registers to their initial values and have
y ⊕ f (x) in the target register. Ignoring the result and workspace registers,
we write
(x, y) 7→ (x, y ⊕ f (x)).
This is the standard approach we will use for embedding a classical compu-
tation in a quantum computation.
Therefore, for any computable f : 2m → 2n , there is a reversible quantum
gate array Uf : Hm+n → Hm+n such that for x ∈ 2m and y ∈ 2n ,
See Fig. III.17. In particular, Uf |x, 0i = |x, f (x)i. The first m qubits are
called the data register and the last n are called the target register.
gates for any classically computable function. In fact, it is possible to conceive of a univer-
sal quantum Turing machine [Bernstein and Vazirani 1997]. In this construction we must
assume a sufficient supply of bits that correspond to the tape of a Turing machine.
Knowing that an arbitrary classical function f with m input and k output bits can be im-
plemented on quantum computer, we assume the existence of a quantum gatearray Uf that
implements f . Uf is a m + k bit transformation of the form Uf : |x, y! → |x, y ⊕ f (x)!
where ⊕ denotes the bitwise exclusive-OR6. Quantum gate arrays Uf , defined in this way,
are unitary for any function f . To compute f (x) we apply Uf to |x! tensored with k
C. QUANTUM
zores INFORMATION
|x, 0!. Since f (x) ⊕ f (x) = 0 we have Uf Uf = I. Graphically the transformation 115
Uf : |x, y! → |x, y ⊕ f (x)! is depicted as
|x! |x!
Uf
|y! |y ⊕ f (x)!.
While the T and F gates are complete for combinatorial circuits, they cannot achieve ar-
Figurequantum
bitrary III.17:state
Computation
transformations.of function
In order by arbitrary
to realize quantum gate
unitary array (Rieffel
transformations 7
, &
single
Polak,bit2000).
rotations need to be included. Barenco et. al. [Barenco et al. 1995] show that
Cnot together with all 1-bit quantum gates is a universal gate set. It suffices to include the
following one-bit transformations
! " ! iα "
C.5 Quantum parallelism cos α sin α
,
e 0
− sin α cos α 0 e−iα
Since Uf is linear, if it is applied to a superposition of bit strings, it will
for all 0 ≤ α ≤ 2π together with the Cnot to obtain a universal set of gates. As we shall
produce a superposition of the results of applying f to them in parallel (i.e.,
see, such non-classical transformations are crucial for exploiting the power of quantum
in the same time it takes to compute it on one input):
computers.
(c1 |x1 i +Parallelism
UfQuantum
5.2 c2 |x2 i + · · · + ck |xk i) = c1 Uf |x1 i + c2 Uf |x2 i + · · · + ck Uf |xk i.
What happens if U
For example, iffweis applied
have atosuperposition
input which is inof a superposition?
the inputs xThe answer is easy
1 and x2 ,
but powerful: since Uf is a linear transformation,
! it is applied to all basis vectors in the
√
superposition simultaneously and will generate a
√
superposition of the results. In this way,
3 1 3 1
it is possible |x1 if (x)
Uf to compute + for|xn2 ivalues
⊗ |0i
of x= |x1application
in a single , f (x1 )i +of U|xf .2This
, f (xeffect
2 )i. is
2
called quantum parallelism. 2 2 2
The power of quantum algorithms comes from taking advantage of quantum parallelism
Theentanglement.
and amplitude So ofmost
a result
quantumy will be the
algorithms sum
begin of the amplitudes
by computing a function ofof all x
interest such
that
on y = f (x). of all values as follows. Start with an n-qubit state |00 . . . 0!. Apply the
a superposition
If we apply Uf to a superposition of all possible 2m inputs, it will compute
a superposition of all the corresponding outputs in parallel (i.e., in the same
6 ⊕ is not the direct sum of vectors.
7 More precisely, we mean arbitrary unitary transformations up to a constant phase factor. A constant phase shift
time as required for one function evaluation)! The Walsh-Hadamard trans-
of the state has no physical, and therefore no computational, significance.
formation can be used to produce this superposition of all possible inputs:
1
Wm |00 . . . 0i = √ (|00 . . . 0i + |00 . . . 1i + · · · + |11 . . . 1i)
2m
1 X
= √ |xi
2m x∈2m
2 −1 m
1 X
= √ |xi.
2m x=0
116 CHAPTER III. QUANTUM COMPUTATION
In the last line we are obviously interpreting the bit strings as natural num-
bers. Hence,
2m −1
! 2m −1 2m −1
1 X 1 X 1 X
Uf Wm |0i = Uf √ |x, 0i = √ Uf |x, 0i = √ |x, f (x)i.
2m x=0 2m x=0 2m x=0
Alice Bob
Encoder Decoder
EPR
source
Alice. Alice receives two classical bits, encoding the numbers 0 through 3. Depending
FigureAlice
on this number III.18: Superdense
performs one of the coding. (Rieffel
transformations & Y,
{I, X, Polak,
Z} on 2000)
her qubit of the
entangled pair ψ0 . Transforming just one bit of an entangled pair means performing the
identity transformation on the other bit. The resulting state is shown in the table.
C.6 Value
Applications Transformation New state
0 ψ0 = (I ⊗ I)ψ0 √12 (|00" + |11")
C.6.a Superdense 1coding ψ1 = (X ⊗ I)ψ0 √12 (|10" + |01")
2 ψ2 = (Y ⊗ I)ψ0 √12 (−|10" + |01")
We will consider a couple
3 simple applications
ψ3 = (Z of these
⊗ I)ψ0 √12 (|00" ideas.
− |11") The first is called
superdense coding or (more modestly) dense coding, since it is a method by
whichAlice
onethen sends her qubit to Bob.
quantum particle can be used to transmit two classical bits of
Bob. Bob applies
information. It wasa controlled-
described to the
NOTby two qubits
Bennett and of the entangledinpair.
Wiesner 1992, and was
partially validated experimentally by 1998.
Here is the idea. Alice and Bob share an entangled pair of qubits. To
transmit two bits of information, Alice applies one of four transformations
to her qubit. She then sends her qubit to Bob, who can apply an operation
to the entangled pair to determine which of the four transformations she
applied, and hence recover the two bits of information.
Now let’s work it through more carefully. Suppose Alice and Bob share
the entangled pair |β00 i = √12 (|00i + |11i). Since the four Bell states are a
basis for the quantum state of the pair of qubits, Alice’s two bits of infor-
mation can be encoded as one of the four Bell states. For example, Alice
can use the state |βzx i to encode the bits z, x (the correspondence is ar-
bitrary so long as we are consistent, but this one is easy to remember).
Recall the circuit for generating Bell states (Fig. III.13, p. 112). Its effect
is CNOT(H ⊗ I)|zxi = |βzx i. This cannot be used by Alice for generating
the Bell states, because she doesn’t have access to Bob’s qubit. However,
the Bell states differ from each other only in the relative parity and phase
of their component qubits (i.e., whether they have the same or opposite bit
118 CHAPTER III. QUANTUM COMPUTATION
values and the same or opposite signs). Therefore, Alice can alter the par-
ity and phase of just her qubit to transform the entangled pair into any of
the Bell states. In particular, if she uses zx to select I, X, Z, or ZX = Y
(corresponding to zx = 00, 01, 10, 11 respectively) and applies it to just her
qubit, she can generate the corresponding Bell state |βzx i. I’ve picked this
correspondence because of the simple relation between the bits z, x and the
application of the operators Z, X, but this is not necessary; any other 1-1
correspondence between the two bits and the four operators could be used.
When Alice applies this transformation to her qubit, Bob’s qubit is unaf-
fected, and so the transformation on the entangled pair is I ⊗ I, X ⊗ I,
Z ⊗ I, or ZX ⊗ I. We can check the results as follows:
(H ⊗ I)CNOT|βzx i = |zxi.
This translates the Bell basis into the computational basis, so Bob can mea-
sure the bits exactly.
Finally, Bob measures the resulting bit which allows him to distinguish between 0 and
3, and 1 and 2.
C. QUANTUM INFORMATION
4.2.2 Teleportation. The objective is to transmit the quantum state of a particle using119
classical bits and reconstruct the exact quantum state at the receiver. Since quantum state
cannot be copied, the quantum state of the given particle will necessarily be destroyed. Sin-
gle bit teleportation has been realized experimentally [Bouwmeester et al. 1997; Nielsen
et al. 1998; Boschi et al. 1998].
Alice Bob
Decoder Encoder
EPR
source
Alice. Alice has a qubit whose state she doesn’t know. She wants to send the state of ths
qubitFigure III.19: Quantum teleportation. (Rieffel & Polak, 2000)
φ = a|0! + b|1!
to Bob through classical channels. As with dense coding, Alice and Bob each possess one
qubit of an entangled pair
1
ψ0 = √ (|00! + |11!).
2
This is because H|0i = |+i = √12 (|0i + |1i) and H|1i = |−i = √12 (|0i − |1i).
Rearranging and factoring to separate Alice’s qubits from Bob’s, we have:
1
|ψ2 i = [|00i(a|0i + b|1i) + |01i(a|1i + b|0i)
2
+|10i(a|0i − b|1i) + |11i(a|1i − b|0i)] .
Thus the unknown amplitudes have been transferred from the first qubit
(Alice’s) to the third (Bob’s), which now incorporates the amplitudes a and
b, but in different ways depending on the first two qubits. In fact you can
see that the amplitudes are transformed by the Pauli matrices, and Bob can
restore the quantum state by applying the correct Pauli matrix. Therefore
Alice measures the first two bits (completing measurement in the Bell basis)
and sends them to Bob over the classical channel. This measurement par-
tially collapses the state, which includes Bob’s qubit, but in a way that is
determined by the first two qubits.
When Bob receives the two classical bits from Alice, he uses them to
select a transformation for his qubit, which restores the amplitudes to the
correct basis vectors. These transformations are the Pauli matrices (which
are their own inverses):
Figure 1.13. Quantum circuit for teleporting a qubit. The two top lines represent Alice’s system, while the bottom
line is Bob’s system. The meters represent measurement, and the double lines coming out of them carry classical
Figure III.21: Circuit for quantum teleportation. [from Nielsen & Chuang
bits (recall that single lines denote qubits).
(2010)]
1 ! "
= √ α|0"(|00" + |11") + β|1"(|00" + |11") , (1.29)
2
notwhere
allow faster-than-light communication, since Alice has to transmit her
we use the convention that the first two qubits (on the left) belong to Alice, and
twotheclassical bits
third qubit to Bob.
to Bob. As we explained previously, Alice’s second qubit and Bob’s qubit
Entangled states can be teleported
start out in an EPR state. Alice in a through
sends her qubits similaraway. Free-space quantum
gate, obtaining
teleportation has been demonstrated
1 !
over 143 km between" two of the Canary
Islands (Nature, 13 = √ 2012).
|ψ1 "Sept. 6 + |11") + β|1"(|10" + |01") .
α|0"(|00"In Sept. 2015 teleportation was achieved (1.30)
2
over 101 km through supercooled nanowire. For teleporting material systems,
theShe then sends the first qubit through a Hadamard gate, obtaining
current record is 21 m.
1! "
|ψ2 " = α(|0" + |1")(|00" + |11") + β(|0" − |1")(|10" + |01") .
2
C.7 Universal quantum gates (1.31)
WeThis
havestate may
seen be re-written
several in the following
interesting examplesway, simply by regrouping
of quantum computing terms:
using gates
such as CNOT and the1 Hadamard ! # 7
and$ Pauli #operators.$ Since the imple-
|ψ " = |00" α|0" + β|1" + |01" α|1" + β|0"
mentation of each 2of these 2 is a technical challenge, it raises the important
# $ # $"
question: What gates are sufficient
+ |10" for implementing
α|0" − β|1" any quantum
+ |11" α|1" − β|0" . compu-
(1.32)
tation?
This expression naturally breaks down into four terms. The first term has Alice’s qubits
Both the Fredkin (controlled swap) and Toffoli (controlled-controlled-
in the state |00", and Bob’s qubit in the state α|0" + β|1" – which is the original state
NOT) gates are sufficient for classical logic circuits. In fact, they can op-
|ψ". If Alice performs a measurement and obtains the result 00 then Bob’s system will
erate
be inasthewell
stateon
|ψ".qubits in from
Similarly, superposition. But whatweabout
the previous expression can readother quantum
off Bob’s post-
operators?
measurement state, given the result of Alice’s measurement:
It can be proved that single-qubit unitary ! "
operators can be approximated
00 −
$ → |ψ 3 (00)" ≡ α|0" + β|1"
arbitrarily closely by the Hadamard gate and the T (π/8) gate, which (1.33)is
! "
6 01 $−→ |ψ3 (01)" ≡ α|1" + β|0" (1.34)
http://www.nature.com/nature/journal/v489/n7415/full/nature11472.html
! "
(accessed 12-09-18). 10 $−→ |ψ3 (10)" ≡ α|0" − β|1" (1.35)
7
This lecture follows Nielsen & Chuang (2010,! §4.5). "
11 $−→ |ψ3 (11)" ≡ α|1" − β|0" . (1.36)
Depending on Alice’s measurement outcome, Bob’s qubit will end up in one of these
four possible states. Of course, to know which state it is in, Bob must be told the result of
Alice’s measurement – we will show later that it is this fact which prevents teleportation
C. QUANTUM INFORMATION 123
defined:
1 0 ∼ e−iπ/8 0
T = iπ/4 = iπ/8 (III.19)
0 e 0 e
(ignoring global phase). To approximate within any single-qubit unitary
operation, you need O(logc (1/)) gates, where c ≈ 2. For an m-gate circuit
(of CNOTs and single-qubit unitaries) and an accuracy of , O(m logc (m/)),
where c ≈ 2, gates are needed (Solovay-Kitaev theorem).
A two-level operation is a unitary operator on a d-dimensional Hilbert
space that non-trivially affects only two qubits out of n (where d = 2n ).
It can be proved that any two-level unitary operation can be computed by
a combination of CNOTs and single-qubit operations. This requires O(n2 )
single-qubit and CNOT gates.
It also can be proved that an arbitrary d-dimensional unitary matrix
can be decomposed into a product of two-level unitary matrices. At most
d(d − 1)/2 of them are required. Therefore a unitary operator on an n-qubit
system requires at most 2n−1 (2n − 1) two-level matrices.
In conclusion, the H (Hadamard), CNOT, and π/8 gates are sufficient
for quantum computation. For fault-tolerance, either the standard set —
H (Hadamard), CNOT, π/8, and S (phase) — can be used, or H, CNOT,
Toffoli, and S. The latter phase gate is defined:
2 1 0
S=T = . (III.20)
0 i
Quantum algorithms 33
124 CHAPTER III. QUANTUM COMPUTATION
|xi|f (x)i and Uf |xi|1i = |xi|¬f (x)i. Usually we set y = 0 to get the result
|f (x)i, but here you will see an application in which we want y = 1.
Now consider the result of applying Uf to |xi in the data register and to
the superposition |−i = √12 (|0i − |1i) in the target register.
1 1 1
Uf |xi|−i = √ |xi|f (x)i − √ |xi|¬f (x)i = √ |xi[|f (x)i − |¬f (x)i].
2 2 2
Now the rightmost square bracket is |0i − |1i if f (x) = 0 or |1i − |0i if
f (x) = 1. Therefore, we can write
1
Uf |xi|−i = √ |xi(−)f (x) (|0i − |1i) = (−)f (x) |xi|−i. (III.21)
2
[Here, (−)x is an abbreviation for (−1)x when we want to emphasize that
the sign is all that matters.] Since Uf |xi|−i = (−)f (x) |xi|−i, the result of
applying it to an equal superposition of x = 0 and x = 1 is:
1 X 1 X
√ Uf |xi|−i = √ (−)f (x) |xi|−i.
2 x∈2 2 x∈2
If f is a constant function, then f (0) = f (1), and the summation is ± √12 (|0i+
|1i)|−i = ±|+i|−i because both components have the same sign.. On the
other hand, if f (0) 6= f (1), then the summation is ± √12 (|0i − |1i)|−i =
±|−i|−i because the components have opposite signs. That is, a constant
function gives the |0i and |1i components of the data qubit the same phase,
and otherwise gives them the opposite phase. Therefore, we can determine
whether the function is constant or not by measuring the first qubit in the sign
basis; we get |+i if f (0) = f (1) and |−i otherwise. With this background,
we can state Deutsch’s algorithm.
algorithm Deutsch:
def
Initial state: Begin with the qubits |ψ0 i = |01i.
by a pair of Hadamard gates. Recall that H|0i = √1 (|0i + |1i) = |+i and
2
H|1i = √12 (|0i − |1i) = |−i.
def
|ψ2 i = Uf |ψ1 i
1 1
= Uf √ (|0i + |1i) ⊗ √ (|0i − |1i)
2 2
1
= [Uf |00i − Uf |01i + Uf |10i − Uf |11i]
2
1
= [|0, f (0)i − |0, ¬f (0)i + |1, f (1)i − |1, ¬f (1)i]
2
There are two cases: f (0) = f (1) and f (0) 6= f (1).
1
|ψ2 i = [|0, f (0)i − |0, ¬f (0)i + |1, f (0)i − |1, ¬f (0)i]
2
1
= [|0i(|f (0)i − |¬f (0)i) + |1i(|f (0)i − |¬f (0)i)]
2
1
= (|0i + |1i)(|f (0)i − |¬f (0)i)
2
1
= ± (|0i + |1i)(|0i − |1i)
2
1
= ± √ (|0i + |1i)|−i
2
= | + −i.
The last line applies because global phase (including ±) doesn’t matter.
1
|ψ2 i = [|0, f (0)i − |0, ¬f (0)i + |1, ¬f (0)i − |1, f (0)i]
2
D. QUANTUM ALGORITHMS 127
1
= [|0i(|f (0)i − |¬f (0)i) + |1i(|¬f (0)i − |f (0)i)]
2
1
= [|0i(|f (0)i − |¬f (0)i) − |1i(|f (0)i − |¬f (0)i)]
2
1
= (|0i − |1i)(|f (0)i − |¬f (0)i)
2
1
= ± (|0i − |1i)(|0i − |1i)
2
1
= ± √ (|0i − |1i)|−i
2
= | − −i
Clearly we can discriminate between the two cases by measuring the first
qubit in the sign basis. In particular, note that in the unequal case, the |1i
component has the opposite phase from the |0i component.
Notice that the information we need is in the data register, not the target
register. This technique is called phase kick-back (i.e., kicked back into the
phase of the data register).
In conclusion, we can determine whether or not f (0) = f (1) with a single
evaluation of f , which is quite remarkable. In effect, we are evaluating f on
a superposition of |0i and |1i and determining how the results interfere with
each other. As a result we get a definite (not probabilistic) determination of
a global property with a single evaluation.
This is a clear example where a quantum computer can do something
faster than a classical computer. However, note that Uf has to uncompute
Quantum algorithms 35
128 CHAPTER III. QUANTUM COMPUTATION
Figure 1.20. Quantum circuit implementing the general Deutsch–Jozsa algorithm. The wire with a ‘/’ through it
Figure III.23:
represents a set of nQuantum
qubits, similar circuit for engineering
to the common Deutsch-Jozsa
notation. algorithm. [fig. from NC]
evenly weighted superposition of 0 and 1. Next, the function f is evaluated (by Bob)
f , using
which : |x, y! as
Uf takes → |x, (x)!, as
y ⊕ ftime
much giving
computing it, but we will see other cases
(Deutsch-Jozsa) where the speedup ! (−1)is much
f (x) " more than# 2×.
|x! |0! − |1!
|ψ2 ! = √ √ . (1.48)
x 2n 2
D.1.b The Deutsch-Jozsa algorithm
Alice now has a set of qubits in which the result of Bob’s function evaluation is stored
in the amplitude of the qubit superposition state. She now interferes terms in the super-
The Deutsch-Jozsa algorithm is a generalization of the Deutsch algorithm to
position using a Hadamard transform on the query register. To determine the result of
n bits, which they published it in 1992; here we present the improved version
the Hadamard transform it helps to first calculate the effect of the Hadamard transform
of on
Nielsen & Chuang
a state |x!. (2010,
By checking the p. 59).
cases x = 0 and x = 1 separately we see that for a single
$ √
xz Suppose we are given an unknown function f : 2n →
This is the problem:
qubit H|x! = z (−1) |z!/ 2. Thus
n+1
2 in the form of a unitary transform $ Uf ∈x1L(Hz1 +·· +xn zn
, H) (Fig. III.23). We are
z ,...,z (−1) |z1 , . . . , zn !
told only thatHf⊗nis either constant
|x1 , . . . , xn ! = 1
or n
balanced,
√ which means .that it is 0 on
(1.49)
2
half its domain and 1 on the other half. Our task is to determine into which
n
Thisthe
class cangiven
be summarized
f falls. more succinctly in the very useful equation
Consider first the classical situation. $ Wex·z can
⊗n z (−1) |z! try different input bit strings
H
x. We might (if we’re lucky) discover after |x! = √ , (1.50)
2n the second query of f that it is
n
not constant. But we might require as many as 2 /2 + 1 queries to answer
where x · z is the bitwise inner product of x and z, modulo 2. Using this equation
the question. So we’re facing O(2n−1 ) function evaluations.
and (1.48) we can now evaluate |ψ3 !,
! ! (−1)x·z+f (x) |z! " |0! − |1! #
|ψ3 ! = √ . (1.51)
z x
2n 2
algorithm Deutsch-Jozsa:
Alice now observes the query register. Note that the amplitude for the state |0!⊗n is
$ f (x)
x (−1) /2n . Let’s look at the two possible cases – f constant and f balanced – to
def
discern what happens.
Initial state: In the
As in the case where
Deutsch f is constant
algorithm, the amplitude
prepare for |0!
the initial state
⊗n
is |ψ
+10 ior=
⊗n depending on the constant value f (x) takes. Because |ψ ! is of unit length it follows
|0i−1, |1i. 3
that all the other amplitudes must be zero, and an observation will yield 0s for all qubits
in the query register. If f is balanced then the positive and negative contributions to the
amplitude for |0!⊗n cancel, leaving an amplitude of zero, and a measurement must yield
a result other than 0 on at least one qubit in the query register. Summarizing, if Alice
D. QUANTUM ALGORITHMS 129
Measurement: Consider the first n qubits and the amplitude of one par-
ticular basis state, z = |0i = |0i⊗n , which we separate from the rest of the
summation:
X 1 X X 1
f (x)
|ψ3 i = n
(−) |0i|−i + n
(−)x·z+f (x) |zi|−i
x∈2n
2 n x∈2n
2
z∈2 −{0}
P
Hence, the amplitude of |0i⊗n , the all-|0i qubit string, is x∈2n 21n (−)f (x) .
Recall how in the basic Deutsch algorithm we got a sum of signs (either all
the same or not) for all the function evaluations. The result is similar here,
but we have 2n values rather than just two. We now have two cases:
The good news about the Deutsch-Jozsa algorithm is that with one quan-
tum function evaluation we have got a result that would require between 2
and O(2n−1 ) classical function evaluations (exponential speedup!). The bad
news is that the algorithm has no known applications! Even if it were useful,
however, the problem could be solved probabilistically on a classical com-
puter with only a few evaluations of f : for an error probability of , it takes
O(log −1 ) function evaluations. However, it illustrates principles of quantum
computing that can be used in more useful algorithms.
D. QUANTUM ALGORITHMS 131
algorithm Simon:
9
The following presentation follows Mermin’s Quantum Computer Science (Mermin,
2007, §2.5, pp. 55–8).
132 CHAPTER III. QUANTUM COMPUTATION
(This is the general expression for the Walsh transform of a bit string. The
phase depends on the number of common 1-bits.) Therefore,
" #
1 1 X 1 X
H ⊗n |ψ3 i = √ n/2
(−1)x0 ·y |yi + n/2 (−1)(x0 +r)·y |yi
2 2 y∈2n
2 y∈2n
1 X
= (n+1)/2 (−1)x0 ·y + (−1)(x0 +r)·y |yi.
2 y∈2n
D. QUANTUM ALGORITHMS 133
Note that
(−1)(x0 +r)·y = (−1)x0 ·y (−1)r·y .
Therefore, if r · y = 1, then the bracketed expression is 0 (since the terms
have opposite sign and cancel). However, if r · y = 0, then the bracketed
expression is 2(−1)x0 ·y (since they don’t cancel). Hence the result of the
Walsh-Hadamard transform is
1 X
|ψ4 i = H ⊗n |ψ3 i = (−1)x0 ·y |yi.
2(n−1)/2 y s.t. r·y=0
First equation: Since we know y(1) , this gives us some information about
r, expressed in the equation:
(1) (1)
y1 r1 + y2 r2 + · · · + yn(1) rn = 0 (mod 2).
(Mermin, 2007, p. 57) Note that the “extra” evaluations are independent of
n. Therefore Simon’s problem can be solved in linear time on a quantum
computer, but requires exponential time on a classical computer.
D. QUANTUM ALGORITHMS 135
The widely used RSA public-key cryptography system is based on the diffi-
culty of factoring large numbers.10 The best classical algorithms are nearly
exponential in the size of the input, m = ln M . Specifically, the best
current (2006) algorithm (the number field sieve algorithm) runs in time
1/3 2/3
eO(m ln m) . This is subexponential but very inefficient. Shor’s quantum
algorithm is bounded error-probability quantum polynomial time (BQP),
specifically, O(m3 ). Shor’s algorithm was invented in 1994, inspired by Si-
mon’s algorithm.
Shor’s algorithm reduces factoring to finding the period of a function.
The connection between factoring and period finding can be understood as
follows. Assume you are trying to factor M . Suppose you can find x such
that x2 = 1 (mod M ). Then x2 − 1 = 0 (mod M ). Therefore (x + 1)(x − 1) =
0 (mod M ). Therefore both x + 1 and x − 1 have common factors with M
(except in the trivial case x = 1, and so long as neither is a multiple of M ).
Now pick an a that is coprime (relatively prime) to M . If ar = 1 (mod M )
and r happens to be even, we’re done (since we can find a factor of M as
explained above). (The smallest such r is called the order of a.) This r is
the period of ax (mod M ), since ax+r = ax ar = ax (mod M ).
In summary, if we can find the order of an appropriate a and it is even,
then we can probably factor the number. To accomplish this, we need to
find the period of ax (mod M ), which can be determined through a Fourier
transform.
Like the classical Fourier transform, the quantum Fourier transform puts
all the amplitude of the function into multiples of the frequency (reciprocal
period). Therefore, measuring the state yields the period with high proba-
bility.
10
These section is based primarily on Rieffel & Polak (2000).
136 CHAPTER III. QUANTUM COMPUTATION
Therefore let
ω 0·0 ω 0·1 ··· ω 0·(N −1)
u†0
ω 1·0 ω 1·1 ··· ω 1·(N −1)
1def u†1 1
ω 2·0 ω 2·1 ··· ω 2·(N −1)
F= √ .. = √ .
N . N
.. .. .. ..
. . . .
u†N −1
ω (N −1)·0 ω (N −1)·1 · · · ω (N −1)·(N −1)
√ √ (III.25)
kj
That is, Fkj = ukj / N = ω / N for k, j ∈ N. Note that the “−” signs in
the complex exponentials were eliminated by the conjugate transpose. F is
unitary transformation (exercise).
The fast Fourier transform (FFT) reduces the number of operations re-
quired from O(N 2 ) to O(N log N ).11 It does this with a recursive algorithm
that avoids recomputing values. However, it is restricted to powers of two,
N = 2n .
The quantum Fourier transform (QFT) is even faster, O(log2 N ), that is,
O(n2 ). However, because the spectrum is encoded in the amplitudes of the
quantum state, we cannot get them all. Like the FFT, the QFT is restricted
to powers of two, N = 2n . The QFT transforms the amplitudes of a quantum
state: X X
UQFT fj |ji = fˆk |ki,
j∈N k∈N
11
The FFT is O(N log N ), but N > M 2 = e2m . Therefore the FFT is O(M 2 log M 2 ) =
O(M 2 log M ) = O(me2m )
138 CHAPTER III. QUANTUM COMPUTATION
def
where f̂ = Ff .
Suppose f has period r, and suppose that the period evenly divides the
number of samples, r | N . Then all the amplitude of fˆ should be at multiples
of its fundamental frequency, N/r. If r 6 | N , then the amplitude will be
concentrated near multiples of N/r. The approximation is improved by using
larger n.
The FFT (and QFT) are implemented in terms of additions and multi-
plications by various roots of unity (powers of ω). In QFT, these are phase
shifts. In fact, the QFT can be implemented with n(n + 1)/2 gates of two
types: (1) One is Hj , the Hadamard transformation of the jth qubit. (2) The
other is a controlled phase-shift Sj,k , which uses qubit xj to control whether
it does a particular phase shift on the |1i component of qubit xk . That is,
Sj,k |xj xk i 7→ |xj x0k i is defined by
def
Sj,k = |00ih00| + |01ih01| + |10ih10| + eiθk−j |11ih11|,
where θk−j = π/2k−j . That is, the phase shift depends on the indices j and
k.
It can be shown that the QFT can be defined:12
n−1
Y n−1
Y
UQFT = Hj Sj,k .
j=0 k=j+1
algorithm Shor:
12
See Rieffel & Polak (2000) for this, with a detailed explanation in Nielsen & Chuang
(2010, §5.1, pp. 517–21).
D. QUANTUM ALGORITHMS 139
Step 1: Pick a random number a < M . If a and M are not coprime (rela-
tively prime), we are done. (Euclid’s algorithm is O(m2 ) = O(log2 M ).) For
our example, suppose we pick a = 11, which is relatively prime with 21.
def def
Modular exponentiation: Let g(x) = ax (mod M ), for x ∈ M =
{0, 1, . . . , M − 1}. This takes O(m3 ) gates and is the most complex part
of the algorithm! (Reversible circuits typically use m3 gates for m qubits.)
In our example case, g(x) = 11x (mod 21), so
g(x) = 1, 11, 16, 8, 4, 2, 1, 11, 16, 8, 4, . . .
| {z }
period
In order to get a good QFT approximation, pick n such that M 2 ≤ 2n < 2M 2 .
def
Let N = 2n . Equivalently, pick n such that 2 lg M ≤ n < 2 lg M + 1, that
is, roughly twice as many qubits as in M . Note that although the number
of samples is N = 2n , we need only n qubits (thanks to the tensor product).
This is where quantum computation gets its speedup over classical compu-
tation; M is very large, so N > M 2 is extremely large. The QFT computes
all these in parallel. For our example M = 21, and so we pick n = 9 for
N = 512 since 441 ≤ 512 < 882. Therefore m = 5.
where
def 1, if g(x) = g ∗
fx = ,
0, otherwise
def p
and Z = |{x | g(x) = g ∗ }| is a normalization factor. For example,
1
|ψ2 i = (|3, 8i + |9, 8i + |15, 8i + · · ·)
Z
1
= (|3i + |9i + |15i + · · ·)|8i
Z
Note that the values x for which fx 6= 0 differ from each other by the period;
This produces a function f of very strong periodicity. As in Simon’s algo-
rithm, if we could measure two such x, we would have useful information,
but we can’t. Suppose we measure the result register and get g ∗ = 8. Fig.
III.24 shows the corresponding f = (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, . . .).
0.012
0.0108
26 · E. Rieffel and W. Polak
0.0096
0.0084
0.012
0.0072
0.0108
0.006
0.0096
0.0048
0.0084
0.0036
0.0072
0.0024
0.006
0.0012
0.0048
0.0
0.0036
0 64 128 192 256 320 384 448 512
0.0024
0.0012
!
Fig. 2. Probabilities
FigureP III.24: for measuring
x mod 21 = 8}}
Example x when probability
measuring the state distribution
C x∈X |fx |in2 Stepfor
|x, 8! obtained 2, where
state
X
−1 = {x|211
0.0
Z x∈N0fx |x, 8i. 64 In128this example
192 256the period
320 is r =448 6 (e.g.,
384 512 at
x = 3, 9, 15, . . .). [fig. source: Rieffel & Polak (2000)]
0.17 !
Fig. 2. Probabilities for measuring x when measuring the state C x∈X
|x, 8! obtained in Step 2, where
0.153 x mod 21 = 8}}
X = {x|211
0.136
0.119
0.17
0.102
0.153
0.085
0.136
0.068
0.119
0.051
0.102
0.034
0.085
0.017
0.068
0.0
0.051
0 64 128 192 256 320 384 448 512
0.034
0.017 Fig. 3. Probability distribution of the quantum state after Fourier Transformation.
0.0
0 64 128 192 256 320 384 448 512
where the amplitude is 0 except at multiples of 2m /r. When the period r does not divide
2m , the transform approximates the of
exact case so most of the amplitude is attached to
Figure
integers
Fig. 3. Probabilityprobability
III.25:
close toExample
distribution
multiples of 2r .
m thedistribution fˆx̂ |2 of
quantum state after|Fourier Transformation.
the quantum Fourier
transform
Example.ofFigure
f . The3 showsspectrum
the result of is applying
concentrated
the quantum near multiples
Fourier Transform of toN/6
the =
m
where
512/6 the amplitude
= 85 1/3,
state obtained is
in Stepthat 0 except
is that
2. Note at multiples
85 Figure
1/3, 170 of 2
2/3,
3 is the /r.
256,
graph When the period
etc.fast [fig.
of the r does
source:
Fourier not divide
transformRieffel
of the &
m
2function
Polak , the transform approximates
shown in Figure
(2000)] 2. In2mthisthe exact case
particular so most
example the of the amplitude
period of f does notis attached
divide 2mto.
integers close to multiples of r .
Step 4. Extracting the period. Measure the state in the standard basis for quantum com-
Example.
putation, andFigure
call the3 shows theInresult
result v. of applying
the case where thetheperiod
quantum Fourier
happens to Transform
be a powertoofthe 2,
state
so thatobtained in StepFourier
the quantum 2. Notetransform
that Figure 3 isexactly
gives the graph of the fast
multiples of 2Fourier
m transform
/r, the period isofeasy
the
function
to extract.shown in Figure
In this case, v 2.=Inj 2this
m
particular example the period of f does not divide 2m .
r for some j. Most of the time j and r will be relatively
Step 4. Extracting the period. Measure the state in the standard basis for quantum com-
putation, and call the result v. In the case where the period happens to be a power of 2,
so that the quantum Fourier transform gives exactly multiples of 2m /r, the period is easy
m
to extract. In this case, v = j 2r for some j. Most of the time j and r will be relatively
142 CHAPTER III. QUANTUM COMPUTATION
def
Period a power of 2: If r | N , then the resulting state will be v = |kN/ri
for some k ∈ N. Therefore k/r = v/N . If k and r are relatively prime, as is
likely, then reducing the fraction v/N to lowest terms will produce r in the
denominator. In this case the period is discovered.
Period not a power of 2: In the case r does not divide N , it’s often pos-
sible to guess the period from a continued fraction expansion of v/N .14 If
v/N is sufficiently close to p/q, then a continued fraction expansion of v/N
will contain the continued fraction expansion of p/q. For example, suppose
the measurement returns v = 427, which is not a power of two. This is the
result of the continued fraction expansion of v/N (in this case, 427/512) (see
IQC):
i ai pi qi i
0 0 0 1 0.8339844
1 1 1 1 0.1990632
2 5 5 6 0.02352941
3 42 211 253 0.5
got the period q in Step 5. If the guess q is even, then aq/2 + 1 and aq/2 − 1
are likely to have common factors with M . Use the Euclidean algorithm to
check this. The reason is as follows. If q is the period of g(x) = ax (mod M ),
then aq = 1 (mod M ). This is because, if q is the period, then for all x,
g(x + q) = g(x), that is, aq+x = aq ax = ax (mod M ) for all x. Therefore
aq − 1 = 0 (mod M ). Hence,
Iteration: There are several reasons that the preceding steps might not have
succeeded: (1) The value v projected from the spectrum might not be close
enough to a multiple of N/r (D.3.b). (2) In D.3.b, k and r might not be
relatively prime, so that the denominator is only a factor of the period, but
not the period itself. (3) In D.3.b, one of the two factors turns out to be a
multiple of M . (4) In D.3.b, q was odd. In these cases, a few repetitions of
the preceding steps yields a factor of M .
15
This section is based primarily on Erik Lucero, R. Barends, Y. Chen, J. Kelly, M.
Mariantoni, A. Megrant, P. O’Malley, D. Sank, A. Vainsencher, J. Wenner, T. White, Y.
Yin, A. N. Cleland & John M. Martinis, “Computing prime factors with a Josephson phase
qubit quantum processor.” Nature Physics 8, 719–723 (2012) doi:10.1038/nphys2385
[CPF].
16
See Martin-Lópex et al., “Experimental realization of Shor’s quantum factoring algo-
rithm using qubit recycling,” Nature Photonics 6, 773–6 (2012).
146 CHAPTER III. QUANTUM COMPUTATION
algorithm Grover:
Input: Let M be the size of the solution space and pick n such that 2n ≥ M .
17
This section is based primarily on Rieffel & Polak (2000).
measurement has projected the state of Eq. 2 onto the subspace !ki=1 |xi , 1! where
measurement has projected the state of Eq. 2 onto the subspace √12kk i=1 |xi , 1! where
k is the number of solutions. Further measurement of the remaining2 bits will provide one
kofisthese
the number of solutions. Further measurement of the remaining bits
solutions. If the measurement of qubit P (x) yields 0, then the whole will provide one
process is
of theseover
started solutions.
and theIfsuperposition
the measurement
of Eq.of2 qubit P (x)
must be yields 0,
computed then the whole process is
again.
started over algorithm
Grover’s and the superposition
then consistsofofEq.
the2following
must be computed
steps: again.
Grover’s algorithm then consists of the following steps:
(1) Prepare a register containing a superposition of all possible values xi ∈ [0 . . . 2n − 1].
(1) Prepare a register containing a superposition of all possible values xi ∈ [0 . . . 2n − 1].
(2) Compute P (xi ) on this register.
(2) Compute P (xi ) on this register.
(3) Change amplitude aj to −aj for xj such that P (xj ) = 1. An efficient algorithm for
(3) QUANTUM
D. Change
changingamplitude aj to −a
ALGORITHMS
selected signs j for xj such
is described that P 7.1.2.
in section (xj ) =A1.plot
Anofefficient algorithmafter
the amplitudes for 147
changing selected signs
this step is shown here. is described in section 7.1.2. A plot of the amplitudes after
this step is shown here.
average
average
0
0
(4) Apply inversion about the average to increase amplitude of xj with P (xj ) = 1. The
(4) Apply
Figure inversion
III.28:
quantum about
Depiction
algorithm the of
average to increase
theperform
to efficiently result of amplitude
phase
inversion of
thexjaverage
rotation
about with P is
(xgiven
j ) = 1.
(changing The
inthe
sec-sign)
quantum
of solutions algorithm
tion 7.1.1.inThe
Grover’sto efficiently perform
algorithm.
resulting amplitudes look inversion
[source: about
as shown,Rieffel the average
& amplitude
where the is given
Polak (2000)] in sec-
of all the xi ’s
tion
with7.1.1.
P (xi )The
= 0resulting
have beenamplitudes look
diminished as shown, where the amplitude of all the xi ’s
imperceptibly.
with P (xi ) = 0 have been diminished imperceptibly.
average
average
0
0 √
(5) Repeat steps 2 through 4 ππ4 √ 2n times.
(5) Repeat steps 2 through 4 4 2n times.
(6) Read
Figure the result.
III.29: Depiction of result of inversion about the mean in Grover’s
(6) Read the result.
algorithm.
Boyer et.al.[source:
[Boyer et Rieffel
al. 1996] & Polak
provide (2000)]
a detailed analysis of the performance of Grover’s
Boyer et.al.
algorithm. They[Boyer
proveetthat
al. 1996] provide
Grover’s a detailed
algorithm analysis
is optimal upoftothe performance
a constant of no
factor; Grover’s
quan-
algorithm. They prove that Grover’s algorithm
tum algorithm can perform an unstructured search √ is optimal up to a constant factor;
faster. They also show that if there no quan-is
tum
onlyalgorithm
a single x0can perform
such that P (xan )unstructured searchπ √ faster. They also
n iterations show 2that if there is
0 is true, then after π 2 of steps through 4 the
def nx such that P (x
onlyNa single def ) is √ then after 8 2n iterations of steps
Let = 2is 0.5.
failure rate andAfter
0 let N =0 2nππ4true,
iterating =2{0,
√ 1, . .the
n times N − 1},
. ,8failure ratethedropssetto of−n
2−n
2 through
n-bit 4 the We
strings.
. Interestingly,
failure rate is 0.5. After iterating 2 n times the failure rate drops to 2√ . Interestingly,
are given iterations
additional a computable predicate
will increase P rate.
4the failure : N For → example,
2. Suppose after ππ2we
√ 2nhave
n
a quantum
iterations the
additional
failure iterations
rate isUclose to will
1. increase the failure rate. For example, after 2 2 iterations the
gate array P (an oracle) that computes the predicate:
failure
Thererate
areismany
close to 1. algorithms in which a procedure is repeated over and over again
classical
There are many classical
for ever better results. Repeatingalgorithms
quantumin which a procedure
procedures may is repeated
improve overfor
results anda over again
while, but
for ever better results. RepeatingUquantumP |x, yi procedures
= |x, y ⊕may P (x)i.
improve results for a while, but
Notice that the components we want, |x, 1i, and the components we don’t
want, |x, 0i, all have the same amplitude, √1N . So if we measure the state,
the chances of getting a hit are very small, O(2−n ). The trick, therefore, is
to amplify the components that we want at the expense of the ones we don’t
want; this is what Grover’s algorithm accomplishes.
148 CHAPTER III. QUANTUM COMPUTATION
Inversion about mean: Next, we invert all the components around their
mean amplitude (which is a little less than the amplitudes of the non-
solutions); the result is shown in Fig. III.29. As a result of this operation,
amplitudes of non-solutions go from a little above the mean to a little below
it, but amplitudes of solutions go from far below the mean to far above it.
This amplifies the solutions.
Iteration: This Grover iteration (the sign change and inversion about the
√
π N
√
mean) is repeated 4 times. Thus the algorithm is O( N ).
In the following geometric analysis, I will suppose that there is just one
answer α such that P (α) = 1; then |αi is the desired answer vector. Let
|ωi be a uniform superposition of all the other (non-answer) states, and
observe that |αiqand |ωi are orthonormal. Therefore, initially the state is
√1 |αi N −1
|ψ0 i = N
+ N
|ωi. In general, after k iterations the state is |ψk i =
D. QUANTUM ALGORITHMS 149
2â
∣𝜑⊥⟩
∣𝜓⟩
b
∣𝜑⟩
–a
∣𝜓′⟩
Figure III.31: Reflection across arbitrary vector. Reflection of |ψi across |φi
in plane with |φ⊥ i. |ψi = a|φ⊥ i + b|φi becomes |ψ 0 i = −a|φ⊥ i + b|φi.
|α〉
|ψ1〉
2θ
|ψ0〉
θ
θ |ω〉
|ψ'0〉
Figure III.32: Geometry of first Grover iteration.
D. QUANTUM ALGORITHMS 151
q
product cos θ = hψ0 | ωi = NN−1 . Therefore the sign change reflects |ψ0 i
from θ above |ωi into |ψ00 i, which is θ below it. Inversion about the mean
reflects |ψ00 i from 2θ below |ψ0 i into a state we call |ψ1 i, which is 2θ above it.
Therefore, in going from |ψ0 i to |ψ1 i the state vector has rotated 2θ closer
to |αi.
You can see that after k iterations, the state vector |ψk i will be (2k + 1)θ
above |ωi. We can solve (2k + 1)θ = π/2 to get the required number of
iterations to bring |ψk i to |αi. Note that for small θ, θ ≈ sin θ = √1N (which
√ √
is certainly small).√ Hence, we want (2k +1)/ N ≈ π/2, or 2k +1 ≈ π N /2.
That
√ is, k ≈ π N /4 is the required number of iterations. Note that after
π N /8 iterations, we are about halfway there (i.e., π/4), so the probability
of success is 50%. In general, the probability of success is about sin2 2k+1 √ .
N
Now for the techniques for changing the sign and inversion about the
mean. Let |ψk i be the state after k iterations (k ≥ 0). To change the sign,
simply apply UP to |ψk i|−i. To see the result, let X0 = {x | P (x) = 0} and
X1 = {x | P (x) = 1}, the solution set. Then:
UP |ψk i|−i
" #
X
= UP ax |x, −i
x∈N
" #
1 X
= UP √ ax |x, 0i − ax |x, 1i
2 x∈N
" #
1 X X X X
= √ UP ax |x, 0i + ax |x, 0i − ax |x, 1i − ax |x, 1i
2 x∈X0 x∈X1 x∈X0 x∈X1
"
1 X X
= √ ax UP |x, 0i + ax UP |x, 0i
2 x∈X0 x∈X1
#
X X
− ax UP |x, 1i − ax UP |x, 1i
x∈X0 x∈X1
"
1 X X
= √ ax |x, 0i + ax |x, 1i
2 x∈X0 x∈X1
#
X X
− ax |x, 1 ⊕ 0i − ax |x, 1 ⊕ 1i
x∈X0 x∈X1
152 CHAPTER III. QUANTUM COMPUTATION
" #
1 X X X X
= √ ax |xi|0i + ax |xi|1i − ax |xi|1i − ax |xi|0i
2 x∈X0 x∈X1 x∈X0 x∈X1
!
1 X X
= √ ax |xi − ax |xi (|0i − |1i)
2 x∈X0 x∈X1
!
X X
= ax |xi − ax |xi |−i.
x∈X0 x∈X1
Therefore the signs of the solutions have been reversed (they have been ro-
tated by π). Notice how |−i in the target register can be used to separate
the 0 and 1 results by rotation. This is a useful idea!
It remains to show the connection between inversion about the mean and
reflection across |ψ0 i. This reflection is given by Rψ0 = 2|ψ0 ihψ0 | − I. Note
that:
! !
1 X 1 X 1 XX
|ψ0 ihψ0 | = √ |xi √ hy| = |xihy|.
N x∈N N y∈N N x∈N y∈N
To see this, write aj = ā ± δj , that is, as a difference from the mean. Then
2ā − aj = 2ā − (ā ± δj ) = ā ∓ δj . Therefore an amplitude δj below the
mean will be transformed to δj above, and vice verse. But an amplitude
that is negative, and thus very far below the mean, will be transformed to
an amplitude much above the mean. This is exactly what we want in order
to amplify the negative components, which correspond to solutions.
D. QUANTUM ALGORITHMS 153
2 2
This matrix has N
− 1 on the main diagonal and N
in the off-diagonal
elements: 2
2 2
N
−1 N
··· N
2 2
− 1 ··· 2
D = N.. N
.. ...
N
.. .
. . .
2 2 2
N N
··· N
−1
Note that D = 2|ψ0 ihψ0 |−I = Rψ0 . It is easy to confirm that DD† = I (Exer.
III.49), so the matrix is unitary and therefore a possible quantum operation,
but it remains to be seen if it can be implemented efficiently.
We claim D = W RW , where W = H ⊗n is the n-qubit Walsh-Hadamard
transform and R is the phase rotation matrix:
1 0 ··· 0
0
def 0 −1 · · ·
R = .. .. . . .. .
. . . .
0 0 · · · −1
Figure III.33: Circuit for√ Grover’s algorithm. The Grover iteration in the
dashed box is repeated π 4N times.
|1111!
$$!!
$$$ #" !!!
$$ # " !!
$$ # " !!
|1110! |1101! |1011! |0111!
& &'''( '(((''' (((
(' & &'''' &&%%
& ((( '' (' (( &''' &''' %
&&
(((( (((( ' &'
&' '
' &'
&' ''
''%
'%
'%
&
( ( & '& '
|1100! |1010! |1001! |0110! |0101! |0011!
'
%' ''' ' ( ((
%''' && ''''
'' && (((( ((& &
%% ''' && ''' &&(' (' (' ( (((&&
%%&&''''( &( &'
('(' '(
'((('''&&
|1000! |0100! |0010! |0001!
!! $$
!! " # $$
!! " #$$$
!!
"#$
!$
|0000!
We want to transform 1 X
W |xi = √ √
(−)x·z |zi.
2mn
|000! → 1/ z∈MN
3(|001! + |010! + |100!
D. QUANTUM ALGORITHMS 157
As shown in Sec. A.2.c (p. 70), we can derive a matrix representation for W :
Wjk = hj | W | ki
1 X
= hj| √ (−)k·z |zi
2mn z∈MN
1 X
= √ (−)k·z hj | zi
mn
2 z∈MN
1
= √ (−1)k·j .
2 mn
Note that k · j = |k ∩ j|, where on the right-hand side we interpret the bit
strings as sets.
The general approach is to try to steer amplitude away from sets that
violate the constraints, but the best technique depends on the particular
problem. One technique is to invert the phase on bad subsets so that they
tend to cancel the contribution of good subsets to supersets. This could be
done by a process like Grover’s algorithm using a predicate that tests for
violation of constraints. Another approach is to assign random phases to
bad sets.
It is difficult to analyze the probability that an iteration of a heuristic
algorithm will produce a solution, and so its efficiency is usually evaluated
empirically, but empirical tests will be difficult to apply to quantum heuristic
search until larger quantum computers are available, since classical computers
require exponential time to simulate quantum systems. Small simulations,
however, indicate that Hogg’s algorithms may provide polynomial speedup
over Grover’s algorithm.
158 CHAPTER III. QUANTUM COMPUTATION
Figure III.36: Effects of decoherence on a qubit. On the left is a qubit |yi that
is mostly isoloated from its environment |Ωi. On the right, a weak interaction
between the qubit and the environment has led to a possibly altered qubit
|xi and a correspondingly (slightly) altered environment |Ωxy i.
Quantum coherence is very difficult to maintain for long.18 Even weak inter-
actions with the environment can affect the quantum state, and we’ve seen
that the amplitudes of the quantum state are critical to quantum algorithms.
On classical computers, bits are represented by very large numbers of parti-
cles (but that is changing). On quantum computers, qubits are represented
by atomic-scale states or objects (photons, nuclear spins, electrons, trapped
ions, etc.). They are very likely to become entangled with computationally
irrelevant states of the computer and its environment, which are out of our
control. Quantum error correction is similar to classical error correction in
that additional bits are introduced, creating redundancy that can be used to
correct errors. It is different from classical error correction in that: (a) We
want to restore the entire quantum state (i.e., the continuous amplitudes),
not just 0s and 1s. Further, errors are continuous and can accumulate. (b)
It must obey the no-cloning theorem. (c) Measurement destroys quantum
information.
18
This section follows Rieffel & Polak (2000).
D. QUANTUM ALGORITHMS 159
In this notation the state vectors |Ωxy i are not normalized, but incorporate
the amplitudes of the various outcomes. In the case of no error, |Ω00 i =
|Ω11 i = |Ωi and |Ω01 i = |Ω10 i = 0. If the entanglement with the environment
is small, then kΩ01 k, kΩ10 k 1 (small exchange of amplitude).
def
Define decoherence operators Dxy |Ωi = |Ωxy i, for x, y ∈ 2, which describe
the effect of the decoherence on the environment. (These are not unitary, but
are the products of scalar amplitudes and unitary operators for the various
outcomes.) Then the evolution of the joint system is defined by the equations:
1
D = [D00 ⊗ (I + Z) + D01 ⊗ (X − Y ) +
2
D10 ⊗ (X + Y ) + D11 ⊗ (I − Z)]
1
= [(D00 + D11 ) ⊗ I + (D10 + D01 ) ⊗ X +
2
(D10 − D01 ) ⊗ Y + (D00 − D11 ) ⊗ Z].
160 CHAPTER III. QUANTUM COMPUTATION
Therefore the bit flip error can be described as a linear combination of the
Pauli matrices. It is generally the case that the effect of decoherence on a
single qubit can be described by a linear combination of the Pauli matrices,
which is important, since qubits are subject to various errors beside bit flips.
This is a distinctive feature about quantum errors: they have a finite basis,
and because they are unitary, they are therefore invertible. In other words,
single-qubit errors can be characterized in terms of a linear combination
of the Pauli matrices (which span the space of 2 × 2 self-adjoint unitary
matrices: C.2.a, p. 105): I (no error), X (bit flip error), Y (phase error), and
Z = Y X (bit flip phase error).PTherefore a single qubit error is represented
by e0 σ0 + e1 σ1 + e2 σ2 + e3 σ3 = 3j=0 ej σj , where the σj are the Pauli matrices
(Sec. C.2.a, p. 105).
Correction: Since the errors are unitary, and the syndrome is known, we
−1
can invert the error and thereby correct it: y = ES(ỹ) (ỹ).
D. QUANTUM ALGORITHMS 161
Figure III.37: Circuit for quantum error correction. |ψi is the n-qubit quan-
tum state to be encoded by C, which adds m error-correction qubits to yield
the encoded state |φi. E is a unitary superposition of error operators Ej ,
which alter the quantum state to |φ̃i. S is the syndrome extraction operator,
which computes a superposition of codes for the errors E. The syndrome
register is measured, to yield a particular syndrome code j ∗ , which is used to
select a corresponding inverse error transformation Ej−1
∗ to correct the error.
Quantum case: Now consider the quantum case, in which the state |ψi is a
superposition
P of basis vectors, and the error is a superposition of error types,
E = j ej Ej . This is an orthogonal decomposition of E (see Fig. III.37).
def
Encoding: The encoded state is |φi = C|ψi|0i. There are several require-
ments for a useful quantum error correcting code. Obviously, the codes for
orthogonal inputs must be orthogonal; that is, if hψ | ψ 0 i = 0, then C|ψ, 0i
and C|ψ 0 , 0i are orthogonal: hψ, 0|C † C|ψ 0 , 0i = 0. Next, if |φi and |φ0 i are
the codes of distinct inputs, we do not want them to be confused by the error
processes, so hφ|Ej† Ek |φ0 i = 0 for all i, j. Finally, we require that for each
pair of error indices j, k, there is a number mjk such that hφ|Ej† Ek |φi = mjk
for every code |φi. This means that the error syndromes are independent of
the codes, and therefore the syndromes can be measured without collapsing
superpositions in the codes, which would make them useless for quantum
computation.
def
Error process: Let |φ̃i = E|φi be the code corrupted by error.
428 Quantum error-correction
the logical |0! and logical |1! states, not the physical zero and one states. A circuit
162 CHAPTER
performing this encoding is illustrated in FigureIII.
10.2.QUANTUM COMPUTATION
|ψ" • •
|0" ⊕
|0" ⊕
Figure 10.2. Encoding circuit for the three qubit bit flip code. The data to be encoded enters the circuit on the top
line.
Figure III.38: Quantum encoding circuit for triple repetition code. [source:
NC]
Exercise 10.1: Verify that the encoding circuit in Figure 10.2 works as claimed.
Suppose theextraction:
Syndrome initial state a|0!Apply
+ b|1! has
thebeen perfectly encoded
syndrome as a|000!
extraction + b|111!.
operator to the
Each of the three qubits is passed through an independent copy of
encoded state, augmented with enough extra qubits to represent the set ofthe bit flip channel.
Suppose a bit flip occurred on one or fewer of the qubits. There is a simple two stage
syndromes. This yields a superposition of syndromes:
error-correction procedure which ! can be used to recover the correct quantum state in
this case: X X X
S|φ̃, 0i = S ej Ej |φi ⊗ |0i = ej (SEj |φi|0i) = ej (Ej |φi|ji).
(1) Error-detectionj or syndrome diagnosis: Wej perform a measurement j which tells us
what error, if any, occurred on the quantum state. The measurement result is called
the error syndrome. For the bit flip channel there are four error syndromes,
corresponding to the four projection operators:
Measurement: Measure the syndrome register to obtain some j ∗ and the
≡ |000!#000| + |111!#111| no error (10.5)
collapsed state EjP∗0|φi|j ∗ 19
i.
P1 ≡ |100!#100| + |011!#011| bit flip on qubit one (10.6)
P2 ≡ |010!#010| + |101!#101| bit flip on qubit two (10.7)
Correction: Apply Ej−1
∗ to correct the error.
P3 ≡ |001!#001| + |110!#110| bit flip on qubit three. (10.8)
Suppose for example that a bit flip occurs on qubit one, so the corrupted state is
a|100! + b|011!. Notice that #ψ|P1 |ψ! = 1 in this case, so the outcome of the
measurement result (the error
Note the remarkable syndrome)
fact that is certainly
although there1. was
Furthermore, the syndrome
a superposition of er-
measurement does not cause any change to the state: it is a|100! + b|011! both
rors, we only have to correct one of them to get the original state back. This
before and after syndrome measurement. Note that the syndrome contains only
is because measurement
information of the
about what error haserror syndrome
occurred, and doescollapses
not allow into
us to ainfer
state affected
anything
by just that one error.
about the value of a or b, that is, it contains no information about the state being
protected. This is a generic feature of syndrome measurements, since to obtain
D.5.d information
Exampleabout the identity of a quantum state it is in general necessary to
perturb that state.
We’ll work through
(2) Recovery: We use an the example
value of thetoerror
illustrate
syndrome thetoerror
tell uscorrection process.
what procedure to useFor
an example, suppose
to recover the we use
initial state. For aexample,
simpleif triple redundancy
the error syndrome wascode that assigns
1, indicating a
19 bit flip on the first qubit, then we flip that qubit again, recovering the original state
As we mentioned the discussion of in Shor’s algorithm (p. 140), it is not necessary to
a|000!
actually + b|111!
perform the with perfect accuracy.
measurement; the sameThe fourcan
effect possible error syndromes
be obtained by unitaryand the
operations.
recovery procedure in each case are: 0 (no error) – do nothing; 1 (bit flip on first
qubit) – flip the first qubit again; 2 (bit flip on second qubit) – flip the second qubit
D. QUANTUM ALGORITHMS 163
|0i 7→ |000i and |1i 7→ |111i. This is accomplished by a simple quantum gate
array:
C|0i|00i = |000i, C|1i|00i = |111i.
This is not a sophisticated code! It’s called a repetition code. The three-qubit
codes are called logical zero and logical one (See Fig. III.38). This code can
correct single bit flips (by majority voting); the errors are represented by the
operators:
E0 = I ⊗I ⊗I
E1 = I ⊗I ⊗X
E2 = I ⊗X ⊗I
E3 = X ⊗ I ⊗ I.
The ⊕s compare each pair of bits, and so the ⊕ will be zero if the two bits
are the same (the majority). The following table shows the bit flipped (if
any), the corresponding syndrome, and the operator to correct it (which is
the same as the operator that caused the error):
|φ̃i = E|φi
4 3 1
= X ⊗ I ⊗ I + I ⊗ X ⊗ I √ (|000i − |111i)
5 5 2
164 CHAPTER III. QUANTUM COMPUTATION
4 3
= √ X ⊗ I ⊗ I(|000i − |111i) + √ I ⊗ X ⊗ I(|000i − |111i)
5 2 5 2
4 3
= √ (|100i − |011i) + √ (|010i − |101i).
5 2 5 2
Applying the syndrome extraction operator yields:
4 3
S|φ̃, 000i = S √ (|100000i − |011000i) + √ (|010000i − |101000i)
5 2 5 2
4 3
= √ (|100011i − |011011i) + √ (|010101i − |101101i)
5 2 5 2
4 3
= √ (|100i − |011i) ⊗ |011i + √ (|010i − |101i) ⊗ |101i
5 2 5 2
Measuring the syndrome register yields either |011i (representing an error in
bit 3) or |101i (representing an error in bit 2). Suppose we get |011i. The
state collapses into:
1
√ (|100i − |011i) ⊗ |011i.
2
Note that we have projected into a subspace for just one of the two bit-flip
errors that occurred (the flip in bit 3). The measured syndrome |011i tells
us to apply X ⊗ I ⊗ I to the first three bits, which restores |φi:
1 1
(X ⊗ I ⊗ I) √ (|100i − |011i) = √ (|000i − |111i) = |φi.
2 2
We can do something similar to correct single phase flip (Z) errors by using
the encoding |0i 7→ | + ++i, |1i 7→ | − −−i (Exer. III.54). To see this, recall
that Z in the sign basis is the analog of X is the computational basis.
D.5.e Discussion
There is a nine-qubit code, called the Shor code, that can correct arbitrary
errors on a single qubit, even replacing the entire qubit by garbage (Nielsen
& Chuang, 2010, §10.2). The essence of this code is that triple redundancy
is used to correct X errors, and triple redundancy again to correct Z errors,
thus requiring nine code qubits for each logical qubut. Since Y = ZX and
the Pauli matrices are a basis, this code is able to correct all errors.
Quantum error correction is remarkable in that an entire continuum of
errors can be corrected by correcting only a discrete set of errors. This
D. QUANTUM ALGORITHMS 165
20
See Nielsen & Chuang (2010, §10.6.4).
166 CHAPTER III. QUANTUM COMPUTATION
E Abrams-Lloyd theorem
E.1 Overview
All experiments to date confirm the linearity of QM, but what would be
the consequences of slight nonlinearities?21 We will see that nonlinearities
can be exploited to solve NP problems (and in fact harder problems) in
polynomial time. This fact demonstrates clearly that computability and
complexity are not purely mathematical matters. Because computation is
inherently physical, fundamental physics is intertwined with fundamental
computation theory. It also exposes the fact that there are hidden physical
assumptions in the traditional theory of computation.
How could nonlinearities be exploited in quantum computation? Recall
that quantum state vectors lie on the unit sphere and unitary transformations
preserve “angles” (inner products) between vectors. Nonunitary transforma-
tions would, in effect, stretch the sphere, so the angles between vectors could
change. Unitary transformations could be used to position the vectors to the
correct position for subsequent nonunitary transformations. The following
algorithm exploits a nonlinear operator to separate vectors that are initially
close together.
Figure III.39: Quantum circuit for Abrams-Lloyd algorithm. The first mea-
surement produces 0 with probability greater than 1/4, but if it yields a
nonzero state, we try again. The N parallelogram represents a hypothetical
nonlinear quantum state transformation, which may be repeated to yield a
macroscopically observable separation of the solution and no-solution vectors.
algorithm Abrams-Lloyd:
1 X
= √ (Wn |xi)|P (x)i
2n x∈2n
" #
1 X 1 X
= √ √ (−)x·z |zi |P (x)i.
n
2 x∈2n n
2 z∈2n
The last step applies by Eq. III.24 (p. 129). That is,
X X 1
|ψ2 i = n
(−)x·z |zi|P (x)i.
x∈2n z∈2n
2
At least half of the 2n vectors x must have the same value, a = P (x) (since
P (x) ∈ 2). Therefore the amplitude of |0, ai is at least 1/2, and the proba-
bility of observing |0, ai is at least 1/4. We get a non-zero data register with
probability ≤ 3/4. (If we happen to observe |x0 , 1i, then of course we have
our answer.)
In the case in which we get a zero data register, measurement of the data
register yields the state:
≥ 14 −1 s 2n − s −1 s 2n − s
|ψ2 i −→ Z |0i|1i + |0i|0i = Z |0i n |1i + |0i ,
2n 2n 2 2n
macroscopically distinguishable.”
Step 5 (measure result register): Measure the result qubit. If the vec-
tors have been sufficiently separated, there will be a significant probability
of observing |1i in the s 6= 0 case.
a ..... o
b ,,, SUM
o ^ CARRY
Fig. 4. Adder.
Figure III.40: Simple adder using reversible logic. [fig. from Feynman (1986)]
Quantum Mechanical Computers 511
0 ~ C I ...... (3 = 0I
SI
b A s'- --l---b b'
C SUM c'
C1
d=o' CARRY = d'
Fig. 5. Full adder.
Figure III.41: Full adder using reversible logic. [fig. from Feynman (1986)]
successively on a pair of lines, but with alternate choice for control line,
accomplishes an exchange of the information on the lines (Fig. 3b).
It turns out that combinations of just these two elements alone
what are
amount to negative
insufficient probabilities,
to accomplish arbitrary and
logicalwefunctions.
don’t know Somehow to do this
element
classically. We’ve
involving threeseen
lineshow in quantum
is necessary. mechanics,
We have chosen whatprobabilities
we can callcanthein effect
cancelC O
byN Tdestructive
ROLLED CO interference
N T R O L L E D ofNOT.
the Here
wavefunctions.
(see Fig. 3c) weThehave
conclusion
two is
control lines a, b, which appear unchanged in the output
that no conventional computer can efficiently simulate a quantum computer. and which change
the third line c to N O T c only if both lines are activated (a = 1 and b = 1).
Therefore, if we want to (efficiently) simulate any physical system, we need
Otherwise c ' = c. If the third line input c is set to 0, then evidently it
a quantum computer.
becomes 1(c' = 1) only if both a and b are 1 and therefore supplies us with
the AND function (see Table I).
Three combinations
F.1.b Universal quantum for (a, computer
b), namely (0, 0), (0, 1), and (1, 0), all give
the same value, 0, to the AND (a, b) function so the ambiguity requires
twoFeynman
In 1985 bits to resolve it. These
described are kept
several in the lines
possible designsa, b in
forthea universal
output so the
quantum
function
23 can be reversed (by itself, in fact). The AND function is the carry
computer. He observes that NOT, CNOT, and CCNOT are sufficient for
bit for the sum of a and b.
any logic From
gate, these
as well as for COPY and EXCHANGE, and therefore for
elements it is known that any logical circuit can be put
universal computation.
together by using them Heinexhibits circuits
combination, andforin afact,
simple adder science
computer (Fig. III.40)
and a full adder (Fig. III.41).
The goal is to construct a Hamiltonian to govern the operation of a quan-
Table I.
tum computer. Feynman describes quantum logic gates in terms of two prim-
itive operations, which change
a b
thec stateo ' of b'an ¢'“atom” (two-state system or
“wire”). Letters near the beginning of the alphabet (a, b, c, . . .) are used for
0 0 0 0 0 0
23
This section is based primarily
0 on
0 Feynman
1 0 (1986).
0 1
0 1 0 0 1 0
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 1 0 l
1 1 0 1 1 1
1 1 1 1 1 0
172 CHAPTER III. QUANTUM COMPUTATION
data or register atoms, and those toward the end (p, q, r, . . .) for program
atoms (which are used for sequencing operations). In this simple sequential
computer, only one program atom is set at a time.
For a single line a, the annihilation operator is defined:
0 1
a= = |0ih1|.
0 0
The annihilator changes the state |1i to |0i; typically it lowers the energy of
a quantum system. Applied to |0i, it leaves the state unchanged and returns
the zero vector 0 (which is not a meaningful quantum state). It matches
|1i and resets it to |0i. The operation is not unitary (because not norm
preserving). It is a “partial NOT” operation.
Its conjugate transpose it the creation operation:24
∗ 0 0
a = = |1ih0|.
1 0
The creator transforms |0i to |1i, but leaves |1i alone, returning 0; typically
it raises the energy of a quantum system. It matches |0i and sets it to |1i.
This is the other half of NOT = a + a∗ .
Feynman also defines a number operation or 1-test: Consider25
∗ 0 0
Na = a a = = |1ih1|.
0 1
This has the effect of returning |1i for input |1i, but 0 for |0i:
<o
in a ballistic fashion, is of the order 6 x 10 -15 sec. This does not represent
C
...... q
IF c = t GO p TO q AND PUT c =O
p ~ IF c =O GO p TO r AND PUT c = I
I
IF c = I GO r TO p AND PUT c : O
(Feynman writes this 1 − Na because he writes 1 = I.) This has the effect
of returning |0i for input |0i, but 0 for |1i. This is test for |0i. (It is the rest
of the identity operation.)
The two operations a and a∗ are sufficient for creating all 2 × 2 matrices,
and therefore all transformations on a single qubit. Note that
w x
= waa∗ + xa + ya∗ + za∗ a.
y z
O a
sMiNoT01tM
s =b+ t
I sN tN I
Fig. 8. CONTROLLED NOT by switches,
Figure III.43: CNOT implemented by switches. 0/1 annotations on the wires
show the a values. [fig. from Feynman (1986)]
In these diagrams, horizontal or vertical lines will represent program
atoms. The switches are represented by diagonal lines and in boxes we'll
put the other matrices that operate on registers such as the N O T b. To be
at p moves to the
specific, q, but if c = |0i
Hamiltonian foritthis
moves to r. Itofalso
little section a C Onegates
N T R O L Lc Ein
D the process.
NOT,
thinking of it as starting at
It’s also reversible (see Fig. III.42). s and ending at t, is given below:
The switch is aHe(s,
tensor product on |c, p, q, ri:
t) = s ' a s + t*a*tM + t*(b + b*) sM + s~va*s
q ∗ cpnt- +
l ' a t ∗u q-
r c∗ pt~rs
+ u[p+∗ cC.C
∗
q + p∗ cr].
(The c.c means to add the complex conjugate of all the previous terms.)
(The bracketed Although expression
there seemis tojustbe the
two complex
routes hereconjugate
which would of the first part,
possibly
required for reversibility.)
produce Read the factors
all kinds of complications in each
characteristic of term
quantumfrommechanics,
right to left:
(1) q ∗ cp:
this ifis pnotand
so. cIf are
the entire computer
set, then unsetsystem
themisand started
setinq.a definite state for
(2) r∗ ca∗ p:
by if thep time
is settheand
cursor
c isreaches s, the
not set, theatom
unseta ispstill
andin set
some definite
c and r. state
(although possibly different from its initial state due to previous computer
Fig.operations
III.43 shows CNOT implemented by switches. This is the controlled-
on it). Thus only one of the two routes is taken. The expression
NOT applied to data by
may be simplified a, bomitting
and sequenced by cursor
the S * I N term atoms
and putting t u = Ss,N.
t (= start, ter-
minate). If One a need
= 1 not thebecursor state
concerned in moves
that case,along theroute
that one top isline, and(two
longer if a = 0
along thecursor sites) than
bottom. If itthe otheralong
moves (one cursor
the top, site)then
for it
again there bis+no
applies b∗ inter-
to negate b
ference.
(otherwise No scattering
leaving is produced
it alone). In eitherin case,
any casethebycursor
the insertion
arrivesinto at athe
chain
reversed
of coupled sites, an extra piece of chain of any number of sites with the
switch,samewhere sets the next cursor atom t. We can write it
mutual coupling between sites (analogous to matching impedances in
transmission lines).
∗
Ha,b (s, t)To
= sstudy, + t∗ athings
M as these
∗
t∗M (b +web∗think
tM +further, ∗ ∗
)sM +ofsputting t∗ atNtogether.
N a s +pieces + t∗N sNA+ c.c,
piece (see Fig. 9) M might be represented as a logical unit of interacting
where parts
“c.c”in means
which weto only
add represent
the complex
the firstconjugates
input cursorofsite
theas preceding
sM and the terms.
final one at the other end as tM. All the
Read the factors in each term from right to left: rest of the program sites that are
∗ between sM and tM are considered internal parts of M, and M contains its
(1) sM as: if s and a are set, then unset them and set sM .
registers. Only sM and t M are sites that may be coupled externally.
(4) s∗N a∗ s: if s is set and a in unset, then unset s and set sN and a.
(6) t∗N sN : if sN is set, then unset it and set tN .
(3) t∗M (b + b∗ )sM : if sM is set, then unset it, negate b and set tM .
The first thing we do is to copy (using C O N T R O L L E D NOT's) our
external input into M. Then M operates, and the cursor goes on the top
line in our drawing. It copies the output out of M into the external output
register. M now contains garbage. Next it changes f to N O T f, comes down
on the other line of the switch, backs out through M clearing the garbage,
and uncopies the input again.
When you copy data and do it again, you reduce one of the registers
to 0, the register into which you coied the first time. After the coying, it
F. UNIVERSAL
goes out (sinceQUANTUM COMPUTERS
f is now changed) on the other line where we restore f to 0 175
f f
coPY I
i ~ NOT f IJ
(5) t∗ atN : if tN and a are set (as a must be to get here), then unset them
and set t.
(2) t∗ a∗ tM : if tM is set and a is unset (as it must be to get here), then reverse
their states and set t.
(The t∗N sN term can be eliminated by setting tN = sN .)
program atom t. At the end of the process, everything is reset to the initial
conditions, except that we have the result in the Out register. Feynman also
discusses how to do subroutines and other useful programming constructs,
but we won’t go into those details.
|ψ(nT )i = U n |ψ(0)i.
hx0 ; n0 ; m0 | U | x; n; mi
Y m
= [δxx+1
0 U + (n0 , m0x |n, mx ) + δxx−1
0 U − (n0 , m0x |n, mx )] δm0yy .
y6=x
U + and U − represent moves to the right and left, respectively. The first
two δ functions ensure that the tape position cannot move by more than one
position in a step. The final product of deltas ensures that all the other tape
positions are unchanged; it’s equivalent to: ∀y 6= x : m0y = my . The U +
and U − functions define the actual transitions of the machine in terms of the
processor state and the symbol under the tape head. Each choice defines a
quantum computer Q[U + , U − ].
The machine cannot be observed before it has halted, since that would
generally alter its state. Therefore one of the processor’s bits is chosen as a
halt indicator. It can be observed from time to time without affecting the
computation.
Q can simulate TMs, but also any other quantum computer to arbitrary
precision. In fact, it can simulate any finitely realizable physical system to
arbitrary precision, and it can simulate some physical systems that go beyond
the power of TMs (hypercomputation).
178 CHAPTER III. QUANTUM COMPUTATION
this work; Tversky had already died.) Since then many other psychologists
have confirmed and extended their findings. They concluded that every-
day human reasoning follows the laws of neither classical logic nor classical
probability theory. “Many of these findings relate to order/context effects,
violations of the law of total probability (which is fundamental to Bayesian
modeling), and failures of compositionality.” (Pothos & Busemeyer, 2013)
(iii) Quantum probability (QP). The mathematics of quantum me-
chanics provides an alternative system (axiomatization) of probability which
has the potential to account for these violations of CP, as we will see. Note
that QP is just a probability theory; there is no presumption that physical
quantum phenomena are significant in the brain (although they might be).
In this sense, our brains appear to be using a kind of quantum computation.
G.2 Framework
G.2.a Questions & outcomes
Just as CP begins by defining a sample space, QP begins by defining a Hilbert
space, which defines all possible answers that could be produced for all possi-
ble questions (addressed by the model). Corresponding to the quantum state
is the cognitive state, which you can think of as the indeterminate state of the
brain before there is a decision or determination to act in some way (such as
answering a question). Corresponding to observables in quantum mechanics,
we have questions in QP. More generally, we might refer to quandries, that
is, unsettled dispositions to act. Corresponding to projectors into subspaces
we have decisions. Often the subspaces are one-dimensional, that is, rays.
It is not just that we do not know whether the person is happy or not; rather
the person “is in an indefinite state regarding happiness, simultaneously en-
tertaining both possibilities, but being uncommitted to either” (Pothos &
Busemeyer, 2013). More realistically “happy” and “unhappy” are likely to
180 CHAPTER III. QUANTUM COMPUTATION
|| PA PB |ψ 〉 ||2 Prob( B ∧ then A)
Prob( A | B) = = (this is called Lüder’s law).
|| PB |ψ 〉 ||2 Prob( B)
to that in CP theory, but for potential order effects in the sequential projection PAPB,
Ha pp y, &e m p lo ye d
lo ye d
Em p
Ha pp y
Ha pp y
Ha pp y,&unem p lo yed
Unha p py Unha p py
Figure 1. An illustration of basic processes in QP theory. In Figure 1b, all vectors are co-
planar, and the figure is a two-dimensional one. In Figure 1c, the three vectors “Happy,
Figure III.45:
employed,” “Happy,“An illustration
unemployed,” of basic
and “Unhappy, processes
employed” in QP theory.
are all orthogonal to each In Figure
other,
[b], allso vectors
that the figure
areis acoplanar,
three-dimensional
and one.
the(The fourthis
figure dimension, “unhappy,
a two-dimensional one. In
unemployed” is not shown).
Figure [c], the three vectors ‘Happy, employed’, ‘Happy, unemployed’, and
‘Unhappy, employed’ are all orthogonal to each other, so that the figure is
a three-dimensional
The magnitude of a one. (The
projection fourth
depends dimension,
upon the angle between‘unhappy, unemployed’ is
the corresponding
not shown).” (Pothos & Busemeyer, 2013)
subspaces. For example, when the angle is large, a lot of amplitude is lost between
be complex subspaces, not rays, but for the sake of the example, we use a
2D outcome space.
Asking the question is equivalent to measuring the state in the “happiness
basis,” which comprises two projectors Phappy and Punhappy :
Phappy = |happyihhappy|,
Punhappy = |unhappyihunhappy|.
G.2.c Incompatibility
As in quantum mechanics, questions can be compatible or incompatible. In
fact, Neils Bohr borrowed the notion of incompatible questions from the psy-
chologist William James. Compatible questions can be asked in any order;
they commute; incompatible questions do not commute.
In CP it is always possible to specify a joint probability distribution over
the four possible pairs of answers (the unicity principle). In QP you can do
this for compatible questions, but not for incompatible ones. Psychologically,
in the incompatible case the person cannot form a single thought for all
combinations of possible outcomes (because they are linearly dependent).
In the incompatible case, asking the first question alters the context of the
second question, and thus affects its answer. Therefore, in applying QP in
psychology, we can ask whether one decision is likely to affect the other.
Suppose we are going to ask a person two questions, whether they are
happy or not and whether they are employed or not. It is plausible that
happiness and employment are related, so we postulate a single 2D space
spanned by both bases (Fig. III.45(b)). The angle between the two bases
reflects the fact that happiness is likely to be correlated to employment.
Notice that once we get an answer regarding happiness, we will be in an
indefinite state regarding employment, and vice versa.
182 CHAPTER III. QUANTUM COMPUTATION
Suppose we ask if the subject is employed and then ask if they are happy.
The probability that they answer “yes” to both is given by:29
From this example, we can see that the law for conditional probability in
QP, called Lüder’s Law, is:
kPA PB |ψik2 P{B && A}
P{A | B} = 2
= .
kPB |ψik P{B}
Look at Fig. III.45(b). You can see that
This is because the subject was more uncertain about their happiness than
their employment, and therefore the state vector lost a lot of its amplitude
29
As in C++, “ && ” should be read “and then” (sequential “and”).
G. QUANTUM PROBABILITY IN COGNITION 183
via its projection first onto |happyi. “The size of such angles and the relative
dimensionality of the subspaces are the cornerstones of QP cognitive models
and are determined by the known psychology of the problem. These angles
(and the initial state vector) have a role in QP theory analogous to that
of prior and conditional distributions in Bayesian modeling.” (Pothos &
Busemeyer, 2013)
Ba nk%Te lle r
t
nis
mi
Fe
~ Ba nk% Te lle r
~F
em
inis
t
Figure 2. An illustration of the QP explanation for the conjunction fallacy.
two questions in different orders can lead to changes in response (Feldman & Lynch
1988; Schuman & Presser 1981; Tourangeau et al. 1991). Consider the questions “Is
G.2.f Time evolution Clinton honest?” and “Is Gore honest?” and the same questions in a reverse order. When
the first two questions were asked in a Gallup poll, the probabilities of answering yes for
Time evolution is CP is defined by “a transition matrix (the solution to Kol-
Clinton and Gore were 50% and 68%, respectively. The corresponding probabilities for
mogorov’s forward equation)” (Pothos & Busemeyer, 2013). It transforms
asking the questions in the reverse order were, by contrast, 57% and 60% (Moore 2002).
the probabilities without violating the law of total probability. In QP am-
Such order effects are puzzling according to CP theory, because, as noted, the probability
plitudes change by a unitaryof transformation.
saying yes to question A and then yes to question B equals
based on the abovementioned idea, but was more general, so as to provide a parameter
free test of the relevant empirical data (e.g., there are various specific types of order
s
n%Ye
C linto
G ore %Ye s
o
to n%N
C lin
G ore %No
Figure 3. An illustration of order effects in Gallup polls.
Figure III.47: Example of order effects in Gallop polls. [fig. from PB]
A related failure of commutativity concerns the order of assessing different pieces
a bank teller; these priors are depicted in Fig. III.46. However, notice that
being a feminist is largelyHowever,
independent of being
there have been demonstrations that, ina
fact,bank teller. In making a
1
= (hψC | + hψD |)PD† PD (|ψC i + |ψD i)
2
1 1
= kPD |ψC ik2 + kPD |ψD ik2 + hψD | PD† PD | ψC i.
2 2
The interference term hψD | PD† PD | ψC i could be positive or negative, in the
latter case decreasing the probability below unity.
greater dimensionality than another, on average the transition from the lower
dimensionality subspace to the higher dimensionality one would retain more amplitude
than the converse transition (it has not been proved that this is always the case, but note
a
re
Ko
C hina
a
re
Ko C hina
31
Material in this section is adapted from MacLennan (2013), my commentary on Pothos
& Busemeyer (2013).
G. QUANTUM PROBABILITY IN COGNITION 189
G.4.a QM premises
To this end, application of QP in cognitive science would be aided by im-
porting two premises from quantum mechanics:
The first premise is that the fundamental reality is the wavefunction. In
cognitive science this corresponds to postulating a spatially-distributed pat-
tern of neural activity as the elements of the cognitive state space. Therefore
the basis vectors used in QP are in fact basis functions for an infinite (or
very high) dimensional Hilbert space.
The second important fact is that the wavefunction is complex-valued
and that wavefunctions combine with complex coefficients. This is the main
reason for interference and other non-classical properties. The authors ac-
knowledge this, but do not make explicit use of complex numbers in the
target article.
G.4.c Projection
Possible neural mechanisms: Pothos & Busemeyer (2013) specify that a
judgment or decision corresponds to measurement of a quantum state, which
projects it into a corresponding subspace, but it is informative to consider
possible mechanisms. For example, the need to act definitely (such as coming
to a conclusion in order to answer a question) can lead to mutually compet-
itive mechanisms, such as among the minicolumns in a macrocolumn, which
creates dynamical attractors corresponding to measurement eigenspaces. Ap-
proach to the attractor amplifies certain patterns of activity at the expense
of others. Orthogonal projectors filter the neural activity and win the com-
petition with a probability proportional to the squared amplitude of the
190 CHAPTER III. QUANTUM COMPUTATION
patterns to which they are matched. (In the case where the phases of neu-
ral impulses encode complex phases, matching occurs when the phases are
delayed in such a way that the impulses reinforce.) The winner positively
reinforces its matched signal and the loser negatively reinforces the signal to
which it is matched. Regardless of mechanism, during collapse the energy of
the observed eigenstate of the decision (measurement) operator receives the
energy of the orthogonal eigenstates (this is the effect of renormalization).
The projection switches a jumble of frequencies and phases into a smaller,
more coherent collection, corresponding to the answer (observed) eigenspace.
No inherent bases: The target article suggests that a QP model of a
process begins by postulating basis vectors and qualitative angles between
alternative decision bases (significantly, only real rotations are discussed).
As a consequence, a QP model is treated as a low-dimensional vector space.
This is a reasonable, top-down strategy for defining a QP cognitive model,
but it can be misleading. There is no reason to suppose that particular
decision bases are inherent to a cognitive Hilbert space. There may be a
small number of ”hard-wired” decisions, such as fight-or-flight, but the vast
majority are learned. Certainly this is the case for decisions corresponding
to lexical items such as (un-)happy and (un-)employed.
Creation/modification of observables: Investigation of the dynamics
of cognitive wavefunction collapse would illuminate the mechanisms of deci-
sion making but also of the processes by which observables form. This would
allow modeling changes in the decision bases, either temporary through con-
text effects or longer lasting through learning. Many decision bases are ad
hoc, as when we ask, “Do you admire Telemachus in the Odyssey?” How such
ad hoc projectors are organized requires looking beneath a priori state basis
vectors to the underlying neural wavefunctions and the processes shaping
them.
G.4.e Suggestions
Pothos & Busemeyer (2013) do an admirable job of defending QP as a fruitful
top-down model of decision making, but I believe it would be more valuable
if it paid greater attention to the complex-valued wavefunction that under-
lies QP in both quantum mechanics and cognition. This would allow a more
detailed account of the origin of interference effects and of the structure of
both learned and ad hoc decision operators. Finally, the treatment of incom-
patible decisions can be made more rigorous by treating them quantitatively
as noncommuting operators.
G.5 Conclusions
You might wonder why it is so important to understand the less-then-perfect
inferential abilities of humans. There are at least two reasons, scientific and
technological. First, it is important to understand human inference as both
pure and applied science. It reveals much about our human nature, and
specifically provides hints as to how the brain works. From a more applied
192 CHAPTER III. QUANTUM COMPUTATION
H Exercises
Exercise III.1 Compute the probability of measuring |0i and |1i for each
of the following quantum states:
1. 0.6|0i + 0.8|1i.
p
2. √13 |0i + 2/3|1i.
√
3
3. 2
|0i − 12 |1i.
1
4. − 25 (24|0i − 7|1i).
iπ/6
5. − √12 |0i + e√
2
|1i.
Exercise III.2 Compute the probability of the four states if the following
are measured in the computational basis:
√ √ √
1. (ei |00i + 2|01i + 3|10i + 2e2i |11i)/ 10.
1
2. 2
(−|0i + |1i) ⊗ (eπi |0i + e−πi |1i).
3.
p p √ eπi/4 eπi/2
( 1/3|0i − 2/3|1i) ⊗ 2 |0i + |1i .
2 2
2. Do the same, but supposing instead that we measure just the second
qubit.
Exercise III.5 Prove that a normal matrix is Hermitian if and only if it has
real eigenvalues.
194 CHAPTER III. QUANTUM COMPUTATION
def
Exercise III.6 Prove that U (t) = exp(−iHt/~) is unitary.
Exercise III.8 Show that the commutators ([L, M ] and {L, M }) are bilin-
ear (linear in both of their arguments).
Exercise III.11 Show that the four Bell states are orthonormal (i.e., both
orthogonal and normalized).
Exercise III.15 Prove that I, X, Y , and Z are unitary. Use either the imag-
inary or real definition of Y (C.2.a, p. 105).
Exercise III.17 Show that the X, Y, Z and H gates are Hermitian (their
own inverses) and prove your answers. Use either the imaginary or real
definition of Y (C.2.a, p. 105).
Exercise III.20 Prove that the Pauli matrices span the space of 2 × 2 ma-
trices.
Exercise III.21 Prove |βxy i = (P ⊗ I)|β00 i, where xy = 00, 01, 11, 10 for
P = I, X, Y, Z, respectively.
Exercise III.22 Suppose that P is one of the Pauli operators, but you don’t
know which one. However, you are able to pick a 2-qubit state |ψ0 i and
operate on it, |ψ1 i = (P ⊗ I)|ψ0 i. Further, you are able to select a unitary
operation U to apply to |ψ1 i, and to measure the 2-qubit result, |ψ2 i = U |ψ1 i,
in the computational basis. Select |ψ0 i and U so that you can determine with
certainty the unknown Pauli operator P .
Exercise III.23 What is the matrix for CNOT in the standard basis? Prove
your answer.
Exercise III.24 Show that CNOT does not violate the No-cloning Theorem
6 |ψi|ψi. Under what conditions
by showing that, in general, CNOT|ψi|0i =
does the equality hold?
Exercise III.27
2. Give the probabilities and resulting states for measuring the first two
qubits in the computational basis.
Exercise III.28 What is the matrix for CCNOT in the standard basis?
Prove your answer.
196 CHAPTER III. QUANTUM COMPUTATION
Exercise III.29 Use a single Toffoli gate to implement each of NOT, NAND,
and XOR.
Exercise III.31 Design a quantum circuit to transform |000i into the en-
tangled state √12 (|000i + |111i).
Exercise III.40 Recall the conditional selection between two operators (C.3,
p. 111): |0ih0| ⊗ U0 + |1ih1| ⊗ U1 . Suppose the control bit is a superposition
|χi = a|0i + b|1i. Show that:
Exercise III.41 Show that the 1-bit full adder (Fig. III.15, p. 113) is cor-
rect.
Exercise III.44 Verify the remaining decoding cases for quantum telepor-
tation Sec. C.6.b (p. 121).
Exercise III.46 Complete the following step from the derivation of the
Deutsch-Jozsa algorithm (Sec. D.1, p. 129):
X 1
H|xi = √ (−1)xz |zi.
z∈2
2
Exercise III.48 Show that the Fourier transform matrix (Eq. III.25, p. 137,
Sec. D.3.a) is unitary.
Exercise III.49 Prove the claim on page 153 (Sec. D.4.b) that D is unitary.
Exercise III.50 Prove the claim on page 153 (Sec. D.4.b) that
2 2 2
N N
··· N
2 2 ··· 2
W R0 W = N.. N.. . . N
.. .
. . . .
2 2 2
N N
··· N
Exercise
√ III.51 Show that if there are s solutions x such that P (x) = 1,
π N/s
then 4
is the optimal number of iterations in Grover’s algorithm.
198 CHAPTER III. QUANTUM COMPUTATION
Exercise III.52 Design a quantum gate array for the following syndrome
extraction operator (Sec. D.5.d, p. 163):
def
S|x3 , x2 , x1 , 0, 0, 0i = |x3 , x2 , x1 , x1 ⊕ x2 , x1 ⊕ x3 , x2 ⊕ x3 i.
Exercise III.53 Design a quantum gate array to apply the appropriate error
correction for the extracted syndrome as given in Sec. D.5.d, p. 163:
Exercise III.56 Prove that Aab,c = 1+a† ab† b(c+c† −1) = 1+Na Nb (Ac −1)
is a correct definition of CCNOT by showing how it transforms the quantum
register |a, b, ci (Sec. F.1.b).
q † cp + r† c† p + p† c† q + p† cr.
Chapter IV
Molecular Computation
These lecture notes are exclusively for the use of students in Prof. MacLen-
nan’s Unconventional Computation course.
2018,
c B. J. MacLennan, EECS,
University of Tennessee, Knoxville. Version of November 17, 2018.
A Basic concepts
A.1 Introduction to molecular computation
Molecular computation uses molecules to represent information and molec-
ular processes to implement information processing. DNA computation is
the most important (and most explored) variety of molecular computation,
and it is the principal topic of this chapter. The principal motivation is that
molecular computation is massively parallel (molar or Avogadro-scale par-
allelism, that is, on the order of 1026 ). Data representations are also very
dense (on the order of nanometers), and molecular computation can be very
efficient in terms of energy.
199
200 CHAPTER IV. MOLECULAR COMPUTATION
Fig. IV.4 gives some idea what the actual double helix looks like. It is
worth keeping in mind that it is a very complex organic molecule, not a
string of the letters A, C, T, G. For example, in addition to the double helix
formed by the backbones of the two strands, there are two groves between the
strands (Fig. IV.5). They are adjacent to the bases and therefore provide
binding sites for proteins. Because the backbones are not symmetrically
placed, the grooves have two sizes: minor (1.2nm) and major (2.2nm). The
latter provides the best binding site.
To give some sense of the scale, here are some numbers. The nucleotides
are about 3.3nm long. The chain is 2.2 to 2.6nm wide. The radius of the
coil is about 1nm. The longest human chromosome (chromosome 1) is about
220 × 106 bp (base pairs) long.
DNA data storage: In Aug. 2012 a team at Harvard reported convert-
ing a 53,000 word book to DNA and then reading it out by DNA sequenc-
ing.2 This is the densest consolidation of data in any medium (including
flash memory, but also compared to experimental media, such as quantum
2
Church, George M, Gao, Yuan, and Kosuri, Sriram. “Next-generation digital
information storage in DNA.” Science 337 (28 Sept. 2012): 1628. See also Sci-
ence (Aug. 16, 2012), DOI: 10.1126/science.1226355 (accessed 2012-08-24). See also
http://spectrum.ieee.org/biomedical/imaging/reading-and-writing-a-book-with-dna/
(accessed 2012-08-24).
202 CHAPTER IV. MOLECULAR COMPUTATION
A.3.a Denaturation
Denaturation causes DNA double strands to unwind and separate into its
two single strands. This occurs by breaking the H-bonds between the bases,
which are weak compared to covalent bonds (e.g., those in the backbone).
Thermal denaturing (heating to about 95◦ C) is the most common procedure,
but there are also chemical denaturing agents.
A.3.b Hybridization
Hybridization is the process by which two DNA single strands combine in
a sequence specific way through the formation of non-covalent H-bonds be-
tween the bases. Annealing is used to achieve an energetically favorable result
(a minimal number mismatches).
3
Church, op. cit., Supplementary Material
A. BASIC CONCEPTS 205
A.3.d Ligation
Hybridization links to strands together by H-bonds, but there is a nick in the
DNA backbone resulting from a missing phosphodiester bond. Certain en-
zymes will ligate the strands by filling in the missing covalent bonds, making
a more stable strand.
A.3.f Synthesis
Synthesis is the process of assembling single-stranded oligonucleotides (“oli-
gos”), which are short strands with a desired sequence. Assembly occurs in
the 30 to 50 direction.The process is completely automated nowadays, but due
to the accumulation of errors, the current limit is about 200 nucleotides.
sample. The sample is put in a well at one end of a layer of gel, and a positive
electric field is applied at the other end of the gel. Since DNA is negatively
charged, it migrates through the gel, but smaller fragments migrate faster.
Later the DNA can be stained and the gel imaged to determine the relative
concentration of fragments of different sizes in the sample. The resulting
pattern is called a DNA ladder; see Fig. IV.6 for several “lanes” of DNA
ladders.
A.3.i Filtering
There are several ways to filter DNA samples for specific sequences. One way
to select for an oligonucleotide o in a sample S is to attach the complementary
sequence o to a filter. Then when S goes through the filter, the o in it will
hybridize with the o in the filter, and stay there; what passes through is
S − o. If desired, the oo on the filter can be denatured to separate the
strands containing o from the o on the filter.
A.3.j Modification
Certain enzymes can be used to change subsequences of a DNA sequence.
A.3.k Sequencing
Sequencing is the process of reading out the sequence of a DNA molecule.
This is traditionally accomplished by extension of a primed template se-
quence, but a variety of other techniques have been developed since the 1970s.
The latest techniques do massively parallel sequencing of many fragments.
208 CHAPTER IV. MOLECULAR COMPUTATION
B Filtering models
Filtering models, an important class of DNA algorithms, operate by filtering
out of the solution molecules that are not part of the desired result. A
chemical solution can be treated mathematically as a finite bag or multi-set
of molecules, and filtering operations can be treated as operations to produce
multi-sets from multi-sets. Typically, for a problem of size n, strings of size
O(n) are required. The chemical solution should contain enough strings
to include many copies all possible answers. Therefore, for an exponential
problem, we will have O(k n ) strings. Filtering is essentially a brute-force
method of solving problems.
1 2
7 3
6 4
algorithm Adlemen:
Step 2: Amplify the concentration of paths beginning with vin and ending
with vout . This is done by PCR using vin and vout as primers. Remember
that denaturation separates the sense and antisense strands. PCR extends
the sense strand in the 30 direction from vin , and extends the antisense strand
in the 30 direction from vout . At the end of this step we have paths of all
sorts from vin to vout .
Step 3: Only paths with the correct length are retained; for n = 7 this
is 140bp. This operation is accomplished by gel electrophoresis. The band
corresponding to 140bp is determined by comparison with a marker lane,
and the DNA is extracted from this band and amplified by PCR. Gel elec-
trophoresis and PCR are repeated to get a sufficient quantity of the DNA.
We now have paths from vin to vout , but they might not be Hamiltonian.
Step 4 (affinity purification): Select for paths that contain all the ver-
tices (and are thus necessarily Hamiltonian). This is done by first selecting
all those paths that contain v1 , and then, of those, all that contain v2 , and
so forth. To select for vi , first heat the solution to separate the strands.
Then add the vi bound to a magnetic bead. Rehybridize (so the beads are
bound to strands containing vi ), and use a magnet to extract all the paths
containing vi . Repeat this process for each vertex.
Step 5 If there any paths left, they are Hamiltonian. Therefore amplify them
by PCR and inspect the result by gel electrophoresis to see if there are any
B. FILTERING MODELS 211
strands of the correct length. If there are, then there is a Hamiltonian path;
if there aren’t, then there is not. If desired, the precise HP can be determined
by a graduated PCR procedure: Run n − 1 parallel PCR reactions. In the
ith lane, vin is the left primer and vi is the right primer. This will produce
bands with lengths 40, 60, 80, 100, 120, and 140 bp. The lane that has a
band at 40 corresponds to the first vertex after vin in the path, the lane with
a band at 60 corresponds to the next vertex, etc. This final readout process
depends on there being only one Hamiltonian path, and it is error-prone due
to its dependence on PCR.
B.1.d Discussion
Adleman’s algorithm is linear in the number of nodes, since the only iteration
is Step 4, which is repeated for each vertex. Step 5 is also linear if the path
is read out. Thanks to the massive parallelism of molecular computation,
it solves this NP-complete problem in linear time. Adleman’s experiment
took about a week, but with a more automated approach it could be done
in a few hours. On the other hand, the PCR process cannot be significantly
shortened.
In addition to time, we need to consider the molecular resources required.
The number of different oligos required is proportional to n, but the number
of strands is much larger, since there must be multiples instances of each
212 CHAPTER IV. MOLECULAR COMPUTATION
possible path. If d is the average degree of the graph, then there are about
dn possible paths (exponential in n). For example, if d = 10 and n = 80, then
the required 1080 DNA molecules is more than the estimated number of atoms
in the universe. Hartmanis calculated that for n = 200 the weight of the DNA
would exceed the weight of the earth. So this brute-force approach is still
defeated by exponential explosion. Lipton (1995) estimates that Adleman’s
algorithm is feasible for n ≤ 70, based on an upper limit of 1021 ≈ 270 DNA
strands (Boneh, Dunworth, Lipton & Sgall, 1996), but this is also feasible on
conventional computers.
Nevertheless, Adleman’s algorithm illustrates the massive parallelism of
molecular computation. Step 1 (generation of all possible paths) took about
an hour for n = 7. Adleman estimates that about 1014 ligation operations
were performed, and that it could be scaled up to 1020 operations. There-
fore, speeds of about 1015 to 1016 ops/sec (1–10 peta-operations/s) should be
achievable, which is, digital supercomputer range. Adlemen also estimates
that 2×1019 ligation operations were performed per joule of energy. Contem-
porary supercomputers perform only 109 operations per joule, so molecular
computation is 1010 more energy-efficient. It is near the thermodynamic limit
of 34×1019 operations per joule. Recall (Ch. II, Sec. B.1) kT ln 2 ≈ 3×10−9 pJ
= 3 × 10−21 J, so there can be about 3.3 × 1020 bit changes/J.5
A more pervasive problem is the inherent error in the filtering processes
(due to incorrect hybridization). Some strands we don’t want, get through;
and some that we do want, don’t. With many filtering stages the errors
accumulate to the extent that the algorithms fail. There are some approaches
to error-resistant DNA computing, but this is an open problem.
5
DNA is of course space efficient. One bit of information occupies about 1 cubic nm,
whereas contemporary disks store a bit in about 1010 cubic nm. That is, DNA is a 1010
improvement in density.
B. FILTERING MODELS 213
Figure IV.9: Graph G2 for Lipton’s algorithm (with two variables, x and y).
[source: Lipton (1995)]
algorithm Lipton:
Let ` = |Ck+1 | (the number of literals in Ck+1 ), and suppose Ck+1 has the
form v1 ∨ · · · ∨ v` , where the vi are literals (positive or negative variables).
Our goal is to find all strings that satisfy at least one of these literals. To
B. FILTERING MODELS 215
i−1
Positive literal: Suppose vi = xj (some positive literal). Let Tki = E(T k , j, 1)
and let a = 1 (used below). These are the paths that satisfy this positive
literal, since they have 1 in position j.
i
In either case, Tki are the strings that satisfy literal i of the clause. Let T k =
i−1
E(T k , j, ¬a) be the remaining strings (which do not satisfy this literal).
Continue the process above until all the literals in the clause are processed.
At the end, for each i = 1, . . . , `, Tki will contain the strings that satisfy literal
i of clause k.
Combine Tk1 , . . . , Tk` into Tk+1 . (Combining the test tubes effectively does
OR.) These will be the strings that satisfy at least one of the literals in clause
k + 1.
Step 3 (detection): At this point, the strings in Tm (if any) are those that
satisfy C1 , . . . , Cm , so do a detect operation (for example, with PCR and gel
electrophoresis) to see if there are any strings left. If there are, the formula
is satisfiable; if there aren’t, then it is not.
If the number of literals per clause is fixed (as in the 3-SAT problem),
then performance is linear in m. The main problem with this algorithm is the
216 CHAPTER IV. MOLECULAR COMPUTATION
effect of errors, but imperfections in extraction are not fatal, so long as there
are enough copies of the desired sequence. In 2002, Adelman’s group solved
a 20-variable 3-SAT problem with 24 clauses, finding the unique satisfying
string.7 In this case the number of possible solutions is 220 ≈ 106 . Since the
degree of the specialized graph used for this problem is small, the number
of possible paths is not excessive (as it might be in the Hamiltonian Path
Problem). They stated, “This computational problem may be the largest
yet solved by nonelectronic means,” and they conjectured that their method
might be extended to 30 variables.
7
Ravinderjit S. Braich, Nickolas Chelyapov, Cliff Johnson, Paul W. K. Rothemund,
Leonard Adleman, “Solution of a 20-Variable 3-SAT Problem on a DNA Computer,”
Science 296 (19 Apr. 2002), 499–502.
B. FILTERING MODELS 217
def
−(t, w) = t − +(t, w) (multi-set difference).
Merge: The merge operation combines several test tubes into one test
tube:
def
∪(t1 , t2 , . . . , tn ) = t1 ∪ t2 ∪ · · · ∪ tn .
Detect: The detect operation determines if any DNA strings remain in
a test tube:
def true, if t 6= ∅
detect(t) = .
false, otherwise
Amplify: Given a test tube t, the amplify operation produces two copies
of it: t0 , t00 ← amplify(t). Amplification is a problematic operation, which
depends on the special properties of DNA and RNA, and it may be error
prone. Therefore it is useful to consider a restricted model of DNA computing
that avoids or minimizes the use of amplification.
The following additional operations have been proposed:
Length-separate: This operation produces a test tube containing all
the strands less than a specified length:
def
(t, ≤ n) = {s ∈ t | |s| ≤ n}.
218 CHAPTER IV. MOLECULAR COMPUTATION
B.3.b Examples
AllC: The following example algorithm detects if there are any sequences
that contain only C:
E(t, j, 1) = +(t, xj ),
E(t, j, 0) = +(t, x0j ).
B. FILTERING MODELS 219
(a)
(b)
Polymerase extends
Primer block
(c)
(d)
B.4.b Permutations
Amos et al. describe a PFM algorithm for generating all possible permuta-
tions of a set of integers.
algorithm Permutations:
9
Amos, p. 51.
222 CHAPTER IV. MOLECULAR COMPUTATION
Iteration:
for j = 1 to n − 1 do
copy(U, {U1 , U2 , . . . , Un })
for i = 1, 2, . . . , n and all k > j
in parallel do remove(Ui , {pj ij 6= pj i, pk i})
// Ui contains i in jth position and no other is
union({U1 , U2 , . . . , Un }, U )
end for
Pn ← U
C Formal models
C.1 Sticker systems
C.1.a Basic operations
The sticker model was developed by Rosweis et al. in the mid-1990s. It
depends primarily on separation by means of hybridization and makes no
use of strand extension and enzymes. It implements a sort of random-access
binary memory. Each bit position is represented by a substrand of length
m. A memory strand comprises k contiguous substrands, and so has length
n = km and can store k bits. Sticker strands or stickers are strands that are
complementary to substrands representing bits. When a sticker is bound to
a bit, it represents 1, and if no sticker is bound, the bit is 0. Such a strand,
which is partly double and partly single, is called a complex strand.
Computations begin with a prepared library of strings. A (k, l) library
uses the first l ≤ k bits as inputs to the algorithm, and the remaining k −l for
output and working storage. Therefore, the last k − l are initially 0. There
are four basic operations, which act on multi-sets of binary strings:
Merge: Creates the union of two tubes (multi-sets).
Separate: The operation separate(N, i) separates a tube N into two
tubes: +(N, i) contains all strings in which bit i is 1, and −(N, i) contains
all strings in which bit i is 0.
Set: The operation set(N, i) produces a tube in which every string from
N has had its ith bit set to 1.
Clear: The operation clear(N, i) produces a tube in which every string
from N has had its ith bit cleared to 0.
Step 1 (initialization): For all strands, if the i subset bit is set, then set
the bits for all the elements of that subset. Call the result tube N0 . This is
accomplished by the following code:
Initialize (p + q, q) library in N0
for i = 1 to q do
separate(+(N0 , i), −(N0 , i)) //separate those with subset i
for j = 1 to |Ci | do
set(+(N0 , i), q + cji ) //set bit for jth element of set i
end for
N0 ← merge(+(N0 , i), −(N0 , i)) //recombine
end for
Step 2 (retain covers): Retain only the strands that represent collections
that cover the set. To do this, retain in N0 only the strands whose last p bits
are set.
for i = q + 1 to q + p do
N0 ← +(N0 , i) //retain those with element i
end for
C. FORMAL MODELS 225
3.2 Filtering Models 59
N0 N1 N2 N3 N4
(3,4), (2,3),
(2,3,4), (1,3),
(1,3,4), (1,2,3)
(1,2,3,4)
Separate on 1
Separate on 2
(1,3)
(1,2,3)
(1,3,4) (2,3)
(3,4) (2,3,4) (1,2,3,4)
Separate on 3
(1,3) (1,2,3)
(1,3,4) (2,3)
(3,4) (2,3,4) (1,2,3,4)
Separate on 4
(1,3)
(3,4) (2,3) (1,3,4) (1,2,3)
(2,3,4) (1,2,3,4)
Rsplice(S, T ) = S1 T2 , if S2 = T1 ,
Figure IV.12: The fokI restriction enzyme bound to DNA. [source: wikipedia]
D Enzymatic computation
The molecular computation processes that we have seen so far are externally
controlled by a person or conventional automatic controller sequencing the
chemical operations.11 In autonomous molecular computation the chemical
processes sequence themselves so that they do not require external control.
This is also called “one-pot” molecular computation; that is, you put all the
reactants in one pot, and the molecular processes do the rest. Autonomous
molecular computation is essential for, example, controlling drug delivery in
the body.
Shapiro and his colleagues have demonstrated how to implement finite
state machines (FSMs) by autonomous molecular computation. In addition
to DNA, it uses a restriction enzyme, ligase, and ATP (for fuel).
The implementation is based on the fokI restriction enzyme. “Once the
11
This section is based primarily on: (1) Yaakov Benenson, Tamar Paz-Elizur, Rivka
Adar, Ehud Keinan, Zvi Livneh, and Ehud Shapiro. Programmable and autonomous
computing machine made of biomolecules. Nature, 414:430–434, 2001.
(2) Yaakov Benenson, Rivka Adam, Tamar Paz-Livneh, and Ehud Shapiro. DNA molecule
provides a computing machine with both data and fuel. Proceedings of the National
Academies of Science, 100(5):2191–2196, 2003.
228 CHAPTER IV. MOLECULAR COMPUTATION
GGATGNNNNNNNNN NNNNNNNNNN
CCTACNNNNNNNNNNNNN NNNNNN
where [q, s] represents a DNA sequence encoding both state q and symbol
s, and t is a terminator for the string. The fokI enzyme cleaves off [q, s1 ] in
such a way that a transition molecule can bind to the sticky end in a way
that encodes [q 0 , s2 ]. A special terminator symbol marks the end of the input
string.
As an example we will consider a two-state FSM on {a, b} that accepts
strings with an even number of ‘b’s. Ignoring the terminator, DNA codes are
assigned to the two symbols ‘a’ and ‘b’ as follows:
a →
7 AAααaa
b → 7 BBββbb
12
wikipedia, s.v. fokI.
D. ENZYMATIC COMPUTATION 229
where A, α, a, B, β, b are unspecified (by me) bases.13 The bases are selected
in such a way that either the first four bases (AAαα, BBββ) or the last four
bases (ααaa, ββbb) encode the symbol. These alternatives represent the two
machine states.
The transition molecules are constructed so that the distance between
the recognition site (for fokI) and the next symbol depends on new state. As
a consequence, when fokI operates it cleaves the next symbol code at a place
that depends on the state. Therefore the sticky end encodes the state in the
way that it represents the next symbol:
[q0 , a] 7→ ααaa
[q1 , a] 7 → AAαα
[q0 , b] 7 → ββbb
[q1 , b] 7 → BBββ
(q0 , a) → q0 GGATGN N N
CCTACN N N ααaa
(q1 , a) → q1 GGATGN N N
CCTACN N N AAαα
(q0 , b) → q1 GGATGN N N N N
CCTACN N N N N ββbb
(q1 , b) → q0 GGATGN
CCTACN BBββ
The N s represent any bases as before. They are used as spacers to adjust
the restriction site to represent the new state.
After transition to the new state the sense strand will look like this (for
convenience assuming the next symbol is ‘a’):
The longest strings processed in the PNAS experiments were 12.14 Op-
eration required about 20 seconds per step. However, the parallel speed was
about 6.6 × 1010 ops/s/µl. Energy consumption was about 34kT per tran-
sition, which is only about 50× the von Neumann-Landauer limit (kT ln 2).
The authors note, “Reaction rates were surprisingly insensitive to tempera-
ture and remained similar over the range of 2–20◦ C.” This implementation
also handles nondeterministic FSMs (just put in all the transition molecules),
but the yield decreases exponentially (due to following out all the nondeter-
ministic paths, breadth-first). Therefore it doesn’t seem to be practical for
nondeterministic machines.
14
Benenson et al., PNAS 100 (5), March 4, 2003.
Chapter V
Analog Computation
These lecture notes are exclusively for the use of students in Prof. MacLen-
nan’s Unconventional Computation course.
2018,
c B. J. MacLennan, EECS,
1
University of Tennessee, Knoxville. Version of November 17, 2018.
A Definition
Although analog computation was eclipsed by digital computation in the
second half of the twentieth century, it is returning as an important alterna-
tive computing technology. Indeed, as explained in this chapter, theoretical
results imply that analog computation can escape from the limitations of
digital computation. Furthermore, analog computation has emerged as an
important theoretical framework for discussing computation in the brain and
other natural systems.
Analog computation gets its name from an analogy, or systematic rela-
tionship, between the physical processes in the computer and those in the
system it is intended to model or simulate (the primary system). For exam-
ple, the electrical quantities voltage, current, and conductance might be used
as analogs of the fluid pressure, flow rate, and pipe diameter of a hydrolic sys-
tem. More specifically, in traditional analog computation, physical quantities
in the computation obey the same mathematical laws as physical quantities
in the primary system. Thus the computational quantities are proportional
1
This chapter is based on an unedited draft for an article that appeared in the Ency-
clopedia of Complexity and System Science (Springer, 2008).
231
232 CHAPTER V. ANALOG COMPUTATION
B Introduction
B.1 History
B.1.a Pre-electronic analog computation
Just like digital calculation, analog computation was originally performed by
hand. Thus we find several analog computational procedures in the “con-
structions” of Euclidean geometry (Euclid, fl. 300 BCE), which derive from
techniques used in ancient surveying and architecture. For example, Problem
B. INTRODUCTION 233
Figure V.1: Euclid Problem VI.13: To find a mean proportional between two
given straight lines. Solution: Take on any line AC parts AB, BC respectively
equal to X, Y. On AC describe a semicircle ADC. Erect BD at right angles to
AC, meeting the semicircle in D. BD will be the mean proportional required.
II.51 is “to divide a given straight line into two parts, so that the rectangle
contained by the whole and one of the parts shall be equal to the square
of the other part.” Also, Problem VI.13 is “to find a mean proportional
between two given straight lines” (Fig. V.1), and VI.30 is “to cut a given
straight line in extreme and mean ratio.” These procedures do not make
use of measurements in terms of any fixed unit or of digital calculation; the
lengths and other continuous quantities are manipulated directly (via com-
pass and straightedge). On the other hand, the techniques involve discrete,
precise operational steps, and so they can be considered algorithms, but over
continuous magnitudes rather than discrete numbers.
It is interesting to note that the ancient Greeks distinguished continuous
magnitudes (Grk., megethoi), which have physical dimensions (e.g., length,
area, rate), from discrete numbers (Grk., arithmoi), which do not (Maziarz &
Greenwood, 1968). Euclid axiomatizes them separately (magnitudes in Book
V, numbers in Book VII), and a mathematical system comprising both dis-
crete and continuous quantities was not achieved until the nineteenth century
in the work of Weierstrass and Dedekind.
The earliest known mechanical analog computer is the “Antikythera mech-
anism,” which was found in 1900 in a shipwreck under the sea near the Greek
island of Antikythera (between Kythera and Crete) (Figs. V.3, V.4). It dates
to the second century BCE and appears to be intended for astronomical cal-
culations. The device is sophisticated (at least 70 gears) and well engineered,
suggesting that it was not the first of its type, and therefore that other analog
234 CHAPTER V. ANALOG COMPUTATION
computing devices may have been used in the ancient Mediterranean world
(Freeth et al., 2006). Indeed, according to Cicero (Rep. 22) and other au-
thors, Archimedes (c. 287–c. 212 BCE) and other ancient scientists also built
analog computers, such as armillary spheres, for astronomical simulation and
computation. Other antique mechanical analog computers include the astro-
labe, which is used for the determination of longitude and a variety of other
astronomical purposes, and the torquetum, which converts astronomical mea-
surements between equatorial, ecliptic, and horizontal coordinates.
A class of special-purpose analog computer, which is simple in conception
but may be used for a wide range of purposes, is the nomograph (also, nomo-
gram, alignment chart) (Fig. V.5). In its most common form, it permits the
solution of quite arbitrary equations in three real variables, f (u, v, w) = 0.
The nomograph is a chart or graph with scales for each of the variables;
typically these scales are curved and have non-uniform numerical markings.
Given values for any two of the variables, a straightedge is laid across their
positions on their scales, and the value of the third variable is read off where
the straightedge crosses the third scale. Nomographs were used to solve many
problems in engineering and applied mathematics. They improve intuitive
understanding by allowing the relationships among the variables to be visu-
alized, and facilitate exploring their variation by moving the straightedge.
Lipka (1918) is an example of a course in graphical and mechanical methods
of analog computation, including nomographs and slide rules.
Until the introduction of portable electronic calculators in the early 1970s,
the slide rule was the most familiar analog computing device. Slide rules use
logarithms for multiplication and division, and they were invented in the early
seventeenth century shortly after John Napier’s description of logarithms.
The mid-nineteenth century saw the development of the field analogy
method by G. Kirchhoff (1824–87) and others (Kirchhoff, 1845). In this ap-
proach an electrical field in an electrolytic tank or conductive paper was
used to solve two-dimensional boundary problems for temperature distribu-
tions and magnetic fields (Small, 2001, p. 34). It is an early example of
analog field computation, which operates on continuous spatial distributions
of quantity (i.e., fields).
In the nineteenth century a number of mechanical analog computers were
developed for integration and differentiation (e.g., Lipka 1918, pp. 246–56,
Clymer 1993). For example, the planimeter measures the area under a curve
or within a closed boundary (Fig. V.6). While the operator moves a pointer
along the curve, a rotating wheel accumulates the area. Similarly, the inte-
B. INTRODUCTION 239
Figure V.6: Planimeter for measuring the area inside an arbitrary curve
(1908). [source: wikipedia]
graph is able to draw the integral of a given function as its shape is traced
(Fig. V.7). Other mechanical devices can draw the derivative of a curve or
compute a tangent line at a given point.
Figure V.7: Integraph for drawing the integral of an arbitrary curve (Lipka,
1918).
B. INTRODUCTION 241
data were entered in the form of continuous curves, and the machine auto-
matically plotted the output curves continuously as the equations were inte-
grated. Similar differential analyzers were constructed at other laboratories
in the US and the UK.
Setting up a problem on the MIT differential analyzer took a long time;
gears and rods had to be arranged to define the required dependencies among
the variables. Bush later designed a much more sophisticated machine, the
Rockefeller Differential Analyzer, which became operational in 1947. With
18 integrators (out of a planned 30), it provided programmatic control of ma-
chine setup, and permitted several jobs to be run simultaneously. Mechanical
differential analyzers were rapidly supplanted by electronic analog comput-
ers in the mid-1950s, and most were disassembled in the 1960s (Bowles 1996,
Owens 1986, Small 2001, pp. 50–5).
During World War II, and even later wars, an important application
of optical and mechanical analog computation was in “gun directors” and
“bomb sights,” which performed ballistic computations to accurately target
artillery and dropped ordnance.
technology.
Although it is commonly believed that analog computers quickly disap-
peared after digital computers became available, this is inaccurate, for both
general-purpose and special-purpose analog computers have continued to be
used in specialized applications to the present time. For example, a general-
purpose electrical (vs. electronic) analog computer, the Anacom, was still
in use in 1991. This is not technological atavism, for “there is no doubt
considerable truth in the fact that Anacom continued to be used because it
effectively met a need in a historically neglected but nevertheless important
computer application area” (Aspray, 1993). As mentioned, the reasons for
the eclipse of analog computing were not simply the technological superiority
of digital computation; the conditions were much more complex. Therefore
a change in conditions has necessitated a reevaluation of analog technology.
Fig. 2. RASP 3.0 functional block diagram illustrating the resulting computational blocks and resulting routing architecture. The infrastructure control
includes a µP developed from an open-source MSP 430 processor [1], as well as on-chip structures include the on-chip DACs, current-to-voltage conversion,
Figure V.12: RASP 3.0 FPAA. “RASP 3.0 integrates divergent concepts from
and voltage measurement, to program each FG device. The FG switches in the connection (C) blocks, the switch (S) blocks, and the local routing are a single
multiple previous FPAA designs . . . along with low-power digital computa-
pFET FG transistor programmed to be a closed switch over the entire fabric signal swing of 0–2.5 V [9]. The CABs and the CLBs are similar to previous
approaches [3]. Eight, four input BLE lookup tables with a latch comprise the CLB blocks. Transconductance amplifiers, transistors, capacitors, switches, and
tion, including a 16-bit microprocessor (µP), interface circuitry, and DACs
other elements comprise the CAB blocks.
+ ADCs.
digital Theapproaches,
computational FPAA SoC die timing,
capacitance, photorapid measures 12 and
as utilization mmaccessibility
7 mm,of fabricated
the resulting computational
reconfigurability of the routing fabric, implementation
in a 350-nm standard CMOS process. The die photo identifiesof data resources for the data flow. The µP,
closestSRAM
high utilization struc-
converters in the mixed-mode fabric, and utilizing the routing ture (i.e., PSoC5 [10]) has nearly a 600 000 factor less in
memory,
fabric as part DACs, and programming (DACs +parameter
of the computation. ADC)densityinfrastructure; the mixed
than this SoC FPAA device.
array of the
This paper FPAA fabric
demonstrates (SectionisV)
composed of interdigitated analog (A) and digital
the first embed-
ded classifier structure (command-word recognition) compiled II. A RCHITECTURE D ESCRIPTION OF THE FPAA SoC IC
(D) configurable blocks on a single routing grid.
onto a single FPAA device, going from sensor input (audio) to
DACs and programming in-
Fig. 2 shows the block diagram for the RASP 3.0 FPAA
frastructure
classified are accessed
word, experimentally through
demonstrated memory-mapped
in analog hard- IC based on registers.” (George
a Manhattan FPAA et al.,including the
architecture,
ware.
2016) This demonstration is a small fraction of the overall IC. array of computation blocks and routing, composed of con-
The SoC FPAA compiled system system power (23 µW) is nection (C) and switch (S) blocks. This configurable fabric
consistent with the ×1000 improvement factor (comparison of effectively integrates analog (A) and digital (D) components
MACs) for physical computation over digital approaches, with in a hardware platform easily mapped toward compiler tools.
future opportunities for improved performance in the same IC. The switchable analog and digital devices are a combination of
Section VI summarizes the SoC FPAA design, as well as the components in the computational analog blocks (CABs),
presents the comparison showing the SoC FPAA as the most in the computational logic blocks (CLBs), and in the devices
sophisticated FPAA device built to date. The presented SoC in the routing architectures that are programmed to nonbinary
FPAA device maximizes both parameter area normalized to levels. The architecture is based on floating-gate (FG) device,
the process node, nearly a factor of 500 improvement in area circuit, and system techniques; we present the particular
efficiency as typical of other analog FPAA devices, as well FG programming approach elsewhere [12].
B. INTRODUCTION 251
We have considered the continuous state space, which is the basis for analog
computing, but there are a variety of ways in which analog computers can
operate on the state. In particular, the state can change continuously in time
or be updated at distinct instants (as in digital computation).
C. FUNDAMENTALS OF ANALOG COMPUTING 255
Since the laws of physics on which analog computing is based are differential
equations, many analog computations proceed in continuous real time. Also,
as we have seen, an important application of analog computers in the late
19th and early 20th centuries was the integration of ODEs in which time
is the independent variable. A common technique in analog simulation of
physical systems is time scaling, in which the differential equations are altered
systematically so the simulation proceeds either more slowly or more quickly
than the primary system (see Sec. C.4 for more on time scaling). On the
other hand, because analog computations are close to the physical processes
that realize them, analog computing is rapid, which makes it very suitable
for real-time control applications.
In principle, any mathematically describable physical process operating
on time-varying physical quantities can be used for analog computation. In
practice, however, analog computers typically provide familiar operations
that scientists and engineers use in differential equations (Rogers & Con-
nolly, 1960; Truitt & Rogers, 1960). These include basic arithmetic opera-
tions, such as algebraic sum and difference (u(t) = v(t) ± w(t)), constant
multiplication or scaling (u(t) = cv(t)), variable multiplication and division
(u(t) = v(t)w(t), u(t) = v(t)/w(t)), and inversion (u(t) = −v(t)). Transcen-
dental functions may be provided, such as the exponential (u(t) = exp v(t)),
logarithm (u(t) = ln v(t)), trigonometric functions (u(t) = sin v(t), etc.), and
resolvers for converting between polar and rectangular Rcoordinates. Most
t
important, of course, is definite integration (u(t) = v0 + 0 v(τ )dτ ), but dif-
ferentiation may also be provided (u(t) = v̇(t)). Generally, however, direct
differentiation is avoided, since noise tends to have a higher frequency than
the signal, and therefore differentiation amplifies noise; typically problems
are reformulated to avoid direct differentiation (Weyrick, 1969, pp. 26–7).
As previously mentioned, many GPACs include (arbitrary) function genera-
tors, which allow the use of functions defined only by a graph and for which
no mathematical definition might be available; in this way empirically defined
functions can be used (Rogers & Connolly, 1960, pp. 32–42). Thus, given a
graph (x, f (x)), or a sufficient set of samples, (xk , f (xk )), the function gen-
erator approximates u(t) = f (v(t)). Rather less common are generators for
arbitrary functions of two variables, u(t) = f (v(t), w(t)), in which the func-
tion may be defined by a surface, (x, y, f (x, y)), or by sufficient samples from
it.
256 CHAPTER V. ANALOG COMPUTATION
where τi is a time constant, ai is the decay rate, bi is the bias, and wij is the
connection weight to neuron i from neuron j. In a Hopfield network every
neuron is symmetrically connected to every other (wij = wji ) but not to itself
(wii = 0).
258 CHAPTER V. ANALOG COMPUTATION
synchronously (that is, signals propagate through the layers, or back to earlier
layers, in lockstep).
Many artificial neural-net learning algorithms are also sequential-time
analog computations. For example, the back-propagation algorithm updates
a network’s weights, moving sequentially backward through the layers.
In summary, the correctness of sequential time computation (analog or
digital) depends on the order of operations, not on their duration, and sim-
ilarly the efficiency of sequential computations is evaluated in terms of the
number of operations, not on their total duration.
(see Sec. F.3 below), from a practical standpoint these constants could be
set with about at most four digits of precision (Rogers & Connolly, 1960,
p. 11). Indeed, automatic potentiometer-setting devices were constructed
that read a series of decimal numerals from punched paper tape and used
them to set the potentiometers for the constants (Truitt & Rogers, 1960,
pp. 3-58–60). Nevertheless it is worth observing that analog computers do
allow continuous inputs that need not be expressed in digital notation, for
example, when the parameters of a simulation are continuously varied by
the operator. In principle, therefore, an analog program can incorporate
constants that are represented by a real-valued physical quantity (e.g., an
angle or a distance), which need not be expressed digitally. Further, as we
have seen (Sec. B.1.b), some electronic analog computers could compute a
function by means of an arbitrarily drawn curve, that is, not represented by
an equation or a finite set of digitized points. Therefore, in the context of
analog computing it is natural to expand the concept of a program beyond
discrete symbols to include continuous representations (scalar magnitudes,
vectors, curves, shapes, surfaces, etc.).
Typically such continuous representations would be used as adjuncts to
conventional discrete representations of the analog computational process,
such as equations or diagrams. However, in some cases the most natural static
representation of the process is itself continuous, in which case it is more like
a “guiding image” than a textual prescription (MacLennan, 1995). A simple
example is a potential surface, which defines a continuum of trajectories from
initial states (possible inputs) to fixed-point attractors (the results of the
computations). Such a “program” may define a deterministic computation
(e.g., if the computation proceeds by gradient descent), or it may constrain
a nondeterministic computation (e.g., if the computation may proceed by
any potential-decreasing trajectory). Thus analog computation suggests a
broadened notion of programs and programming.
C.4.b Scaling
An important aspect of analog computing is scaling, which is used to adjust a
problem to an analog computer. First is time scaling, which adjusts a problem
to the characteristic time scale at which a computer operates, which is a
consequence of its design and the physical processes by which it is realized
(Peterson 1967, pp. 37–44, Rogers & Connolly 1960, pp. 262–3, Weyrick
1969, pp. 241–3). For example, we might want a simulation to proceed on
a very different time scale from the primary system. Thus a weather or
economic simulation should proceed faster than real time in order to get
useful predictions. Conversely, we might want to slow down a simulation of
protein folding so that we can observe the stages in the process. Also, for
accurate results it is necessary to avoid exceeding the maximum response rate
of the analog devices, which might dictate a slower simulation speed. On the
C. FUNDAMENTALS OF ANALOG COMPUTING 265
degrades, as is generally the case, then the system will be adaptive, in effect
continually searching out the shortest paths, so long as source continues to
function (Camazine et al., 2001). Simulated diffusion has been applied to
robot path planning (Khatib, 1986; Rimon & Koditschek, 1989).
It might seem that any continuous physical process could be viewed as analog
computation, which would make the term almost meaningless. As the ques-
tion has been put, is it meaningful (or useful) to say that the solar system
is computing Kepler’s laws? In fact, it is possible and worthwhile to make a
distinction between computation and other physical processes that happen
to be described by mathematical laws (MacLennan, 1994a,c, 2001, 2004).
equivalent models, such as the lambda calculus) are models of discrete com-
putation, and so it is natural to wonder how analog computing compares in
power, and in particular whether it can compute beyond the “Turing limit.”
Superficial answers are easy to obtain, but the issue is subtle because it de-
pends upon choices among definitions, none of which is obviously correct,
it involves the foundations of mathematics and its philosophy, and it raises
epistemological issues about the role of models in scientific theories. This is
an active research area, but many of the results are apparently inconsistent
due to the differing assumptions on which they are based. Therefore this
section will be limited to a mention of a few of the interesting results, but
without attempting a comprehensive, systematic, or detailed survey; Siegel-
mann (1999) can serve as an introduction to the literature.
We will mention a few of the results that have been obtained concerning the
power of sequential-time analog computation.
Although the BSS model has been investigated extensively, its power
has not been completely determined (Blum et al., 1998, 1988). It is known
to depend on whether just rational numbers or arbitrary real numbers are
allowed in its programs (Siegelmann, 1999, p. 148).
A coupled map lattice (CML) is a cellular automaton with real-valued
states; it is a sequential-time analog computer, which can be considered a
discrete-space approximation to a simple sequential-time field computer. Or-
ponen & Matamala (1996) showed that a finite CML can simulate a universal
Turing machine. However, since a CML can simulate a BSS program or a
recurrent neural network (see Sec. F.2.c below), it actually has super-Turing
power (Siegelmann, 1999, p. 149).
Recurrent neural networks are some of the most important examples of
sequential analog computers, and so the following section is devoted to them.
F. ANALOG COMPUTATION AND THE TURING LIMIT 277
With the renewed interest in neural networks in the mid-1980s, many in-
vestigators wondered if recurrent neural nets have super-Turing power. M.
Garzon and S. Franklin showed that a sequential-time net with a countable
infinity of neurons could exceed Turing power (Franklin & Garzon, 1990; Gar-
zon & Franklin, 1989, 1990). Indeed, Siegelmann & Sontag (1994b) showed
that finite neural nets with real-valued weights have super-Turing power, but
Maass & Sontag (1999b) showed that recurrent nets with Gaussian or sim-
ilar noise had sub-Turing power, illustrating again the dependence on these
results on assumptions about what is a reasonable mathematical model of
analog computing.
For recent results on recurrent neural networks, we will restrict our at-
tention of the work of Siegelmann (1999), who addresses the computational
power of these network in terms of the classes of languages they can rec-
ognize. Without loss of generality the languages are restricted to sets of
binary strings. A string to be tested is fed to the network one bit at a time,
along with an input that indicates when the end of the input string has been
reached. The network is said to decide whether the string is in the language if
it correctly indicates whether it is in the set or not, after some finite number
of sequential steps since input began.
Siegelmann shows that, if exponential time is allowed for recognition,
finite recurrent neural networks with real-valued weights (and saturated-
linear activation functions) can compute all languages, and thus they are
more powerful than Turing machines. Similarly, stochastic networks with
rational weights also have super-Turing power, although less power than the
deterministic nets with real weights. (Specifically, they compute P/POLY
and BPP/log∗ respectively; see Siegelmann 1999, chs. 4, 9 for details.) She
further argues that these neural networks serve as a “standard model” of
(sequential) analog computation (comparable to Turing machines in Church-
Turing computation), and therefore that the limits and capabilities of these
nets apply to sequential analog computation generally.
Siegelmann (1999, p 156) observes that the super-Turing power of recur-
rent neural networks is a consequence of their use of non-rational real-valued
weights. In effect, a real number can contain an infinite number of bits of
information. This raises the question of how the non-rational weights of a net-
work can ever be set, since it is not possible to define a physical quantity with
infinite precision. However, although non-rational weights may not be able
278 CHAPTER V. ANALOG COMPUTATION
to be set from outside the network, they can be computed within the network
by learning algorithms, which are analog computations. Thus, Siegelmann
suggests, the fundamental distinction may be between static computational
models, such as the Turing machine and its equivalents, and dynamically
evolving computational models, which can tune continuously variable param-
eters and thereby achieve super-Turing power.
other hand, we have the obvious fact that neural nets are routinely simulated
on ordinary digital computers, which have at most the power of Turing ma-
chines. Furthermore, it is reasonable to suppose that any physical process
that might be used to realize analog computation—and certainly the known
processes—could be simulated on a digital computer, as is done routinely in
computational science. This would seem to be incontrovertible proof that
analog computation is no more powerful than Turing machines. The crux
of the paradox lies, of course, in the non-Turing-computable reals. These
numbers are a familiar, accepted, and necessary part of standard mathe-
matics, in which physical theory is formulated, but from the standpoint of
Church-Turing (CT) computation they do not exist. This suggests that the
the paradox is not a contradiction, but reflects a divergence between the
goals and assumptions of the two models of computation.
4
See MacLennan (2003, 2004) for a more detailed discussion of the frames of relevance
of natural computation and control.
G. ANALOG THINKING 283
G Analog thinking
It will be worthwhile to say a few words about the cognitive implications of
analog computing, which are a largely forgotten aspect of analog vs. digital
debates of the late 20th century. For example, it was argued that analog
computing provides a deeper intuitive understanding of a system than the
alternatives do (Bissell 2004, Small 2001, ch. 8). On the one hand, analog
computers afforded a means of understanding analytically intractable sys-
tems by means of “dynamic models.” By setting up an analog simulation, it
was possible to vary the parameters and explore interactively the behavior
of a dynamical system that could not be analyzed mathematically. Digital
simulations, in contrast, were orders of magnitude slower and did not permit
this kind of interactive investigation. (Performance has improved sufficiently
in contemporary digital computers so that in many cases digital simulations
can be used as dynamic models, sometimes with an interface that mimics an
analog computer; see Bissell 2004.)
Analog computing is also relevant to the cognitive distinction between
knowing how (procedural knowledge) and knowing that (declarative knowl-
edge) (Small, 2001, ch. 8). The latter (“know-that”) is more characteristic of
scientific culture, which strives for generality and exactness, often by design-
ing experiments that allow phenomena to be studied in isolation, whereas the
former (“know-how”) is more characteristic of engineering culture; at least
it was so through the first half of the twentieth century, before the develop-
ment of “engineering science” and the widespread use of analytic techniques
in engineering education and practice. Engineers were faced with analyt-
ically intractable systems, with inexact measurements, and with empirical
relationships (characteristic curves, etc.), all of which made analog comput-
ers attractive for solving engineering problems. Furthermore, because ana-
log computing made use of physical phenomena that were mathematically
analogous to those in the primary system, the engineer’s intuition and un-
derstanding of one system could be transferred to the other. Some commen-
tators have mourned the loss of hands-on intuitive understanding resulting
from the increasingly scientific orientation of engineering education and the
disappearance of analog computers (Bissell, 2004; Lang, 2000; Owens, 1986;
Puchta, 1996).
I will mention one last cognitive issue relevant to the differences between
analog and digital computing. As already discussed Sec. C.4, it is generally
agreed that it is less expensive to achieve high precision with digital tech-
284 CHAPTER V. ANALOG COMPUTATION
nology than with analog technology. Of course, high precision may not be
important, for example when the available data are inexact or in natural
computation. Further, some advocates of analog computing argue that high
precision digital results are often misleading (Small, 2001, p. 261). Precision
does not imply accuracy, and the fact that an answer is displayed with 10
digits does not guarantee that it is accurate to 10 digits; in particular, engi-
neering data may be known to only a few significant figures, and the accuracy
of digital calculation may be limited by numerical problems. Therefore, on
the one hand, users of digital computers might fall into the trap of trusting
their apparently exact results, but users of modest-precision analog comput-
ers were more inclined to healthy skepticism about their computations. Or
so it was claimed.
H Future directions
H.1 Post-Moore’s Law computing
Certainly there are many purposes that are best served by digital technology;
indeed there is a tendency nowadays to think that everything is done better
digitally. Therefore it will be worthwhile to consider whether analog com-
putation should have a role in future computing technologies. I will argue
that the approaching end of Moore’s Law (Moore, 1965), which has predicted
exponential growth in digital logic densities, will encourage the development
of new analog computing technologies.
Two avenues present themselves as ways toward greater computing power:
faster individual computing elements and greater densities of computing el-
ements. Greater density increases power by facilitating parallel computing,
and by enabling greater computing power to be put into smaller packages.
Other things being equal, the fewer the layers of implementation between the
computational operations and the physical processes that realize them, that
is to say, the more directly the physical processes implement the computa-
tions, the more quickly they will be able to proceed. Since most physical pro-
cesses are continuous (defined by differential equations), analog computation
is generally faster than digital. For example, we may compare analog addi-
tion, implemented directly by the additive combination of physical quantities,
with the sequential process of digital addition. Similarly, other things being
equal, the fewer physical devices required to implement a computational ele-
H. FUTURE DIRECTIONS 285
ment, the greater will be the density of these elements. Therefore, in general,
the closer the computational process is to the physical processes that realize
it, the fewer devices will be required, and so the continuity of physical law
suggests that analog computation has the potential for greater density than
digital. For example, four transistors can realize analog addition, whereas
many more are required for digital addition. Both considerations argue for
an increasing role of analog computation in post-Moore’s Law computing.
From this broad perspective, there are many physical phenomena that are
potentially usable for future analog computing technologies. We seek phe-
nomena that can be described by well-known and useful mathematical func-
tions (e.g., addition, multiplication, exponential, logarithm, convolution).
These descriptions do not need to be exact for the phenomena to be useful
in many applications, for which limited range and precision are adequate.
Furthermore, in some applications speed is not an important criterion; for
example, in some control applications, small size, low power, robustness,
etc. may be more important than speed, so long as the computer responds
quickly enough to accomplish the control task. Of course there are many
other considerations in determining whether given physical phenomena can
be used for practical analog computation in a given application (MacLen-
nan, 2009). These include stability, controllability, manufacturability, and
the ease of interfacing with input and output transducers and other devices.
Nevertheless, in the post-Moore’s Law world, we will have to be willing to
consider all physical phenomena as potential computing technologies, and in
many cases we will find that analog computing is the most effective way to
utilize them.
Natural computation provides many examples of effective analog com-
putation realized by relatively slow, low-precision operations, often through
massive parallelism. Therefore, post-Moore’s Law computing has much to
learn from the natural world.
286 CHAPTER V. ANALOG COMPUTATION
Bibliography
287
288 BIBLIOGRAPHY
Berut, A., Arakelyan, A., Petrosyan, A., Ciliberto, S., Dillenschneider, R., &
Lutz, E. (2012). Experimental verification of Landauer’s principle linking
information and thermodynamics. Nature, 483, 187–189.
Blum, L., Cucker, F., Shub, M., & Smale, S. (1998). Complexity and Real
Computation. Berlin: Springer-Verlag.
Blum, L., Shub, M., & Smale, S. (1988). On a theory of computation and
complexity over the real numbers: Np completeness, recursive functions
and universal machines. The Bulletin of the American Mathematical So-
ciety, 21, 1–46.
Bournez, O., Campagnolo, M., Graça, D., & Hainry, E. (2006). The General
Purpose Analog Computer and computable analysis are two equivalent
paradigms of analog computation. In Theory and Applications of Models
of Computation (TAMC 2006), volume 3959 of Lectures Notes in Computer
Science (pp. 631–43). Berlin: Springer-Verlag.
Calude, C., Casti, J., & Dinneen, M., Eds. (1998). Unconventional Models
of Computation. Singapore & New York: Springer.
Calude, C. & Paun, G. (2001). Computing with Cells and Atoms. London &
New York: Taylor & Francis.
Camazine, S., Deneubourg, J., Franks, N. R., Sneyd, J. Theraulaz, G., &
Bonabeau, E. (2001). Self-organization in Biological Systems. Princeton.
Eberbach, E., Goldin, D., & Wegner, P. (2003). Turing’s ideas and models
of computation. In C. Teuscher (Ed.), Alan Turing: Life and Legacy of a
Great Thinker. Berlin, Heidelberg & New York: Springer-Verlag.
290 BIBLIOGRAPHY
Maass, W. & Sontag, E. (1999a). Analog neural nets with Gaussian or other
common noise distributions cannot recognize arbitrary regular languages.
Neural Computation, 11, 771–782.
Maass, W. & Sontag, E. (1999b). Analog neural nets with Gaussian or other
common noise distributions cannot recognize arbitrary regular languages.
Neural Computation, 11(3), 771–782.
Maini, P. K. & Othmer, H. G., Eds. (2001). Mathematical Models for Bio-
logical Pattern Formation. Springer- Verlag.
McClelland, J., Rumelhart, D., & the PDP Research Group (1986). Parallel
Distributed Processing: Explorations in the Microstructure of Cognition,
Volume 2: Psychological and Biological Models. Cambridge, MA: MIT
Press.
Mead, C. (1989). Analog VLSI and Neural Systems. Reading, MA: Addison-
Wesley.
BIBLIOGRAPHY 295
Mills, J. W., Himebaugh, B., Kopecky, B., Parker, M., Shue, C., & Weile-
mann, C. (2006). “Empty space” computes: The evolution of an unconven-
tional supercomputer. In Proceedings of the 3rd Conference on Computing
Frontiers (pp. 115–26). New York: ACM Press.
Milner, R., Parrow, J., & Walker, D. (1992). A calculus of mobile processes,
I & II. Information and Computation, 100, 1–77.
Owens, L. (1986). Vannevar bush and the differential analyzer: The text and
context of an early computer. Technology and Culture, 27(1), 63–95.
Rumelhart, D., McClelland, J., & the PDP Research Group (1986). Parallel
Distributed Processing: Explorations in the Microstructure of Cognition,
Volume 1: Foundations. Cambridge, MA: MIT Press.
Small, J. S. (2001). The Analogue Alternative. London & New York: Rout-
ledge.
Thomson, W. (1938). The tides. In The Harvard Classics, volume 30: Sci-
entific Papers (Lord Kelvin) (pp. 274–307). New York: Collier.