Barak Shoshany PHY 256 Lecture Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 167

PHY 256: Introduction To Quantum Physics

Summer 2020 Lecture Notes

Barak Shoshany

Department of Physics, University of Toronto


60 St. George St., Toronto, Ontario, M5S 1A7, Canada

[email protected]
http://baraksh.com/PHY256/

June 25, 2020

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Course Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Non­Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 The Failures of Classical Physics . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Black­Body Radiation and the Ultraviolet Catastrophe . . . . . 6
2.1.2 The Photoelectric Effect . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 The Double­Slit Experiment . . . . . . . . . . . . . . . . . . . . 8
2.1.4 The Stern­Gerlach Experiment . . . . . . . . . . . . . . . . . . 12
2.2 Quantum vs. Classical Mechanics . . . . . . . . . . . . . . . . . . . . . 13
3 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Operations on Complex Numbers . . . . . . . . . . . . . . . . 18
3.1.3 The Complex Plane and Real 2­Vectors . . . . . . . . . . . . . 19
3.1.4 Polar Coordinates and Complex Phases . . . . . . . . . . . . . 22
3.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Dual Vectors, Inner Products, Norms, and Hilbert Spaces . . . 25
3.2.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.4 Matrices and the Adjoint . . . . . . . . . . . . . . . . . . . . . . 30

1
3.2.5 The Outer Product . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.6 The Completeness Relation . . . . . . . . . . . . . . . . . . . . 33
3.2.7 Representing Vectors in Different Bases . . . . . . . . . . . . . 34
3.2.8 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.9 Multiplication and Inverse of Matrices . . . . . . . . . . . . . . 37
3.2.10 Matrices Inside Inner Products . . . . . . . . . . . . . . . . . . 38
3.2.11 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 39
3.2.12 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.13 Unitary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.14 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.15 Representing Matrices in Different Bases . . . . . . . . . . . . 43
3.2.16 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . 45
3.2.17 The Cauchy­Schwarz Inequality . . . . . . . . . . . . . . . . . 47
3.3 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Random Variables and Probability Distributions . . . . . . . . 48
3.3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.4 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.5 Normal (Gaussian) Distributions . . . . . . . . . . . . . . . . . 55
4 The Foundations of Quantum Theory . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Axiomatic Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.1 Dimensionless and Dimensionful Constants . . . . . . . . . . . 58
4.1.2 Hilbert Spaces, States, and Operators . . . . . . . . . . . . . . 59
4.1.3 Hermitian Operators and Observables . . . . . . . . . . . . . . 61
4.1.4 Probability Amplitudes . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.5 Superposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.6 Inner Products with Matrices, and the Expectation Value . . . 65
4.1.7 Summary For Discrete Systems . . . . . . . . . . . . . . . . . 67
4.2 Two­State Systems, Spin 1/2, and Qubits . . . . . . . . . . . . . . . . 68
4.2.1 The Pauli Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Spin 1/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.3 Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.4 The Meaning of Superposition . . . . . . . . . . . . . . . . . . 74
4.3 Composite Systems and Quantum Entanglement . . . . . . . . . . . . 77
4.3.1 The Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.2 Vectors and Matrices in the Composite Hilbert Space . . . . . 79
4.3.3 Quantum Entanglement . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 The Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.5 Entanglement Does Not Transmit Information . . . . . . . . . 88
4.3.6 Bell’s Theorem and Bell’s Inequality . . . . . . . . . . . . . . . 89

2
4.4 Non­Commuting Observables and the Uncertainty Principle . . . . . . 93
4.4.1 Commuting and Non­Commuting Observables . . . . . . . . . 93
4.4.2 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . 94
4.4.3 Simultaneous Diagonalization . . . . . . . . . . . . . . . . . . . 97
4.5 Dynamics, Transformations, and Measurements . . . . . . . . . . . . . 99
4.5.1 Unitary Transformations and Evolution . . . . . . . . . . . . . 99
4.5.2 Quantum Logic Gates . . . . . . . . . . . . . . . . . . . . . . . 101
4.5.3 The Measurement Axiom (Projective) . . . . . . . . . . . . . . 105
4.5.4 Applications of the Measurement Axiom . . . . . . . . . . . . . 107
4.5.5 The Measurement Axiom (Simplified) . . . . . . . . . . . . . . 109
4.5.6 Interpretations of Quantum Mechanics and the Measurement
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.5.7 Superposition Once Again: Schrödinger’s Cat . . . . . . . . . 114
4.6 The No­Cloning Theorem and Quantum Teleportation . . . . . . . . . 116
4.6.1 The No­Cloning Theorem . . . . . . . . . . . . . . . . . . . . . 116
4.6.2 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . . 118
4.7 The Foundations of Quantum Theory: Summary . . . . . . . . . . . . 122
5 Continuous Quantum Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1.1 Exponentials and Logarithms . . . . . . . . . . . . . . . . . . . 125
5.1.2 Matrix and Operator Exponentials . . . . . . . . . . . . . . . . 128
5.2 Continuous Time Evolution, Hamiltonians, and the Schrödinger Equa­
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.1 The Schrödinger Equation and Hamiltonians: Preface . . . . . 131
5.2.2 Derivation of the Schrödinger Equation . . . . . . . . . . . . . 132
5.2.3 Time­Independent Hamiltonians . . . . . . . . . . . . . . . . . 135
5.2.4 Hamiltonians and Energy . . . . . . . . . . . . . . . . . . . . . 136
5.3 Hamiltonian Mechanics and Canonical Quantization . . . . . . . . . . . 137
5.3.1 A Quick Review of Classical Hamiltonian Mechanics . . . . . . 138
5.3.2 Canonical Quantization . . . . . . . . . . . . . . . . . . . . . . 141
5.4 The Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.4.1 The Classical Harmonic Oscillator . . . . . . . . . . . . . . . . 143
5.4.2 Quantizing the Harmonic Oscillator . . . . . . . . . . . . . . . 145
5.4.3 The Energy Eigenstates of the Harmonic Oscillator . . . . . . 146
5.5 Wavefunctions, Position, and Momentum . . . . . . . . . . . . . . . . . 150
5.5.1 The Position Operator . . . . . . . . . . . . . . . . . . . . . . . 150
5.5.2 Wavefunctions in the Position Basis . . . . . . . . . . . . . . . 151
5.5.3 The Momentum Operator . . . . . . . . . . . . . . . . . . . . . 155
5.5.4 Quantum Interference . . . . . . . . . . . . . . . . . . . . . . . 156
5.6 Solutions of the Schrödinger Equation . . . . . . . . . . . . . . . . . . 157
5.6.1 The Schrödinger Equation for a Particle . . . . . . . . . . . . . 157

3
5.6.2 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . 159

List of Figures
2.1 The electromagnetic spectrum of a black body . . . . . . . . . . . . . 6
2.2 The photoelectric effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Light waves in the double­slit experiment . . . . . . . . . . . . . . . . 9
2.4 Interference of two light waves . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Electron interference pattern in the double­slit experiment . . . . . . 10
2.6 The Stern­Gerlach experiment . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 The uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 The complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Probability distribution for one roll of a 6­sided die . . . . . . . . . . . 56
3.4 Probability distribution for the sum of two rolls of a 6­sided die . . . . 57
3.5 Probability distribution for the sum of three rolls of a 6­sided die . . . 57
4.1 A qubit in a superposition of |0⟩ and |1⟩ . . . . . . . . . . . . . . . . . . 75
4.2 Schrödinger’s Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

1 Introduction

1.1 Course Outline

This course will serve as a comprehensive introduction to the foundations of quan­


tum mechanics, from the modern point of view of 21st century theoretical physics.
It will be somewhat different from a traditional first course in quantum mechanics,
in that we will develop the theory from scratch in an axiomatic and mathemati­
cally rigorous(ish) way. There will be less emphasis on doing calculations, and
more on a deep conceptual understanding of the theory.
First, a short non­technical overview of quantum mechanics will be provided. We
will discuss the failures of classical mechanics that prompted the development of
the quantum theory, and list the major differences between classical and quantum
mechanics.
Next, we will learn the necessary mathematical background, including complex
numbers, linear algebra, and probability. Even if you took courses on these sub­
jects before, you should still pay careful attention, since we will learn the material
from the quantum point of view and introduce important notation that is unique
to quantum mechanics.
Once we have a firm grasp of the mathematical background, we will use it to define
quantum mechanics axiomatically. We will learn about fundamental concepts

4
such as Hilbert spaces, states, operators, observables, superposition, probability
amplitudes, and expectation values.
Then, we will begin studying simple discrete quantum systems known as qubits,
which are the quantum analogue of bits, and are used in quantum computers. We
will learn about Schrödinger’s cat, quantum entanglement, Bell’s theorem, the
uncertainty principle, unitary evolution, quantum measurements, and quantum
teleportation.
In the remainder of the course we will study continuous quantum systems and re­
lated concepts, including Hamiltonians, the Schrödinger equation, canonical quan­
tization, the quantum harmonic oscillator, wavefunctions, quantum interference,
and solutions to the time­independent Schrödinger equation, including scattering
and tunneling in one dimension.
By the end of the course, the students should expect to have a fairly good under­
standing of quantum mechanics, and to develop an intuition for this very strange
and unintuitive theory. They will also be adequately prepared to dive deeper into
the subject, whether by taking more advanced courses or by doing research.

1.2 Exercises and Problems

Throughout these notes, you will find many exercises and problems.

• Exercises are usually just calculations. They are meant to verify that you
understand how to calculate things, and they are usually simple and straight­
forward.

• Problems are usually proof­based. They are meant to verify that you un­
derstand the more abstract relations between the concepts we will introduce,
and they often require some thought.

2 Non­Technical Overview

In this chapter, I will provide a non­technical overview of quantum physics, and


how it compares to classical physics. I won’t go into exactly who discovered what
and in which year, because this is not a history course; this is a course about
how the universe works. However, if you are interested in the history of quantum
mechanics, there are many excellent websites and textbooks on the subject, and
you are encouraged to look them up.
Instead, I will focus on two main goals in this chapter:

1. Introducing some of the fundamental experiments which illustrate why clas­


sical mechanics needs to be replaced with a more fundamental theory. This

5
should also convince you that your classical intuition must be replaced with
quantum intuition, which is what we will try to develop in this course.

2. Summarizing the fundamental properties of quantum mechanics and the dif­


ferences between it and classical mechanics in non­technical terms, without
going into the math. This should give you some idea of what we will study
throughout this course in much more detail and with the full, uncensored
mathematical framework.

2.1 The Failures of Classical Physics

2.1.1 Black­Body Radiation and the Ultraviolet Catastrophe

UV VISIBLE INFRARED

14
5000 K
00 ¹ · m 00 ² · nm 00 ¹)
00

12
Classical theory (5000 K)
00

10
Spectral radiance (kW · sr 00

4000 K
4

2
3000 K

0
0 0.5 1 1.5 2 2.5 3
Wavelength (μm)

Figure 2.1: The electromagnetic spectrum of a black body. Source:


Wikipedia.

A black body is an object that absorbs all incoming light at all frequencies. It
absorbs it and does not reflect it – therefore, it is black. More generally, it absorbs
not just light, but all electromagnetic radiation. Black bodies also emit radiation,
due to their heat. Electromagnetic radiation has a spectrum of wavelengths of

6
different lengths. We are interested in predicting the amount of radiation emitted
by the black body at each wavelength, which we will refer to as the black body’s
spectrum.
One can try to use classical physics to calculate this spectrum. It turns out that the
amount of the radiation is inversely proportional to the wavelength1 . This means
that as the wavelength approaches zero, the amount of radiation approaches
infinity! This is illustrated by the black curve in figure 2.1. This result is called
the ultraviolet catastrophe, since ultraviolet light has shorter wavelengths than
visible light. Obviously, this does not fit well with experimental data, since when
we measure the total radiation emitted from a black body, we most definitely do
not measure it to be infinity!
To solve this problem, we must use quantum physics. If we assume that radiation
can only be emitted in discrete “packets” of energy called quanta, we get the
correct spectrum of radiation, which is compatible with experiment. The law
describing the amount of radiation at each wavelength is called Planck’s law. In
figure 2.1, we can see three different curves, calculated using Planck’s law, giving
the radiation spectrum at different temperatures (in Kelvin). You can see that the
total amount of radiation is no longer infinite. The quanta of electromagnetic
radiation are called photons.

2.1.2 The Photoelectric Effect

Figure 2.2: The photoelectric effect. Source: Khan Academy.

When light hits a material, it causes the material to emit electrons. This phe­
nomenon is called the photoelectric effect. Using classical physics, and the as­
sumption that light is a wave, we can make the following predictions:

• Brighter light should have more energy, so it should cause the emitted elec­
trons to have more kinetic energy, and thus move faster.

• Light with higher frequency should hit the material more often, so it should
cause a higher rate of electron emission, resulting in a larger electric current.
1
More precisely, the power emitted per unit area per unit solid angle per unit wavelength is
proportional to 1/𝜆4 where 𝜆 is the wavelength... But fortunately, we don’t need to be very precise
here!

7
• Assuming there is a certain minimum energy needed to dislodge an electron
from the material, sufficiently bright light of any frequency should cause
electron emission.

However, what actually happens is the exact opposite:

• The kinetic energy of the emitted electrons increases with frequency, not
brightness.

• The electric current increases with brightness, not frequency.

• Electrons are emitted only when the frequency of the light exceeds a certain
threshold, regardless of how bright it is.

This is illustrated in figure 2.2, where the red light does not cause any electrons
to be emitted, but the green and blue lights do, since they have higher frequency.
Furthermore, since the blue light has higher frequency than the green light, the
kinetic energy of the emitted electrons is larger.
To explain this, we must again use quantum physics. Einstein proposed to use
the same model that Planck suggested to solve the ultraviolet catastrophe, where
light is made of discrete photons. Each photon has energy proportional to the
frequency of the light, and brighter light of the same frequency simply has more
photons, each photon still with the same amount of energy. This model fits the
predictions perfectly.
So in figure 2.2, making the red light brighter will increase the number of photons,
but no matter how bright it is, the individual photons it’s made of still do not have
enough energy to dislodge an electron on their own. On the other hand, each
individual photon of the green and blue lights has, on its own, enough energy
to dislodge a photon, and even if the light is very dim, the electrons will still be
emitted.

2.1.3 The Double­Slit Experiment

The previous two experiments may have convinced you that light is not a wave,
but a particle. But is that really the case? The double­slit experiment shows that
things are actually more complicated. In this experiment, a light beam hits a
plate with two parallel slits. Most of the light is blocked by the plate, but some
of it passes through the slits and hits a screen, creating a pattern of bright and
dark bands.
This can be most naturally explained by assuming that light is not a particle, but
a wave. Each of the slits becomes the origin of a new wave, as illustrated in
figure 2.3. Each of the two waves has crests and troughs. When a crest of one

8
x

θ
P

Figure 2.3: Light waves in the double­slit experiment. Source:


Wikipedia.

Figure 2.4: Constructive (left) and destructive (right) interference of


two light waves. Source: Wikipedia.

9
Figure 2.5: An interference pattern created by electrons in the double­
slit experiment. Each image (from top to bottom) corresponds to a
later point in time, after more electrons have accumulated. Source:
Wikipedia.

10
wave is at the same place as a crest of the other wave, they add up to create
a crest with double the magnitude. This is called constructive interference. On
the other hand, if a crest of one wave is at the same place as a trough of the
other wave, they cancel each other. This is called destructive interference. See
figure 2.4 for an illustration. The pattern on the screen, as seen in figure 2.3, is
a consequence of this interference.
So the double­slit experiment seems to prove that light is a wave, in contradic­
tion with black­body radiation and the photoelectric effect, which seem to prove
that light is a particle. It turns out that, in fact, both are correct; the quantum
nature of light has the consequence that it sometimes behaves like a classical
wave, and other times like a classical particle. This is called wave­particle duality.
Contrary to common misconception, this doesn’t mean that light is “both a wave
and a particle”; it simply demonstrates that the classical concepts of “wave” and
“particle” are not the proper way to describe reality.
Okay, so light exhibits wave­particle duality. Maybe this makes sense. But matter,
which is a tangible thing you can touch, is definitely made of particles, right? To
check that, we can replace the beam of light with a beam of electrons. Since
we think electrons are particles, not waves, we expect to find on the screen not
an interference pattern, but just individual dots corresponding to the individual
electron particles. And this is indeed what happens, except... If we run the
experiment for some time, and let the electrons build up, then after a while we
see that an interference pattern emerges nonetheless! This is shown in figure 2.5.
What does this mean? It means that, in quantum physics, both light and matter
exhibit wave­particle duality. In classical physics, the measurement of the posi­
tion of the electron on the screen is deterministic; if we know the initial position
and velocity of the electron, then we can predict exactly where the electron lands.
In quantum physics, we instead have a probability distribution, which gives us
the probability for the electron to be measured at each particular point on the
screen. This probability distribution turns out to propagate in space like a wave,
and interfere with itself constructively and destructively on the way as a wave
does, which is what causes the interference pattern on the screen – it is actually
a pattern of probabilities! In the end, the probability will be enlarged on some
points of the screen and reduced on other points.
To clarify how the measurement of the positions of the electrons on the screen
yields a probability distribution, consider instead a 6­sided die. If you roll the
die just once or twice, you won’t have much information about the probabilities
to roll each number on the die. This is analogous to sending just a couple of
electrons through the slits. What you need to do is to roll the die a large number
of times, let’s say 6,000 times. Then you count how many times the die rolled
on each number. For example, if it rolled around 1,000 times on each number,
then you know the die is fair; but if it rolled around 2,000 times on 6 and around

11
800 times on every other number, then you know the die is loaded. Similarly, we
need to send a large number of electrons through the slits in order to determine
the probability distribution for their positions on the screen. It turns out that the
position of the electron is “loaded”!
As an aside, in 21st century terms, the precise answer to the question “is light
a wave or a particle?” turns out to be that both of them are different aspects of
the same fundamental entity called the quantum electromagnetic field. This field
propagates from place to place like a wave, but on the other hand, if you put
enough energy into it, you can cause a quantum excitation in the field. It is this
excitation that behaves like a particle.
Moreover, it turns out that all elementary particles are quantum fields, and thus
all of them exhibit these two aspects. This is called quantum field theory. It
neatly unites quantum mechanics with special relativity, and explains elementary
particle physics in amazing accuracy – it is actually the most accurate theory in
all of science! In this course we will focus on non­relativistic quantum mechanics,
which is to quantum field theory as Newtonian physics is to special relativity.
Quantum field theory is much more complicated, and is usually only taught at
the graduate­school level.

2.1.4 The Stern­Gerlach Experiment

In the Stern­Gerlach experiment, electrically neutral particles, such as silver


atoms, are sent through an inhomogeneous magnetic field and into a screen.
For reasons we won’t go into (since they require some knowledge of electrody­
namics), the magnetic field will deflect the particle up or down by an amount
proportional to its angular momentum. According to classical physics, this angu­
lar momentum can have any value, and so we would expect to see the particles
hit every possible point along a continuous line on the screen. This is item (4)
in figure 2.6.
However, what actually happens when we perform the experiment is that the
particles are deflected either up or down by the exact same amount each time, and
hit only two specific discrete points on the screen. This is item (5) in figure 2.6.
To explain this, we must again use quantum physics. Quantum particles are not
seen as classically spinning objects; instead they are said to have an intrinsic
form of angular momentum called spin. For particles like electrons or silver atoms,
a measurement of spin can only yield one of two options: “spin up” or “spin down”.
The previous experiments we discussed showed us that something that is classi­
cally continuous – light, or more generally, electromagnetic radiation – is quan­
tized in the quantum theory into discrete packets or quanta of energy called
photons. Similarly, the Stern­Gerlach experiment tells us that another classically
continuous thing, angular momentum, is also quantized in the quantum theory

12
2
4
5 N
S 1

Figure 2.6: The Stern­Gerlach experiment. Source: Wikipedia.

– into discrete spin. This seems to be a general property of most, but not all,
quantum systems: something that in classical physics was continuous turns out
to actually be discrete in quantum physics.
Finally, let me just mention that one can use spin to create qubits, or “quantum
bits”, where “spin up” represents a value of 0 and “spin down” represents a value
of 1. Because spin is a quantum quantity, it satisfies all of the weird properties
of quantum mechanics that we will discuss later. By taking advantage of these
quantum properties, we can potentially do calculations faster with a quantum
computer that uses qubits compared to a classical computer that uses classical
bits.

2.2 Quantum vs. Classical Mechanics

Let us now summarize, in a non­technical way, the most important features of


quantum mechanics and how they differ from their classical­mechanical counter­
parts.

1. Quantum mechanics is, as far as we know, the exact and fundamen­


tal theory of reality. Classical mechanics turns out to be just an approx­
imation to this theory. This means that, in general, all modern theories of

13
Figure 2.7: The uncertainty principle.

physics must be quantum theories if they intend to be fundamental. One im­


portant exception to that rule is general relativity, which we do not yet know
how to describe as a quantum theory; if we did, we would call that theory
quantum gravity. However, this is usually not a problem, since general rela­
tivity is mostly needed only when describing huge things like planets, stars,
galaxies, and so on, in which case we do not need quantum mechanics since
we are within the realm of validity of the classical approximation. In fact,
this leads us to the next property:

2. Quantum mechanics is the theory of the smallest things. This includes


elementary particles, atoms, and molecules. Since all big things are made of
small things, quantum mechanics also describes humans, planets, galaxies,
and the whole universe. However, this is exactly where the classical limit
comes in; when many small quantum systems make up one big system,
classical mechanics generally turns out to be a good enough description for
all practical purposes. This is similar to how relativity is always the correct
way to describe physics, but at low velocities, much smaller than the speed
of light, Newtonian physics is a good enough approximation.

3. Quantum mechanics usually involves discrete things. This is in con­


trast with classical mechanics, which usually involves continuous things. In
fact, continuous classical things generally turn out to be made of discrete
quantum things. We saw an example of this when we discussed how light
– a continuous electromagnetic field – is actually made of discrete photons.

14
Similarly, we saw that angular momentum, which is continuous in the clas­
sical theory, is replaced by discrete spin in the quantum theory.

4. Quantum mechanics is a probabilistic theory. Classical mechanics, on


the other hand, is a deterministic theory. For example, in classical mechan­
ics, given a particle’s exact position and momentum at any one time, we can
(in principle) predict its position and momentum at any other time – with
absolute certainty. However, in quantum mechanics, the most we can ever
hope to know is the probability distribution to find the particle at a certain
position or with a certain momentum. This is illustrated in figure 2.7.

5. Quantum mechanics allows for superposition of states. In classical


mechanics, the state of a particle is simply given by the exact values of its
position and momentum. In contrast, in quantum mechanics the particle
can – in fact, usually must – be in a superposition of possible positions and
momenta. Each one of the possibilities in the superposition has a probability
assigned to it, and this is where the probability distribution in figure 2.7
comes from.

6. Quantum mechanics features uncertainty in measurements. This is


called the uncertainty principle. In classical mechanics, at least theoretically,
we can precisely know both the position and momentum of the particle. How­
ever, in quantum mechanics, the more we know about the position, the less
we know about the momentum – and vice versa. If the position probability
distribution is narrow and concentrated at a certain region, meaning that
there is low uncertainty in the position, then one can prove that the mo­
mentum probability distribution must be wide, meaning that there is high
uncertainty in the momentum. The opposite is also true. This is again illus­
trated in figure 2.7.

7. Quantum mechanics has a stronger type of correlation called entan­


glement. Classical mechanics also allows for correlation. For example, let’s
say I have two sealed envelopes with notes inside them, one with the num­
ber 0 and the other with the number 1. I give one to Alice and one to Bob.
If Alice opens her envelope and sees the number 0, she can be sure that
Bob has the envelope with the number 1, and vice versa. The results are
clearly correlated. However, if we replace the notes with qubits – quantum
bits which are in a superposition of 0 and 1 – then the envelopes are now
correlated more strongly via quantum entanglement. We will discuss later
in exactly what way quantum entanglement is stronger than classical cor­
relation, but right now we will note that this fact is what gives quantum
computers their power.

15
3 Mathematical Background

Quantum theory is the theoretical framework believed to describe all aspects of


our universe at the most fundamental level. Mathematically, as we will see, it is
relatively simple, although much more abstract than classical physics. However,
conceptually, it is very hard to understand using the classical intuition we have
from our daily lives. In these lectures we will learn to develop quantum intuition.
In this chapter we shall learn some basic mathematical concepts, focusing on
complex numbers, linear algebra, and probability theory, which will be used ex­
tensively throughout the course. Even if the student is already familiar with these
concepts, it is still a good idea to go over this chapter, since the unique notation
commonly used in quantum mechanics is different than the notation used else­
where in mathematics and physics.

3.1 Complex Numbers

Complex numbers are at the very core of the mathematical formulation of quan­
tum theory. In this section we will give a review of complex numbers and present
some definitions and results that will be used throughout the course.

3.1.1 Motivation

In real life, we only encounter real numbers. These numbers form a field, that
is, a set of elements with well­defined operations of addition, subtraction, multi­
plication, and division. This field is denoted ℝ. Geometrically, we can imagine ℝ
as a 1­dimensional line, stretching from −∞ to +∞.
Unfortunately, it turns out that the field of real numbers has a serious flaw. One
can write down completely reasonable­looking quadratic equations, with only real
coefficient, which nonetheless have no solutions in ℝ. Consider the most general
quadratic equation:
𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0, 𝑎, 𝑏, 𝑐 ∈ ℝ. (3.1)

One can easily prove (by completing the square) that there are two potential
solutions, given by √
−𝑏 ± 𝑏2 − 4𝑎𝑐
𝑥± ≡ . (3.2)
2𝑎

√ corresponds to the choice + and the other one to −. However,


Here, one solution
the square root 𝑏2 − 4𝑎𝑐 poses a problem, because the square of a real number
is always non­negative2 :
𝑥2 ≥ 0, ∀𝑥 ∈ ℝ. (3.3)
2
Here, ∀ means “for all”.

16
The number (and existence) of real solutions is thus determined by the sign of
the expression inside the square root, called the discriminant Δ ≡ 𝑏2 − 4𝑎𝑐:

⎧ −𝑏 ± Δ
{Δ > 0 ∶ two real roots 𝑥± = ,
{ 2𝑎
{ 𝑏
⎨Δ = 0 ∶ one real root 𝑥 = − 2𝑎 , (3.4)
{
{
{Δ < 0 ∶ no real roots.

It would be very convenient (not to mention more elegant) to have a field of


numbers that is algebraically closed, meaning that every non­constant polynomial
(and in particular, a quadratic polynomial) with coefficients in the field has a root
in the field.
Since the problem stems from the fact that no real number can square to a neg­
ative number, let us simply extend our field with just one number, the imaginary
unit, denoted3 i, whose sole purpose is to square to a negative number. The most
natural choice is for i to square to −1:
2
i ≡ −1. (3.5)

The new field created by extending ℝ with i is the field of complex numbers,
denoted ℂ. A general complex number is written

𝑧 = 𝑎 + i 𝑏, 𝑧 ∈ ℂ, 𝑎, 𝑏, ∈ ℝ, (3.6)

where 𝑎 is called the real part and 𝑏 is called the imaginary part, both real num­
bers.

Now, in the quadratic equation, having Δ with a negative Δ is no longer a

problem, since the number i −Δ squares to Δ:
√ 2 2
(i −Δ) = i (−Δ) = (−1) (−Δ) = Δ. (3.7)

Therefore, we conclude that every quadratic equation has a solution in the field
3
We use non­italic font exclusively for i in order to distinct it from 𝑖, which will be used for labels
and variables. Of course, it is usually a wise idea not to have both i and 𝑖 in the same equation in
the first place, but sometimes that is unavoidable.

17
of complex numbers4 :

⎧ −𝑏 ± Δ
{Δ > 0 ∶ two real roots 𝑥± = ,
{ 2𝑎
{
{ 𝑏
⎨Δ = 0 ∶ one real root 𝑥 = −
2𝑎
, (3.8)
{ √
{ 𝑏 −Δ
{Δ < 0 ∶ two complex roots 𝑥± = − ± i .
{ 2𝑎 2𝑎

As a matter of fact, this is a special case of the fundamental theorem of algebra:


any polynomial of degree 𝑛 with complex coefficients5 has at least one, and at
most 𝑛, unique complex roots6 . The quadratic equation corresponds to the case
𝑛 = 2.
Exercise 3.1.
A. Solve the quadratic equation

𝑥2 − 6𝑥 + 25 = 0. (3.9)

B. Find the quadratic equation whose solutions are 𝑧 = 7 ± 2 i.


Problem 3.2. Above we saw that the equation 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0 with 𝑎, 𝑏, 𝑐 ∈ ℝ can
either have two real solutions, one real solution, or two complex solutions that
are conjugates of each other.
A. Imaginary numbers7 are numbers of the form i 𝑏 for 𝑏 ∈ ℝ. What kind of
equation has two imaginary solutions that are complex conjugates of each other?
B. What kind of equation has two imaginary solutions that are in general not
complex conjugates of each other?
C. What kind of equation has two arbitrary complex solutions that are in general
not complex conjugates of each other?
Note: In all of the above, don’t just find a specific equation that has this property
– find a family of equations with arbitrary parameters of certain types.

3.1.2 Operations on Complex Numbers

Complex numbers can be added and multiplied with other complex numbers.
There is really nothing special about these operations, except that it is customary
4
Note that real numbers are a special case of complex numbers, so the two real roots are also
two complex roots.
5
Again, real numbers are a special case of complex numbers, so the coefficients can be all real.
6
Or equivalently, it has exactly 𝑛 not necessarily unique complex roots, accounting for possible
degeneracy/multiplicity. For example, for Δ = 0 the quadratic equation has two degenerate roots,
or one root of multiplicity 2.
7
Sometimes also called purely imaginary numbers.

18
to group the imaginary parts (i.e. anything that is a multiple of i) together and
2
turn i into −1 in the final result:

(𝑎 + i 𝑏) + (𝑐 + i 𝑑) = (𝑎 + 𝑐) + i (𝑏 + 𝑑) , (3.10)

(𝑎 + i 𝑏) (𝑐 + i 𝑑) = (𝑎𝑐 − 𝑏𝑑) + i (𝑎𝑑 + 𝑏𝑐) . (3.11)

Next, note that the two solutions to a quadratic equation with Δ < 0 are the same,
up to the sign of i. That is, if we replace i with − i in one of the solutions, we get
the other solution. Such numbers are called complex conjugates, and the process
of replacing i with − i is called complex conjugation. The complex conjugate of 𝑧
is denoted 𝑧∗ :
𝑧 = 𝑎 + i 𝑏 ⟹ 𝑧 ∗ = 𝑎 − i 𝑏. (3.12)

Of course, the conjugate of the conjugate is the original number:



(𝑧∗ ) = 𝑧. (3.13)

This means that the complex conjugation operation is an involution, that is, its
own inverse.
Complex conjugation allows us to write a general formula for the real or imaginary
parts of a complex number, denoted Re 𝑧 and Im 𝑧 respectively:

𝑧 + 𝑧∗ 𝑧 − 𝑧∗
Re 𝑧 ≡ , Im 𝑧 ≡ . (3.14)
2 2i
You can check that if 𝑧 = 𝑎 + i 𝑏 then we get Re 𝑧 = 𝑎 and Im 𝑧 = 𝑏, as expected.

Exercise 3.3. What are the real and imaginary parts of 4−7 i? What is its complex
conjugate?

Problem 3.4. If a number is the complex conjugate of itself, can you say anything
interesting about that number? What about if a number is minus the complex
conjugate of itself?

3.1.3 The Complex Plane and Real 2­Vectors

Recall that the field of real numbers ℝ is geometrically a line. The space ℝ𝑛 is
an 𝑛­dimensional space which is home to real 𝑛­vectors, that is, ordered lists of
𝑛 real numbers of the form (𝑣1 , … , 𝑣𝑛 ). In particular, ℝ2 is geometrically a plane,
with vectors of the form (𝑥, 𝑦).
The complex plane ℂ is similar to ℝ2 , except that instead of the 𝑥 and 𝑦 axes we
have the real and imaginary axes respectively. The real unit 1, which squares
to +1, defines the positive direction of the real axis, while the imaginary unit i,

19
Im

z = a + i b = eiϕ
b

r = |z|

ϕ
Re
a

-b
z* = a - i b = e-iϕ

Figure 3.1: The complex plane, with a complex number 𝑧 = 𝑎 + i 𝑏 and


its conjugate 𝑧∗ = 𝑎 − i 𝑏. Also shown is the polar representation of both
numbers (see section 3.1.4).

which squares to −1, defines the positive direction of the imaginary axis. This is
illustrated in figure 3.1.
Since ℂ is a plane, we can define vectors on it, just like on ℝ2 . A real 2­vector
(𝑎, 𝑏) is an arrow in ℝ2 which points from the origin (0, 0) to the point that is 𝑎 steps
in the direction of the 𝑥 axis and 𝑏 steps in the direction of the 𝑦 axis. A complex
number 𝑧 = 𝑎 + i 𝑏 is similarly an arrow in ℂ which points from the origin 0 to the
point that is 𝑎 steps along the real axis and 𝑏 steps along the imaginary axis.
The complex conjugate 𝑧∗ = 𝑎−i 𝑏 is obtained by replacing i with − i. Since i defines
the direction of the imaginary axis, this is equivalent to flipping the imaginary axis.
In other words, 𝑧∗ is the reflection of 𝑧 along the real axis, as shown in figure 3.1.

20
From the Pythagorean theorem, we know that the magnitude (or length) of the

real 2­vector (𝑎, 𝑏) is 𝑎2 + 𝑏2 . The magnitude or absolute value |𝑧| of the complex

number 𝑧 = 𝑎 + i 𝑏 is also 𝑎2 + 𝑏2 . (Inspect figure 3.1 to see how the Pythagorean
theorem fits in.) Furthermore, since 𝑧∗ is just a reflection of 𝑧, they both have
the same magnitude. A convenient way to calculate the magnitude of either 𝑧 or
𝑧∗ it to multiply them with each other:
2 2 2
|𝑧| = |𝑧 ∗ | ≡ 𝑧 ∗ 𝑧 = (𝑎 + i 𝑏) (𝑎 − i 𝑏) = 𝑎2 − i 𝑏2 = 𝑎2 + 𝑏2 , (3.15)

so
|𝑧| = |𝑧 ∗ | = √𝑎2 + 𝑏2 . (3.16)

For an abstract complex number (where we don’t necessarily know the explicit
values of the real and imaginary parts) one can also write

2 2
|𝑧| = |𝑧 ∗ | = √(Re 𝑧) + (Im 𝑧) . (3.17)

We note that there is an isomorphism between complex numbers and real 2­


vectors. An isomorphism between two spaces is a mapping between the spaces
that can be taken in either direction (i.e. is invertible), and preserves the structure
of each space. The isomorphism between ℂ and ℝ2 is given by:

𝑎 + i 𝑏 ⟷ (𝑎, 𝑏) . (3.18)

We have already seen that the norm operation is preserved. Similarly, addition
of complex numbers

(𝑎 + i 𝑏) + (𝑐 + i 𝑑) = (𝑎 + 𝑐) + i (𝑏 + 𝑑) . (3.19)

maps into addition of 2­vectors

(𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 + 𝑑) . (3.20)

Exercise 3.5. Let 𝑧 = 5 + 6 i and 𝑤 = 7 + 8 i.


A. Calculate 𝑧∗ , 𝑤∗ , |𝑧|, |𝑤|, 𝑧 + 𝑤, 𝑧 − 𝑤, |𝑧 + 𝑤|, |𝑧 − 𝑤|, and 𝑧𝑤.
B. Find the 2­vectors isomorphic to 𝑧 and 𝑤.

Problem 3.6. Show that multiplications of a vector by a real number and reflec­
tion of a vector with respect to the 𝑥 and 𝑦 axes map to equivalent operations on
the corresponding complex numbers.

21
3.1.4 Polar Coordinates and Complex Phases

A vector in ℝ2 can be converted from Cartesian coordinates (𝑥, 𝑦) to polar coordi­


nates (𝑟, 𝜙). The 𝑟 coordinate is the magnitude of the vector, and the 𝜙 coordinate
is the angle that the vector makes with respect to the 𝑥 axis. The relation between
the coordinate systems is given by

𝑥 = 𝑟 cos 𝜙, 𝑦 = 𝑟 sin 𝜙, (3.21)


𝑦
𝑟 = √𝑥2 + 𝑦2 , 𝜙 = arctan . (3.22)
𝑥
This simply follows from the definitions of cos 𝜙 and sin 𝜙, since the vector creates
a right triangle with the 𝑥 axis (see figure 3.1). For example, the vector (𝑥, 𝑦) =

(1, 3) in Cartesian coordinates corresponds to 𝑟 = 2 and 𝜙 = 𝜋/3.
𝑥 and 𝑦 can be any real numbers, but 𝑟 must be non­negative and 𝜙 must be
in the range (−𝜋, 𝜋] (in radians) where 𝜙 = 0 corresponds to the 𝑥 axis. However,
there is a subtlety here: the range of the arctan function is (−𝜋/2, 𝜋, 2), so 𝜙 needs
to be further adjusted according to the quadrant. One can instead use a more
complicated definition that automatically takes the quadrant into account:

⎧arctan( 𝑦 ) if 𝑥 > 0,
{ 𝑥
{arctan( 𝑦 ) + 𝜋 if 𝑥 < 0 and 𝑦 ≥ 0,
{ 𝑥
{
{arctan( 𝑦 ) − 𝜋 if 𝑥 < 0 and 𝑦 < 0,
𝑥
𝜙=⎨ (3.23)
𝜋
+
{ 2 if 𝑥 = 0 and 𝑦 > 0,
{ 𝜋
{− 2 if 𝑥 = 0 and 𝑦 < 0,
{
{undefined if 𝑥 = 0 and 𝑦 = 0.

This function is sometimes called atan2 (𝑥, 𝑦), and it is implemented in most pro­
gramming languages. Note that 𝜙 is undefined at the origin since a vector of
length zero does not point in any direction.
Given that complex numbers are isomorphic to real 2­vectors, we should be able
to write complex numbers in polar coordinates as well. Looking at equation (3.21),
and replacing 𝑥 and 𝑦 with 𝑎 and 𝑏, we see that

𝑧 = 𝑎 + i 𝑏 = 𝑟 (cos 𝜙 + i sin 𝜙) . (3.24)

We can write this more compactly using Euler’s formula:

ei 𝜙 = cos 𝜙 + i sin 𝜙 ⟹ 𝑧 = 𝑟 ei 𝜙 . (3.25)

This is illustrated in figure 3.1. In this context, the angle 𝜙 is called the complex
phase. It is of extreme importance in quantum mechanics, as we shall see.

22
Exercise 3.7. Write 2 i −3 in polar coordinates.
Problem 3.8. Prove, using Euler’s formula, that ∣ei 𝜙 ∣ = 1, that is, the magnitude
of the complex number ei 𝜙 is 1. If 𝑧 = 𝑟 ei 𝜙 , what is |𝑧|?
Problem 3.9. Prove Euler’s formula. (You may need to use some calculus.)

3.2 Linear Algebra

The most important and fundamental mathematical structure in quantum theory


is the Hilbert space, a type of complex vector space. In this section we will define
Hilbert spaces and learn about many important concept and results from linear
algebra that apply to them.

3.2.1 Complex Vector Spaces

A real 𝑛­vector is an ordered list of 𝑛 real numbers. Analogously, a complex 𝑛­


vector is an ordered list of 𝑛 complex numbers. For example, a complex 2­vector
with two complex components Ψ1 and Ψ2 is written as:

Ψ1
|Ψ⟩ ≡ ( ). (3.26)
Ψ2

The notation |Ψ⟩ is unique to quantum mechanics, and it is called bra­ket


notation or sometimes Dirac notation. In this notation, we write a straight line
| and an angle bracket ⟩, and between them, a label. We will usually denote
a general vector with the label Ψ; this label, and its lowercase counterpart 𝜓,
are very commonly used in quantum mechanics. However, we can use whatever
label we want to describe our vector – including letters, numbers, symbols, or
even whole words and sentences, for example:

|𝐴⟩ , |𝛽⟩ , |3⟩ , |♣⟩ , |Bob⟩ , |Schrödinger’s Cat Is Alive⟩ , … (3.27)

This is a great advantage of the bra­ket notation, as it allows us to be very de­


scriptive in the labels we choose for our vectors – which we can’t do with the
notation v or 𝑣 ⃗ commonly used for vectors in mathematics and physics.
A vector space 𝒱 over a field8 𝔽 is a set of vectors equipped with two operations:
addition of vectors and multiplication of vector by scalar, where a scalar is any
number from the field 𝔽. Vector addition must satisfy the following conditions:

1. Closed – the sum of two vectors is another vector in the same space:

∀ |Ψ⟩ , |Φ⟩ ∈ 𝒱 ∶ |Ψ⟩ + |Φ⟩ ∈ 𝒱. (3.28)


8
The field is usually taken to be ℝ or ℂ. Naturally, for a complex vector space, it will be ℂ.

23
2. Commutative – the order of vectors doesn’t matter:

∀ |Ψ⟩ , |Φ⟩ ∈ 𝒱 ∶ |Ψ⟩ + |Φ⟩ = |Φ⟩ + |Ψ⟩ . (3.29)

3. Associative – if three vectors are added, it doesn’t matter which two are
added first:

∀ |Ψ⟩ , |Φ⟩ , |Θ⟩ ∈ 𝒱 ∶ ( |Ψ⟩ + |Φ⟩) + |Θ⟩ = |Ψ⟩ + ( |Φ⟩ + |Θ⟩) . (3.30)

4. Identity vector or zero vector – there is a (unique) vector9 0 which, when


added to any vector, does not change it:

∃0 ∈ 𝒱 ∶ ∀ |Ψ⟩ ∈ 𝒱 ∶ |Ψ⟩ + 0 = |Ψ⟩ . (3.31)

5. Inverse vector – for every vector there exists another (unique) vector such
that the two vectors sum to the zero vector:

∀ |Ψ⟩ ∈ 𝒱 ∶ ∃ ( − |Ψ⟩) ∈ 𝒱 ∶ |Ψ⟩ + ( − |Ψ⟩) = 0. (3.32)

Furthermore, multiplication by a scalar must satisfy the following conditions:

1. Closed – the product of a vector and a scalar is a vector in the same space:

∀𝛼 ∈ 𝔽, ∀ |Ψ⟩ ∈ 𝒱 ∶ 𝛼 |Ψ⟩ ∈ 𝒱. (3.33)

2. Associative – if two scalars are multiplied by a vector, it doesn’t matter


whether we first multiply the two scalars or we first multiply one of the
scalars with the vector:

∀𝛼, 𝛽 ∈ 𝔽, ∀ |Ψ⟩ ∈ 𝒱 ∶ (𝛼𝛽) |Ψ⟩ = 𝛼 (𝛽 |Ψ⟩) . (3.34)

3. Distributive over addition of scalars:

∀𝛼, 𝛽 ∈ 𝔽, ∀ |Ψ⟩ ∈ 𝒱 ∶ (𝛼 + 𝛽) |Ψ⟩ = 𝛼 |Ψ⟩ + 𝛽 |Ψ⟩ . (3.35)

4. Distributive over addition of vectors:

∀𝛼 ∈ 𝔽, ∀ |Ψ⟩ , |Φ⟩ ∈ 𝒱 ∶ 𝛼 (|Ψ⟩ + |Φ⟩) = 𝛼 |Ψ⟩ + 𝛼 |Φ⟩ . (3.36)


9
Note that here we are using a slight abuse of notation by denoting the zero vector as the
number 0, instead of using bra­ket notation. The reason is that |0⟩ already has a special common
meaning in quantum mechanics, as we will see later; in the context of that special meaning, |0⟩ is
not the zero vector.

24
5. Identity scalar or unit scalar – there is a (unique) scalar 1 which, when
multiplied by any vector, does not change it:

∃1 ∈ 𝔽 ∶ ∀ |Ψ⟩ ∈ 𝒱 ∶ 1 |Ψ⟩ = |Ψ⟩ . (3.37)

We now define a 2­dimensional complex vector space, which we denote ℂ2 , as the


space of complex 2­vectors over ℂ, with addition of vectors given by

Ψ1 Φ1 Ψ1 + Φ 1
|Ψ⟩ ≡ ( ) ∈ ℂ2 , |Φ⟩ ≡ ( ) ∈ ℂ2 ⟹ |Ψ⟩+|Φ⟩ = ( ) , (3.38)
Ψ2 Φ2 Ψ2 + Φ 2

and multiplication of vector by scalar given by

Ψ1 𝜆Ψ1
|Ψ⟩ ≡ ( ) ∈ ℂ2 , 𝜆∈ℂ ⟹ 𝜆 |Ψ⟩ = ( ). (3.39)
Ψ2 𝜆Ψ2

The 𝑛­dimensional complex vector space ℂ𝑛 is defined analogously. In this course,


we will mostly focus on ℂ2 for simplicity, in particular when giving explicit exam­
ples.
Exercise 3.10. Let

3+i i −1
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ), 𝛼 = 7 i −2, 𝛽 = −4 − 8 i . (3.40)
−9 −10 i

Calculate 𝛼 |Ψ⟩ + 𝛽 |Φ⟩.


Problem 3.11. Check that the addition and multiplication as defined above in­
deed satisfy all of the required conditions for a vector space. You can do this just
for ℂ2 , for simplicity.

3.2.2 Dual Vectors, Inner Products, Norms, and Hilbert Spaces

A dual vector is defined by writing the vector as a row instead of a column, and
replacing each component with its complex conjugate. We denote the dual vector
of |Ψ⟩ as follows:
⟨Ψ| = ( Ψ∗1 Ψ∗2 ) . (3.41)

In terms of notation, there is now an opposite angle bracket ⟨ on the left of the
label, and the straight line | is on the right. Addition and multiplication by a scalar
are defined as for vectors, simply replacing columns with rows. However, you
may not add vectors and dual vectors together – adding a row to a column is
undefined!
If we are given a dual vector, we can take its dual to get a “normal” (column)
vector. In this case, the operation of taking the dual involves writing the vector as

25
a column instead of a row and taking the complex conjugates of the components.
This means that the operation of taking the dual is an involution – taking the dual

of a vector twice gives back the same vector, since (𝑧∗ ) = 𝑧.
Using dual vectors, we may define the inner product. This product allows us to
take a vector and a dual vector and produce a (complex) number out of them,
similarly to the dot product of real vectors10 . Importantly, the inner product only
works for one vector and one dual vector, not for two vectors or two dual vectors.
To calculate it, we multiply the components of both vectors one by one and add
them up:
Φ
⟨Ψ|Φ⟩ = ( Ψ∗1 Ψ∗2 ) ( 1 ) = Ψ∗1 Φ1 + Ψ∗2 Φ2 . (3.42)
Φ2

In bra­ket notation, vectors |Ψ⟩ are called “kets” and dual vectors ⟨Ψ| are called
“bras” . Then the notation for ⟨Ψ|Φ⟩ is called a “bra(c)ket”.
We define the norm­squared of a vector by taking its inner product with its dual
(“squaring” it):

2 Ψ1 2 2
‖Ψ‖ ≡ ⟨Ψ|Ψ⟩ = ( Ψ∗1 Ψ∗2 ) ( ) = |Ψ1 | + |Ψ2 | , (3.43)
Ψ2

where the magnitude­squared of a complex number 𝑧 was defined in section 3.1.3


2
as |𝑧| ≡ 𝑧 ∗ 𝑧. Then we can define the norm as the square root of the norm­squared:

2
‖Ψ‖ ≡ √‖Ψ‖ = √⟨Ψ|Ψ⟩. (3.44)

Observe how taking the dual of a vector generalizes taking the complex conjugate
of a number, and taking the norm of a vector generalizes taking the magnitude
of a number; indeed, for 1­dimensional vectors, these operations are the same!
A vector space with an inner product is called a Hilbert space, provided it is also a
complete metric space11 and that the inner product satisfies the same properties
(which you will derive in problems 3.13, 3.14, and 3.15) as the standard inner
10
The dot product of the real vectors v ≡ (𝑣1 , 𝑣2 ) and w ≡ (𝑤1 , 𝑤2 ) in ℝ2 is defined as v ⋅ w ≡
𝑣1 𝑤1 + 𝑣2 𝑤2 . In principle, this definition does secretly involve a dual (row) vector and a (column)
vector, but since we do not need to take the complex conjugate, we don’t really need to worry
about dual vectors. However, it is important to note that in real vector spaces with curvature, such
as those used in general relativity, the dot product must be replaced with a more complicated inner
product which involves the metric, and it again becomes crucial to distinguish vectors from dual
vectors – which in this context are also called contravariant and covariant vectors respectively.
11
A vector space is a complete metric space if whenever an infinite series of vectors |Ψ𝑖 ⟩ converges
absolutely, that is, the series of the norms of the vectors converges:

∑ ‖Ψ𝑖 ‖ < ∞, (3.45)
𝑖=0

then the series of the vectors themselves converges as well, to some vector |Ψ⟩ in the Hilbert

26
product on ℂ𝑛 . In particular, ℂ𝑛 itself is a Hilbert space, but there are many
other Hilbert spaces, some of them much more abstract. The usual notation for
a general Hilbert space is ℋ.

Exercise 3.12. Let

7 +7i −2 − 7 i
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ). (3.47)
−7 − 2 i i

Calculate ⟨Ψ|, ⟨Φ|, ‖Ψ‖, ‖Φ‖, ⟨Ψ|Φ⟩, and ⟨Φ|Ψ⟩.


2
Problem 3.13. Prove that the norm­squared ‖Ψ‖ is always non­negative, and it
is zero if and only if |Ψ⟩ is the zero vector, that is, the vector whose components
are all zero. In other words, the inner product is positive­definite. As a corollary,
explain why we must take the complex conjugate of the components when we
convert a vector to a dual vector. (What would have happened if we didn’t?)

Problem 3.14. Prove that ⟨Φ|Ψ⟩ = ⟨Ψ|Φ⟩∗ , that is, if we swap the order of vectors
in the inner product we get the complex conjugate of the original product. Thus,
unlike the dot product, the inner product on ℂ𝑛 is not symmetric. However, it
is conjugate­symmetric, and in particular, the magnitude of the inner product
remains the same, since |𝑧| = |𝑧 ∗ |.

Problem 3.15. Prove that if 𝛼, 𝛽 ∈ ℂ and |Ψ⟩ , |Φ⟩ , |Θ⟩ ∈ ℂ𝑛 then

⟨Ψ| (𝛼 |Φ⟩ + 𝛽 |Θ⟩) = 𝛼⟨Ψ|Φ⟩ + 𝛽⟨Ψ|Θ⟩, (3.48)

that is, the inner product is linear in its second argument.

3.2.3 Orthonormal Bases

An orthonormal basis of ℂ𝑛 is a set of 𝑛 non­zero vectors {|𝐵1 ⟩ , … , |𝐵𝑛 ⟩} – which


we will usually denote |𝐵𝑖 ⟩ for short, with the implication that 𝑖 ∈ {1, … , 𝑛} – such
that:

1. They span ℂ𝑛 , which means that any vector |Ψ⟩ ∈ ℂ𝑛 can be written uniquely
as a linear combination of the basis vectors, that is, a sum of the vectors
|𝐵𝑖 ⟩ multiplied by some complex numbers 𝜆𝑖 ∈ ℂ:
𝑛
|Ψ⟩ = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ . (3.49)
𝑖=1

space:

∑ |Ψ𝑖 ⟩ = |Ψ⟩ . (3.46)
𝑖=0

27
This property ensures that the basis can be used to define any single vector
in the space ℂ𝑛 , not just part of that space.
As a simple example, in ℝ3 the vector x̂ ≡ (1, 0, 0) pointing along the 𝑥 axis
and the vector ŷ ≡ (0, 1, 0) pointing along the 𝑦 axis span the 𝑥𝑦 plane, but
not all of ℝ3 . To get a basis for all of ℝ3 , we must add an appropriate third
vector, such as the vector ẑ ≡ (0, 0, 1) pointing along the 𝑧 axis. (But other
vectors, such as (1, 2, 3), would work as well.)

2. They are linearly independent, in that if the zero vector is a linear combina­
tion of the basis vectors, then the coefficients in the linear combination must
all be zero:
𝑛
∑ 𝜆𝑖 |𝐵𝑖 ⟩ = 0 ⟹ 𝜆𝑖 = 0, ∀𝑖. (3.50)
𝑖=1

Linear independence means (as you will show in problem 3.17) that no vec­
tor in the set can be written as a linear combination of the other vectors in
the set. If we could have done so, then that vector would have been redun­
dant, and we would have needed to remove it in order to obtain a basis.
As a simple example, the set composed of x,̂ y,̂ and (1, 2, 0) is linearly depen­
dent, since (1, 2, 0) = x̂ + 2y,̂ but the set {x,̂ y,̂ z}
̂ is linearly independent.

3. They are all orthogonal to each other, that is, the inner product of any two
different vectors evaluates to zero:

⟨𝐵𝑖 |𝐵𝑗 ⟩ = 0, ∀𝑖 ≠ 𝑗. (3.51)

4. They are all unit vectors, that is, they have a norm (and norm­squared) of
1:
2
‖𝐵𝑖 ‖ = ⟨𝐵𝑖 |𝐵𝑖 ⟩ = 1, ∀𝑖. (3.52)

In fact, properties 3 and 4 may be expressed more compactly as:

0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (3.53)
1 if 𝑖 = 𝑗,

where 𝛿𝑖𝑗 is called the Kronecker delta. If this combined property is satisfied, we
say that the vectors are orthonormal12 .
These requirements become much simpler in 𝑛 = 2 dimensions. An orthonormal
basis for ℂ2 is a set of 2 non­zero vectors |𝐵1 ⟩ , |𝐵2 ⟩ such that:

1. They span ℂ2 , which means that any vector |Ψ⟩ ∈ ℂ2 can be written as a
12
Actually, bases don’t have to be orthonormal in general, but in quantum mechanics they always
are, for reasons that will become clear later.

28
linear combination of the basis vectors:

|Ψ⟩ = 𝜆1 |𝐵1 ⟩ + 𝜆2 |𝐵2 ⟩ , (3.54)

for a unique choice of 𝜆1 , 𝜆2 ∈ ℂ.

2. They are linearly independent, which means that we cannot write one in
terms of a scalar times the other, i.e.:

|𝐵1 ⟩ ≠ 𝜆 |𝐵2 ⟩ , 𝜆 ∈ ℂ. (3.55)

3. They are orthonormal to each other, that is, the inner product between them
evaluates to zero and both of them have unit norm:

⟨𝐵1 |𝐵2 ⟩ = 0, (3.56)


2 2
‖𝐵1 ‖ = ⟨𝐵1 |𝐵1 ⟩ = 1, ‖𝐵2 ‖ = ⟨𝐵2 |𝐵2 ⟩ = 1. (3.57)

A very important basis, the standard basis of ℂ2 , is defined as:

1 0
|11 ⟩ ≡ ( ), |12 ⟩ ≡ ( ). (3.58)
0 1

We similarly define the standard basis of ℂ𝑛 for any 𝑛 in the obvious way.

Problem 3.16. Show that the standard basis vectors satisfy the properties above.

Problem 3.17. Show that linear independence means that no vector in the basis
can be written as a linear combination of the other vectors in the basis.

Problem 3.18. Any basis which is orthogonal but not orthonormal, that is, does
not satisfy property 4, can be made orthonormal by normalizing each basis vector,
that is, dividing it by its norm:

|𝐵𝑖 ⟩
|𝐵𝑖 ⟩ ↦ . (3.59)
‖𝐵𝑖 ‖

Show that if an orthogonal but not orthonormal basis satisfies properties 1­3,
then it still satisfies them after normalizing it in this way.

Exercise 3.19. Consider the complex vector

1+i
|Ψ⟩ ≡ ( ). (3.60)
2 + 2i

Normalize |Ψ⟩ and find another complex vector |Φ⟩ such that the set {|Ψ⟩ , |Φ⟩} is
a basis of ℂ2 (i.e. satisfies all of the properties above).

29
Problem 3.20. Find an orthonormal basis of ℂ3 which is not the standard basis
or a scalar multiple of the standard basis. Show that it is indeed an orthonormal
basis.

3.2.4 Matrices and the Adjoint

A matrix in 𝑛 dimensions is an 𝑛 × 𝑛 array13 of (complex) numbers. In 𝑛 = 2


dimensions we have

𝐴11 𝐴12
𝐴=( ), 𝐴11 , 𝐴12 , 𝐴21 , 𝐴22 ∈ ℂ. (3.61)
𝐴21 𝐴22

A matrix can act on a vector to produce another vector. If it’s a ket (a verti­
cal/column vector), the result is another ket. If it’s a bra (a horizontal/row dual
vector), the result is another bra.
If the matrix acts on a ket, then it must act from the left, and the element at
row 𝑖 of the resulting ket is obtained by taking the inner product of row 𝑖 of the
matrix with the ket:

𝐴11 𝐴12 Ψ 𝐴 Ψ + 𝐴12 Ψ2


𝐴 |Ψ⟩ = ( ) ( 1 ) = ( 11 1 ). (3.62)
𝐴21 𝐴22 Ψ2 𝐴21 Ψ1 + 𝐴22 Ψ2

If the matrix acts on a bra, then it must act from the right, and the element at
column 𝑖 of the resulting bra is obtained by taking the inner product of column
𝑖 of the matrix with the bra:

𝐴11 𝐴12
⟨Ψ| 𝐴 = ( Ψ∗1 Ψ∗2 ) ( ) = ( Ψ∗1 𝐴11 + Ψ∗2 𝐴21 Ψ∗1 𝐴12 + Ψ∗2 𝐴22 ) . (3.63)
𝐴21 𝐴22

Note that the dual vector ⟨Ψ| 𝐴 is not the dual of the vector 𝐴 |Ψ⟩, as you can see
by taking the dual of equation (3.62). However, we can define the adjoint of a
matrix by transposing rows into columns and then taking the complex conjugate
of all the components:
𝐴∗ 𝐴∗21
𝐴† = ( 11 ), (3.64)
𝐴∗12 𝐴∗22

where the notation † for the adjoint is called dagger. Then the vector dual to
𝐴 |Ψ⟩ is ⟨Ψ| 𝐴† , as you will check in problem 3.22. Actually, taking the adjoint of
a matrix is exactly the same operation as taking the dual of a vector! The only
difference is that for a matrix we have 𝑛 columns to transpose into rows, while
13
In fact, matrices don’t have to be square, they can have a different number of rows and columns,
that is, 𝑛 ×𝑚 where 𝑛 ≠ 𝑚; but non­square matrices are generally not of much interest in quantum
mechanics.

30
for a vector we only have one. Therefore, we have
† †
|Ψ⟩ = ⟨Ψ| , ⟨Ψ| = |Ψ⟩ , (3.65)

and we get the following nice relation:



(𝐴 |Ψ⟩) = ⟨Ψ| 𝐴† . (3.66)

The identity matrix, which we will write simply as 1, is:

1 0
1=( ). (3.67)
0 1

Acting with it on any vector or dual vector does not change it: 1 |Ψ⟩ = |Ψ⟩.

Problem 3.21. To rotate (real) vectors in ℝ2 by an angle 𝜃, we take their product


with the (real) rotation matrix:

cos 𝜃 − sin 𝜃
𝑅 (𝜃) ≡ ( ). (3.68)
sin 𝜃 cos 𝜃

A. Calculate the matrix 𝑅 (𝜋/3).


B. Write down the vector resulting from rotating (−5, 9) by 𝜋/3 radians, in both
Cartesian and polar coordinates.
C. Repeat (B) for rotating a general 2­vector (𝑥, 𝑦) by a general angle 𝜃.
D. Find the mapping between rotations of 2­vectors in ℝ2 and rotations of complex
numbers in ℂ, and explain what is the analogue of the rotation matrix in terms of
complex numbers.

Problem 3.22. Show that the vector dual to 𝐴 |Ψ⟩ is indeed ⟨Ψ| 𝐴† .

Exercise 3.23. Let

1 + 5i 2
𝐴≡( ), ⟨Ψ| ≡ ( i −2 i −3 ) . (3.69)
3 − 7i 4 + 8i

Calculate 𝐴 |Ψ⟩ and ⟨Ψ| 𝐴† separately, and then check that they are the dual of
each other.

Problem 3.24. Show that (𝐴† ) = 𝐴. This means that the adjoint operation is an
involution, exactly like complex conjugation and taking the dual of a vector. In
fact, all three are the exact same operation. By choosing an appropriate matrix,
explain how taking the complex conjugate of a number is a special case of taking
the adjoint of a matrix.

31
Problem 3.25. Show that the action of a matrix on a vector is linear, that is,

𝐴 (𝛼 |Ψ⟩ + 𝛽 |Φ⟩) = 𝛼𝐴 |Ψ⟩ + 𝛽𝐴 |Φ⟩ . (3.70)

3.2.5 The Outer Product

We have seen that vectors and dual vectors may be combined to generate a
complex number using the inner product. We can similarly combine a vector and
a dual vector to generate a matrix, using the outer product. Given

Φ1
⟨Ψ| ≡ ( Ψ∗1 Ψ∗2 ) , |Φ⟩ ≡ ( ), (3.71)
Φ2

we define the outer product as the matrix whose component at row 𝑖, column 𝑗 is
given by multiplying the component at row 𝑖 of |Φ⟩ with the component at column
𝑗 of ⟨Ψ|:
Φ Ψ∗ Φ Ψ∗2 Φ1
|Φ⟩⟨Ψ| = ( 1 ) ( Ψ∗1 Ψ∗2 ) = ( 1∗ 1 ). (3.72)
Φ2 Ψ1 Φ2 Ψ∗2 Φ2

Note how when taking an inner product the straight lines | face each other: ⟨Ψ|Φ⟩,
while when taking an outer product the angle brackets ⟩⟨ face each other. This
shows some of the elegance of the Dirac notation! A bra­ket is an inner product,
while a ket­bra is an outer product.
We can assign a rank to scalars, vectors, and matrices:

• Scalars have rank 0 since they have 𝑛0 = 1 component,

• Vectors have rank 1 since they have 𝑛1 = 𝑛 components,

• Matrices have rank 2 since they have 𝑛2 components.

Then the inner product reduces the rank of the vectors from 1 to 0, while the
outer product increases the rank from 1 to 2.

Exercise 3.26. Calculate the outer product |Ψ⟩⟨Φ| for

1 3−i
|Ψ⟩ = ( ), |Φ⟩ = ( ). (3.73)
2+i 4i

Remember that when writing the dual vector, the components are complex con­
jugated!

32
3.2.6 The Completeness Relation

Let us write the vector |Ψ⟩ as a linear combination of basis vectors:


𝑛
|Ψ⟩ = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ . (3.74)
𝑖=1

Taking the inner product of the above equation with ⟨𝐵𝑗 | and using the fact that
the basis vectors are orthonormal,

0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (3.75)
1 if 𝑖 = 𝑗,

we get:
𝑛 𝑛
⟨𝐵𝑗 |Ψ⟩ = ∑ 𝜆𝑖 ⟨𝐵𝑗 |𝐵𝑖 ⟩ = ∑ 𝜆𝑖 𝛿𝑖𝑗 = 𝜆𝑗 , (3.76)
𝑖=1 𝑖=1

since all of the terms in the sum vanish except the one with 𝑖 = 𝑗. Therefore, the
coefficients 𝜆𝑖 in equation (3.74) are given, for any vector |Ψ⟩ and for any basis
|𝐵𝑖 ⟩, by
𝜆𝑖 = ⟨𝐵𝑖 |Ψ⟩. (3.77)

Now, since 𝜆𝑖 is a scalar, and multiplication by a scalar is commutative (unlike the


inner and outer products!), we can move it to the right in equation (3.74):
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩ 𝜆𝑖 . (3.78)
𝑖=1

We haven’t actually done anything here; where to write the scalar, on the left or
right of the vector, is completely arbitrary – it’s just conventional to write it on
the left. Then, replacing 𝜆𝑖 with ⟨𝐵𝑖 |Ψ⟩ as per equation (3.77), we get
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (3.79)
𝑖=1

To make this even more suggestive, let us add parentheses:


𝑛
|Ψ⟩ = (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩. (3.80)
𝑖=1

Note that what we did here is go from a vector |𝐵𝑖 ⟩ times a complex number
⟨𝐵𝑖 |Ψ⟩ to a matrix |𝐵𝑖 ⟩⟨𝐵𝑖 | times a vector |Ψ⟩, for each 𝑖. The fact that these two
different products are actually equal to one another (as you will prove in problem
3.28) is not at all trivial, but it is one of the main reasons we like to use bra­ket

33
notation! The notation now suggests (see problem 3.29) that
𝑛
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 | = 1, (3.81)
𝑖=1

where |𝐵𝑖 ⟩⟨𝐵𝑖 | is the outer product defined above, and the 1 on the right­hand
side is the identity matrix. This extremely useful result is called the completeness
relation.
In ℂ2 , we simply have
|𝐵1 ⟩⟨𝐵1 | + |𝐵2 ⟩⟨𝐵2 | = 1. (3.82)

Exercise 3.27. Given the basis

1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ), (3.83)
2 1 2 −1

first show that it is indeed an orthonormal basis, and then show that it satisfies
the completeness relation given by equation (3.82).

Problem 3.28. Provide a rigorous proof that


𝑛 𝑛
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩ = (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩. (3.84)
𝑖=1 𝑖=1

This means that the product is associative.

Problem 3.29. Importantly, we didn’t “divide equation (3.80) by |Ψ⟩” to get


equation (3.81)! You can’t do that with matrices and vectors. Instead, equa­
tion (3.81) follows from the fact that any matrix 𝐴 which satisfies |Ψ⟩ = 𝐴 |Ψ⟩ for
every vector |Ψ⟩ must necessarily be the identity matrix. Prove this.

3.2.7 Representing Vectors in Different Bases

Let us consider a complex 𝑛­vector defined as follows:

Ψ1

⎜ ⎞

|Ψ⟩ ≡ ⎜ ⋮ ⎟, Ψ𝑖 ∈ ℂ. (3.85)
⎝ Ψ𝑛 ⎠

Given an orthonormal basis |𝐵𝑖 ⟩, we have seen that we can write |Ψ⟩ as a linear
combination of the basis vectors:
𝑛
|Ψ⟩ = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ . (3.86)
𝑖=1

34
The coefficients 𝜆𝑖 ∈ ℂ depend on |Ψ⟩ and on the basis vectors, as we showed in
equation (3.77):
𝑛
𝜆𝑖 ≡ ⟨𝐵𝑖 |Ψ⟩ ⟹ |Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩ (3.87)
𝑖=1

With these coefficients, we can represent the vector |Ψ⟩ in the basis |𝐵𝑖 ⟩. This
representation will be a vector of the same dimension 𝑛, with the components
being the coefficients 𝜆𝑖 = ⟨𝐵𝑖 |Ψ⟩, and will be denoted as follows:

⟨𝐵1 |Ψ⟩ 𝜆1
⎛ ⎞ ⎛ ⎞
|Ψ⟩ ∣ ≡ ⎜
⎜ ⋮ ⎟
⎟ ⎜ ⋮ ⎟
= ⎜ ⎟. (3.88)
𝐵 ⎝ ⟨𝐵 𝑛 |Ψ⟩ ⎠ ⎝ 𝜆𝑛 ⎠

We say that 𝜆𝑖 are the coordinates of |Ψ⟩ with respect to the basis |𝐵𝑖 ⟩.
The correct way to understand the meaning of a vector is as an abstract entity,
like an arrow in space, which does not depend on any particular basis – it is just
there. However, if we want to do concrete calculations with a vector, we must
somehow represent it numerically. This is done by choosing a basis and writing
down the coordinates of the vector in that basis.
Therefore, whenever we define a vector using its components – as we have been
doing throughout this chapter – there is always a specific basis in which the
vector is represented, with the components being the coordinates in this basis. If
no particular basis is explicitly specified, it is implied that it is the standard basis.
But no representation is better than the other; we usually choose whatever basis
is most convenient to work with. In quantum mechanics, we often choose a basis
defined by some physical observable, as we will see below.

Exercise 3.30. Let a vector |Ψ⟩ be represented in the standard basis as

1 − 9i
|Ψ⟩ ≡ ( ). (3.89)
7 i −2

Find its representation |Ψ⟩ ∣𝐵 in terms of the orthonormal basis

1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ). (3.90)
2 1 2 −1

Problem 3.31. Prove that the inner product (and thus also the norm) is indepen­
dent of the choice of basis. That is, for any two vectors |Ψ⟩ and |Φ⟩ and any two
bases |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩,

⟨Ψ|Φ⟩∣ = ⟨Ψ|Φ⟩∣ . (3.91)


𝐵 𝐶

35
3.2.8 Change of Basis

Let the representation of a vector |Ψ⟩ in the basis |𝐵𝑖 ⟩ be

⟨𝐵1 |Ψ⟩ 𝑛
⎛ ⎞
|Ψ⟩ ∣ = ⎜
⎜ ⋮ ⎟
⎟ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (3.92)
𝑖=1
𝐵 ⎝ ⟨𝐵𝑛 |Ψ⟩ ⎠

Given a different basis |𝐶𝑖 ⟩, we have a different representation

⟨𝐶1 |Ψ⟩ 𝑛

⎜ ⎞

|Ψ⟩ ∣ = ⎜ ⋮ ⎟ = ∑ |𝐶𝑖 ⟩⟨𝐶𝑖 |Ψ⟩. (3.93)
𝑖=1
𝐶 ⎝ ⟨𝐶𝑛 |Ψ⟩ ⎠

To find a relation between the two representations, we use the completeness


relation, equation (3.81):
𝑛
∑ |𝐵𝑗 ⟩⟨𝐵𝑗 | = 1. (3.94)
𝑗=1

Inserting it in the middle of the inner product representing the coordinates ⟨𝐶𝑖 |Ψ⟩,
we get that for all 𝑖
𝑛 𝑛
⟨𝐶𝑖 |Ψ⟩ = ⟨𝐶𝑖 | (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |) |Ψ⟩ = ∑⟨𝐶𝑖 |𝐵𝑗 ⟩⟨𝐵𝑗 |Ψ⟩. (3.95)
𝑗=1 𝑗=1

Again, the Dirac notation proves to be pretty convenient! This relation can be
expressed in matrix form as follows:

⟨𝐶1 |Ψ⟩ ⟨𝐶1 |𝐵1 ⟩ ⋯ ⟨𝐶1 |𝐵𝑛 ⟩ ⟨𝐵1 |Ψ⟩



⎜ ⎞
⎟ ⎛
⎜ ⎞
⎟ ⎛
⎜ ⎞

⎜ ⋮ ⎟=⎜ ⋮ ⋱ ⋮ ⎟⎜ ⋮ ⎟, (3.96)
⎝ ⟨𝐶𝑛 |Ψ⟩ ⎠ ⎝ ⟨𝐶𝑛 |𝐵1 ⟩ ⋯ ⟨𝐶𝑛 |𝐵𝑛 ⟩ ⎠ ⎝ ⟨𝐵𝑛 |Ψ⟩ ⎠

or in other words,

|Ψ⟩ ∣ = 𝑃𝐶←𝐵 |Ψ⟩ ∣ , (3.97)


𝐶 𝐵

where the change­of­basis matrix from |𝐵𝑖 ⟩ to |𝐶𝑖 ⟩, denoted 𝑃𝐶←𝐵 , is defined as

⟨𝐶1 |𝐵1 ⟩ ⋯ ⟨𝐶1 |𝐵𝑛 ⟩



⎜ ⎞

𝑃𝐶←𝐵 ≡⎜ ⋮ ⋱ ⋮ ⎟. (3.98)
⎝ ⟨𝐶𝑛 |𝐵1 ⟩ ⋯ ⟨𝐶𝑛 |𝐵𝑛 ⟩ ⎠

36
Exercise 3.32. Consider the two bases

1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ), (3.99)
2 1 2 −1

1 1 1 −i
|𝐶1 ⟩ = √ ( ) , |𝐶2 ⟩ = √ ( ). (3.100)
2 i 2 −1

A. The vector |Ψ⟩ is represented in the standard basis as

−3
|Ψ⟩ = ( ). (3.101)
2+i

Find its representations in the bases |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩.


B. Find the change­of­basis matrix 𝑃𝐶←𝐵 . Calculate 𝑃𝐶←𝐵 |Ψ⟩ ∣𝐵 and verify that
the result is equal to the expression you obtained in (A) for |Ψ⟩ ∣𝐶 .
𝑖

3.2.9 Multiplication and Inverse of Matrices

The matrix product of two matrices is another matrix. The element of that matrix
at row 𝑖, column 𝑗 is calculated by taking the inner product of row 𝑖 of the left
matrix with column 𝑗 of the right matrix:

𝐴11 𝐴12 𝐵 𝐵12 𝐴 𝐵 + 𝐴12 𝐵21 𝐴11 𝐵12 + 𝐴12 𝐵22


𝐴𝐵 = ( ) ( 11 ) = ( 11 11 ).
𝐴21 𝐴22 𝐵21 𝐵22 𝐴21 𝐵11 + 𝐴22 𝐵21 𝐴21 𝐵12 + 𝐴22 𝐵22
(3.102)
Observe that the action of a matrix on a vector, and the inner and outer products
of vectors, are all just special cases of matrix multiplication – where a ket is a
matrix with only one column, and a bra is a matrix with only one row!
Given a matrix 𝐴, if there exists another matrix 𝐴−1 such that

𝐴−1 𝐴 = 𝐴𝐴−1 = 1, (3.103)

then the matrix 𝐴 is called invertible and 𝐴−1 is called its inverse matrix. Note that
−1
(𝐴−1 ) = 𝐴, so the operation of taking the inverse is an involution. Sometimes
matrices do not have an inverse; such matrices are called singular.

Exercise 3.33. Calculate the products 𝐴𝐵 and 𝐵𝐴 where:

−1 3 9 − 8i 7
𝐴≡( ), 𝐵≡( ). (3.104)
−6 i 2 i −1 4i −2 i

37
Problem 3.34. Find a general formula for the inverse of a 2 × 2 matrix by taking

𝑎 𝑏 𝑒 𝑓
𝐴≡( ), 𝐴−1 ≡ ( ), (3.105)
𝑐 𝑑 𝑔 ℎ

and solving for 𝑒, 𝑓, 𝑔, ℎ in terms of 𝑎, 𝑏, 𝑐, 𝑑.

Exercise 3.35. Find the inverse of the matrix

1 2 − 4i
𝐴≡( ). (3.106)
−i −2

You can use the formula you found in problem 3.34.


† −1
Problem 3.36. Show that (𝐴𝐵) = 𝐵† 𝐴† and (𝐴𝐵) = 𝐵−1 𝐴−1 for any two matri­
ces 𝐴 and 𝐵.

Problem 3.37. Matrix multiplication is not commutative in general. That is, for
two arbitrary matrices 𝐴 and 𝐵, it is not in general true that 𝐴𝐵 = 𝐵𝐴. Find
an example of two matrices which commute, and an example of two matrices
which do not commute. In each case, show that they indeed commute or don’t
commute.

Problem 3.38. Show that multiplying by a scalar 𝜆 ∈ ℂ is the same as multiplying


by a matrix with all of its elements equal to zero except for the elements on the
diagonal, which are all equal to 𝜆:

𝜆 0
𝜆𝐴 = ( ) 𝐴. (3.107)
0 𝜆

This is also known as a scalar matrix.

Problem 3.39. Given two bases |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩, show that the change­of­basis
matrix 𝑃𝐵←𝐶 is the inverse of the change­of­basis matrix in the other direction,
𝑃𝐶←𝐵 .

3.2.10 Matrices Inside Inner Products

Since 𝐴 |Φ⟩ is itself a vector, we may calculate the inner product of that vector
with the dual vector ⟨Ψ|, which as usual gives us a complex number:

𝐴11 𝐴12 Φ
⟨Ψ|𝐴|Φ⟩ = ( Ψ∗1 Ψ∗2 ) ( )( 1 )
𝐴21 𝐴22 Φ2
= Ψ∗1 𝐴11 Φ1 + Ψ∗2 𝐴21 Φ1 + Ψ∗1 𝐴12 Φ2 + Ψ∗2 𝐴22 Φ2 .

38
If we take the dual of 𝐴 |Φ⟩ we get ⟨Φ| 𝐴† , as you proved in problem 3.22. Thus,
inverting the order of the inner product, we get

𝐴∗11 𝐴∗21 Ψ1
⟨Φ|𝐴† |Ψ⟩ = ( Φ∗1 Φ∗2 ) ( ∗ ∗ )( )
𝐴12 𝐴22 Ψ2
= Ψ1 𝐴∗11 Φ∗1 + Ψ2 𝐴∗21 Φ∗1 + Ψ1 𝐴∗12 Φ∗2 + Ψ2 𝐴∗22 Φ∗2 .

This is, of course, the complex conjugate of ⟨Ψ|𝐴|Φ⟩, since inverting the order of
the inner product results in the complex conjugate. In other words, we have the
relation
⟨Ψ|𝐴|Φ⟩∗ = ⟨Φ|𝐴† |Ψ⟩. (3.108)

Taking the complex conjugate reverses the order of the inner product, and also
replaces the matrix with its adjoint.

Exercise 3.40. Calculate the inner product ⟨Ψ|𝐴|Φ⟩ where

5 + 2i 9 8i 3 + 4i
|Ψ⟩ = ( ), 𝐴=( ), |Φ⟩ = ( ). (3.109)
−3 i 6i 5 − 4i 2

3.2.11 Eigenvalues and Eigenvectors

If the matrix 𝐴, acting on the (non­zero) vector |Ψ⟩, results in a scalar multiple
of |Ψ⟩:
𝐴 |Ψ⟩ = 𝜆 |Ψ⟩ , 𝜆 ∈ ℂ, (3.110)

then we call |Ψ⟩ an eigenvector of 𝐴 and 𝜆 its eigenvalue. Note that |Ψ⟩ cannot
be the zero vector, but 𝜆 can be zero.
For example, if
1 0
𝐴=( ), (3.111)
0 −1

then it’s easy to see that


1
|Ψ⟩ = ( ) (3.112)
0

is an eigenvector with eigenvalue +1 and

0
|Φ⟩ = ( ) (3.113)
1

is an eigenvector with eigenvalue −1:

𝐴 |Ψ⟩ = |Ψ⟩ , 𝐴 |Φ⟩ = − |Φ⟩ . (3.114)

39
Exercise 3.41. The matrix
1 2
𝐴≡( ) (3.115)
2 1

has two eigenvectors. Find them and their corresponding eigenvalues.

Problem 3.42. Prove that, if |Ψ⟩ is an eigenvector of a matrix 𝐴, then 𝛼 |Ψ⟩ is


also an eigenvector of 𝐴 for any 𝛼 ∈ ℂ, and it has the same eigenvalue.

3.2.12 Hermitian Matrices

A matrix 𝐴 is called Hermitian if it’s equal to its adjoint:

𝐴 = 𝐴† . (3.116)

Thus, it is sometimes also referred to as a self­adjoint matrix. For such a matrix,


we have that
⟨Ψ|𝐴|Φ⟩∗ = ⟨Φ|𝐴|Ψ⟩. (3.117)

A Hermitian matrix is analogous to a real number, since 𝑧 = 𝑧 ∗ implies that 𝑧 is


real.
The eigenvalues of a Hermitian matrices must all be real. To see this, let 𝜆 be an
eigenvalue of the Hermitian matrix 𝐴 with the eigenvector |Ψ⟩:

𝐴 |Ψ⟩ = 𝜆 |Ψ⟩ . (3.118)

Then we can take the inner product of both sides with ⟨Ψ|:
2
⟨Ψ|𝐴|Ψ⟩ = ⟨Ψ|𝜆|Ψ⟩ = 𝜆⟨Ψ|Ψ⟩ = 𝜆 ‖Ψ‖ , (3.119)

where were able to move 𝜆 out of the inner product because it’s just a number.
From equation (3.117), we have:

⟨Ψ|𝐴|Ψ⟩ = ⟨Ψ|𝐴|Ψ⟩∗ , (3.120)


2
so ⟨Ψ|𝐴|Ψ⟩ is real. Since ‖Ψ‖ is also real – and non­zero, since |Ψ⟩ is an eigenvector,
so by definition it cannot be the zero vector – we conclude that 𝜆 must be real.
Now, let |Ψ⟩ and |Φ⟩ be two eigenvectors of 𝐴 corresponding to different eigenval­
ues 𝜆 and 𝜇 respectively:

𝐴 |Ψ⟩ = 𝜆 |Ψ⟩ , 𝐴 |Φ⟩ = 𝜇 |Φ⟩ , 𝜆 ≠ 𝜇. (3.121)

Let us take the inner product of the first equation with ⟨Φ| and of the second
equation with ⟨Ψ|:
⟨Φ|𝐴|Ψ⟩ = ⟨Φ|𝜆|Ψ⟩ = 𝜆⟨Φ|Ψ⟩, (3.122)

40
⟨Ψ|𝐴|Φ⟩ = ⟨Ψ|𝜇|Φ⟩ = 𝜇⟨Ψ|Φ⟩. (3.123)

From equation (3.117), the first equation is the complex conjugate of the second
equation. Since 𝜆 must be real – as we just proved – we get

𝜇⟨Ψ|Φ⟩ = (𝜆⟨Φ|Ψ⟩) = 𝜆⟨Ψ|Φ⟩. (3.124)

Seeing that 𝜆 ≠ 𝜇 by our assumption, this equation can only be true if

⟨Ψ|Φ⟩ = 0. (3.125)

In other words, eigenvectors of a Hermitian matrix corresponding to different


eigenvalues are orthogonal. Now, since an eigenvector multiplied by a scalar is
still an eigenvector, the eigenvectors |Ψ⟩ and |Φ⟩ can be divided by their norms,
so that they are not only orthogonal but also orthonormal.
Moreover, one can prove that for any Hermitian matrix 𝐴 in ℂ𝑛 , there is an or­
thonormal basis of ℂ𝑛 consisting of eigenvectors of 𝐴. Such a basis is called an
orthonormal eigenbasis. The proof requires some slightly more advanced tools
from linear algebra, so we won’t write it here. However, this theorem is extremely
important in quantum theory. As we will see, Hermitian matrices represent phys­
ical observables in quantum theory, and their eigenvalues correspond to the pos­
sible values obtained by performing measurements on these observables. The
fact that there is an orthonormal basis of eigenvectors will prove very useful for
studying observables in quantum theory.

Problem 3.43. Let 𝐴 and 𝐵 be Hermitian matrices. Under what conditions is the
product 𝐴𝐵 Hermitian?

Exercise 3.44. Consider the matrix

0 2i
𝐴≡( ). (3.126)
𝑐 0

A. Find the value of 𝑐 for which 𝐴 is a Hermitian matrix.


B. Find the eigenvalues of 𝐴 with the value of 𝑐 that you found in (A).
C. Find an orthonormal eigenbasis of 𝐴 with the value of 𝑐 that you found in (A).
Show that it is indeed orthonormal.

Problem 3.45. Find the most general 2 × 2 Hermitian matrix by demanding that
𝐴 = 𝐴† and finding conditions on the components of 𝐴.

41
3.2.13 Unitary Matrices

A matrix 𝑈 is called unitary if its adjoint is also its inverse:

𝑈 −1 = 𝑈 † ⟹ 𝑈 𝑈 † = 𝑈 † 𝑈 = 1. (3.127)

A unitary matrix is analogous to a complex number with norm 1, since such a


2
number satisfies 𝑧∗ 𝑧 = 𝑧𝑧 ∗ = |𝑧| = 1.
Acting with a unitary matrix on two vectors preserves their inner product. To see
this, consider a unitary matrix 𝑈 and two vectors |Ψ⟩ and |Φ⟩. If we act with 𝑈 on
both vectors, we get 𝑈 |Ψ⟩ and 𝑈 |Φ⟩. Taking the bra of the ket 𝑈 |Ψ⟩, we obtain

(𝑈 |Ψ⟩) = ⟨Ψ| 𝑈 † . (3.128)

If we take the inner product of 𝑈 |Ψ⟩ and 𝑈 |Φ⟩, we get

⟨Ψ|𝑈 † 𝑈 |Φ⟩ = ⟨Ψ|Φ⟩, (3.129)

since 𝑈 † 𝑈 = 1. Therefore, the inner product of these two vectors is the same
before and after acting on them with 𝑈 .
Now, let 𝜆 be an eigenvalue of the unitary matrix 𝑈 with the eigenvector |Ψ⟩:

𝑈 |Ψ⟩ = 𝜆 |Ψ⟩ . (3.130)

Taking the adjoint of both sides, we get

⟨Ψ| 𝑈 † = ⟨Ψ| 𝜆∗ . (3.131)

Multiplying both equations together, we have

⟨Ψ| 𝑈 † 𝑈 |Ψ⟩ = ⟨Ψ| 𝜆∗ 𝜆 |Ψ⟩ . (3.132)


2
On the left­hand side 𝑈 † 𝑈 = 1, and on the right­hand side 𝜆∗ 𝜆 = |𝜆| , so
2 2 2 2
⟨Ψ|Ψ⟩ = |𝜆| ⟨Ψ|Ψ⟩ ⟹ ‖Ψ‖ = |𝜆| ‖Ψ‖ . (3.133)
2
Since |Ψ⟩ is an eigenvector, it is not the zero vector, and ‖Ψ‖ ≠ 0. Hence, we
conclude that the eigenvalues of a unitary matrix must all have magnitude 1.
This means that they lie on the unit circle of the complex plane, and are of the
form 𝑧 = ei 𝜙 for some 𝜙 ∈ ℝ.
As with Hermitian matrices, eigenvectors of a unitary matrix corresponding to
different eigenvalues are orthogonal (you will prove this in problem 3.48), and
you can always find an orthonormal eigenbasis of eigenvectors of a unitary matrix.
Problem 3.46. Find the most general 2 × 2 unitary matrix by demanding that

42
𝑈 −1 = 𝑈 † or 𝑈 𝑈 † = 𝑈 † 𝑈 = 1 and finding conditions on the components of 𝑈 .

Problem 3.47. Find three 2 × 2 matrices that are both Hermitian and unitary
(other than the identity matrix).

Problem 3.48. Prove that eigenvectors of a unitary matrix corresponding to dif­


ferent eigenvalues are orthogonal.

Problem 3.49. Prove that the columns of a unitary matrix, treated as kets, form
an orthonormal basis on ℂ𝑛 . Then prove that the same is true for the rows of a
unitary matrix, treated as bras.

3.2.14 Normal Matrices

A normal matrix is a matrix 𝐴 which satisfies 𝐴† 𝐴 = 𝐴𝐴† . Observe that a normal


matrix is analogous to a complex number 𝑧, since such a number trivially satis­
fies 𝑧∗ 𝑧 = 𝑧𝑧 ∗ . It is easy to see that Hermitian matrices, which satisfy 𝐴† = 𝐴,
and unitary matrices, which satisfy 𝐴† = 𝐴−1 , are both special cases of normal
matrices.
If 𝐴 is normal and all of its eigenvalues are real, then it is Hermitian. If 𝐴 normal
and all of its eigenvalues have unit magnitude, then it is unitary. Furthermore, it
turns out that the condition that the matrix has an orthonormal eigenbasis applies
not just to Hermitian and unitary matrices, but in general to any normal matrix;
in fact, it is true if and only if the matrix is normal.

Problem 3.50. Let 𝐴 and 𝐵 be normal matrices. Under which condition are 𝐴𝐵
and 𝐴 + 𝐵 also normal?

3.2.15 Representing Matrices in Different Bases

In section 3.2.7 we saw that vectors are abstract entities which can have different
representations in different bases. The same is true for matrices. Consider a
matrix 𝐴 and a basis |𝐵𝑖 ⟩. Inserting the completeness relation (3.81) twice, one
time on each side of 𝐴, we get:
𝑛 𝑛
𝐴 = (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) 𝐴 (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |)
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩⟨𝐵𝑗 |
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ (𝐴𝑖𝑗 ) |𝐵𝑖 ⟩⟨𝐵𝑗 |,
𝐵
𝑖=1 𝑗=1

43
where14
(𝐴𝑖𝑗 ) ≡ ⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩ ∈ ℂ, 𝑖, 𝑗 ∈ {1, … , 𝑛} (3.134)
𝐵

are the coordinates of 𝐴 in the basis |𝐵𝑖 ⟩.


We have obtained a sum over outer products of the form |𝐵𝑖 ⟩⟨𝐵𝑗 |. Recall that
the outer product of two vectors is a matrix; thus |𝐵𝑖 ⟩⟨𝐵𝑗 | can be thought of
as “basis matrices”, in analogy with basis vectors. The representation of 𝐴 in
terms of a linear combination of these “basis matrices” is called the outer product
representation of 𝐴, and it is very useful in quantum theory.
We can also write this representation in matrix form as

⟨𝐵1 |𝐴|𝐵1 ⟩ ⋯ ⟨𝐵1 |𝐴|𝐵𝑛 ⟩



⎜ ⎞

(𝐴)𝐵 = ⎜ ⋮ ⋱ ⋮ ⎟. (3.135)
⎝ ⟨𝐵 𝑛 |𝐴|𝐵1 ⟩ ⋯ ⟨𝐵 𝑛 |𝐴|𝐵𝑛⟩ ⎠

In another basis |𝐶𝑖 ⟩, the matrix 𝐴 will have the representation

𝑛 𝑛 ⟨𝐶1 |𝐴|𝐶1 ⟩ ⋯ ⟨𝐶1 |𝐴|𝐶𝑛 ⟩


⎛ ⎞
(𝐴)𝐶 = ∑ ∑ (𝐴𝑖𝑗 ) |𝐶𝑖 ⟩⟨𝐶𝑗 | = ⎜
⎜ ⋮ ⋱ ⋮ ⎟
⎟, (3.136)
𝐶
𝑖=1 𝑗=1
⎝ ⟨𝐶𝑛 |𝐴|𝐶1 ⟩ ⋯ ⟨𝐶𝑛 |𝐴|𝐶𝑛 ⟩ ⎠

where now the coordinates (𝐴𝑖𝑗 ) are given by


𝐶

(𝐴𝑖𝑗 ) ≡ ⟨𝐶𝑖 |𝐴|𝐶𝑗 ⟩ ∈ ℂ, 𝑖, 𝑗 ∈ {1, … , 𝑛} . (3.137)


𝐶

Inserting the completeness relation (3.81) into the coordinates twice, similarly to
what we did above, we get
𝑛 𝑛
⟨𝐶𝑖 |𝐴|𝐶𝑗 ⟩ = ⟨𝐶𝑖 | (∑ |𝐵𝑘 ⟩⟨𝐵𝑘 |) 𝐴 (∑ |𝐵ℓ ⟩⟨𝐵ℓ |) |𝐶𝑗 ⟩
𝑘=1 ℓ=1
𝑛 𝑛
= ∑ ∑⟨𝐶𝑖 |𝐵𝑘 ⟩⟨𝐵𝑘 |𝐴|𝐵ℓ ⟩⟨𝐵ℓ |𝐶𝑗 ⟩.
𝑘=1 ℓ=1

Problem 3.51. Show that this relation can be written in matrix form as follows:

⟨𝐶1 |𝐴|𝐶1 ⟩ ⋯ ⟨𝐶1 |𝐴|𝐶𝑛 ⟩



⎜ ⎞

⎜ ⋮ ⋱ ⋮ ⎟=
⎝ ⟨𝐶𝑛 |𝐴|𝐶1 ⟩ ⋯ ⟨𝐶𝑛 |𝐴|𝐶𝑛 ⟩ ⎠
⟨𝐶1 |𝐵1 ⟩ ⋯ ⟨𝐶1 |𝐵𝑛 ⟩ ⟨𝐵1 |𝐴|𝐵1 ⟩ ⋯ ⟨𝐵1 |𝐴|𝐵𝑛 ⟩ ⟨𝐵1 |𝐶1 ⟩ ⋯ ⟨𝐵1 |𝐶𝑛 ⟩

⎜ ⎞
⎟ ⎛
⎜ ⎞
⎟ ⎛
⎜ ⎞

=⎜ ⋮ ⋱ ⋮ ⎟⎜ ⋮ ⋱ ⋮ ⎟⎜ ⋮ ⋱ ⋮ ⎟,
⎝ ⟨𝐶𝑛 |𝐵1 ⟩ ⋯ ⟨𝐶𝑛 |𝐵𝑛 ⟩ ⎠ ⎝ ⟨𝐵𝑛 |𝐴|𝐵1 ⟩ ⋯ ⟨𝐵𝑛 |𝐴|𝐵𝑛 ⟩ ⎠ ⎝ ⟨𝐵𝑛 |𝐶1 ⟩ ⋯ ⟨𝐵𝑛 |𝐶𝑛 ⟩ ⎠
14
Note that we could move (𝐴𝑖𝑗 ) to the left since it is a scalar.
𝐵

44
and thus the relation between the representations of 𝐴 in different bases is given
by
(𝐴)𝐶 = 𝑃𝐶←𝐵 (𝐴)𝐵 𝑃𝐵←𝐶 , (3.138)

where
⟨𝐶1 |𝐵1 ⟩ ⋯ ⟨𝐶1 |𝐵𝑛 ⟩
⎛ ⎞
𝑃𝐶←𝐵 ≡⎜
⎜ ⋮ ⋱ ⋮ ⎟
⎟ (3.139)
⎝ ⟨𝐶𝑛 |𝐵1 ⟩ ⋯ ⟨𝐶𝑛 |𝐵𝑛 ⟩ ⎠
−1
is the change­of­basis matrix (3.98), and 𝑃𝐵←𝐶 = 𝑃𝐶←𝐵 . This is analogous to the
relation between vectors in different bases, |Ψ⟩ ∣ = 𝑃𝐶←𝐵 |Ψ⟩ ∣ .
𝐶 𝐵

Problem 3.52. Let 𝑈 be a unitary matrix and let |𝐵𝑖 ⟩ be an orthonormal basis.
A. Prove that |𝐶𝑖 ⟩ ≡ 𝑈 |𝐵𝑖 ⟩ is also an orthonormal basis.
B. Prove that 𝑈 has the outer product representation
𝑛
𝑈 = ∑ |𝐶𝑖 ⟩ ⟨𝐵𝑖 | . (3.140)
𝑖=1

C. Conversely, prove that if |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩ are two arbitrary orthonormal bases,
then the matrix 𝑈 defined by equation (3.140) is unitary.

3.2.16 Diagonalizable Matrices

A diagonal matrix is a matrix with all of its elements equal to zero except for the
elements on the diagonal, for example:

𝐷1 0
𝐷=( ). (3.141)
0 𝐷2

A matrix 𝐴 is called diagonalizable if there exists an invertible matrix 𝑃 such that


the matrix 𝑃 −1 𝐴𝑃 is diagonal. In quantum theory, we are mostly concerned with
the case where 𝑃 is also a unitary matrix, such that 𝑃 † 𝐴𝑃 is diagonal. It turns
out that a matrix 𝐴 is diagonalizable by a unitary matrix 𝑃 if and only if 𝐴 is
normal. This means, in particular, that both Hermitian and unitary matrices are
diagonalizable in such a way.
Let 𝐴 be a normal matrix with an orthonormal eigenbasis |𝐵𝑖 ⟩ with corresponding
eigenvalues 𝜆𝑖 :
𝐴 |𝐵𝑖 ⟩ = 𝜆𝑖 |𝐵𝑖 ⟩ , ∀𝑖. (3.142)

Now, consider the change­of­basis matrix (3.98), this time from the eigenbasis

45
|𝐵𝑖 ⟩, which is orthonormal to the standard basis |1𝑖 ⟩:

⟨11 |𝐵1 ⟩ ⋯ ⟨11 |𝐵𝑛 ⟩



⎜ ⎞

𝑃1←𝐵 ≡⎜ ⋮ ⋱ ⋮ ⎟. (3.143)
⎝ ⟨1𝑛 |𝐵1 ⟩ ⋯ ⟨1𝑛 |𝐵𝑛 ⟩ ⎠

Note that each eigenvector |𝐵𝑖 ⟩ is represented in the standard basis |1𝑖 ⟩ as follows:

⟨1 |𝐵 ⟩
⎛ 1 𝑖 ⎞
|𝐵𝑖 ⟩ ∣ = ⎜
⎜ ⋮ ⎟
⎟, ∀𝑖. (3.144)
1 ⎝ ⟨1𝑛 |𝐵𝑖 ⟩ ⎠

Hence, the columns of 𝑃1←𝐵 are in fact the eigenvectors |𝐵𝑖 ⟩ themselves, as ex­
pressed in the standard basis:

𝑃1←𝐵 = ( |𝐵1 ⟩ ⋯ |𝐵𝑛 ⟩ ) . (3.145)

Let us denote 𝑃 ≡ 𝑃1←𝐵 for short. Then

𝐴𝑃 = 𝐴 ( |𝐵1 ⟩ ⋯ |𝐵𝑛 ⟩ )

= ( 𝐴 |𝐵1 ⟩ ⋯ 𝐴 |𝐵𝑛 ⟩ )

= ( 𝜆1 |𝐵1 ⟩ ⋯ 𝜆𝑛 |𝐵𝑛 ⟩ ) .

How did we get this? Remember that in section 3.2.9 we said that the element
of the matrix 𝐴𝑃 at row 𝑖, column 𝑗 is calculated by taking the inner product of
row 𝑖 of 𝐴 with column 𝑗 of 𝑃 . But column 𝑗 of 𝑃 is just ∣𝐵𝑗 ⟩. The product 𝐴 |𝐵𝑖 ⟩
is another ket, whose rows are obtained by taking the inner product of each row
of 𝐴 with |𝐵𝑖 ⟩ respectively. The last equality follows from equation (3.142).
Next, we write the full matrix and decompose it into two matrices:

𝜆 ⟨1 |𝐵 ⟩ ⋯ 𝜆𝑛 ⟨11 |𝐵𝑛 ⟩ ⟨11 |𝐵1 ⟩ ⋯ ⟨11 |𝐵𝑛 ⟩𝜆1 0 0


⎛ 1 1 𝑖 ⎞ ⎛ ⎞ ⎛ ⎞
𝐴𝑃 = ⎜
⎜ ⋮ ⋱ ⋮ ⎟ =
⎟ ⎜ ⎜ ⋮ ⋱ ⎟
⎟⎜ 0 ⋱ 0 ⎟
⋮⎜ ⎟.
⎝ 𝜆1 ⟨1𝑛 |𝐵1 ⟩ ⋯ 𝜆𝑛 ⟨1𝑛 |𝐵𝑛 ⟩ ⎠ ⎝ ⟨1𝑛 |𝐵1 ⟩ ⋯ ⟨1𝑛 |𝐵𝑛 ⟩
⎠ ⎝ 0 0 𝜆𝑛 ⎠
(3.146)
You should calculate the product on the right­hand side (even just for 𝑛 = 2 or
𝑛 = 3) to convince yourself that this decomposition is indeed correct. Now, if we

46
define a new diagonal matrix, with the eigenvalues on the diagonal:

𝜆1 0 0
⎛ ⎞
𝐷≡⎜ 0 ⋱ 0 ⎟
⎜ ⎟, (3.147)
⎝ 0 0 𝜆 𝑛 ⎠

then we can write equation (3.146) as

𝐴𝑃 = 𝑃 𝐷. (3.148)

Finally, we multiply by 𝑃 −1 from the left to get

𝑃 −1 𝐴𝑃 = 𝐷. (3.149)

Exercise 3.53. Diagonalize the following matrix:

1 3
𝐴=( ). (3.150)
3 1

Problem 3.54. Prove that the change­of­basis matrix 𝑃 ≡ 𝑃1←𝐵 as defined above,
with |𝐵𝑖 ⟩ an orthonormal eigenbasis, is unitary. This means that we can also write
𝑃 † 𝐴𝑃 = 𝐷, since 𝑃 −1 = 𝑃 † for unitary matrices.

Problem 3.55. Show that if 𝐴 is a normal matrix then it has the outer product
representation
𝑛
𝐴 = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (3.151)
𝑖=1

where |𝐵𝑖 ⟩ is an orthonormal eigenbasis and 𝜆𝑖 are the eigenvalues of the eigen­
vectors |𝐵𝑖 ⟩.

3.2.17 The Cauchy­Schwarz Inequality

The Cauchy­Schwarz inequality states that for any two vectors |Ψ⟩ and |Φ⟩, we
have
|⟨Ψ|Φ⟩| ≤ ‖Ψ‖ ‖Φ‖ . (3.152)

To prove it, consider an orthonormal basis |𝐵𝑖 ⟩ such that15

|Φ⟩
|𝐵1 ⟩ ≡ . (3.153)
‖Φ‖
15
Such a basis can always be generated using a method called the Gram­Schmidt process, which
we will not describe here.

47
Then, using the completeness relation (3.81), we find:
2 2 2
‖Ψ‖ ‖Φ‖ = ⟨Ψ|Ψ⟩ ‖Φ‖
𝑛
2
= ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
𝑖=1
𝑛
2
= ⟨Ψ| (|𝐵1 ⟩⟨𝐵1 | + ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
𝑖=2
𝑛
1 2
= ⟨Ψ| ( 2
|Φ⟩ ⟨Φ| + ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
‖Φ‖ 𝑖=2
𝑛
1 2
=( 2
⟨Ψ|Φ⟩⟨Φ|Ψ⟩ + ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩) ‖Φ‖
‖Φ‖ 𝑖=2
𝑛
1 2 2 2
=( 2
|⟨Ψ|Φ⟩| + ∑ |⟨Ψ|𝐵𝑖 ⟩| ) ‖Φ‖
‖Φ‖ 𝑖=2
𝑛
2 2 2
= |⟨Ψ|Φ⟩| + ∑ |⟨Ψ|𝐵𝑖 ⟩| ‖Φ‖
𝑖=2
2
≥ |⟨Ψ|Φ⟩| .

Taking the square root, we obtain equation (3.152).


Problem 3.56. Explain each step in the proof above.
Exercise 3.57. Check this inequality explicitly for three pairs of vectors of your
choice.
Problem 3.58. Find a condition that is equivalent to an equality in the Cauchy­
Schwarz inequality. That is, find an “if and only if” statement for |⟨Ψ|Φ⟩| = ‖Ψ‖ ‖Φ‖
involving properties of |Ψ⟩ and |Φ⟩.

3.3 Probability Theory

3.3.1 Random Variables and Probability Distributions

A random variable 𝑋 is a function which assigns a real value to each possible


outcome of an experiment or process. Sometimes these values will be the actual
measured value in some way: for example, the value of the random variable 𝑋 for
rolling a 6­sided die will simply be the number on the die. Other times, the value
of the random variable will be just a numerical label assigned to each outcome:
for example, for a coin toss we can assign 1 to heads and 0 to tails (but we can
also assign any other numbers, if we want).
These examples were of discrete random variables, but we can also have contin­
uous random variables, such as the position of a particle along a line, which in

48
principle can take any real value. For simplicity, we will focus on discrete random
variables here.
A (discrete) probability distribution assigns a probability to each value of a random
variable. We denote by 𝑃 (𝑋 = 𝑥) the probability that the random variable 𝑋 will
have the value 𝑥. A probability is a number between 0 and 1, which denotes how
likely it is (in percentage) for the value to occur, so 0 means this value never
occurs and 1 (= 100%) means this value always occurs.
The probabilities for all the possible values must sum to 1, because if for example
they only sum to 0.9, this means that in 10% of the cases the random variable has
no value, which doesn’t really make sense. Also, if 𝑃 (𝑋 = 𝑥) = 0 then there must
be at least one other possible value that 𝑋 can take, since it will never evaluate
to 𝑥, and if 𝑃 (𝑋 = 𝑥) = 1 then there cannot be any other possible values that 𝑋
can take, since it always evaluates to 𝑥.
For example, for the coin toss we have

1 1
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = , (3.154)
2 2
and for the 6­sided die roll we have
1 1 1
𝑃 (𝑋 = 1) = , 𝑃 (𝑋 = 2) = , 𝑃 (𝑋 = 3) = , (3.155)
6 6 6
1 1 1
𝑃 (𝑋 = 4) = , 𝑃 (𝑋 = 5) = , 𝑃 (𝑋 = 6) = . (3.156)
6 6 6
Note how the probabilities sum to 1 in each case. Of course, we could also say that
maybe the coin toss results in heads only 49.9% of the time, and tails another
49.9% of the time, and the remaining 0.2% is the probability for the coin to
balance perfectly on its edge... But usually we ignore subtleties like this and
assume we have idealized coins. Similarly, we could also have a loaded coin
which lands on heads more or less frequently than it lands on tails, but usually
we assume that the coins are fair unless stated otherwise. The same discussion
applies for dies, with any number of sides: they are, by default, assumed to be
idealized and fair.
These probability distributions are uniform, since they assign the same probability
to each value of 𝑋. However, probability distributions need not be uniform. A
simple example is a loaded coin, which perhaps has

1 2
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = . (3.157)
3 3
As a more interesting example, if we toss two fair coins 𝑋1 and 𝑋2 and define a
random variable to be the sum of the results, 𝑋 ≡ 𝑋1 + 𝑋2 , then we can get any

49
of the following 4 outcomes:

0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 2. (3.158)

The probability for each outcome is

1 1 1
⋅ = , (3.159)
2 2 4
but the outcome 1 appears twice; thus

1 1 1
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = , 𝑃 (𝑋 = 2) = . (3.160)
4 2 4
Of course, the probabilities still sum to 1.

Exercise 3.59. Calculate the probability distribution for the sum of two rolls of a
6­sided die. This is known to players of role­playing games (such as Dungeons &
Dragons) as a “2d6”, where we define 𝑛d𝑁 to be the sum of 𝑛 rolls of an 𝑁 ­sided
die.

3.3.2 Conditional Probability

Consider two random variables, 𝑋 and 𝑌 . Let 𝑋 have 𝑁 possible values 𝑥𝑖 ,


𝑖 ∈ {1, … , 𝑁 } and let 𝑌 have 𝑀 possible values 𝑦𝑖 , 𝑖 ∈ {1, … , 𝑀 }. Then the joint
probability to get 𝑋 = 𝑥𝑖 and 𝑌 = 𝑦𝑗 at the same time, for some specific choice
of 𝑖 and 𝑗, is denoted
𝑃 (𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 ) , (3.161)

where ∩ means16 “and”. Furthermore, we have


𝑀
∑ 𝑃 (𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 ) = 𝑃 (𝑋 = 𝑥𝑖 ) , (3.162)
𝑗=1

because the total probability to get 𝑋 = 𝑥𝑖 is the sum of all the different prob­
abilities that involve 𝑋 = 𝑥𝑖 plus something else. To illustrate this, consider the
following random variables:

𝑋 = whether you pass or fail this course, (3.163)

𝑌 = whether you did or did not do all the homework. (3.164)


16
More precisely, ∩ means the intersection of two sets, where one is the set of events for which
𝑋 = 𝑥𝑖 and the other is the set of events for which 𝑌 = 𝑦𝑗 .

50
There are in total 4 different combinations, and their probabilities must sum to 1.
Maybe the probabilities are as follows:

𝑃 (pass ∩ did homework) = 40%, (3.165)

𝑃 (pass ∩ didn’t do homework) = 20%, (3.166)

𝑃 (didn’t pass ∩ did homework) = 10%, (3.167)

𝑃 (didn’t pass ∩ didn’t do homework) = 30%. (3.168)

Then clearly the total probability that you pass (whether or not you did the
homework) is 40% + 20% = 60%, and the total probability that you do not pass
is 10% + 30% = 40%. This is exactly what equation (3.162) means.
However, what you really want to know is the probability that you pass given that
you did the homework vs. the probability that you pass given that you did not do
the homework. This is called conditional probability. The probability for outcome
𝑋 given outcome 𝑌 is denoted 𝑃 (𝑋|𝑌 ), where | is read as “given that”. It is
related to 𝑃 (𝑋 ∩ 𝑌 ) as follows:

𝑃 (𝑋 ∩ 𝑌 )
𝑃 (𝑋|𝑌 ) = . (3.169)
𝑃 (𝑌 )

In other words, it is the probability that both 𝑋 and 𝑌 happened, divided by the
probability for 𝑌 to happen. Let us calculate:

40%
𝑃 (pass | did homework) = = 80%, (3.170)
40% + 10%

20%
𝑃 (pass | didn’t do homework) = = 40%. (3.171)
20% + 30%
So you better do all the homework, because that doubles your chances of passing
the course!

Exercise 3.60. There are six more conditional probabilities that we did not cal­
culate here. Calculate them. What do you learn from the results?

Exercise 3.61. A test for COVID­19 has17 a 1% chance of false positive, i.e.
the result is positive but the patient isn’t actually sick, and a 1% chance of false
negative, i.e. the result is negative but the patient is actually sick. Assume that
0.1% of the population is actually sick.
17
FYI: This exercise is not based on any real data!

51
A. Fill in the blanks in the following table:

Sick Healthy Total


Actual status 99.9% 100%
(3.172)
Positive test 0.099% 1.098%
Negative test 0.001% 98.901%

B. Given that you tested positive, what is the conditional probability that you
actually have COVID­19?
C. Given that you tested negative, what is the conditional probability that you
actually don’t have COVID­19?
D. Which result should you trust, a positive one or a negative one?

3.3.3 Expected Values

The expected value (or expectation value or mean) ⟨𝑋⟩ of a random variable 𝑋 is
the average over all the possible values 𝑋 can take, weighted by their assigned
probabilities:
𝑁
⟨𝑋⟩ ≡ ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 , (3.173)
𝑖=1

where 𝑁 is the total number of possible outcomes, 𝑥𝑖 is the value of outcome


number 𝑖, and 𝑃 (𝑋 = 𝑥𝑖 ) is the probability to get 𝑥𝑖 . In the example of the coin
toss, we have:
1 1 1
⟨𝑋⟩ = ⋅ 0 + ⋅ 1 = = 0.5, (3.174)
2 2 2
and for the 6­sided die roll, we have:

1 1 1 1 1 1 7
⟨𝑋⟩ = ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 + ⋅ 5 + ⋅ 6 = = 3.5. (3.175)
6 6 6 6 6 6 2
Observe that the expected value in both cases is not an actual value the random
variable can take! This is often the case with discrete random variables.
We will now prove that the expected value is linear:

⟨𝛼𝑋 + 𝛽𝑌 ⟩ = 𝛼 ⟨𝑋⟩ + 𝛽 ⟨𝑌 ⟩ , 𝛼, 𝛽 ∈ ℝ. (3.176)

This can be broken down into two rules:

⟨𝛼𝑋⟩ = 𝛼 ⟨𝑋⟩ , ⟨𝑋 + 𝑌 ⟩ = ⟨𝑋⟩ + ⟨𝑌 ⟩ . (3.177)

52
The first rule is easy to prove:
𝑁
⟨𝛼𝑋⟩ = ∑ 𝑃 (𝛼𝑋 = 𝛼𝑥𝑖 ) (𝛼𝑥𝑖 )
𝑖=1
𝑁
= 𝛼 ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 .
𝑖=1

To prove the second part, let 𝑋 have 𝑁 possible values 𝑥𝑖 and let 𝑌 have 𝑀
possible values 𝑦𝑖 , as in the previous section. Then in calculating ⟨𝑋 + 𝑌 ⟩ we need
to sum over both 𝑁 and 𝑀 , to ensure we take all possible combinations of 𝑋 and
𝑌 into account. Using equation (3.162), we get:

𝑁 𝑀
⟨𝑋 + 𝑌 ⟩ = ∑ ∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) (𝑥𝑖 + 𝑦𝑗 )
𝑖=1 𝑗=1
𝑁 𝑀 𝑀 𝑁
= ∑ (∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 )) 𝑥𝑖 + ∑ (∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 )) 𝑦𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
𝑁 𝑀
= ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 + ∑ 𝑃 (𝑌 = 𝑦𝑗 ) 𝑦𝑗
𝑖=1 𝑗=1

= ⟨𝑋⟩ + ⟨𝑌 ⟩ ,

as we wanted to prove.

Exercise 3.62. Calculate the expected value for the sum of two coin tosses and
for a 2d6 roll (the sum of two 6­sided dies). First, do it by defining one random
variable for the sum, calculating the probabilities, and then using the definition
of the expected value. Then, do it by considering just one coin or one 6­sided die
respectively, and use equation (3.176). Compare your results.

3.3.4 Standard Deviation

The standard deviation18 measures how far the outcomes are expected to be from
the expected value. To calculate the standard deviation, we take the expected
2
value of (𝑋 − ⟨𝑋⟩) , that is, the square of the difference between the actual value
of 𝑋 and its expected value ⟨𝑋⟩. Then, we take the square root of the result to
obtain the standard deviation Δ𝑋:
2
Δ𝑋 ≡ √⟨(𝑋 − ⟨𝑋⟩) ⟩. (3.178)
18
By the way, the square of the standard deviation is called the variance, but it will not interest
us in this course.

53
To simplify this, first we note that
2 2
(𝑋 − ⟨𝑋⟩) = 𝑋 2 − 2𝑋 ⟨𝑋⟩ + ⟨𝑋⟩ . (3.179)

In this formula, 𝑋 2 is a random variable (whose values are the squares of the
values of 𝑋), but ⟨𝑋⟩ is just a number, not a random variable. Since it is a
number, we can treat it as a random variable that only returns one value with
100% probability, which means that

⟨⟨𝑋⟩⟩ = ⟨𝑋⟩ . (3.180)

So, by equation (3.176), we have:


2 2
⟨(𝑋 − ⟨𝑋⟩) ⟩ = ⟨𝑋 2 ⟩ − 2 ⟨𝑋⟩ ⟨𝑋⟩ + ⟨𝑋⟩
2
= ⟨𝑋 2 ⟩ − ⟨𝑋⟩ .

Therefore, the standard deviation can be written as follows:

2
Δ𝑋 = √⟨𝑋 2 ⟩ − ⟨𝑋⟩ . (3.181)

This form is easier to do calculations with. For example, for the coin toss we have
from before
1
⟨𝑋⟩ = , (3.182)
2
and we also calculate:
1 2 1 2 1
⟨𝑋 2 ⟩ = ⋅0 + ⋅1 = , (3.183)
2 2 2
which gives us
1 1 1
Δ𝑋 = √ − = . (3.184)
2 4 2
This makes sense, as the two actual values of the outcomes, 0 and 1, lie exactly
1/2 away from the expected value ⟨𝑋⟩ = 1/2 in each direction. So they each
“deviate” from it by 1/2.
For the die roll, we have from before

7
⟨𝑋⟩ = , (3.185)
2
and we also calculate:
1 2 91
⟨𝑋 2 ⟩ = (1 + 22 + 32 + 42 + 52 + 62 ) = , (3.186)
6 6
which gives us
91 49 35
Δ𝑋 = √ − = √ ≈ 1.7. (3.187)
6 4 12

54
Exercise 3.63. Calculate the standard deviation for the sum of two coin tosses
and for a 2d6 roll.

3.3.5 Normal (Gaussian) Distributions

The normal (or Gaussian) distribution is depicted in figure 3.2. Unlike the distri­
butions we have considered so far, it is continuous; but we won’t worry about
that right now. The shape of the distributions is a “bell curve”, centered on some
mean (or expected) value 𝜇 (equal to 0 in the plot) and with a standard deviation
𝜎. The values of 𝜇 and 𝜎 can be any real numbers.

0.4

0.3

0.2

0.1

-3 σ -2 σ -σ σ 2σ 3σ

Figure 3.2: The normal (or Gaussian) distribution.

The “68­95­99.7 rule” tells us the fraction of outcomes which lie within 1, 2 and
3 standard deviations of the mean:

• Roughly 68% of the outcomes lie between −1𝜎 and +1𝜎,

• Roughly 95% of outcomes lie between −2𝜎 and +2𝜎,

• Roughly 99.7% of outcomes lie between −3𝜎 and +3𝜎.

The normal distribution is the most common probability distribution you will en­
counter in this course, and in physics and math in general. The reason for that
is that there is a theorem, the central limit theorem, which states that whenever
we take the sum of independent random variables, the probability distribution of
the sum will gradually start to look like a normal distribution. As we add more
and more variables, the sum will get closer and closer to a normal distribution.
This can already be seen in the case of the die rolls. For a 1d6 roll we have a
uniform distribution, as depicted in figure 3.3. For a 2d6 roll, we get a triangular

55
distribution centered at the mean value of 7, as depicted in figure 3.4. When
solving exercise 3.59, you found that the probability for each possible combination
of die rolls is 16 ⋅ 16 = 36
1
, but as for the sum of the rolls, the outcomes 2 and 12
appear only once (corresponding to 1+1 and 6+6 respectively), while the outcome
7 appears six times (corresponding to 1+6, 2+5, 3+4, 4+3, 5+2, and 6+1) and
thus has a probability of 6/36 = 1/6, and so on.
For a 3d6 roll, the sum of three rolls of a 6­sided die, as depicted in figure 3.5,
we see that the probability distribution is starting to obtain the signature “bell”
shape of the normal distribution. Its mean value is 10.5, as you can calculate
(3 × 3.5). We will get closer and closer to a normal distribution as we increase the
number of dice, that is, the 𝑛 in 𝑛d6. In the limit 𝑛 → ∞, we will precisely obtain
a normal distribution, but even for small values of 𝑛, the approximation is already
close enough for most practical purposes.

1/6

1 2 3 4 5 6

Figure 3.3: The distribution of results for one roll of a 6­sided die, also
known as 1d6. It is a uniform distribution.

Exercise 3.64. Plot the probability distributions of the sum of 𝑛 coin tosses, from
𝑛 = 1 and up to a value of 𝑛 large enough for the distribution to start looking like
a normal distribution.
Problem 3.65. Write a computer program (I recommend using either Mathemat­
ica or Python) that will generate a plot of the probability distribution for an 𝑛d𝑠
roll with an arbitrary number of rolls 𝑛 and an arbitrary number of sides 𝑠 (where
𝑠 = 2 corresponds to a coin). It should also plot the continuous normal distribution
(with the correct mean and standard deviation) over the discrete distribution, to
check how closely they match. Generate some plots using your program, and use
them to demonstrate the central limit theorem for different values of 𝑛 and 𝑠.

56
6/36

5/36

4/36

3/36

2/36

1/36

2 3 4 5 6 7 8 9 10 11 12

Figure 3.4: The distribution of results for the sum of two rolls of a
6­sided die, also known as 2d6. It is triangular.

27/216

25/216

21/216

15/216

10/216

6/216

3/216

1/216
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Figure 3.5: The distribution of results for the sum of three rolls of a
6­sided die, also known as 3d6. It is starting to obtain the “bell” shape
of a normal distribution.

4 The Foundations of Quantum Theory

Now that we have obtained the required mathematical tools, we can finally present
quantum theory! This theory provides the correct fundamental framework for vir­
tually all of known physics. We will see that its fundamental ingredients are Hilbert
spaces with states and operators. These universal ingredients are then used to
create particular models describing specific physical systems.
In this chapter, we will work exclusively with discrete quantum systems, which
are based on finite­dimensional Hilbert spaces. These are much simpler than
continuous quantum systems, which are based on infinite­dimensional Hilbert
spaces. In particular, the math is much simpler – just linear algebra, without any

57
calculus. However, it turns out that finite­dimensional Hilbert spaces are sufficient
to define all of the fundamental concepts in quantum theory, and derive almost
all of the most important results.

4.1 Axiomatic Definition

4.1.1 Dimensionless and Dimensionful Constants

Consider the fine­structure constant, which represents the strength of the elec­
tromagnetic interaction:
𝛼 ≈ 0.0073. (4.1)

This constant is not specified in any particular units, such as meters or second; it
is a pure number. We call such a constant dimensionless.
In contrast, some constants in physics are dimensionful. This means that their
numerical value depends on the system of units we use. For example, the speed
of light 𝑐 has the following values in different systems of units:

𝑐 ≈ 3.0 × 108 meters/second


≈ 1.1 × 107 miles/minute
≈ 170 astronomical units/day
≈ 3.5 × 10−5 parsecs / hour
≈ 1 light year / year.

What this means is that the numerical value of the speed of light does not have
any physical meaning whatsoever19 ! It is merely a consequence of choosing to
work with one system of units and not the other. But units are human constructs;
the universe could not care less what units humans choose to measure things
with. Therefore, none of the numbers written above have any actual meaning.
The numerical values of dimensionless constants are the only numbers that
have a physical meaning, as they do not depend on the system of units. However,
keep in mind that they are still, unavoidably, just parameters that are defined by
humans in a certain way. There’s nothing special about the number 𝛼 ≈ 0.0073
itself; we could also define another parameter 𝛽 ≡ 2𝛼 and use that in our equations
instead. So don’t try to do numerology20 with the specific value of 𝛼! What is
important here is not the numerical value itself, but the fact that it is independent
of the choice of units.
19
Indeed, in modern SI units the speed of light is defined to be 299,792,458 meters per second,
and this definition is used to measure the length of a meter – not the other way around.
20
Interestingly, in the past some physicists tried to claim that 𝛼 equals exactly 1/137, but more
precise measurements revealed that this is not actually the case. Still, you will often see it written
as 1/137 for that historical reason.

58
For this reason, it is most natural to work in Planck units, where:

1
𝑐=𝐺=ℏ= = 𝑘𝐵 = 1. (4.2)
4𝜋𝜀0

Here 𝑐 is the speed of light, 𝐺 is the gravitational constant used in Newtonian grav­
ity, ℏ is the (reduced) Planck constant used in quantum mechanics, 1/4𝜋𝜀0 is the
Coulomb constant used in electromagnetism, and 𝑘𝐵 is the Boltzmann constant
used in statistical mechanics.
All of these are dimensionful constants, which means we don’t really care about
their numerical values – so we might as well just set them to 1. This allows us to
simply remove them from our equations. For example, instead of writing √ℏ𝐺/𝑐3
– also known as the Planck length – we just write 1, and this allows us to write
the equation 𝐴 = ℏ𝐺𝛾√𝑗 (𝑗 + 1)/𝑐3 as 𝐴 = 𝛾√𝑗 (𝑗 + 1). Much simpler, right21 ?
Planck units are commonly used when doing research in theoretical physics, be­
cause they make equations simpler, more elegant, and less cluttered. However,
sometimes we get numerical results that we wish to convert to real­world units
such as kilograms and meters. To do this, all we need to do is to find the combi­
nation of the constants in equation (4.2) that has the desired units. For example,
if we know that our pure number represents length, then we can multiply it by
the Planck length √ℏ𝐺/𝑐3 ≈ 1.6 × 10−35 meters to find its value in meters.
Since this course is taught by a theorist, we will use Planck units exclusively. This
means that unlike in a traditional quantum mechanics course, ℏ will not appear
in any of our equations!

Exercise 4.1. Calculate your age, height, mass, and body temperature in Planck
units. For this, you will have to find combinations of the dimensionful constants
we set to 1 in equation (4.2) that give you the desired units, as we did for the
Planck length.

4.1.2 Hilbert Spaces, States, and Operators

Recall that in section 3.2.2 we defined a Hilbert space as a vector space with an
inner product that is also a complete metric space with respect to that inner prod­
uct. Quantum theory can be defined axiomatically using the theory of Hilbert
spaces. In this chapter we will list a total of seven fundamental axioms, plus an
eighth axiom that may or may not be fundamental.
The System Axiom: A system in quantum theory is the mathematical represen­
tation of a physical system (such as a particle) as a Hilbert space. The type and
21
This is the equation for the eigenvalues of the area operator in loop quantum gravity. We will
learn about operators in the next section.

59
dimension of the Hilbert space will depend on the particular system. Note that
the dimension of the Hilbert space is unrelated to the dimension of spacetime.
In the finite­dimensional case, for example when the system involves spin, the
Hilbert space will usually be ℂ𝑛 for some 𝑛, such as ℂ2 , which was used in
most of the examples above and will continue to be used below. In the infinite­
dimensional case, for example when the system involves position and momentum
(which are, in general, continuous and not discrete) the Hilbert space it will usually
be a space of functions, which is much more complicated.
The State Axiom: A state of a quantum system is a vector with unit norm in the
system’s Hilbert space, that is, a vector |Ψ⟩ which satisfies

‖Ψ‖ = √⟨Ψ|Ψ⟩ = 1. (4.3)

States represent the different configurations the system can have. It is important
to stress that only unit vectors can represent states. If for some reason we have
a vector with non­unit norm, we must normalize it (divide it by its norm) to obtain
a unit vector, which can then represent a state.
Another important aspect of states is that they are only defined up to a complex
phase. This means that, if the vector |Ψ⟩ represents a state, then all vectors of
the form ei 𝜙 |Ψ⟩ for 𝜙 ∈ ℝ represent the same22 state as |Ψ⟩. Note that adding a
phase to a vector does not change the norm, since


∥ei 𝜙 Ψ∥ = √(ei 𝜙 |Ψ⟩) ei 𝜙 |Ψ⟩ = √⟨Ψ| e− i 𝜙 ei 𝜙 |Ψ⟩ = √⟨Ψ|Ψ⟩ = ‖Ψ‖ . (4.4)

The Operator Axiom: An operator on a Hilbert space is a linear transformation


which takes a vector and outputs another vector. By “linear” we mean that the
operator 𝐴 satisfies

𝐴 (|Ψ⟩ + |Φ⟩) = 𝐴 |Ψ⟩ + 𝐴 |Φ⟩ , 𝐴 (𝜆 |Ψ⟩) = 𝜆𝐴 |Ψ⟩ , (4.5)

where |Ψ⟩ and |Φ⟩ are vectors and 𝜆 is a scalar. In the discrete case, operators
are just matrices on ℂ𝑛 . In the continuous case, where the vectors are actually
functions, the operators will be derivatives acting on the functions23 . In quantum
22
Actually, the more precise definition is that a state is a ray in a Hilbert space. Rays are defined
as equivalence classes of vectors such that a vector |Ψ⟩ is equivalent to 𝜆 |Ψ⟩ for any scalar 𝜆 ∈ ℂ.
The scalar can be separated into a polar representation, 𝜆 = 𝑟 ei 𝜙 , as we discussed in section 3.1.4.
The 𝑟 part stretches the magnitude of the vector by a factor of 𝑟, and the ei 𝜙 part (the phase)
rotates it by 𝜙 radians. Any vector in the same equivalence class represents the same state, so
multiplying the vector by a scalar will not change the state it represents, whatever the magnitude
and phase are. However, it is conventional to choose states to be represented specifically by a unit
vector from the equivalence class, since otherwise we would have to normalize vectors to 1 all the
time. This is also the reason we only use orthonormal bases in quantum theory.
23
In it interesting to note that there is often still a sense in which operators and states in a
continuous Hilbert space have the equivalent of indices and elements. You will learn about this in

60
theory, operators transform states into other states, and they represent an ac­
tion performed on the system, such as a measurement, a transformation, or an
evolution in time.

4.1.3 Hermitian Operators and Observables

An operator corresponding to a Hermitian matrix24 is a Hermitian operator. Above


you have proved some interesting properties of these operators. In particular,
their eigenvalues are real, and there is an orthonormal basis consisting of their
eigenvectors (a.k.a. an eigenbasis).
The Observable Axiom: In quantum theory, Hermitian operators correspond to
observables, that is, properties of the system that can be measured. The eigen­
values of these operators, which are real as for all Hermitian operators, exactly
correspond to all the different possible outcomes of the measurement. The map­
ping is one­to­one and onto (a.k.a. a bijection), meaning that each eigenvalue
corresponds to exactly one measurement outcome and vice versa. This makes
sense because we always measure real numbers; there are no measurement
devices that measure complex numbers!
Examples of observables are position, momentum, angular momentum, energy,
and spin (which is intrinsic angular momentum, as we learned in section 2.1.4).
All of these may be represented as Hermitian operators on an appropriate Hilbert
space.

4.1.4 Probability Amplitudes

Let the state of a quantum system be |Ψ⟩. Once we have chosen a Hermitian op­
erator to represent our observable, we may obtain an orthonormal basis of states
|𝐵𝑖 ⟩ corresponding to the eigenvectors of that operator. In quantum mechanics,
these eigenvectors are called eigenstates.
The Probability Axiom: The inner product ⟨𝐵𝑖 |Ψ⟩ is called the probability am­
plitude to measure the eigenvalue 𝜆𝑖 corresponding to the eigenstate |𝐵𝑖 ⟩, given
the state |Ψ⟩. When we take the magnitude­squared of a probability amplitude,
we get the corresponding probability. Thus
2
|⟨𝐵𝑖 |Ψ⟩| (4.6)

is the probability to measure the eigenvalue 𝜆𝑖 corresponding to the eigenstate


|𝐵𝑖 ⟩, given the state |Ψ⟩. This is also known as the Born rule.
more advanced courses on quantum mechanics, especially quantum field theory.
24
In an infinite­dimensional Hilbert space, where we don’t necessarily have a matrix representa­
tion, a Hermitian operator is defined using the property that it is self­adjoint.

61
The first four axioms that we presented here simply defined the meaning of sys­
tems, states, operators, and observables in mathematical terms. The Probability
Axiom, on the other hand, has to do with the relations between these mathemat­
ical structures. One can thus justifiably ask: why would this be a probability in
the first place?
Unfortunately, since this is an axiom, it cannot be derived from anything more
fundamental, such as other axioms. However, at the very least, we can verify
that it indeed behaves exactly like a probability is expected to. This follows from
the fact that
𝑛 𝑛
2
∑ |⟨𝐵𝑖 |Ψ⟩| = ∑⟨𝐵𝑖 |Ψ⟩∗ ⟨𝐵𝑖 |Ψ⟩
𝑖=1 𝑖=1
𝑛
= ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩
𝑖=1
𝑛
= ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩
𝑖=1

= ⟨Ψ|Ψ⟩
= 1,

where we used the following:

• Taking the complex conjugate of an inner product switches the order of the
vectors,

• The completeness relation (3.81),

• All quantum states have a norm (and thus also norm­squared) of 1.


2
What does this mean? The number |⟨𝐵𝑖 |Ψ⟩| for each value of 𝑖 from 1 to 𝑛 cor­
responds to each of the 𝑛 possible outcomes of a measurement. We know that
it must be non­negative, because it is a magnitude of a complex number. Also,
when taking the sum of all such numbers, we get 1. In other words, the numbers
2
|⟨𝐵𝑖 |Ψ⟩| behave like probabilities: they are real numbers between 0 and 1, which
always sum to 1. Why they actually represent probabilities is a question that has
no good answer except that this is just how quantum theory works, and it can be
verified experimentally.
One might wonder why we bothered mentioning the probability amplitudes ⟨𝐵𝑖 |Ψ⟩,
which are complex numbers, instead of just directly calculating the probabilities
2
|⟨𝐵𝑖 |Ψ⟩| . It turns out that the fact that probability amplitudes are complex num­
bers is an essential part of what makes quantum mechanics different from classi­
cal mechanics. In fact, you can even think of quantum mechanics as a generaliza­
tion of classical probability theory where probabilities are allowed to be complex
numbers.

62
We could write down a classical theory which assigns probabilities to each mea­
surement outcome; but since probabilities must be real non­negative numbers,
when they are added the result is always a higher probability. Therefore, classi­
cal probabilities interfere only constructively. In quantum theory, on the other
hand, one does not add probabilities, but probability amplitudes; and as we will
see, they can interfere with one another both constructively and destructively,
just as we discussed in section 2.1.3.
If the probability amplitudes for two events have opposite complex phases (for
example, one is positive and one is negative) they can even cancel each other out
completely – so that neither event happens, since their total probability amplitude
(and thus also probability) is zero! This, of course, can never happen with classical
probability.
Exercise 4.2. In the Hilbert space ℂ2 , consider the Hermitian operator

0 1
𝜎𝑥 ≡ ( ). (4.7)
1 0

Find its eigenstates (make sure they are normalized to 1!) and eigenvalues. Then,
calculate the probability to measure each of the eigenvalues given that the system
is in the state
1 1
|Ψ⟩ ≡ √ ( ) . (4.8)
10 3

Verify that the probabilities sum to 1.

4.1.5 Superposition

Consider an observable represented by a Hermitian operator with an orthonormal


basis of eigenstates |𝐵𝑖 ⟩. As with any basis, we may write the state vector |Ψ⟩ as
a linear combination of the basis eigenstates |𝐵𝑖 ⟩:
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (4.9)
𝑖=1

Remember that each coefficient ⟨𝐵𝑖 |Ψ⟩ is the probability amplitude to measure
the eigenvalue corresponding to the eigenstate |𝐵𝑖 ⟩ given that the system is in
the state |Ψ⟩. So this is a sum over the basis states |𝐵𝑖 ⟩, corresponding to the
possible measurement outcomes, with a probability amplitude attached to each
of these outcomes, which depends on the state |Ψ⟩. Such a linear combination of
states25 is called a superposition.
25
More generally, a superposition is any linear combination of states. The states don’t have to
be basis eigenstates and the coefficients don’t have to be probability amplitudes – but they usually
are.

63
The concept of superposition is responsible for many of the weird properties of
quantum mechanics, as we will soon see. Importantly, superposition is not an
axiom, but simply an (almost trivial) mathematical property of vectors in Hilbert
spaces. This means that superposition follows automatically from the previous
axioms; it is not something that needs to be introduced separately.
You will often hear people (including physicists, if they are being sloppy) say that
superposition means that “the system is in multiple states at the same time”. For
example, it is frequently said about particles – which can be in a superposition
of eigenstates corresponding to different outcomes for the measurement of their
position – that “the particle is in multiple places at the same time”. However, this
is a common misconception – or at the very least, an overly literal interpretation
of the math.
The fact that a state |Ψ⟩ can be written in a superposition of eigenstates |𝐵𝑖 ⟩
doesn’t mean that the system is actually “in” all of these different states at once.
The system is, in fact, in only one state: the state |Ψ⟩. This state can be repre­
sented in the eigenbasis |𝐵𝑖 ⟩, and doing this reveals the probability to measure
each of the eigenvalues. However, one can always find26 an orthonormal basis
where |Ψ⟩ itself is one of the basis states – and often, this can be an eigenbasis
corresponding to another observable of the system. In that basis, the system
will not be in a superposition – it will just be in the state |Ψ⟩, with a probability
amplitude of ⟨Ψ|Ψ⟩ = 1!
So instead of saying that “the system is in all of the states |𝐵1 ⟩ , … , |𝐵𝑛 ⟩ at once”, it
is more precise to say that the system is currently in the state |Ψ⟩, and a measure­
ment of the observable with the eigenbasis |𝐵𝑖 ⟩ could yield different outcomes,
with the probability amplitude for outcome number 𝑖 given by the projection27 of
|Ψ⟩ on |𝐵𝑖 ⟩, calculated by taking the inner product ⟨𝐵𝑖 |Ψ⟩. It sounds less cool and
mysterious, but it is more accurate and less prone to confusion and misinterpre­
tation.
Of course, this description is too technical for the average person, which is why
physicists usually choose to just say, incorrectly, that “the system is in multiple
states at the same time”. But now that you actually know the math of quantum
theory, you should be able to understand the correct definition of superposition!
I will let you digest all of this for now, and in section 4.2.4 we will discuss an
analogy, using a concrete quantum system, that should help you understand this
better.
26
Using the Gram­Schmidt process mentioned in footnote (15).
27
In ℝ𝑛 , the projection of v on w (or w on v) is given by the dot product v ⋅ w. Projections in ℂ𝑛
generalize this concept, with the inner product replacing the dot product.

64
Exercise 4.3. Consider again the Hermitian operator from exercise 4.2,

0 1
𝜎𝑥 ≡ ( ). (4.10)
1 0

Write down an example of a state |Ψ⟩ which corresponds to a probability of 1/3


to measure the eigenvalue +1 and a probability of 2/3 to measure the eigenvalue
−1. Make sure your state is normalized to 1!

Exercise 4.4. A quantum system described by the Hilbert space ℂ3 has an ob­
servable corresponding to a Hermitian operator 𝐴 with the matrix representation

0 1 0
⎛ ⎞
𝐴≡⎜
⎜ 1 0 0 ⎟
⎟. (4.11)
⎝ 0 0 2 ⎠

A. Find its eigenvalues and their corresponding eigenstates. Make sure the states
are normalized to 1.
B. Find three different states such that a measurement of the observable 𝐴
will produce the lowest eigenvalue with probability 1/7, the highest eigenvalue
with probability 2/7, and the middle eigenvalue with probability 4/7. When we
say different states, we mean that the vectors that represent them cannot be
scalar multiples of each other; recall from footnote (22) that such vectors are in
the same equivalence class, and thus represent the same state. Make sure the
states are normalized to 1.
C. Write the state
1
1 ⎛ ⎞

|Ψ⟩ ≡ √ ⎜ −2 ⎟ ⎟ (4.12)
15
⎝ 3 − i ⎠
as a superposition of eigenstates of 𝐴, and calculate the probabilities to measure
each eigenvalue of 𝐴 given that the system is in the state |Ψ⟩. Verify that the
probabilities sum to 1.

4.1.6 Inner Products with Matrices, and the Expectation Value

Consider a Hermitian operator 𝐴 with an orthonormal basis of 𝑛 eigenstates |𝐵𝑖 ⟩


and 𝑛 eigenvalues 𝜆𝑖 . To remind you, this means that

𝐴 |𝐵𝑖 ⟩ = 𝜆𝑖 |𝐵𝑖 ⟩ , ⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 , (4.13)

65
where 𝛿𝑖𝑗 is the Kronecker delta, which we defined in equation (3.53):

0 if 𝑖 ≠ 𝑗,
𝛿𝑖𝑗 = { (4.14)
1 if 𝑖 = 𝑗.

Then we have, for any 𝑖, 𝑗 ∈ {1, … , 𝑛}:

⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩ = ⟨𝐵𝑖 | (𝐴|𝐵𝑗 ⟩)


= ⟨𝐵𝑖 |𝜆𝑗 |𝐵𝑗 ⟩
= 𝜆𝑗 ⟨𝐵𝑖 |𝐵𝑗 ⟩
= 𝜆𝑗 𝛿𝑖𝑗 .

Let us also recall the completeness relation (3.81):


𝑛
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 | = 1. (4.15)
𝑖=1

Now, let |Ψ⟩ be the state of the system. Then:

𝑛 𝑛
⟨Ψ|𝐴|Ψ⟩ = ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) 𝐴 (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |) |Ψ⟩
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩⟨𝐵𝑗 |Ψ⟩
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ 𝜆𝑗 𝛿𝑖𝑗 ⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑗 |Ψ⟩.
𝑖=1 𝑗=1

When taking the sum over 𝑗, the Kronecker delta 𝛿𝑖𝑗 is always 0 except when 𝑗 = 𝑖.
Therefore the sum over 𝑗 always reduces to just one element, the one where 𝑗 = 𝑖.
We get:
𝑛
⟨Ψ|𝐴|Ψ⟩ = ∑ 𝜆𝑖 ⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩
𝑖=1
𝑛
= ∑ 𝜆𝑖 ⟨Ψ|𝐵𝑖 ⟩⟨Ψ|𝐵𝑖 ⟩∗
𝑖=1
𝑛
2
= ∑ 𝜆𝑖 |⟨Ψ|𝐵𝑖 ⟩| ,
𝑖=1

where in the second line we used the fact that switching the order of the vectors
in the inner product is equivalent to taking the complex conjugate.
2
Recall that |⟨Ψ|𝐵𝑖 ⟩| is the probability to measure the eigenvalue 𝜆𝑖 associated with
the eigenstate |𝐵𝑖 ⟩ given the state |Ψ⟩. Therefore, this is a sum of the possible

66
values of the measurement of 𝐴, weighted by their probabilities. But this exactly
the expected value for the measurement of 𝐴, as we defined in equation (3.173).
For this reason, we sometimes simply write ⟨𝐴⟩ (the usual notation for the ex­
pected value) instead of ⟨Ψ|𝐴|Ψ⟩, as long as it is clear that the expected value is
taken with respect to the state |Ψ⟩. If we want to specify the state explicitly, we
can also use the notation
⟨𝐴⟩Ψ ≡ ⟨Ψ|𝐴|Ψ⟩. (4.16)

Note that the terms “expected value” and “expectation value” are often used
interchangeably, but the former seems to be more popular in classical probability
theory while the latter is more popular in quantum theory.

Exercise 4.5. Calculate ⟨𝐴⟩Ψ where

1 0
𝐴≡( ), (4.17)
0 −1

for the following three states:

1 1
|Ψ1 ⟩ = √ ( ) , (4.18)
2 1

1 1
|Ψ2 ⟩ = √ ( ) , (4.19)
5 2

1 3
|Ψ3 ⟩ = √ ( ) . (4.20)
13 2

Exercise 4.6. Calculate ⟨𝐴⟩Ψ for 𝐴 and |Ψ⟩ as defined in exercise 4.4:

0 1 0 1
⎛ ⎞ 1 ⎛ ⎞
𝐴≡⎜
⎜ 1 0 0 ⎟
⎟, |Ψ⟩ ≡ √ ⎜ ⎜ −2 ⎟
⎟. (4.21)
15
⎝ 0 0 2 ⎠ ⎝ 3−i ⎠

Then, calculate the expected value explicitly as defined in equation (3.173), using
the probabilities you calculated in part (C) of exercise 4.4, and verify that you get
the same result.

4.1.7 Summary For Discrete Systems

To summarize, here are the axioms of quantum theory we formulated so far. Here
we formulate them specifically for discrete systems with finite­dimensional Hilbert
spaces:

67
1. The System Axiom: Discrete physical systems are represented by complex
𝑛­dimensional Hilbert spaces ℂ𝑛 , where 𝑛 depends on the specific system.

2. The State Axiom: The states of the system are represented by unit 𝑛­
vectors in the system’s Hilbert space, up to a complex phase.

3. The Operator Axiom: The operators on the system, which act on states
to produce other states, are represented by 𝑛 × 𝑛 matrices in the system’s
Hilbert space.

4. The Observable Axiom: Physical observables in the system are repre­


sented by Hermitian operators on the system’s Hilbert space. The eigenval­
ues of the observable (which are always real, since it’s Hermitian) represent
its possible measured values. The eigenstates of the observable can be used
to form an orthonormal eigenbasis of the Hilbert space.

5. The Probability Axiom: For any observable, the probability amplitude to


measure the eigenvalue corresponding to the eigenstate |𝐵𝑖 ⟩, given that the
system is in the state |Ψ⟩, is the inner product ⟨𝐵𝑖 |Ψ⟩. The probability is given
2
by the magnitude squared of the amplitude, |⟨𝐵𝑖 |Ψ⟩| .

We also discussed two important consequences of these axioms:

• Superposition: Any state |Ψ⟩ can be written as a linear combination of the


eigenstates |𝐵𝑖 ⟩ of an observable, with the probability amplitudes ⟨𝐵𝑖 |Ψ⟩ as
coefficients:
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (4.22)
𝑖=1

• Expectation Value: If the system is in the state |Ψ⟩, the expectation value
for the measurement of the observable 𝐴 is given by ⟨Ψ|𝐴|Ψ⟩.

There are some more axioms that we will add later, but first let us discuss a
concrete example of a physical quantum systems and see these axioms in action.

Problem 4.7. Are these axioms enough to actually do physics? If not, what do
you think is missing and why?

4.2 Two­State Systems, Spin 1/2, and Qubits

So far in this chapter, we discussed quantum theory in an abstract way. How­


ever, a theory of physics is useless without a concrete mapping between the
theory and reality. The simplest non­trivial28 quantum system is described by
28
1­dimensional Hilbert spaces are of course simpler, but they are trivial, since there is only one
state the system can be in, with probability 1.

68
a 2­dimensional Hilbert space, and is thus called a two­state system. All such
system can also be used as qubits, or quantum bits – where one state (doesn’t
matter which one) corresponds to 0 and the other state corresponds to 1. Let us
now describe such systems in detail.

4.2.1 The Pauli Matrices

Let us introduce the Pauli matrices:

0 1 0 −i 1 0
𝜎𝑥 ≡ ( ), 𝜎𝑦 ≡ ( ), 𝜎𝑧 ≡ ( ). (4.23)
1 0 i 0 0 −1

As the notation suggests, each matrix is associated with a spatial axis: 𝑥, 𝑦, and
𝑧. These three matrices have the following properties (here 𝑖 stands for 𝑥, 𝑦, or
𝑧):

• They are Hermitian: 𝜎𝑖† = 𝜎𝑖 . This means they can represent observables.

• They are unitary: 𝜎𝑖† = 𝜎𝑖−1 . This means they can represent transformations.

– Since they are both Hermitian and unitary, they are their own inverse:
𝜎𝑖 = 𝜎𝑖† = 𝜎𝑖−1 . This means that 𝜎𝑖2 = 1. A matrix which is its own inverse
is called involutory.

• They have two eigenvalues: +1 and −1.

– The eigenstates of 𝜎𝑥 are:

1 1 1 1
|+𝑥⟩ ≡ |+⟩ ≡ √ ( ) , |−𝑥⟩ ≡ |−⟩ ≡ √ ( ). (4.24)
2 1 2 −1

– The eigenstates of 𝜎𝑦 are:

1 1 1 1
|+𝑦⟩ ≡ |+ i⟩ ≡ √ ( ) , |−𝑦⟩ ≡ |− i⟩ ≡ √ ( ). (4.25)
2 i 2 −i

– The eigenstates of 𝜎𝑧 are:

1 0
|+𝑧⟩ ≡ |0⟩ ≡ ( ), |−𝑧⟩ ≡ |1⟩ ≡ ( ), (4.26)
0 1

where, confusingly, |0⟩ corresponds to the eigenvalue +1 and |1⟩ corre­


sponds to the eigenvalue −1 (but that is the standard convention).

69
• Since the Pauli matrices are normal, the eigenstates of each matrix form an
orthonormal eigenbasis of ℂ2 . As you can see, the eigenstates of 𝜎𝑧 are just
the standard basis.

• The eigenstates of 𝜎𝑥 and 𝜎𝑧 are related to each other as follows:

1 1
|+⟩ = √ (|0⟩ + |1⟩) , |−⟩ = √ (|0⟩ − |1⟩) , (4.27)
2 2
1 1
|0⟩ = √ (|+⟩ + |−⟩) , |1⟩ = √ (|+⟩ − |−⟩) . (4.28)
2 2
Problem 4.8. Prove that 𝜎𝑥 , 𝜎𝑦 and 𝜎𝑧 are Hermitian.

Problem 4.9. Prove that 𝜎𝑥 , 𝜎𝑦 and 𝜎𝑧 are unitary.

Problem 4.10. Consider the real vector space of 2 × 2 Hermitian matrices. This is
a vector space where the vectors are Hermitian matrices and the scalars are
real numbers. Don’t get confused: in an abstract vector space, anything can be
a “vector” – including numbers, matrices, tensors of higher rank, functions, and
even weirder stuff.
A. Show that the real vector space of 2 × 2 Hermitian matrices satisfies all of the
conditions in our definition of a vector space in section 3.2.1.
B. Show that the set {1, 𝜎𝑥 , 𝜎𝑦 , 𝜎𝑧 }, composed of the identity matrix 1 and the
three Pauli matrices, is a basis of the real vector space of 2 × 2 Hermitian matrices.
(Since we haven’t defined an inner product on this space, you don’t need to show
that the basis is orthonormal.)

4.2.2 Spin 1/2

Recall that in section 2.1.4 we saw that, in the Stern­Gerlach experiment, the
measurement of angular momentum of a particle had only one of two discrete
results: “spin up” (if the particle is deflected up) or “spin down” (if the particle is
deflected down).
More generally, in quantum theory, every particle has a property called spin, which
is a half­integer 𝑠:
1 3
𝑠 ∈ {0, , 1, , 2, …} . (4.29)
2 2
The measurement of intrinsic angular momentum of a particle of spin 𝑠, in any
direction, always returns one of the results in the set

{−𝑠, −𝑠 + 1, … , 𝑠 − 1, 𝑠} . (4.30)

Note that this set always contains 2𝑠 + 1 values. Thus:

70
• A particle of spin 0 always has intrinsic angular momentum 0;

• A particle of spin 1/2 has intrinsic angular momentum −1/2 or +1/2;

• A particle of spin 1 has intrinsic angular momentum −1, 0, or +1;

• A particle of spin 3/2 has intrinsic angular momentum −3/2, −1/2, +1/2, or
+3/2;

• and so on.

The particles in the Stern­Gerlach experiment have spin 1/2, where “spin up”
corresponds to intrinsic angular momentum +1/2 and “spin down” corresponds to
−1/2. Since these particles have exactly two possible states, spin up and down,
they can be represented as a two­state quantum system.
The Pauli matrix 𝜎𝑖 is a Hermitian operator, and thus it should correspond to an
observable. That observable is twice the spin in the 𝑖 direction, since the Pauli
matrices have eigenvalues ±1, but the spin should be ±1/2. It is thus customary
to define
1 1 1
𝑆𝑥 ≡ 𝜎𝑥 , 𝑆𝑦 ≡ 𝜎𝑦 , 𝑆𝑧 ≡ 𝜎𝑧 , (4.31)
2 2 2
such that 𝑆𝑖 is a Hermitian operator corresponding to spin ±1/2 along the 𝑖 direc­
tion. You can check that 𝑆𝑖 have the same eigenstates as 𝜎𝑖 , but they correspond
to the eigenvalues ±1/2 instead of ±1.
In problem 4.10 you proved that the set {1, 𝜎𝑥 , 𝜎𝑦 , 𝜎𝑧 } forms a basis for the real
vector space of 2 × 2 Hermitian matrices. This means that any Hermitian oper­
ator on the Hilbert space ℂ2 can be written as a linear combination of these 4
matrices. Since Hermitian operators correspond to observables, this means that
every possible observable in ℂ2 can be written in terms of the Pauli matrices
and the identity matrix.
In particular, given a unit vector v∈ℝ3 pointing in an arbitrary direction in space
(the real space, not the Hilbert space!)

v ≡ (𝑥, 𝑦, 𝑧) , √𝑥2 + 𝑦2 + 𝑧 2 = 1, (4.32)

we can represent the measurement of intrinsic angular momentum along that


direction as the Hermitian operator (on the Hilbert space, ℂ2 )

1 𝑧 𝑥 − i𝑦
𝑆v ≡ 𝑥𝑆𝑥 + 𝑦𝑆𝑦 + 𝑧𝑆𝑧 = ( ), (4.33)
2 𝑥 + i𝑦 −𝑧

which has the spin up and spin down eigenstates

1 1+𝑧 1 1−𝑧
|↑⟩ ≡ ( ), |↓⟩ ≡ ( ). (4.34)
√2 (1 + 𝑧) 𝑥 + i𝑦 √2 (1 − 𝑧) −𝑥 − i 𝑦

71
So we learn that, for a spin 1/2 particle, the measurement of intrinsic angular
momentum along any direction in space always yields one of exactly two possible
results – spin up, +1/2, or spin down, −1/2 – with the probability amplitudes
calculated using the Hermitian operator 𝑆v .

Exercise 4.11. Show that the eigenstates |↑⟩ and |↓⟩ indeed correspond to the
eigenstates of 𝑆𝑥 , 𝑆𝑦 , and 𝑆𝑧 – except the state |1⟩ (the −1/2 eigenstate of 𝑆𝑧 ),
which results in a division by zero in the bottom component.

Exercise 4.12. A spin­1/2 particle is in the state

1 1
|Ψ⟩ ≡ √ ( ) . (4.35)
10 3

A. What are the probabilities to measure spin up or down in the 𝑥 direction?


B. What are the probabilities to measure spin up or down in the 𝑦 direction?
C. What are the probabilities to measure spin up or down in the 𝑧 direction?
D. What are the probabilities to measure spin up or down in the direction of the
unit vector ( 13 , 23 , 23 )?
E. What are the expectation values for a measurement of spin in each of the
directions specified in (A)–(D)? (Make sure you are using 𝑆𝑖 and not 𝜎𝑖 for this
calculation!)

Problem 4.13.
A. Let us define the matrix commutator (or operator commutator):

[𝐴, 𝐵] ≡ 𝐴𝐵 − 𝐵𝐴. (4.36)

Show that the spin­1/2 operators 𝑆𝑖 have the commutation relations:

[𝑆𝑥 , 𝑆𝑦 ] = i 𝑆𝑧 , [𝑆𝑦 , 𝑆𝑧 ] = i 𝑆𝑥 , [𝑆𝑧 , 𝑆𝑥 ] = i 𝑆𝑦 . (4.37)

B. Show that the commutation relations (4.37) can be written compactly as


3
[𝑆𝑖 , 𝑆𝑗 ] = i ∑ 𝜖𝑖𝑗 𝑘 𝑆𝑘 , (4.38)
𝑘=1

where the indices 𝑖, 𝑗, 𝑘 take the values {1, 2, 3} corresponding to {𝑥, 𝑦, 𝑧}, and 𝜖𝑖𝑗 𝑘
is the Levi­Civita symbol, defined as

⎧+1 if (𝑖, 𝑗, 𝑘) is an even permutation of (1, 2, 3) ,


{
{
𝑘
𝜖𝑖𝑗 ≡ −1 if (𝑖, 𝑗, 𝑘) is an odd permutation of (1, 2, 3) , (4.39)

{
{0
⎩ otherwise.

72
By even permutation or odd permutation we mean that the permutation involves
exchanging elements an even or odd number of times. For example, (1, 3, 2) is an
odd permutation, because we exchanged elements once: 2 ↔ 3. However, (3, 1, 2)
is an even permutation, because we exchanged elements twice: 2 ↔ 3 and then
1 ↔ 3.
C. The matrix anti­commutator (or operator anti­commutator) is defined as fol­
lows:
{𝐴, 𝐵} ≡ 𝐴𝐵 + 𝐵𝐴. (4.40)

Show that the spin­1/2 operators 𝑆𝑖 have the anti­commutation relation

1
{𝑆𝑖 , 𝑆𝑗 } = 𝛿𝑖𝑗 , (4.41)
2
where 𝛿𝑖𝑗 is the Kronecker delta (times the identity matrix 1).

4.2.3 Qubits

A classical bit can be in one of two states: 0 or 1. A quantum bit, or qubit for
short, is instead in a superposition of two states, denoted |0⟩ and |1⟩:
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1, (4.42)

where 𝑎, 𝑏 ∈ ℂ are the probability amplitudes:

𝑎 ≡ ⟨0|Ψ⟩, 𝑏 ≡ ⟨1|Ψ⟩. (4.43)

Since the system has two states, it can be represented by the Hilbert space ℂ2 ,
and it is conventional to choose |0⟩ and |1⟩ to be the vectors in the standard basis,
which in this case is called the computational basis:

1 0
|0⟩ ≡ ( ), |1⟩ ≡ ( ). (4.44)
0 1

Any two­state quantum system can serve as a qubit. In fact, even systems with
more than two states can be used, as long as two of these states can be decoupled
(separated) from the rest. Some examples include:

• Any spin 1/2 particle, such as an electron, where |0⟩ and |1⟩ are the eigen­
states of the spin operator along the 𝑧 direction, 𝑆𝑧 , so they represent spin
up and spin down, respectively, along that direction.

• The number of particles (doesn’t matter what kind of particles) in a sys­


tem, where |0⟩ corresponds to a state with no particles (a vacuum) and |1⟩
corresponds to a state with exactly one particle.

73
• The polarization of a photon, where |0⟩ is horizontal and |1⟩ is vertical polariza­
tion. (In classical electromagnetism, an electromagnetic wave is composed
of oscillating electric and magnetic fields, and the polarization is the direction
of the electric field.)

Qubits are used in quantum computers as the basic units of computations, just
like bits in classical computers. Since so many different systems can be rep­
resented mathematically in the same way, we can build quantum computers in
many different ways. We will discuss quantum computers (from the theoretical
point of view) in more details later.

4.2.4 The Meaning of Superposition

In section 4.1.5 we discussed the concept of superposition, and we emphasized


that it is inaccurate to describe a system in a superposition of two states as
being “in both states at once”. Similarly, it is a common misconception that
quantum computers are powerful because qubits, which are in a superposition of
|0⟩ and |1⟩, and in some way “both 0 and 1 at the same time” and this allows the
quantum computer to “calculate all the possibilities at once”. That would have
been awesome, but unfortunately that is not how quantum computers work! We
will see how they really work later in this course.
Now that we are familiar with a concrete quantum system, we can use it to illus­
trate further the meaning of superposition. Let us consider, as a simple example,
a qubit in the state |+⟩ (the eigenstate of spin +1/2 in the 𝑥 direction):

1 1 1 0 1 1
|+⟩ = √ (|0⟩ + |1⟩) = √ (( ) + ( )) = √ ( ) . (4.45)
2 2 0 1 2 1

For simplicity, let us forget for a second that we are dealing with complex vectors,
and imagine that they are vectors in ℝ2 , since that is much easier to visualize; see
figure 4.1. As vectors in ℝ2 , the state |0⟩ = (1, 0) points east and the eigenstate
|1⟩ = (0, 1) points north. This does not mean that |+⟩ is “pointing both north and
east at the same time”. It does not, in fact, point in either of these directions;
instead, it points in a third direction, namely north­east.
In other words, if we just look at the vector represented by |+⟩, without consid­
ering any particular basis, it is just a vector pointing in one particular direction,
and in this direction only. The superposition only exists if we insist to represent
|+⟩ in this particular eigenbasis, but there can be another eigenbasis, e.g. the
eigenbasis composed of |+⟩ itself along with |−⟩, in which |+⟩ is not in a superpo­
sition.
What this means is that a state only appears to be in a superposition when we
choose a particular observable and represent that state as a superposition of

74
|1〉=(0,1)

1
|+〉= (1,1)
〈1|+〉= 1
2
2

〈+|+〉 = 1

1
〈0|±〉= |0〉=(1,0)
2

〈-|+〉 = 0

1
〈1|-〉=- 1
2
|-〉= (1,-1)
2

Figure 4.1: The eigenbasis {|0⟩ , |1⟩}, in red, and the eigenbasis {|+⟩ , |1⟩},
in blue. A qubit in the state |+⟩ is in a superposition of |0⟩ and |1⟩, but
this does not mean it is in the states |0⟩ and |1⟩ “at the same time” – it
is only in one state, |1⟩.

eigenstates with respect to that observable. But the system itself is still in the
same state, regardless of which eigenbasis we choose. The projections of the
state of the system on the basis eigenstates give us the probability amplitudes
relevant to that measurement; for example, in figure 4.1 we see that the probabil­

ity amplitudes to measure |0⟩ or |1⟩ are both 1/ 2. However, in the basis consisting
of |+⟩ and |−⟩, we instead have that the probability amplitude to measure |+⟩ is 1
and the probability amplitude to measure |−⟩ is 0.
In the specific case where the qubit is the spin of a spin­1/2 particle, we know
that if the qubit is in the state |+⟩, this means that a measurement of spin along

75
the 𝑥 axis will yield spin up with probability 1. We can say, if we want, that the
system is in a state of spin up along the 𝑥 axis, and this defines the state uniquely.
We also see that, in this basis, the system is not in a superposition; it is just one
state.
However, in the basis corresponding to measurement of spin along the 𝑧 axis, we

may write the state as a superposition, |+⟩ = (|0⟩ + |1⟩) / 2. This doesn’t mean
that the qubit is in both the states |0⟩ and |1⟩ “at the same time”; it means that
it is in a state where a measurement of spin along the 𝑧 axis will yield spin up or
spin down with equal probability.

If being in the superposition (|0⟩ + |1⟩) / 2 doesn’t mean that the qubit is both |0⟩
and |1⟩ at the same time, perhaps it could mean that the qubit is either |0⟩ or
|1⟩, but we just don’t know which one it is, and when we perform a measurement
we will discover which state it was in all along? Unfortunately, that interpretation
doesn’t work either. Theories where the system is in only one particular unknown
(“hidden”) state, but we only discover which one after we measure it, are called
hidden variable theories. They are mostly thought to be incorrect, since they
violate a theorem called Bell’s theorem, which we will learn about in section 4.3.6.
Some theories of hidden variables that are compatible with Bell’s theorem do
exist, but most physicists don’t believe they could replace quantum mechanics,
because they are complicated, contrived, and non­local; the latter means that
they allow faster­than­light or instantaneous communication29 . Indeed, some
non­local hidden variable theories, such as de Broglie–Bohm theory, require all
of the particles in the universe to be able to instantaneously communicate with
each other at all times!
So in conclusion, being in a superposition of two states doesn’t mean being in
both the first state and the second state, but also doesn’t mean being in either
the first state or the second state. Instead, we must conclude that the terms
“and” and “or” are classical terms that can only be used in a classical theory;
superposition is a new quantum term, which simply does not have any classical
analogue.
Compare this with our discussion of wave­particle duality in section 2.1.3. This
duality doesn’t mean that light is “both a wave and a particle”, and it also doesn’t
mean that light is “either a wave or a particle”. What it really means is that the
classical concepts of “wave” and “particle” are not the proper way to describe re­
ality. Similarly, it turns out that the classical terms “and” and “or” cannot be used
to describe reality at the deepest level; for that, we need to introduce quantum
superposition.
29
This doesn’t necessarily mean the theory allows us to send information faster than light. The
components of the system can communicate with each other faster than light, but not necessarily in
a way that we can actually control or make use of. We will discuss this in more detail in section 4.3.6.

76
4.3 Composite Systems and Quantum Entanglement

4.3.1 The Tensor Product

So far, we have only considered single, isolated physical systems, described by a


single Hilbert space. What if we have more than one system, such as a collection
of particles? This calls for a new axiom:
The Composite System Axiom: Given two physical quantum systems repre­
sented by two Hilbert spaces ℋ𝐴 and ℋ𝐵 respectively, the tensor product of the
two spaces, denoted
ℋ𝐴 ⊗ ℋ 𝐵 , (4.46)

is another Hilbert space, representing the composite system which combines the
two original systems. The dimension of the composite Hilbert space is the product
of the dimensions of the individual spaces:

dim (ℋ𝐴 ⊗ ℋ𝐵 ) = dim ℋ𝐴 ⋅ dim ℋ𝐵 . (4.47)

For example, the dimension of ℂ𝑚 ⊗ ℂ𝑛 is 𝑚𝑛.


Given a state |Ψ𝐴 ⟩ in ℋ𝐴 and a state |Ψ𝐵 ⟩ in ℋ𝐵 , we can use the tensor product
to form a new state in ℋ𝐴 ⊗ ℋ𝐵 :

|Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩ ∈ ℋ𝐴 ⊗ ℋ𝐵 . (4.48)

However, not all states in ℋ𝐴 ⊗ ℋ𝐵 are necessarily of this form; this fact will
prove essential soon, when we discuss entanglement. Furthermore, if |𝐴𝑖 ⟩, 𝑖 ∈
{1, … , 𝑚} is an orthonormal basis of ℋ𝐴 and ∣𝐵𝑗 ⟩, 𝑗 ∈ {1, … , 𝑛} is an orthonormal
basis of ℋ𝐵 , then

|𝐴𝑖 ⟩ ⊗ ∣𝐵𝑗 ⟩ , 𝑖 ∈ {1, … , 𝑚} , 𝑗 ∈ {1, … , 𝑛} , (4.49)

is an orthonormal basis of ℋ𝐴 ⊗ ℋ𝐵 . Note that there are 𝑚𝑛 basis states in total,


since the dimension of the composite Hilbert space is 𝑚𝑛.
The tensor product is linear. This means that for 𝜆 ∈ ℂ, |Ψ𝐴 ⟩ ∈ ℋ𝐴 , and |Ψ𝐵 ⟩ ∈ ℋ𝐵
we have
𝜆 (|Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩) = (𝜆 |Ψ𝐴 ⟩) ⊗ |Ψ𝐵 ⟩ = |Ψ𝐴 ⟩ ⊗ (𝜆 |Ψ𝐵 ⟩) , (4.50)

for |Ψ𝐴 ⟩ , |Φ𝐴 ⟩ ∈ ℋ𝐴 and |Θ𝐵 ⟩ ∈ ℋ𝐵 we have

( |Ψ𝐴 ⟩ + |Φ𝐴 ⟩) ⊗ |Θ𝐵 ⟩ = |Ψ𝐴 ⟩ ⊗ |Θ𝐵 ⟩ + |Φ𝐴 ⟩ ⊗ |Θ𝐵 ⟩ , (4.51)

77
and for |Ψ𝐴 ⟩ ∈ ℋ𝐴 and |Θ𝐵 ⟩ , |Ω𝐵 ⟩ ∈ ℋ𝐵 we have

|Ψ𝐴 ⟩ ⊗ ( |Θ𝐵 ⟩ + |Ω𝐵 ⟩) = |Ψ𝐴 ⟩ ⊗ |Θ𝐵 ⟩ + |Ψ𝐴 ⟩ ⊗ |Ω𝐵 ⟩ . (4.52)

In particular, notice from equation (4.50) that scalars commute with the tensor
product, so we can move them in or out of the product as we see fit – just as,
until now, we have been moving scalars in and out of inner and outer products.
Importantly, the tensor product itself is not commutative:

|Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩ ≠ |Ψ𝐵 ⟩ ⊗ |Ψ𝐴 ⟩ . (4.53)

The order matters, since the first state must come from the first Hilbert space,
and the second state must come from the second Hilbert space – which may
be a completely different space with completely different states. For example,
in the tensor product ℂ2 ⊗ ℂ3 the first state must be represented by a 2­vector
while the second state must be represented by a 3­vector – so they cannot be
interchanged.
Now, if 𝑂𝐴 is an operator on ℋ𝐴 and 𝑂𝐵 is an operator on ℋ𝐵 , then 𝑂𝐴 ⊗ 𝑂𝐵 is
an operator on ℋ𝐴 ⊗ ℋ𝐵 , which is defined such that each operator acts only on
the state coming from the same space as that operator:

(𝑂𝐴 ⊗ 𝑂𝐵 ) ( |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩) = (𝑂𝐴 |Ψ𝐴 ⟩) ⊗ (𝑂𝐵 |Ψ𝐵 ⟩) . (4.54)

In other words, the first operator in the product 𝑂𝐴 ⊗ 𝑂𝐵 acts only on the first
state in the product |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩, and the second operator in the product 𝑂𝐴 ⊗ 𝑂𝐵
acts only on the second state in the product |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩. This has to be the case,
since e.g. in the tensor product ℂ2 ⊗ ℂ3 the first operator must be represented by
a 2×2 matrix and act on 2­vectors while the second operator must be represented
by a 3 × 3 matrix and act on 3­vectors. Note that, as for the tensor product of
states, not all operators in ℋ𝐴 ⊗ ℋ𝐵 are necessarily of this form.
If we have two bras ⟨Ψ𝐴 | ∈ ℋ𝐴 and ⟨Ψ𝐵 | ∈ ℋ𝐵 , their tensor product ⟨Ψ𝐴 | ⊗ ⟨Ψ𝐵 | is a
bra in ℋ𝐴 ⊗ ℋ𝐵 , and the inner product of this bra with a ket of the form |Φ𝐴 ⟩ ⊗ |Φ𝐵 ⟩
in ℋ𝐴 ⊗ ℋ𝐵 is defined by taking the inner products of each bra with the ket from
the same space:

( ⟨Ψ𝐴 | ⊗ ⟨Ψ𝐵 |) ( |Φ𝐴 ⟩ ⊗ |Φ𝐵 ⟩) = ⟨Ψ𝐴 |Φ𝐴 ⟩⟨Ψ𝐵 |Φ𝐵 ⟩. (4.55)

The first bra acts only on the first ket and the second bra acts only on the
second ket. Once again, the inner product must work this way, since for example
in ℂ2 ⊗ ℂ3 we can only take the inner product of 2­vectors with 2­vectors and
3­vectors with 3­vectors – the inner product of a 2­vector with a 3­vector is

78
undefined.
Similarly, if we have two operators 𝑂𝐴 , 𝑃𝐴 ∈ ℋ𝐴 and two operators 𝑂𝐵 , 𝑃𝐵 ∈ ℋ𝐵 ,
then the composite operator 𝑂𝐴 ⊗ 𝑂𝐵 ∈ ℋ𝐴 ⊗ ℋ𝐵 acts on the composite operator
𝑃𝐴 ⊗ 𝑃𝐵 ∈ ℋ𝐴 ⊗ ℋ𝐵 in the only way that makes sense, with each operator acting
on the operator from the same space:

(𝑂𝐴 ⊗ 𝑂𝐵 ) (𝑃𝐴 ⊗ 𝑃𝐵 ) = 𝑂𝐴 𝑃𝐴 ⊗ 𝑂𝐵 𝑃𝐵 . (4.56)

Finally, above we stated the Composite System Axiom for two quantum systems,
but we can use it recursively to define the composite Hilbert space of any number
of systems: just take the tensor product of all the Hilbert spaces together,

ℋ𝐴 ⊗ ℋ 𝐵 ⊗ ℋ 𝐶 ⊗ … (4.57)

Everything we defined above still applies, with the obvious generalizations.

Problem 4.14. Let ℋ𝐴 and ℋ𝐵 be two Hilbert spaces. Find an isomorphism


between the composite Hilbert spaces ℋ𝐴 ⊗ ℋ𝐵 and ℋ𝐵 ⊗ ℋ𝐴 .

Problem 4.15. Let ℋ𝐴 and ℋ𝐵 be two Hilbert spaces, let 𝐴 be an operator in


ℋ𝐴 , and let 𝐵 be an operator in ℋ𝐵 .
A. Construct an operator in ℋ𝐴 ⊗ ℋ𝐵 which acts as 𝐴 does on the states of ℋ𝐴 ,
but leaves the states of ℋ𝐵 unchanged.
B. Construct an operator in ℋ𝐴 ⊗ ℋ𝐵 which acts as 𝐵 does on the states of ℋ𝐵 ,
but leaves the states of ℋ𝐴 unchanged.
C. Show that the two operators you constructed commute by calculating their
commutator as defined in equation (4.36).

4.3.2 Vectors and Matrices in the Composite Hilbert Space

Consider the tensor product ℂ𝑚 ⊗ ℂ𝑛 . Since the dimension of this Hilbert space is
𝑚𝑛, and since in any finite­dimensional Hilbert space we know how to represent
states as vectors and operators as matrices of the same dimension as the Hilbert
space (as discussed in section 3.2.7 and section 3.2.15 respectively), we conclude
that states in ℂ𝑚 ⊗ ℂ𝑛 can be represented as 𝑚𝑛­vectors and operators in ℂ𝑚 ⊗ ℂ𝑛
can be represented as 𝑚𝑛 × 𝑚𝑛 matrices. In other words, ℂ𝑚 ⊗ ℂ𝑛 is isomorphic
to ℂ𝑚𝑛 .

79
Explicitly, for two states represented by the vectors30

Ψ1 Φ1

⎜ ⎞
⎟ ⎛ ⎞
|Ψ⟩ ≡ ⎜ ⋮ 𝑚
⎟∈ℂ , |Φ⟩ ≡ ⎜ ⋮ ⎟
⎜ 𝑛
⎟∈ℂ , (4.58)
⎝ Ψ𝑚 ⎠ ⎝ Φ𝑛 ⎠

we define the tensor product as follows:

Φ1 Ψ1 Φ1
⎛ ⎛ ⎞ ⎞ ⎛ ⎞

⎜ Ψ1 ⎜ ⋮ ⎟
⎜ ⎟ ⎟
⎟ ⎜
⎜ ⋮ ⎟


⎜ ⎟
⎟ ⎜
⎜ ⎟

Ψ1 |Φ⟩ ⎜ ⎝ Φ 𝑛 ⎠ ⎟ ⎜ Ψ1 Φ𝑛 ⎟
⎛ ⎞ ⎜ ⎟ ⎜ ⎟
|Ψ⟩ ⊗ |Φ⟩ ≡ ⎜
⎜ ⋮ ⎟ =
⎟ ⎜ ⎜ ⋮ ⎟
⎟ = ⎜
⎜ ⋮ ⎟
⎟ ∈ ℂ𝑚𝑛 . (4.59)
⎜ ⎟ ⎜ ⎟
⎝ Ψ𝑚 |Φ⟩ ⎠ ⎜ ⎜ Φ 1 ⎟
⎟ ⎜
⎜ Ψ𝑚 Φ1 ⎟

⎜ ⎛ ⎞
⎜ Ψ𝑚 ⎜ ⋮ ⎟ ⎟ ⎜
⎜ ⎟ ⎟ ⎜ ⋮ ⎟

⎝ ⎝ Φ𝑛 ⎠ ⎠ ⎝ Ψ𝑚 Φ𝑛 ⎠

For example:

3 1⋅3 3

⎜ 1⋅( ) ⎞ ⎟ ⎛ ⎞ ⎛ ⎞
1 3 ⎜ 4 ⎟ ⎜ 1⋅4 ⎟ ⎜ 4 ⎟

( )⊗( )=⎜ ⎟
⎟ =⎜




⎟ =⎜ ⎟
⎜ ⎟. (4.60)
2 4 ⎜ 2⋅( 3 ) ⎟
⎜ ⎟ ⎜ 2⋅3 ⎟ ⎜⎜ 6 ⎟

⎝ 4 ⎠ ⎝ 2⋅4 ⎠ ⎝ ⎠8

Similarly, for two operators represented by the matrices31

𝐴11 ⋯ 𝐴1𝑚 𝐵11 ⋯ 𝐵1𝑛



⎜ ⎞
⎟ 𝑚×𝑚 ⎛
⎜ ⎞
⎟ 𝑛×𝑛
𝐴≡⎜ ⋮ ⋱ ⋮ ⎟∈ℂ , 𝐵≡⎜ ⋮ ⋱ ⋮ ⎟∈ℂ , (4.61)
⎝ 𝐴𝑚1 ⋯ 𝐴𝑚𝑚 ⎠ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠
30
Note that before we used the subscript to indicate which space the state belongs to, but now
the subscript is instead a vector index.
31
Here, ℂ𝑛×𝑛 denotes the space of 𝑛 × 𝑛 complex matrices.

80
we define the tensor product as follows32 :

𝐴11 𝐵 ⋯ 𝐴1𝑚 𝐵

⎜ ⎞

𝐴⊗𝐵 ≡⎜ ⋮ ⋱ ⋮ ⎟
𝐴
⎝ 𝑚1 𝐵 ⋯ 𝐴 𝑚𝑚 𝐵 ⎠
𝐵11 ⋯ 𝐵1𝑛 𝐵11 ⋯ 𝐵1𝑛
⎛ ⎛
⎜ ⎞
⎟ ⎛
⎜ ⎞
⎟ ⎞

⎜ 𝐴11 ⎜ ⋮ ⋱ ⋮ ⎟ ⋯ 𝐴1𝑚 ⎜ ⋮ ⋱ ⋮ ⎟ ⎟


⎜ ⎟

⎜ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ ⎟
=⎜

⎜ ⋮ ⋱ ⋮ ⎟



⎜ 𝐵11 ⋯ 𝐵1𝑛 𝐵11 ⋯ 𝐵1𝑛 ⎟


⎜ ⎛ ⎞ ⎛ ⎞ ⎟


⎜ 𝐴𝑚1 ⎜ ⋮ ⋱ ⋮ ⎟ ⎜
⎟ ⋯ 𝐴𝑚𝑚 ⎜ ⋮ ⋱ ⋮ ⎟
⎟ ⎟
⎝ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ 𝐵
⎝ 𝑛1 ⋯ 𝐵 𝑛𝑛 ⎠ ⎠
𝐴11 𝐵11 ⋯ 𝐴11 𝐵1𝑛 ⋯ 𝐴1𝑚 𝐵11 ⋯ 𝐴1𝑚 𝐵1𝑛

⎜ ⎞

⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟

⎜ ⎟

⎜ 𝐴11 𝐵𝑛1 ⋯ 𝐴11 𝐵𝑛𝑛 ⋯ 𝐴1𝑚 𝐵𝑛1 ⋯ 𝐴1𝑚 𝐵𝑛𝑛 ⎟
=⎜

⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⎟

⎟ ∈ ℂ𝑚𝑛×𝑚𝑛 .

⎜ 𝐴𝑚1 𝐵11 ⋯ 𝐴𝑚1 𝐵1𝑛 ⋯ 𝐴𝑚𝑚 𝐵11 ⋯ 𝐴𝑚𝑚 𝐵1𝑛 ⎟


⎜ ⎟

⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ 𝐴𝑚1 𝐵𝑛1 ⋯ 𝐴𝑚1 𝐵𝑛𝑛 ⋯ 𝐴𝑚𝑚 𝐵𝑛1 ⋯ 𝐴𝑚𝑚 𝐵𝑛𝑛 ⎠

For example:

3 0 3 0

⎜ 0⋅( ) 1⋅( ) ⎞

0 1 3 0 ⎜ 0 4 0 4 ⎟
( )⊗( ⎜
)=⎜ ⎟

2 0 0 4 ⎜
⎜ 2⋅( 3 0 ) 0⋅( 3 0 ⎟
) ⎟
⎝ 0 4 0 4 ⎠
0⋅3 0⋅0 1⋅3 1⋅0

⎜ ⎞

0⋅0 0⋅4 1⋅0 1⋅4
=⎜





⎜ 2⋅3 2⋅0 0⋅3 0⋅0 ⎟
⎝ 2⋅0 2⋅4 0⋅0 0⋅4 ⎠
0 0 3 0

⎜ ⎞

0 0 0 4
=⎜




⎟ .
⎜ 6 0 0 0 ⎟
⎝ 0 8 0 0 ⎠

Exercise 4.16. For the specific |Ψ⟩, |Φ⟩, 𝐴, and 𝐵 we used above:

1 3
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ), (4.62)
2 4
32
Note that the tensor product of vectors is a special case of the tensor product of matrices, with
the vectors treated as single­column matrices.

81
0 1 3 0
𝐴≡( ), 𝐵≡( ), (4.63)
2 0 0 4

calculate
(𝐴 ⊗ 𝐵) ( |Ψ⟩ ⊗ |Φ⟩) . (4.64)

Do so in two ways:

1. Directly in the composite Hilbert space33 ℂ2 ⊗ ℂ2 ≃ ℂ4 using the 4 × 4 matrix


and 4­vector found above.

2. Separately in each of the component spaces using the two 2 × 2 matrices


and the two 2­vectors (acting with the first matrix on the first vector and
the second matrix on the second vector), and then calculating the tensor
product of the results.

Then compare your results and verify that they are the same.

Problem 4.17.
A. Prove that the tensor product preserves the adjoint operation on both vectors
and matrices. That is,


( |Ψ⟩ ⊗ |Φ⟩) = ⟨Ψ| ⊗ ⟨Φ| , (𝐴 ⊗ 𝐵) = 𝐴† ⊗ 𝐵† . (4.65)

B. Prove that the tensor product of two Hermitian operators is Hermitian, and the
tensor product of two unitary operators is unitary.

Problem 4.18. Consider the tensor product ℂ𝑚 ⊗ ℂ𝑛 for arbitrary 𝑚 and 𝑛. Show
that the standard basis of ℂ𝑚 ⊗ ℂ𝑛 is obtained by taking the tensor products of
the standard basis states of ℂ𝑚 and ℂ𝑛 .

Exercise 4.19. Calculate the tensor product

|+⟩ ⊗ |−⟩ ⊗ |0⟩ , (4.66)

where |+⟩ and |−⟩ are the +1 and −1 eigenstates of 𝜎𝑥 respectively, and |0⟩ is the
+1 eigenstate of 𝜎𝑧 (see section 4.2.1).

Exercise 4.20.
A. Calculate the tensor product operator

𝐴 ≡ 𝑆 𝑥 ⊗ 𝑆𝑧 , (4.67)

where 𝑆𝑥 and 𝑆𝑧 were defined in equation (4.31).


33
Here, ≃ means “isomorphic to”.

82
B. Calculate the tensor product state

|Ψ⟩ ≡ |+⟩ ⊗ |1⟩ , (4.68)

where |+⟩ and |1⟩ were defined in section 4.2.1.


C. Calculate the expectation value ⟨𝐴⟩Ψ .

4.3.3 Quantum Entanglement

Consider a composite system of two qubits. In the computational (standard)


basis, each of the qubits is a superposition of the two basis eigenstates |0⟩ and |1⟩.
Let us name the first qubit 𝐴 and the second qubit 𝐵. In the composite Hilbert
space ℋ𝐴 ⊗ ℋ𝐵 , the computational basis has four eigenstates:

|0⟩ ⊗ |0⟩ , |0⟩ ⊗ |1⟩ , |1⟩ ⊗ |0⟩ , |1⟩ ⊗ |1⟩ , (4.69)

where in each of these, the first state is the state of qubit 𝐴 and the second is
the state of qubit 𝐵. Thus |0⟩ ⊗ |0⟩ corresponds to |0⟩ for both qubits, |0⟩ ⊗ |1⟩
corresponds to |0⟩ for qubit 𝐴 and |1⟩ for qubit 𝐵, |1⟩ ⊗ |0⟩ corresponds to |1⟩ for
qubit 𝐴 and |0⟩ for qubit 𝐵, and |1⟩ ⊗ |1⟩ corresponds to |1⟩ for both qubits.
These four eigenstates have the following representations in terms of vectors in
ℂ4 :

1 0
⎜ 0 ⎞
⎛ ⎟ ⎜ 1 ⎞
⎛ ⎟
1 1 1 0
|0⟩ ⊗ |0⟩ = ( ) ⊗ ( ) = ⎜



⎟, |0⟩ ⊗ |1⟩ = ( ) ⊗ ( ) = ⎜
⎜ ⎟
⎟ , (4.70)
0 0 ⎜ 0 ⎟
⎟ 0 1 ⎜ 0 ⎟
⎜ ⎟
⎝ 0 ⎠ ⎝ 0 ⎠

0 0

⎜ ⎞ ⎛ ⎞
0 1 0 ⎟ 0 0 ⎜ 0 ⎟
|1⟩ ⊗ |0⟩ = ( ) ⊗ ( ) = ⎜



⎟, |1⟩ ⊗ |1⟩ = ( ) ⊗ ( ) = ⎜
⎜ ⎟
⎟ . (4.71)
1 0 ⎜ 1 ⎟
⎟ 1 1 ⎜ 0 ⎟
⎜ ⎟
⎝ 0 ⎠ ⎝ 1 ⎠
So we see that they are, in fact, just the standard basis of ℂ4 .
The most general state of both qubits is described as a superposition of all possible
combinations:

𝛼00

⎜ 𝛼01 ⎞

|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ = ⎜




⎟ , (4.72)
⎜ 𝛼10 ⎟
⎝ 𝛼11 ⎠

where 𝛼00 , 𝛼01 , 𝛼10 , 𝛼11 ∈ ℂ and, of course, the coefficients should be chosen such

83
that the state is normalized to 1:
2 2 2 2
|𝛼00 | + |𝛼01 | + |𝛼10 | + |𝛼11 | = 1. (4.73)

We would now like to ask: when do the two qubits depend on each other? More
precisely, under what conditions can qubit 𝐴 be |0⟩ or |1⟩ independently of the
state of qubit 𝐵, and vice versa? As we will now see, this depends on the coeffi­
cients 𝛼𝑖𝑗 .
A separable state is a state which can be written as just one tensor product
instead of a sum of tensor products, that is, a state of the form

|Ψ⟩ = |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩ , (4.74)

where |Ψ𝐴 ⟩ is the state of qubit 𝐴 and |Ψ𝐵 ⟩ is the state of qubit 𝐵. If we can write
the state in this way, then we have separated the states from one another, in
the sense that whatever value |Ψ𝐴 ⟩ has is completely independent of the value of
|Ψ𝐵 ⟩ (and vice versa). In other words, the overall state of the composite system
is just the tensor product of the independent states of the individual systems.
A simple example of a separable state would be:

|Ψ⟩ = |0⟩ ⊗ |0⟩ . (4.75)

This just means that both qubits are, with 100% probability, in the state |0⟩:

|Ψ𝐴 ⟩ = |0⟩ , |Ψ𝐵 ⟩ = |0⟩ . (4.76)

A more interesting separable state is:

1
|Ψ⟩ = (|0⟩ ⊗ |0⟩ + |0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩ + |1⟩ ⊗ |1⟩) . (4.77)
2
To see that it is separable, we simplify it using the distributive property, and get:

1 1
|Ψ⟩ = √ (|0⟩ + |1⟩) ⊗ √ (|0⟩ + |1⟩) . (4.78)
2 2

In other words, both qubits are in a state where either 0 or 1 is possible with 50%
probability, that is:

1 1
|Ψ𝐴 ⟩ = √ (|0⟩ + |1⟩) , |Ψ𝐵 ⟩ = √ (|0⟩ + |1⟩) . (4.79)
2 2

A state which is not separable is called an entangled state. Here is an example

84
of an entangled state:

1
|Ψ⟩ = √ (|0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩) . (4.80)
2

No matter how much we try, we can never write it as just one tensor product; it
is always going to be the sum of two tensor products! This means that the state
of each qubit is no longer independent of the state of the other qubit. Indeed,
if qubit 𝐴 is in the state |0⟩ then qubit 𝐵 must be in the state |1⟩ (due to the
first term), and if qubit 𝐴 is in the state |1⟩ then qubit 𝐵 must be in the state |0⟩
(due to the second term). This is precisely what it means for two systems to be
entangled.
More generally, consider again a composite system in the state

𝛼00

⎜ ⎞

𝛼01
|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ = ⎜




⎟ , (4.81)
⎜ 𝛼10 ⎟
⎝ 𝛼11 ⎠

where 𝛼00 , 𝛼01 , 𝛼10 , 𝛼11 ∈ ℂ. If it is separable, then we should be able to write it in
the form
|Ψ⟩ = (𝛽0 |0⟩ + 𝛽1 |1⟩) ⊗ (𝛾0 |0⟩ + 𝛾1 |1⟩) , (4.82)

where 𝛽0 , 𝛽1 , 𝛾0 , 𝛾1 ∈ ℂ. Expanding the last equation, we get

|Ψ⟩ = 𝛽0 𝛾0 |0⟩ ⊗ |0⟩ + 𝛽0 𝛾1 |0⟩ ⊗ |1⟩ + 𝛽1 𝛾0 |1⟩ ⊗ |0⟩ + 𝛽1 𝛾1 |1⟩ ⊗ |1⟩ . (4.83)

So we should have:
𝛼𝑖𝑗 = 𝛽𝑖 𝛾𝑗 , 𝑖, 𝑗 ∈ {0, 1} , (4.84)

or explicitly:

𝛼00 = 𝛽0 𝛾0 , 𝛼01 = 𝛽0 𝛾1 , 𝛼10 = 𝛽1 𝛾0 , 𝛼11 = 𝛽1 𝛾1 . (4.85)

If this is true, then in particular

𝛼00 𝛼11 − 𝛼01 𝛼10 = (𝛽0 𝛾0 ) (𝛽1 𝛾1 ) − (𝛽0 𝛾1 ) (𝛽1 𝛾0 )


= 𝛽0 𝛽1 𝛾0 𝛾1 − 𝛽0 𝛽1 𝛾0 𝛾1
= 0.

85
Now, if 𝛼𝑖𝑗 are the components of a matrix34 ,

𝛼00 𝛼01
𝛼=( ), (4.88)
𝛼10 𝛼11

then the quantity 𝛼00 𝛼11 − 𝛼01 𝛼10 is called the determinant of the matrix, denoted
det 𝛼:
det 𝛼 ≡ 𝛼00 𝛼11 − 𝛼01 𝛼10 . (4.89)

We have proven that, if the composite state is separable (not entangled), then
the matrix of the coefficients has vanishing determinant. Below you will prove
that this also works in the opposite direction; thus, a composite state of two
qubits is separable if and only if det 𝛼 = 0.
Let us check this. The state in equation (4.75) is separable, since it has

det 𝛼 = 1 ⋅ 0 − 0 ⋅ 0 = 0. (4.90)

The state in equation (4.77) is also separable, since it has

1 1 1 1
det 𝛼 = ⋅ − ⋅ = 0. (4.91)
2 2 2 2
However, the state in equation (4.80) is entangled, since it has

1 1 1
det 𝛼 = 0 ⋅ 0 − √ ⋅ √ = − ≠ 0. (4.92)
2 2 2

Unfortunately, this simple rule only works for a composite system of 2 qubits.
The problem of finding whether a given state of a composite system is separable
or entangled is called the separability problem, and it is, for general states, a
difficult problem to solve!
Problem 4.21. Prove that, for a composite state of two qubits given by

|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ , (4.93)
34
This is actually the matrix that would be obtained if, instead of writing the composite state of
two qubits as a vector in ℂ4 , we wrote it as the outer products of the qubits, which would be a 2 × 2
matrix. Explicitly, you can check that:

1 0 0 1 0 0 0 0
|0⟩ ⟨0| = ( ), |0⟩ ⟨1| = ( ), |1⟩ ⟨0| = ( ), |1⟩ ⟨1| = ( ), (4.86)
0 0 0 0 1 0 0 1

so in this representation, we would get

𝛼00 𝛼01
|Ψ⟩ = 𝛼00 |0⟩ ⟨0| + 𝛼01 |0⟩ ⟨1| + 𝛼10 |1⟩ ⟨0| + 𝛼11 |1⟩ ⟨1| = ( ). (4.87)
𝛼10 𝛼11

The reason we do not use the outer product representation for two­qubit states is that writing them
as vectors in ℂ4 allows us to act on them with operators given by 4 × 4 matrices, just as we would
act on single qubit with operators given by 2 × 2 matrices.

86
the state is separable if

𝛼00 𝛼01
det ( ) = 𝛼00 𝛼11 − 𝛼01 𝛼10 = 0. (4.94)
𝛼10 𝛼11

This is the opposite direction to what we proved above, which is that if the deter­
minant is zero, then the state is separable.

Problem 4.22. Find two separable states and two entangled states of three
qubits, and prove that they are separable/entangled.

4.3.4 The Bell States

Let us define the Bell states, also known as35 EPR states:

1 𝑥
∣𝛽𝑥𝑦 ⟩ ≡ √ (|0⟩ ⊗ |𝑦⟩ + (−1) |1⟩ ⊗ |1 − 𝑦⟩) , 𝑥, 𝑦 ∈ {0, 1} . (4.95)
2

Explicitly, the four choices for 𝑥 and 𝑦 give:

1
|𝛽00 ⟩ ≡ √ (|0⟩ ⊗ |0⟩ + |1⟩ ⊗ |1⟩) , (4.96)
2
1
|𝛽01 ⟩ ≡ √ (|0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩) , (4.97)
2
1
|𝛽10 ⟩ ≡ √ (|0⟩ ⊗ |0⟩ − |1⟩ ⊗ |1⟩) , (4.98)
2
1
|𝛽11 ⟩ ≡ √ (|0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩) . (4.99)
2
It is useful to adopt a shorthand notation where we write

|𝑥𝑦⟩ ≡ |𝑥⟩ ⊗ |𝑦⟩ , (4.100)

so

|00⟩ ≡ |0⟩ ⊗ |0⟩ , |01⟩ ≡ |0⟩ ⊗ |1⟩ , |10⟩ ≡ |1⟩ ⊗ |0⟩ , |11⟩ ≡ |1⟩ ⊗ |1⟩ . (4.101)

In this notation, the Bell states are

1
|𝛽00 ⟩ ≡ √ (|00⟩ + |11⟩) , (4.102)
2
1
|𝛽01 ⟩ ≡ √ (|01⟩ + |10⟩) , (4.103)
2
35
EPR stands for Einstein, Podolsky, and Rosen.

87
1
|𝛽10 ⟩ ≡ √ (|00⟩ − |11⟩) , (4.104)
2
1
|𝛽11 ⟩ ≡ √ (|01⟩ − |10⟩) . (4.105)
2
The Bell states have important applications in quantum information and compu­
tation, as we will see below.

Exercise 4.23. Write down the representations of the four Bell states as 4­
vectors in ℂ2 ⊗ ℂ2 ≃ ℂ4 .

Problem 4.24. Prove that the four Bell states form an orthonormal basis for the
composite Hilbert space of two qubits, by showing that they span that space, are
linearly independent, are orthogonal, and are normalized to 1.

Problem 4.25. Prove that each of the four Bell states is entangled.

Exercise 4.26. Write down the four Bell states in terms of |+⟩ and |−⟩, the eigen­
states of 𝜎𝑥 . You may wish to use the shorthand notation |±±⟩ ≡ |±⟩ ⊗ |±⟩.

4.3.5 Entanglement Does Not Transmit Information

Now that we have rigorously defined quantum entanglement, let us debunk the
most common misconception associated with it: that quantum entanglement al­
lows us to transmit information, and in particular, that it allows us to do so faster
than the speed of light (or even instantaneously), in violation of relativity. This
is, in fact, not true.
To illustrate this, imagine the following scenario. Alice and Bob create an entan­
gled pair of qubits, for example in the Bell state

1
|𝛽01 ⟩ ≡ √ (|01⟩ + |10⟩) . (4.106)
2

Alice takes the first qubit in the pair, and Bob takes the second qubit. Alice then
stays on Earth, while Bob embarks on a long journey to Alpha Centauri, about
4.4 light years away. When Bob gets there, he measures his qubit. He has a 50%
chance to observe 0 and a 50% chance to observe 1. However, if he observes 0
he knows that Alice will surely observe 1 whenever she measures her qubit, and
if he observes 1 he knows that Alice will surely observe 0, since the qubits must
have opposite values.
So it seems that Bob now knows something about Alice’s qubit that he did not
know before. Furthermore, he knows that instantly – even though Alice is 4.4
light years away, and thus according to relativity, that information should have
taken at least 4.4 years to travel between them. But has any information actually
been transferred between them?

88
The answer is no! All Bob did was observe a random event. Bob cannot control
which value he observes when he measures the qubit, 0 or 1; he can only observe
it, and randomly get whatever he gets. He gains information about Alice’s qubit,
which is completely random, but he does not receive any specific message from
Alice, nor can he transmit any specific information to Alice by observing his qubit.
In fact, there is a theorem called the no­communication theorem which rigor­
ously proves that no information can be transmitted using quantum entanglement,
whether faster than light or otherwise. Whatever you measure, it must be com­
pletely random. (Unfortunately, the proof of this theorem uses some advanced
tools that we will not learn in this course, so we will not present it here.)
The fact that a measurement of one qubit determines the measurement of another
qubit might seem like it indicates that some information must be transmitted
between the qubits themselves, so that they “know” about each other’s states.
However, there isn’t any actual need to transmit information between the two
entangled qubits in order for them to match their measurements! After all, the
entangled state does not depend on the distance between the qubits, whether in
time or in space; it is simply the combined state of the two qubits, wherever or
whenever they might be.
Consider now the following completely classical scenario. Let’s say I write 0 on
one piece of paper and 1 on another piece of paper. I then put each piece of
paper in a separate sealed envelope, and randomly give one envelope to Alice
and the other to Bob. When Bob gets to Alpha Centauri, he opens his envelope.
If he sees 0 he knows that Alice’s envelope says 1, and if he sees 1 he knows that
Alice’s envelope says 0.
Obviously, this does not allow any information to be transmitted between Alice
and Bob, nor does each envelope need to “know” what’s inside the other envelope
in order for the measurements to match. If Bob sees 0, then the piece of paper
saying 0 was inside the envelope all along, and the piece of paper saying 1 was
inside Alice’s envelope all along – and vice versa. The envelopes are classically
correlated, and nothing weird is going on. What, then, is the difference between
this classical correlation and quantum entanglement? The answer to this question
can be made precise using Bell’s theorem, which we will now formulate.

4.3.6 Bell’s Theorem and Bell’s Inequality

Bell’s theorem proves that the predictions of quantum theory cannot be explained
by theories of local hidden variables, which we first mentioned in section 4.2.4.
These are deterministic theories, where measurements of quantum systems such
as qubits have pre­existing values. For example, if we measured 0, then the
qubit always had the value 0; we could have, in fact, predicted the exact value
0, and not just the probability to measure it (which is what quantum theory can

89
predict), if we knew the value of a “hidden variable” that quantum theory does
not take into account.
Local hidden variable theories are essentially no different than the envelope sce­
nario described above; the envelope always had the number 0 inside it, and if we
were able to look inside the envelope (at the “hidden variable”) without opening it,
we would have been able to make a deterministic prediction. In this sense, local
hidden variable theories have classical correlation, and Bell’s theorem proves that
quantum entanglement is different, and in a precise sense we will discuss below,
stronger than classical correlation.
Consider the following experiment. I prepare two qubits, and give one to Alice
and another to Bob. Alice can measure one of two different physical observables36
of her qubit, 𝑄 or 𝑅, both having two possible outcomes, +1 or −1. Similarly, Bob
can measure one of two different physical observables of his qubit, 𝑆 or 𝑇 , both
having two possible outcomes, +1 or −1.
We now make two crucial assumptions:

1. Locality: Both Alice and Bob measure their qubits at the same time in dif­
ferent places, so that their measurements cannot possibly disturb or influ­
ence each other without sending information faster than light. This ensures
that the predicted probabilities for Alice’s and Bob’s measurements are com­
pletely independent of each other. This condition puts the “local” in “local
hidden variable theory”.

2. Realism: The values of the physical observables 𝑄, 𝑅, 𝑆, 𝑇 exist indepen­


dently of observation, that is, they have certain definite values 𝑞, 𝑟, 𝑠, 𝑡 which
are already determined before any measurements took place, as in the en­
velope scenario. This condition puts the “hidden variable” in “local hidden
variable theory”.

Together, these two assumptions form the principle of local realism. Classical rela­
tivity definitely satisfies this principle; there are no faster­than­light interactions,
and everything is deterministic. Local hidden variable theories also satisfy this
principle. Non­local hidden variable theories satisfy realism, but not locality.
Now, whatever the values of 𝑞, 𝑟, 𝑠, 𝑡 are, we must always have

𝑟𝑠 + 𝑞𝑠 + 𝑟𝑡 − 𝑞𝑡 = (𝑟 + 𝑞) 𝑠 + (𝑟 − 𝑞) 𝑡 = ±2. (4.107)
36
Alice could take, for example,𝑄 = 𝜎𝑧 and 𝑅 = 𝜎𝑥 – which is indeed what we will take below.
However, for our purposes, it doesn’t matter what the physical observables being measured actually
are. For that matter, the physical systems don’t need to be qubits, either; it’s just easier to talk
about qubits since they are the simplest non­trivial quantum systems. This scenario is very general,
and does not depend on any specific systems or observables, which is good since we are trying to
capture a general property of quantum theory.

90
To see that, note that since 𝑟 = ±1 and 𝑞 = ±1, we must either have 𝑟 + 𝑞 = 0 if
they have opposite signs, or 𝑟 − 𝑞 = 0 if they have the same sign. So one of the
terms must always vanish. In the first case we have (𝑟 − 𝑞) 𝑡 = ±2 because 𝑡 = ±1
and in the second case we have (𝑟 + 𝑞) 𝑠 = ±2 because 𝑠 = ±1.
Using this information, we can calculate the expectation value of this expression.
To do that, we assign a probability 𝑝 (𝑞, 𝑟, 𝑠, 𝑡) to each outcome of 𝑞, 𝑟, 𝑠, 𝑡. For
example, we could simply assign a uniform probability distribution, where all
probabilities are equal:
1
𝑝 (𝑞, 𝑟, 𝑠, 𝑡) = , (4.108)
16
for any values of 𝑞, 𝑟, 𝑠, 𝑡. However, the probability distribution can be arbitrary.
Even though we don’t know the probabilities in advance, we we can nonetheless
still calculate an upper bound on the expectation value:

⟨𝑅𝑆 + 𝑄𝑆 + 𝑅𝑇 − 𝑄𝑇 ⟩ = ∑ 𝑝 (𝑞, 𝑟, 𝑠, 𝑡) (𝑟𝑠 + 𝑞𝑠 + 𝑟𝑡 − 𝑞𝑡)


𝑞,𝑟,𝑠,𝑡∈{−1,+1}

≤2 ∑ 𝑝 (𝑞, 𝑟, 𝑠, 𝑡)
𝑞,𝑟,𝑠,𝑡∈{−1,+1}

= 2.

To go to the second line we used the fact that 𝑟𝑠 + 𝑞𝑠 + 𝑟𝑡 − 𝑞𝑡 = ±2 , as we proved


in equation (4.107), and thus it is always less than or equal to 2 for any values
of 𝑞, 𝑟, 𝑠, 𝑡. To go to the third line we used the fact that the sum of all possible
probabilities must be 1. Also, since the expectation value function is linear, we
have
⟨𝑅𝑆 + 𝑄𝑆 + 𝑅𝑇 − 𝑄𝑇 ⟩ = ⟨𝑅𝑆⟩ + ⟨𝑄𝑆⟩ + ⟨𝑅𝑇 ⟩ − ⟨𝑄𝑇 ⟩ . (4.109)

We thus obtain the Bell inequality37 :

⟨𝑅𝑆⟩ + ⟨𝑄𝑆⟩ + ⟨𝑅𝑇 ⟩ − ⟨𝑄𝑇 ⟩ ≤ 2. (4.110)

To summarize, we have proven that in any locally realistic theory, the expectation
value considered here must be less than or equal to 2.
Now, let us assume that I prepared the two qubits in the following Bell state:

1
|𝛽11 ⟩ = √ (|01⟩ − |10⟩) . (4.111)
2

Alice gets the first qubit, and Bob gets the second qubit. We define the observ­
ables 𝑄, 𝑅, 𝑆, 𝑇 in terms of the Pauli matrices. Alice measures the observables

𝑄 = 𝜎𝑧 , 𝑅 = 𝜎𝑥 , (4.112)
37
More precisely, there are many different Bell inequalities, and this specific one is called the
CHSH (Clauser­Horne­Shimony­Holt) inequality.

91
while Bob measures the observables
1 1
𝑆 = − √ (𝜎𝑥 + 𝜎𝑧 ) , 𝑇 = − √ (𝜎𝑥 − 𝜎𝑧 ) . (4.113)
2 2

In exercise 4.27 you will prove that

1 1
⟨𝑅𝑆⟩ = ⟨𝑄𝑆⟩ = ⟨𝑅𝑇 ⟩ = √ , ⟨𝑄𝑇 ⟩ = − √ , (4.114)
2 2

where we used the shorthand notation 𝑅𝑆 ≡ 𝑅⊗𝑆 and so on, and the expectations
values are calculated with respect to the state |𝛽11 ⟩. We thus get:

⟨𝑅𝑆⟩ + ⟨𝑄𝑆⟩ + ⟨𝑅𝑇 ⟩ − ⟨𝑄𝑇 ⟩ = 2 2 ≈ 2.8, (4.115)

which violates the Bell inequality (4.110)!


Importantly, this is not just a theoretical result; many different experiments have
verified that the Bell inequality is indeed violated in nature. This means that our
assumptions, either locality or realism (or both), must be incorrect. In particular,
it also means that quantum entanglement is stronger than classical correlation,
which is locally realistic – since with classical correlation, the best you can do for
the expectation value considered here is 2, but quantum entanglement allows you

to get a larger expectation value of 2 2.
This pretty much rules out any local hidden variable theory. Instead, we should
consider the following options:

1. Locality is an incorrect assumption38 , but realism is correct. This is the


essence of non­local hidden variable theories, such as de Broglie–Bohm the­
ory, which we briefly discussed in section 4.2.4 – where the state of each
particle depends on the states of every other particle in the universe! How­
ever, most physicists don’t like these theories, since they are complicated
and contrived, and lack the simplicity, elegance, and universality of quantum
theory.

2. Realism is an incorrect assumption, but locality is correct. This is the option


that most physicists prefer, even though it is less intuitive and contradicts
our experience with the classical world. Surely, if you open the fridge to get
an apple, the apple has always been there, even before you observed it; but
the same does not have to be true for observing a qubit.
38
That would be the “spooky action at a distance” you hear about all the time. However, note that
even if locality is violated, this still does not necessarily mean faster­than­light communication is
possible. As we discussed in the previous section, communication between two people requires a
form of non­locality that is controllable, so that Bob can choose which state he measures, and
by doing that, send a message to Alice, which she will discover when she measures her qubit. Thus
a theory can be non­local while still violating neither the no­communication theorem nor relativity.

92
Another important lesson of Bell’s theorem is that there is something fundamen­
tally profound and powerful about quantum entanglement, which classical corre­
lation does not have. This property of quantum entanglement is exactly what
makes quantum computers more powerful than classical computers, as we will
see below. It also has some other interesting applications, such as quantum tele­
portation, which we will discuss later, and quantum cryptography, which we will
not discuss here (unless we have time at the end of the course).

Exercise 4.27. Prove equation (4.114) by explicitly calculating the expectation


values of the given operators with respect to the state |𝛽11 ⟩.

Problem 4.28. Consider two qubits in the composite state

1
|𝛽11 ⟩ = √ (|01⟩ − |10⟩) . (4.116)
2

Since |0⟩ and |1⟩ are the eigenstates of the observable 𝑆𝑧 corresponding to positive
and negative spin respectively along the 𝑧 direction (recall section 4.2.2), it is easy
to see that a measurement of spin along the 𝑧 direction will always yield opposite
spins for the qubits: if one qubit has positive spin in the 𝑧 direction (i.e. |0⟩), then
the other qubit must have negative spin in the 𝑧 direction (i.e. |1⟩). This state is
historically known as a spin singlet.
Now, let v∈ℝ3 be a unit vector pointing in some direction in space (the real space,
not the Hilbert space!). Then the observable 𝑆v defined in equation (4.33) cor­
responds to a measurement of spin along the direction of v. Prove that if the
system is in the state |𝛽11 ⟩, then the measurement of spin along any direction
v will always yield opposite spins for the qubits: if one qubit has positive spin
along the direction v, then the other must have negative spin along the same
direction v.
This is remarkable, since it means if Alice measures her qubit on Earth and Bob
measures his qubit on Alpha Centauri at the same time, and both of them mea­
sure spin along the same direction, then somehow both qubits must “know” to
have opposite spins along this direction, no matter which direction Alice and Bob
choose!

4.4 Non­Commuting Observables and the Uncertainty Principle

4.4.1 Commuting and Non­Commuting Observables

In problem 4.13 we defined the commutator of two operators:

[𝐴, 𝐵] ≡ 𝐴𝐵 − 𝐵𝐴. (4.117)

93
If the operators commute, then 𝐴𝐵 = 𝐵𝐴 and thus the commutator vanishes:
[𝐴, 𝐵] = 0. Otherwise, 𝐴𝐵 ≠ 𝐵𝐴 and the commutator is non­zero: [𝐴, 𝐵] ≠ 0. The
commutator thus tells us if the operators commute or not. Note that any operator
commutes with itself: [𝐴, 𝐴] = 0 for any 𝐴.
Problem 4.29. Prove that the commutator is anti­symmetric:

[𝐵, 𝐴] = − [𝐴, 𝐵] . (4.118)

Problem 4.30. Prove that the commutator is linear:

[𝐴 + 𝐵, 𝐶] = [𝐴, 𝐶] + [𝐵, 𝐶] , (4.119)

[𝐴, 𝐵 + 𝐶] = [𝐴, 𝐵] + [𝐴, 𝐶] . (4.120)


Problem 4.31. Prove that

[𝐴, 𝐵] = [𝐵† , 𝐴† ] . (4.121)
Problem 4.32. Prove the useful identities

[𝐴, 𝐵𝐶] = [𝐴, 𝐵] 𝐶 + 𝐵 [𝐴, 𝐶] , (4.122)

[𝐴𝐵, 𝐶] = 𝐴 [𝐵, 𝐶] + [𝐴, 𝐶] 𝐵. (4.123)


Problem 4.33. Prove the Jacobi identity:

[𝐴, [𝐵, 𝐶]] + [𝐵, [𝐶, 𝐴]] + [𝐶, [𝐴, 𝐵]] = 0. (4.124)

4.4.2 The Uncertainty Principle

When two quantum observables do not commute, we get an uncertainty rela­


tion. Uncertainty is just another name for standard deviation, which we defined
in section 3.3.4. The most well­known such relation is the position­momentum
uncertainty relation39 :
1
Δ𝑥Δ𝑝 ≥ . (4.125)
2
Here, 𝑥 and 𝑝 are two Hermitian operators, corresponding to the observables
position and momentum respectively. This inequality means that the product of
uncertainty in position Δ𝑥 and the uncertainty in momentum Δ𝑝 cannot go below
1/2; it follows that Δ𝑥 and Δ𝑝 cannot both be zero at the same time, so we can
never know both the position and momentum with arbitrarily high certainty.
Let us prove this relation for the general case of any two observables represented
by Hermitian operators, 𝐴 and 𝐵, which do not commute:

[𝐴, 𝐵] ≠ 0. (4.126)
39
Recall that we are using units where ℏ = 1!

94
Recall that the (square of the) standard deviation Δ𝐴 of 𝐴 is given by
2 2
(Δ𝐴) = ⟨(𝐴 − ⟨𝐴⟩) ⟩ . (4.127)

We have seen that expectation values in quantum theory are calculated using the
inner product “sandwich”
⟨𝐴⟩ = ⟨Ψ|𝐴|Ψ⟩, (4.128)

where |Ψ⟩ is the state with respect to which the expectation value is calculated.
The (square of the) standard deviation is thus
2 2
(Δ𝐴) = ⟨Ψ| (𝐴 − ⟨𝐴⟩) |Ψ⟩
= ⟨Ψ| (𝐴 − ⟨𝐴⟩) (𝐴 − ⟨𝐴⟩) |Ψ⟩.

Let us now define a new vector:

|𝑎⟩ = (𝐴 − ⟨𝐴⟩) |Ψ⟩. (4.129)

Then we simply have40


2 2
(Δ𝐴) = ⟨𝑎|𝑎⟩ = ‖𝑎‖ . (4.130)

Similarly, for 𝐵 we define


|𝑏⟩ = (𝐵 − ⟨𝐵⟩) |Ψ⟩, (4.131)

and get
2 2
(Δ𝐵) = ⟨𝑏|𝑏⟩ = ‖𝑏‖ . (4.132)

The product of the (squares of the) standard deviations in 𝐴 and 𝐵 is thus


2 2 2 2
(Δ𝐴) (Δ𝐵) = ‖𝑎‖ ‖𝑏‖ . (4.133)

Using the Cauchy­Schwarz inequality (3.152), we have


2 2 2 2
(Δ𝐴) (Δ𝐵) = ‖𝑎‖ ‖𝑏‖
2
≥ |⟨𝑎|𝑏⟩|
2 2
(∗) = (Re⟨𝑎|𝑏⟩) + (Im⟨𝑎|𝑏⟩)
2
(∗∗) ≥ (Im⟨𝑎|𝑏⟩)
2
⟨𝑎|𝑏⟩ − ⟨𝑏|𝑎⟩
(∗ ∗ ∗) = ( ) ,
2𝑖
2
where in (∗) we used equation (3.17), in (∗∗) we used the fact that (Im⟨𝑎|𝑏⟩) ≥ 0
since it’s the square of a real number, and in (∗ ∗ ∗) we used equation (3.14) and
40
𝐴−⟨𝐴⟩ is the operator 𝐴 minus the real number ⟨𝐴⟩ times the identity operator 1 (the identity
operator is implied). Thus 𝐴 − ⟨𝐴⟩ is Hermitian, and the bra of (𝐴 − ⟨𝐴⟩) |Ψ⟩ is ⟨Ψ| (𝐴 − ⟨𝐴⟩).

95
the fact that ⟨𝑏|𝑎⟩ = ⟨𝑎|𝑏⟩∗ .
Next, we note that

⟨𝑎|𝑏⟩ = ⟨Ψ| (𝐴 − ⟨𝐴⟩) (𝐵 − ⟨𝐵⟩) |Ψ⟩


= ⟨(𝐴 − ⟨𝐴⟩) (𝐵 − ⟨𝐵⟩)⟩
= ⟨𝐴𝐵 − 𝐴⟨𝐵⟩ − ⟨𝐴⟩𝐵 + ⟨𝐴⟩⟨𝐵⟩⟩
= ⟨𝐴𝐵⟩ − ⟨𝐴⟨𝐵⟩⟩ − ⟨⟨𝐴⟩𝐵⟩ + ⟨⟨𝐴⟩⟨𝐵⟩⟩
= ⟨𝐴𝐵⟩ − ⟨𝐴⟩⟨𝐵⟩ − ⟨𝐴⟩⟨𝐵⟩ + ⟨𝐴⟩⟨𝐵⟩
= ⟨𝐴𝐵⟩ − ⟨𝐴⟩⟨𝐵⟩,

where we used the linearity of the expected value, equation (3.176). Similarly,

⟨𝑏|𝑎⟩ = ⟨𝐵𝐴⟩ − ⟨𝐴⟩⟨𝐵⟩. (4.134)

Thus

⟨𝑎|𝑏⟩ − ⟨𝑏|𝑎⟩ = (⟨𝐴𝐵⟩ − ⟨𝐴⟩⟨𝐵⟩) − (⟨𝐵𝐴⟩ − ⟨𝐴⟩⟨𝐵⟩)


= ⟨𝐴𝐵⟩ − ⟨𝐵𝐴⟩
= ⟨[𝐴, 𝐵]⟩,

and so we get
2
2 2 1
(Δ𝐴) (Δ𝐵) ≥ ⟨ [𝐴, 𝐵]⟩ . (4.135)
2i
Now, by definition, Δ𝐴 and Δ𝐵 are real and non­negative. If ⟨ 21i [𝐴, 𝐵]⟩ is also real,
we could take the square root (but we have to add an absolute value because it
could actually be negative):

1
Δ𝐴Δ𝐵 ≥ |⟨[𝐴, 𝐵]⟩| . (4.136)
2
You will show in problem 4.34 that it is indeed always real. Note that the un­
certainty relation we found still depends on the choice of state |Ψ⟩ with which to
calculate the expected values and standard deviations, but sometimes, as in the
position­momentum uncertainty relation, the same relation applies to all states.
As we will explain in more details later, when we discuss continuous systems, the
operators 𝑥 and 𝑝 have the commutator

[𝑥, 𝑝] = i . (4.137)

By plugging this commutator into the uncertainty relation (4.136), we indeed get
the familiar result
1
Δ𝑥Δ𝑝 ≥ . (4.138)
2

96
Problem 4.34. Inequalities are only defined for real numbers, not complex num­
bers. Let us prove that if 𝐴 and 𝐵 are Hermitian, then ⟨[𝐴, 𝐵]⟩ must always be an
imaginary number, and thus ⟨ 21i [𝐴, 𝐵]⟩ is always real, so the inequality we found
is well­defined.
An anti­Hermitian operator 𝑂 is an operator which satisfies

𝑂† = −𝑂. (4.139)

Just as a Hermitian operator is the matrix analogue of a real number, an anti­


Hermitian operator is the matrix analogue of an imaginary number.
A. Prove that the eigenvalues of an anti­Hermitian operator are all purely imagi­
nary, as defined in problem 3.2.
B. Prove that an anti­Hermitian operator is normal, and thus it has an orthonormal
eigenbasis (see section 3.2.14).
C. Prove that if 𝐴 and 𝐵 are Hermitian, then [𝐴, 𝐵] must be anti­Hermitian.
D. Prove that if [𝐴, 𝐵] is anti­Hermitian, then the expectation value ⟨[𝐴, 𝐵]⟩Ψ is
imaginary for any state |Ψ⟩.
Exercise 4.35. Calculate the uncertainty relation for 𝜎𝑥 and 𝜎𝑦 given an arbitrary
qubit:
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1. (4.140)

That is, find the right­hand side of

Δ𝜎𝑥 Δ𝜎𝑦 ≥ (?) . (4.141)

Comment on the consequences of the relation you found for choices of different
states, that is, different values of 𝑎 and 𝑏.

4.4.3 Simultaneous Diagonalization

Why is there uncertainty when two observers don’t commute? Some insight may
be gained from the fact that two Hermitian operators may be simultaneously
diagonalizable if and only if they commute41 .
Recall that in section 3.2.16 we proved that for any Hermitian matrix42 𝐴 there
exists a unitary matrix 𝑃 such that

𝑃 † 𝐴𝑃 = 𝐷, (4.142)
41
This is a special case of a more general theorem: a set of diagonalizable matrices commute if
and only if they are simultaneously diagonalizable. Of course, here we are dealing specifically with
Hermitian matrices, and such matrices are always diagonalizable; furthermore, for our purposes it
is enough to talk about two matrices rather than a larger set.
42
Or more generally for any normal matrix, which satisfies 𝐴† 𝐴 = 𝐴𝐴† . As we mentioned before,
both Hermitian and unitary matrices are special cases of normal matrices.

97
where 𝐷 is a diagonal matrix. Furthermore, the elements on the diagonal are
none other than the eigenvalues of 𝐴. This is called diagonalizing the matrix 𝐴.
Now, let 𝐴1 and 𝐴2 be two Hermitian matrices. We say that 𝐴1 and 𝐴2 are si­
multaneously diagonalizable if both matrices are diagonalizable using the same
unitary matrix 𝑃 :
𝑃 † 𝐴1 𝑃 = 𝐷 1 , 𝑃 † 𝐴2 𝑃 = 𝐷 2 , (4.143)

where 𝐷1 and 𝐷2 are two diagonal matrices.


If 𝐴1 and 𝐴2 are simultaneously diagonalizable, we can invert equation (4.143)
(by multiplying both sides by 𝑃 from the left and 𝑃 † from the right) to find:

𝐴1 = 𝑃 𝐷 1 𝑃 † , 𝐴2 = 𝑃 𝐷2 𝑃 † . (4.144)

Then the commutator of the two matrices is

[𝐴1 , 𝐴2 ] ≡ 𝐴1 𝐴2 − 𝐴2 𝐴1
= (𝑃 𝐷1 𝑃 † ) (𝑃 𝐷2 𝑃 † ) − (𝑃 𝐷2 𝑃 † ) (𝑃 𝐷1 𝑃 † )
= 𝑃 𝐷1 (𝑃 † 𝑃 ) 𝐷2 𝑃 † − 𝑃 𝐷2 (𝑃 † 𝑃 ) 𝐷1 𝑃 †
= 𝑃 𝐷1 𝐷2 𝑃 † − 𝑃 𝐷2 𝐷1 𝑃 †
= 𝑃 (𝐷1 𝐷2 − 𝐷2 𝐷1 ) 𝑃 †
= 𝑃 [𝐷1 , 𝐷2 ] 𝑃 † .

However, any two diagonal matrices commute with each other. Indeed, if

𝜆1 0 0 𝜇1 0 0
⎛ ⎞ ⎛ ⎞
𝐷1 ≡ ⎜
⎜ 0 ⋱ 0 ⎟⎟, 𝐷2 ≡ ⎜
⎜ 0 ⋱ 0 ⎟⎟, (4.145)
⎝ 0 0 𝜆𝑛 ⎠ ⎝ 0 0 𝜇𝑛 ⎠

then it is easy to see that

𝜆1 𝜇1 0 0
⎛ ⎞
𝐷1 𝐷2 = 𝐷1 𝐷2 = ⎜
⎜ 0 ⋱ 0 ⎟
⎟. (4.146)
⎝ 0 0 𝜆𝑛 𝜇𝑛 ⎠

Therefore [𝐷1 , 𝐷2 ] = 0, and we conclude that 𝐴1 and 𝐴2 commute:

[𝐴1 , 𝐴2 ] = 0. (4.147)

It is possible to prove the opposite direction as well: if 𝐴1 and 𝐴2 commute, then


they are simultaneously diagonalizable. However, we won’t do this here.
So what does this mean? Let 𝐴1 and 𝐴2 be two commuting observables, rep­
resented by Hermitian operators. Then they are simultaneously diagonalizable.
Now, remember that in section 3.2.16 we said that the unitary matrix 𝑃 , which

98
in this case diagonalizes both matrices, has for its columns an orthonormal eigen­
basis |𝐵𝑖 ⟩:
𝑃 = ( |𝐵1 ⟩ ⋯ |𝐵𝑛 ⟩ ) . (4.148)

By inspecting equation (4.143) and equation (4.145), we see that the basis states
|𝐵𝑖 ⟩ are eigenstates of both 𝐴1 and 𝐴2 , with the eigenvalues:

𝐴1 |𝐵𝑖 ⟩ = 𝜆𝑖 |𝐵𝑖 ⟩ , 𝐴2 |𝐵𝑖 ⟩ = 𝜇𝑖 |𝐵𝑖 ⟩ . (4.149)

This means that the eigenstates |𝐵𝑖 ⟩ are states where the system simultane­
ously has the exact value 𝜆𝑖 for the observable 𝐴1 and the exact value 𝜇𝑖 for the
observable 𝐴2 .
Conversely, since this is an if­and­only­if relationship, if 𝐴1 and 𝐴2 don’t commute,
then one cannot find a basis of eigenstates of both observables simultaneously
(since if we found such a basis, then they would be simultaneously diagonalizable,
in contradiction). This is essentially where the uncertainty principle comes from:
if 𝐴1 and 𝐴2 don’t commute and the system is in an eigenstate of 𝐴1 , then in
general it can’t also be in an eigenstate of 𝐴2 . This means it must instead be in a
superposition of eigenstates of 𝐴2 , so there are many different possible values
for the measurement of 𝐴2 with different probabilities. So being certain of the
value of 𝐴1 means being necessarily uncertain of the exact value of 𝐴2 .

Exercise 4.36.
A. Show that the following Hermitian operator commutes with the Pauli operator
𝜎𝑥 :
1 −3 0 1
𝐴≡( ), 𝜎𝑥 ≡ ( ). (4.150)
−3 1 1 0

Therefore, they are simultaneously diagonalizable.


B. Show that the eigenstates of 𝜎𝑥 (see section 4.2.1) are also eigenstates of 𝐴,
and find their eigenvalues.
C. Find a unitary matrix 𝑃 which diagonalizes both 𝐴 and 𝜎𝑥 , and find the resulting
diagonal matrices.

4.5 Dynamics, Transformations, and Measurements

4.5.1 Unitary Transformations and Evolution

We have covered almost all of the basic properties of quantum theory. How­
ever, notice that so far we only talked about quantum systems that are in one
given state, and never change. In real life, physical systems change all the time,
whether it’s because some transformation was explicitly done to the system, or

99
simply because time has passed. To account for that in the mathematical frame­
work of quantum theory, let us introduce a new axiom:
The Evolution Axiom: If the system is in the state |Ψ1 ⟩ at some point in time,
and in another state |Ψ2 ⟩ at another point in time, then the two states must be
related by the action of some unitary operator 𝑈 :

|Ψ2 ⟩ = 𝑈 |Ψ1 ⟩ . (4.151)

This is called unitary evolution or unitary transformation.


The exact form of 𝑈 is determined by the specific quantum system in question and
the specific transformation performed, for example rotating the system, moving
it in space, or letting it “move itself” in time (i.e. just waiting for time to pass).
All the Evolution Axiom tells us is that 𝑈 must be a unitary operator – just like the
Observable Axiom tells us that an observable must be represented by a Hermitian
operator, but the exact form of the Hermitian operator depends on the specific
system and the specific observable.
Now, in section 3.2.13 we proved that unitary operators preserve the inner prod­
uct between two states. This means that if we two states |Ψ1 ⟩ and |Φ1 ⟩ at one
time, and they evolve to |Ψ2 ⟩ = 𝑈 |Ψ1 ⟩ and |Φ2 ⟩ = 𝑈 |Φ1 ⟩ at another time, then the
inner product of the new states ⟨Ψ2 |Φ2 ⟩ is equal to the inner product of the old
states ⟨Ψ1 |Φ1 ⟩, because 𝑈 † 𝑈 = 1:

⟨Ψ2 |Φ2 ⟩ = (⟨Ψ1 |𝑈 † ) (𝑈 |Φ1 ⟩) = ⟨Ψ1 |𝑈 † 𝑈 |Φ1 ⟩ = ⟨Ψ1 |Φ1 ⟩. (4.152)

Therefore, probability amplitudes are preserved by unitary evolution.


As a corollary, unitary evolution also preserves the norm of a vector – that is, |Ψ⟩
and 𝑈 |Ψ⟩ have the same norm:

‖𝑈 Ψ‖ = √⟨Ψ|𝑈 † 𝑈 |Ψ⟩ = √⟨Ψ|Ψ⟩ = ‖Ψ‖ . (4.153)

This has to be the case, since quantum states must have norm 1! So if we
start with a properly normalized quantum state, we end up with another properly
normalized quantum state.
Furthermore, recall that probabilities must sum to one. This means that, for an
orthonormal eigenbasis |𝐵𝑖 ⟩, we must have
𝑛
2
∑ |⟨𝐵𝑖 |Ψ⟩| = 1, (4.154)
𝑖=1

as we indeed proved in section 4.1.4. Again, since each of the probability am­
plitudes ⟨𝐵𝑖 |Ψ⟩ is preserved by unitary evolution, we are guaranteed that the
probabilities still sum to 1 after the states have evolved.

100
Lastly, observe that since any unitary operator is invertible (with the inverse of
𝑈 being 𝑈 −1 = 𝑈 † ), any unitary transformation has an inverse transformation.
This means that unitary evolution is always reversible, and therefore quantum
mechanics has time­reversal symmetry: it works exactly the same forwards in
time and backwards in time.
If at time 𝑡1 the system is in the state |Ψ1 ⟩ and at time 𝑡2 > 𝑡1 the system is in
the state |Ψ2 ⟩, then they are either related by |Ψ2 ⟩ = 𝑈 |Ψ1 ⟩, evolving forward in
time, or |Ψ1 ⟩ = 𝑈 † |Ψ2 ⟩ for the same 𝑈 , evolving backwards in time. As far as
quantum mechanics is concerned, there is no distinction between the future and
the past, and everything works the same if we take 𝑡 ↦ −𝑡 so that 𝑡2 < 𝑡1 , as long
as we also replace every unitary evolution operator by its adjoint.

Exercise 4.37. The system was previously in the state

1 1
|Ψ1 ⟩ = √ ( ). (4.155)
5 2i

Now, it is in the state


1 2i
|Ψ2 ⟩ = √ ( ). (4.156)
5 −1

Which unitary operator 𝑈 was responsible for this evolution (such that |Ψ2 ⟩ =
𝑈 |Ψ1 ⟩)? What will be the state of the system after the same amount of time has
passed again (i.e. after another evolution with 𝑈 )?

4.5.2 Quantum Logic Gates

In a classical computer, bits are manipulated using logic gates. In logic terms,
these gates treat 0 as “false” and 1 as “true”. Let us list some examples of logic
gates.
NOT gets a single bit as input, and outputs 1 minus that bit. In logic terms, it
outputs “true” if it gets “false” and vice versa, so the output is the negation of
the input:

Input NOT
0 1
1 0

AND gets two bits as input, and outputs 1 if both bits are 1, otherwise it outputs
0. In logic terms, it outputs “true” only if both bit A and bit B are “true”:

101
Input A Input B AND
0 0 0
0 1 0
1 0 0
1 1 1

OR gets two bits as input, and outputs 1 if at least one of the bits is 1, otherwise
it outputs 0. In logic terms, it outputs “true” if either bit A or bit B or both are
“true”:
Input A Input B OR
0 0 0
0 1 1
1 0 1
1 1 1

XOR (eXclusive OR, pronounced “ex or”) gets two bits as input, and outputs 1
if exactly one of the bits is 1, otherwise it outputs 0. In logic terms, it outputs
“true” if either bit A or bit B, but not both, are “true”:

Input A Input B XOR


0 0 0
0 1 1
1 0 1
1 1 0

In quantum computers we have qubits instead of classical bits, and thus we must
use quantum logic gates, or quantum gates for short. Since they transform qubits
from one state to the other, quantum gates must take the form of unitary opera­
tors, by the Evolution Axiom.
As a simple example, let us define the quantum NOT gate, which flips |0⟩ ↔ |1⟩,
just like a classical NOT gate flips 0 ↔ 1. This gate is none other than the Pauli
matrix 𝜎𝑥 , which is of course unitary:

0 1
NOT ≡ 𝑋 ≡ 𝜎𝑥 = ( ). (4.157)
1 0

(The notation 𝑋 for the NOT gate is common in quantum computing.) Indeed, we
have
0 1 1 0
NOT |0⟩ = ( ) ( ) = ( ) = |1⟩ , (4.158)
1 0 0 1

0 1 0 1
NOT |1⟩ = ( ) ( ) = ( ) = |0⟩ . (4.159)
1 0 1 0

102
Since unitary transformations are linear, this means that for a general qubit state
we have
NOT (𝑎 |0⟩ + 𝑏 |1⟩) = 𝑎 |1⟩ + 𝑏 |0⟩ , (4.160)
2 2
where of course |𝑎| + |𝑏| = 1.
In classical computers there is only one non­trivial single­bit gate, the NOT gate;
the two other options would be the gate 0 ↦ 0, 1 ↦ 0 and the gate 0 ↦ 1, 1 ↦ 1,
which are trivial gates since their output is fixed and does not depend on the
input. However, in quantum computers, since qubits are in a superposition of |0⟩
and |1⟩, we have more options; in fact, we have an infinite number of possible
single­qubit gates, since any unitary operator can be a single­qubit gate.
One example of a useful quantum gate is the 𝑍 gate, which is just the Pauli matrix
𝜎𝑧 :
1 0
𝑍 ≡ 𝜎𝑧 = ( ), (4.161)
0 −1

and has the action

1 0 1 1
𝑍 |0⟩ = ( ) ( ) = ( ) = |0⟩ , (4.162)
0 −1 0 0

1 0 0 0
𝑍 |1⟩ = ( )( ) = ( ) = − |1⟩ , (4.163)
0 −1 1 −1

so it leaves |0⟩ unchanged but flips the phase of |1⟩.


Another example is the Hadamard gate:

1 1 1
𝐻≡√ ( ), (4.164)
2 1 −1

which turns |0⟩ and |1⟩ (the eigenstates of 𝜎𝑧 ) into |+⟩ and |−⟩ respectively (the
eigenstates of 𝜎𝑥 ):

1 1 1 1 1
𝐻 |0⟩ = √ ( ) ( ) = √ (|0⟩ + |1⟩) = |+⟩ , (4.165)
2 1 −1 0 2

1 1 1 0 1
𝐻 |1⟩ = √ ( ) ( ) = √ (|0⟩ − |1⟩) = |−⟩ . (4.166)
2 1 −1 1 2
What about two­qubit gates? Notice that classical two­bit gates such as AND, OR,
and XOR are irreversible, since if we are given the single output bit of any of
these gates, we cannot in general reconstruct the two input bits. For example,
if AND outputs 0, then the inputs could have been any of 00, 01, or 10. In
contrast, quantum gates must be represented by unitary operators, and as we

103
saw in section 4.5.1, unitary transformations are reversible. Thus we cannot
use AND, OR, XOR, and other irreversible logic gates in quantum computing.
We can, however, define other two­qubit quantum gates. A very useful example
is the controlled­NOT or CNOT gate. Here, the first qubit controls whether the
second qubit gets flipped or not. If the first qubit is |0⟩, then the second qubit is
unchanged; if the first qubit is |1⟩, then the second qubit is flipped |0⟩ ↔ |1⟩. So,
given an input state of two qubits, we have:

CNOT |0⟩ ⊗ |0⟩ = |0⟩ ⊗ |0⟩ , (4.167)

CNOT |0⟩ ⊗ |1⟩ = |0⟩ ⊗ |1⟩ , (4.168)

CNOT |1⟩ ⊗ |0⟩ = |1⟩ ⊗ |1⟩ , (4.169)

CNOT |1⟩ ⊗ |1⟩ = |1⟩ ⊗ |0⟩ . (4.170)

As you will verify in exercise 4.38, the CNOT gate can be represented by the
unitary matrix
1 0 0 0

⎜ ⎞

0 1 0 0
CNOT = ⎜⎜



⎟ . (4.171)
⎜ 0 0 0 1 ⎟
⎝ 0 0 1 0 ⎠
Alternatively, as you will verify in exercise 4.39, the CNOT gate can be represented
by a tensor product of outer products:

CNOT = |0⟩ ⟨0| ⊗ ( |0⟩ ⟨0| + |1⟩ ⟨1|) + |1⟩ ⟨1| ⊗ ( |0⟩ ⟨1| + |1⟩ ⟨0|) . (4.172)

Exercise 4.38. Verify that the matrix definition of the CNOT operator given in
equation (4.171) indeed has the action described in equations (4.167), (4.168),
(4.169), and (4.170).
Exercise 4.39. Verify that the CNOT operator has the outer product representa­
tion
CNOT = |0⟩ ⟨0| ⊗ ( |0⟩ ⟨0| + |1⟩ ⟨1|) + |1⟩ ⟨1| ⊗ ( |0⟩ ⟨1| + |1⟩ ⟨0|) . (4.173)

You can either do so by explicitly calculating the matrix representations of the


outer products and tensor products and adding them up to get the matrix in
equation (4.171), or by showing that this operator has the required action on the
two­qubit basis states.
Exercise 4.40. Show that the Hadamard gate turns |+⟩ back into |0⟩ and |−⟩ back
into |1⟩. Note: You don’t actually have to do an explicit calculation, you can simply
use a certain property of the matrix 𝐻 itself.
Problem 4.41. Find an outer product representation for the Hadamard operator
(4.164).

104
Problem 4.42. Show how you can generate each of the four entangled Bell states
by acting on the separable state |0⟩ ⊗ |0⟩ with various quantum gates. This means
that quantum gates can be used to generate entanglement if it’s not already
there.

4.5.3 The Measurement Axiom (Projective)

In section 4.1.4, we formulated the Probability Axiom: if the system is in the


state |Ψ⟩, then the probability to measure the eigenvalue 𝜆𝑖 corresponding to the
2
eigenstate |𝐵𝑖 ⟩ of an observable is given by |⟨𝐵𝑖 |Ψ⟩| . This axiom was good enough
at the time, but after all that we have learned in the previous sections, we can
now see that this axiom is missing two important things:

1. It doesn’t tell us what happens if we measure just one part of a composite


system,

2. It doesn’t tell us about dynamics: what happens to the system after we


perform the measurement.

To correct that, we now replace the Probability Axiom with a new and improved
axiom, which we call the Measurement Axiom. In order to formulate it, let us
recall that in problem 3.55 you proved that if 𝐴 is normal (so in particular, if it is
Hermitian and thus an observable), then it has the outer product representation
𝑛
𝐴 = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (4.174)
𝑖=1

where |𝐵𝑖 ⟩ is an orthonormal eigenbasis and 𝜆𝑖 are the eigenvalues of the eigen­
states |𝐵𝑖 ⟩. More generally, for any observable we can write
𝑛
𝐴 = ∑ 𝜆𝑖 𝑃𝑖 , (4.175)
𝑖=1

where 𝑃𝑖 is the projector onto the vector space of the eigenvectors correspond­
ing to the eigenvalue 𝜆𝑖 , called the eigenspace of 𝜆𝑖 (see problem 4.43). Using
projectors allows us to:

1. Deal with the case of degenerate eigenvectors, where two eigenvectors have
the same eigenvalue; so far we have always implicitly assumed that observ­
ables do not have any degenerate eigenvectors. A trivial example of an
operator with degenerate eigenvalues is the identity matrix 1, which has
only one eigenvalue – namely, 1 – for which every vector in the space is an
eigenvector.

105
2. Measure only part of a composite Hilbert space, for example one qubit in a
composite system of two qubits, as we will see below.

In the simple case where there is no degeneracy of eigenvectors and the mea­
surement is performed on the entire Hilbert space, the projector can take the
simple form
𝑃𝑖 ≡ |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (4.176)

and we recover equation (4.174). Using projectors, we can now define a very
general Measurement Axiom, which employs so­called projective measurements.
The Measurement Axiom (Projective): Consider an observable 𝐴 of the form
𝑛
𝐴 = ∑ 𝜆𝑖 𝑃𝑖 . (4.177)
𝑖=1

If the system is in the state |Ψ⟩, then the probability to measure the eigenvalue
𝜆𝑖 is given by
⟨Ψ|𝑃𝑖 |Ψ⟩. (4.178)

The measurement yields exactly one of the eigenvalues 𝜆𝑖 , and after the mea­
surement, the system collapses to the state43

𝑃𝑖 |Ψ⟩
|Ψ⟩ ↦ , (4.179)
√⟨Ψ|𝑃𝑖 |Ψ⟩

where 𝑃𝑖 is the projector corresponding to the specific eigenvalue 𝜆𝑖 that was


measured.

Problem 4.43. Let 𝐴 be a normal operator with eigenvalues 𝜆𝑖 . Each eigenvalue


has a corresponding eigenspace, which is the set of all vectors which have the
eigenvalue 𝜆𝑖 . Prove that each eigenspace is a vector space by showing that it
satisfies the properties of a vector space as defined in section 3.2.1.

Exercise 4.44. Find the eigenvalues of the CNOT operator (4.171) and their
corresponding eigenvectors and eigenspaces.
43
Notice that the square root of the probability is not necessarily the probability amplitude. For
example, if the amplitude is i /2 then the probability is 1/4, but the square root of that is 1/2, which
is not the amplitude we started with! However, recall that the two vectors |Ψ⟩ and ei 𝜙 |Ψ⟩, which
differ by an overall complex phase ei 𝜙 , represent the same state. Since the square root of the
probability is the same as the amplitude up to a complex phase, dividing by i /2 or 1/2 both result
in the same state.

106
4.5.4 Applications of the Measurement Axiom

Let us now see some examples of the Measurement Axiom in action. First of all,
consider a qubit in the general state
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1. (4.180)

The observable corresponding to the eigenbasis |0⟩ , |1⟩ is the Pauli matrix 𝜎𝑧 , which
has the outer product representation

1 0
𝜎𝑧 = ( ) = |0⟩ ⟨0| − |1⟩ ⟨1| . (4.181)
0 −1

This means that we have44

𝜆0 = +1, 𝑃0 = |0⟩ ⟨0| , (4.182)

𝜆1 = −1, 𝑃1 = |1⟩ ⟨1| . (4.183)

The probability to measure the eigenvalue +1 (corresponding to a value of 0 for


the qubit) is thus
2 2
⟨Ψ|𝑃0 |Ψ⟩ = ⟨Ψ| (|0⟩⟨0|) |Ψ⟩ = ⟨Ψ|0⟩⟨0|Ψ⟩ = |⟨0|Ψ⟩| = |𝑎| , (4.184)

and the probability to measure the eigenvalue −1 (corresponding to a value of 1


for the qubit) is
2 2
⟨Ψ|𝑃1 |Ψ⟩ = ⟨Ψ| (|1⟩⟨1|) |Ψ⟩ = ⟨Ψ|1⟩⟨1|Ψ⟩ = |⟨1|Ψ⟩| = |𝑏| . (4.185)

This indeed matches the old Probability Axiom. The new part is that after the
measurement, if we measured 0, then the system will collapse to the state

𝑃0 |Ψ⟩ |0⟩⟨0|Ψ⟩ 𝑎
|Ψ⟩ ↦ = = |0⟩ ≃ |0⟩ , (4.186)
√⟨Ψ|𝑃0 |Ψ⟩ |𝑎| |𝑎|

𝑎
where by ≃ we mean that |𝑎| |0⟩ and |0⟩ are the same state, since they only differ
by a complex phase (see footnote (43); in polar coordinates we have 𝑎 = |𝑎| ei 𝜙
where ei 𝜙 is the phase of 𝑎, so if we divide 𝑎 by |𝑎| we are left with just the phase).
44
Note that I decided to start counting 𝑖 from 0 to 1 instead of from 1 to 2, so that the subscript
of 𝜆𝑖 will correspond to the value of the qubit. Also, recall that the eigenvalue of |0⟩ is not 0, it’s
−1, and the eigenvalue of |1⟩ is not 1, it’s +1; this is confusing, but unfortunately it’s standard
notation, since qubits are analogous to classical bits which have the values 0 and 1.

107
Similarly, if we measured 1 then the system will collapse to the state

𝑃1 |Ψ⟩ |1⟩⟨1|Ψ⟩ 𝑏
|Ψ⟩ ↦ = = |1⟩ ≃ |1⟩ . (4.187)
√⟨Ψ|𝑃1 |Ψ⟩ |𝑏| |𝑏|

Consider now the general composite state of two qubits given in equation (4.72):

|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ , (4.188)

which is of course normalized such that


2 2 2 2 2
‖Ψ‖ = |𝛼00 | + |𝛼01 | + |𝛼10 | + |𝛼11 | = 1. (4.189)

We can define an observable corresponding to a measurement of only the first


qubit as follows:
𝜎𝑧 ⊗ 1 = ( |0⟩ ⟨0| − |1⟩ ⟨1|) ⊗ 1, (4.190)

where 1 is the identity operator. So we have

𝑃0 = |0⟩ ⟨0| ⊗ 1, 𝑃1 = |1⟩ ⟨1| ⊗ 1. (4.191)

Then the probability to measure 0 for the first qubit is

⟨Ψ|𝑃0 |Ψ⟩ = ⟨Ψ| (|0⟩⟨0| ⊗ 1) |Ψ⟩. (4.192)

Let us first calculate45 the action of the operator 𝑃0 = |0⟩⟨0| ⊗ 1 on the ket |Ψ⟩:

𝑃0 |Ψ⟩ = (|0⟩⟨0| ⊗ 1) |Ψ⟩

= (|0⟩⟨0| ⊗ 1) (𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩)

= |0⟩ ⊗ (𝛼00 ⟨0|0⟩ |0⟩ + 𝛼01 ⟨0|0⟩ |1⟩ + 𝛼10 ⟨0|1⟩ |0⟩ + 𝛼11 ⟨0|1⟩ |1⟩)

= |0⟩ ⊗ (𝛼00 |0⟩ + 𝛼01 |1⟩) ,


45
To understand this calculation, you might want to review how tensor products work, which we
discussed in section 4.3.1.

108
since |0⟩ and |1⟩ form an orthonormal basis, so ⟨0|0⟩ = ⟨1|1⟩ = 1 and ⟨0|1⟩ = ⟨1|0⟩ = 0.
Then we act with the bra ⟨Ψ| from the left:

⟨Ψ|𝑃0 |Ψ⟩ = ⟨Ψ| (|0⟩⟨0| ⊗ 1) |Ψ⟩

= ⟨Ψ| (|0⟩ ⊗ (𝛼00 |0⟩ + 𝛼01 |1⟩))

= (𝛼∗00 ⟨0| ⊗ ⟨0| + 𝛼∗01 ⟨0| ⊗ ⟨1| + 𝛼∗10 ⟨1| ⊗ ⟨0| + 𝛼∗11 ⟨1| ⊗ ⟨1|) (|0⟩ ⊗ (𝛼00 |0⟩ + 𝛼01 |1⟩))

= 𝛼∗00 ⟨0| (𝛼00 |0⟩ + 𝛼01 |1⟩) + 𝛼∗01 ⟨1| (𝛼00 |0⟩ + 𝛼01 |1⟩)
2 2
= |𝛼00 | + |𝛼01 | .

Similarly, we also find that the probability to measure 1 for the first qubit is
2 2
⟨Ψ|𝑃1 |Ψ⟩ = ⟨Ψ| (|1⟩⟨1| ⊗ 1) |Ψ⟩ = |𝛼10 | + |𝛼11 | . (4.193)

These very complicated calculations tell us what we could have just guessed from
common sense: the total probability to measure |0⟩ is the sum of the probabilities
to measure all the composite states which have |0⟩ as the state of the first qubit,
and similarly for |1⟩.
What about collapse? If we measured 0, then the system will collapse to the state

𝑃0 |Ψ⟩ 𝛼 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩


= 00 , (4.194)
√⟨Ψ|𝑃0 |Ψ⟩ √|𝛼 |2 + |𝛼 |2
00 01

and if we measured 1, it will collapse to the state

𝑃1 |Ψ⟩ 𝛼 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩


= 10 . (4.195)
√⟨Ψ|𝑃1 |Ψ⟩ √|𝛼 |2 + |𝛼 |2
10 11

Again, we could have just guessed the result: the qubit that we measured col­
lapses into either |0⟩ or |1⟩, while the other qubit stays in a superposition. The
denominator is there simply to normalize the vector so it has norm 1, and can
thus represent a state.
Problem 4.45. Consider a composite system of three qubits. Which projectors
will you use to measure only the state of the middle qubit in the |+⟩ , |−⟩ eigen­
basis? Which projectors will you use to measure only the state of the first two
qubits in the |0⟩ , |1⟩ eigenbasis?

4.5.5 The Measurement Axiom (Simplified)

Now that we understand how projective measurements works, we can formulate


a simpler version of the Measurement Axiom, which does not require projective

109
measurements, and will be sufficient for our purposes in the rest of this course.
The Measurement Axiom (Simplified):

• Consider an observable with an eigenbasis of non­degenerate eigenstates


|𝐵𝑖 ⟩ corresponding to eigenvalues 𝜆𝑖 . If the system is in the state |Ψ⟩, then
the probability to measure the eigenvalue 𝜆𝑖 corresponding to the eigenstate
|𝐵𝑖 ⟩ is given by
2
|⟨𝐵𝑖 |Ψ⟩| . (4.196)

After the measurement, if the eigenvalue 𝜆𝑖 was measured, then the system
will collapse to the eigenstate |𝐵𝑖 ⟩:

|Ψ⟩ ↦ |𝐵𝑖 ⟩ . (4.197)

This works the same whether the system in question is composite or not,
provided that the measurement is performed on the entire system at once.

• Now consider a composite system and an observable defined only on part of


that system, with non­degenerate eigenstates |𝐵𝑖 ⟩ corresponding to eigen­
values 𝜆𝑖 . The total probability to measure the eigenvalue 𝜆𝑖 is the sum of
the probabilities for all the possible ways in which this eigenvalue can be
measured – that is, the sum of the magnitude­squared of the probability
amplitudes of all the composite states where the part being measured is in
the eigenstate |𝐵𝑖 ⟩. After the measurement, if the eigenvalue 𝜆𝑖 was mea­
sured, then only the system we measured will collapse to the eigenstate |𝐵𝑖 ⟩,
while the other systems will stay in a superposition.

The process described by the Measurement Axiom, where the state of the sys­
tem changes after a measurement, is what people mean when they talk about
wavefunction collapse. However, we haven’t yet defined what a “wavefunction” is.
This is because in the modern abstract formulation of quantum mechanics, which
is what we have been studying so far, states are the fundamental entities, not
wavefunctions. We will explain this in more detail when we define wavefunctions
in section 5.5.

Problem 4.46. A composite system of two qubits is in the state

1
|Ψ⟩ = √ (2 |00⟩ − i |10⟩ + 3 |11⟩) . (4.198)
14

A measurement is performed on only the first qubit, in the |+⟩ , |−⟩ eigenbasis.
For each of the two possible outcomes, what is the probability to measure that
outcome and what will be the state of the system after the measurement?

110
Problem 4.47. A composite system of three qubits is in the state

1
|Ψ⟩ = √ (|000⟩ + 2 |010⟩ − 3 i |011⟩ − 4 |101⟩ + i |110⟩ + 2 i |111⟩) , (4.199)
35

where |000⟩ ≡ |0⟩⊗|0⟩⊗|0⟩ and so on. A measurement is performed on only the first
two qubits in the |0⟩ , |1⟩ basis. For each of the four possible outcomes, what is
the probability to measure that outcome and what will be the state of the system
after the measurement? You can either solve this problem by inspection using
the simplified axiom, or by explicit calculation using the projectors you found in
problem 4.45.

4.5.6 Interpretations of Quantum Mechanics and the Measurement Prob­


lem

If you consider the collapse process carefully, you will realize that it is actually
incompatible with the Evolution Axiom. This is because the collapse is a type of
time evolution: the system was in the state |Ψ⟩ before the measurement, and will
be in one of the eigenstates |𝐵𝑖 ⟩ after the measurement. However, this evolution
is not unitary, because it is not invertible.
Given the probabilistic nature of the measurement, the information that the sys­
tem is currently in the eigenstate |𝐵𝑖 ⟩ is not enough to reconstruct the state |Ψ⟩ of
the system before the measurement, which was a superposition of all the eigen­
states |𝐵1 ⟩ , |𝐵2 ⟩ , … , |𝐵𝑛 ⟩. The information about the coefficients of each eigenstate
in the superposition is lost forever.
This incompatibility, and more generally our failure to understand the exact nature
of measurement and collapse in quantum mechanics, is called the measurement
problem. Many physicists believe that quantum theory will remain fundamentally
incomplete until we manage to solve the measurement problem, and this is an
area of active research. The current approaches towards solving this problem
largely fall into several distinct groups, which more or less coincide with specific
interpretations of quantum mechanics. Let us list some of them.
“Shut up and calculate”: This approach simply ignores the measurement prob­
lem. It is not necessarily associated with any particular interpretation, since it
doesn’t care about trying to interpret the theory in the first place. However, one
could associate it with the Copenhagen interpretation, the earliest interpretation
of quantum mechanics, which essentially just accepts the Measurement Axiom
at face value, without attempting to explain why there is a collapse. This inter­
pretation regards quantum states as merely a tool to calculate probabilities, and
ignores questions like “what was the spin of the particle before I measured it”.
This approach is, by far, the most popular one among physicists, with a recent
survey indicating that around a third of physicists subscribe to the Copenhagen

111
interpretation and another third don’t have any preferred interpretation. How­
ever, this definitely doesn’t mean it is the “best” approach. It is popular simply
because in practice, as long as quantum mechanics enables us to make accurate
predictions, it doesn’t matter how (or even if) the collapse happens.
The applications of quantum mechanics to theoretical, experimental, and applied
physics, as well as to other fields of science and technology, do not require us
to solve the measurement problem. However, as practical as this approach is,
adopting it means ignoring deep and fundamental questions about the nature of
reality which, if answered, could have far­reaching consequences.
There is no collapse: This approach claims that collapse does not actually hap­
pen. The most well­known example of this approach is the Everett or “many­
worlds” interpretation, which gets rid of the collapse by considering the state of
every system to be part of a huge composite state which describes the entire
universe. Measurements then simply correspond to entangling two parts of that
composite state – the system being measured, and the observer. Instead of a col­
lapse, the observer is now in a superposition of having measured each eigenvalue.
For example, if I measured a qubit, I will then be in a superposition of “I mea­
sured 0” and “I measured 1”. This process is completely unitary (and invertible),
thus there is no collapse and no incompatibility with the Evolution Axiom.
It is a common misconception that the name “many worlds” means measurements
somehow “create” new “parallel universes”, one for each measurement outcome.
What really happens is that there is just one universe, but that universe is in
a superposition of many different possibilities – the sum total of every single
superposition of every individual system since the Big Bang. For example, a toy
universe made of 𝑛 qubits will be in a superposition of 2𝑛 different possibilities or
“parallel universes”. However, it’s important to stress that the defining property
of this interpretation is not the “many worlds” part – it is the “no collapse” part!
Let’s see how exactly this works. Say Alice is measuring a qubit. The individual
states of the qubit and Alice before the measurement are

|qubit⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |Alice⟩ = |Alice hasn’t measured yet⟩ . (4.200)

The composite state of both of them together before the measurement is thus

|Ψ1 ⟩ ≡ |qubit⟩ ⊗ |Alice⟩ = (𝑎 |0⟩ + 𝑏 |1⟩) ⊗ |Alice hasn’t measured yet⟩ . (4.201)

Notice that |Ψ1 ⟩ is separable – it is just a tensor product of the state of the qubit
with the state of Alice, and those states are independent of each other.
After the qubit is measured, the system undergoes evolution with a unitary oper­
ator 𝑈 into:
|Ψ2 ⟩ ≡ 𝑈 |Ψ1 ⟩ , (4.202)

112
|Ψ2 ⟩ = 𝑎 |0⟩ ⊗ |Alice measured 0⟩ + 𝑏 |1⟩ ⊗ |Alice measured 1⟩ . (4.203)

Intuitively, we can see that this evolution is unitary because it works similarly to a
CNOT gate; 𝑈 essentially checks the state of the qubit, and changes Alice’s state
accordingly. In problem 4.49 you will find the exact form of this unitary operator.
We can see that the new state |Ψ2 ⟩ is entangled – the states of the qubit and
Alice are now correlated.
We can think of each term in the superposition as a different “parallel universe”
or “world”, but this isn’t quite the same as the typical (incorrect) science­fiction
treatment of the many­worlds interpretation, since the two versions of Alice, the
Alice who measured 0 and the Alice who measured 1, can never communicate
with each other, and there is no sense in which you can “travel” from one “parallel
universe” to another – since you can’t change which term in the superposition you
are in!
Crucially, notice that in the calculation we did above, there is no collapse. It
looks like there is a collapse from the point of view of each of the Alices, since
the Alice who measured 0 can only access the qubit in the state |0⟩ (with which
she is entangled) and the Alice who measured 1 can only access the qubit in the
state |1⟩. However, the overall state of the qubit and Alice (and more broadly, of
the entire universe) in fact evolves in a way that is perfectly compatible with the
Evolution Axiom, and at no point does it reduce to a single eigenstate.
This interpretation is probably the most popular among the approaches which are
not Copenhagen or “shut up and calculate”. This is perhaps due to its simplicity
– it does not introduce any new assumptions, as most other interpretations do,
and in fact it even gets rid of an assumption, namely the collapse part of the
Measurement Axiom, so it arguably makes quantum theory even simpler.
However, it has several unresolved issues. One of its main problems is that it is
unclear where exactly probabilities come from. If I split into several observers
after the measurement, and the different versions of me collectively measured
every single possible outcome of the measurement, then why is the probability for
me to find myself as one observer different from the probability to find myself as
another observer? And what does this probability have to do with the coefficients
of the superposition?
Hidden variables: This approach is associated with interpretations such as
De Broglie–Bohm theory, which we already mentioned in section 4.2.4 and sec­
tion 4.3.6 in the context of non­locality.
To remind you, theories of hidden variables involve adding supplemental variables
which make the theory deterministic “behind the scenes”, but we can’t actually
know the values of these variables and use them to make deterministic predic­
tions, since they’re “hidden”. As the system is deterministic, there is no collapse.
One serious problem with this approach is, as we discussed earlier, that theories of

113
hidden variables tend to be complicated, and many physicists find them contrived
and ad­hoc. Therefore, if we subscribe to the principle of Occam’s razor, which
states that theories with less assumptions should be preferred, we should discard
hidden variables in favor of simpler interpretations.
Collapse models: This approach modifies quantum mechanics by adding an ac­
tual physical mechanism for collapse. This can be done by assuming that there
is a more general type of evolution, which is compatible with both unitary evo­
lution and collapse. Collapse models have the same problem as hidden variable
theories; they require additional assumptions and more complicated equations,
which are not necessarily justified except in that they give the desired results.
For example, one collapse model, the GRW model, assumes that quantum sys­
tems collapse spontaneously – at random, without any relation to measurements.
This happens very rarely, but when you have a big enough composite system
with a very large number of subsystems, it happens frequently enough to explain
collapse.

Problem 4.48. There are many other interpretations of quantum mechanics,


each attempting to solve the measurement problem in a different way. We will
not discuss them here, but you are encouraged to look them up and discuss them
with your classmates. Which interpretation is your favorite?

Problem 4.49. Find the unitary operator 𝑈 in equation (4.202). Treat Alice as a
3­state system with an orthonormal basis

|𝐴⟩ ≡ |Alice hasn’t measured yet⟩ , (4.204)

|𝐴0 ⟩ ≡ |Alice measured 0⟩ , (4.205)

|𝐴1 ⟩ ≡ |Alice measured 1⟩ . (4.206)

You can either write 𝑈 as an outer product representation, or as a matrix repre­


sented in the basis constructed from tensor products of the bases of each sys­
tem, namely |0⟩ , |1⟩ and |𝐴⟩ , |𝐴0 ⟩ , |𝐴1 ⟩. Hint: |𝐴⟩ , |𝐴0 ⟩ , |𝐴1 ⟩, represented in their
own basis, are just the standard basis vectors of ℂ3 . You may have to do some
guesswork regarding the precise form of 𝑈 . Prove that the operator 𝑈 that you
found is unitary and that it transforms |Ψ1 ⟩ into |Ψ2 ⟩.

4.5.7 Superposition Once Again: Schrödinger’s Cat

Suppose that, inside a box, there is a cat and a qubit in the state |+⟩:

1
|+⟩ = √ (|0⟩ + |1⟩) . (4.207)
2

114
Figure 4.2: Schrödinger’s Cat. Source: Found via Google Image
Search, original source unknown.

A measurement apparatus measures the qubit. If it measures 0 (with 50% prob­


ability), the cat dies46 . If it measures 1 (with 50% probability), the cat stays
alive. Therefore, the state of the cat is now a superposition of dead and alive
(see figure 4.2):
1
|cat⟩ = √ (|dead⟩ + |alive⟩) . (4.208)
2
Before we open the box and measure the state of the cat, is it “actually” dead,
or alive? A qubit being in a superposition of 0 and 1, compared to a classical bit
which can only be either 0 or 1, might not be intuitive, but it is nevertheless an
experimental fact. The thought of an animal being in a superposition of dead and
alive, on the other hand, seems absurd.
This thought experiment was suggested by Schrödinger in the early days of quan­
tum mechanics to illustrate this discrepancy between the quantum world (of ele­
mentary particles, atoms, qubits, and so on) and the classical world (of cats and
everything else we know from our daily life).
So what exactly is the difference between a qubit and a cat? Well, the qubit has
an infinite number of eigenbases, corresponding to measurements of spin up or
down along every possible direction – as we saw in section 4.2.2. All of these
eigenbases are completely equivalent; there is no preferred basis. So being in
an eigenstate of 𝜎𝑧 (|0⟩ or |1⟩) isn’t any more “natural” for the qubit than being in
an eigenstate of 𝜎𝑥 (|+⟩ or |−⟩, which can both be written as a superposition of |0⟩
and |1⟩).
However, the cat definitely has a preferred eigenbasis: the one composed of
eigenstates of the “is the cat alive” operator, namely |dead⟩ and |alive⟩. There

is no operator that has (|dead⟩ + |alive⟩) / 2 as one of its eigenstates (like 𝜎𝑥 is
to 𝜎𝑧 ). This is because the cat is not a two­state system; it is composed of
a huge number of entangled quantum particles that interact with each other in
complicated ways, and the Hilbert space required to describe the states of the
46
For example, poison is released into the box. This is just a thought experiment, please do not
attempt it at home!

115
system has many orders of magnitude more than two dimensions.
Now, even a qubit, which is described by a 2­dimensional Hilbert space, is al­
ready extremely fragile. As soon as it interacts with the environment, it gets
entangled with it, and loses its superposition and other quantum properties in a
process called quantum decoherence. This is one of the reasons it is so hard to
build quantum computers: qubits will inevitably interact with the environment,
since they cannot be completely isolated. There is a certain time, called the
decoherence time, after which different physical realizations of qubits undergo
decoherence; the time it takes the quantum gate to operate must be shorter
than the decoherence time.
It should therefore not be a surprise that the cat, which is incredibly more com­
plicated, is also incredibly harder to keep in a superposition. The cat is still a
quantum system, just like anything else in the universe, but it is so complicated,
that it can’t be in arbitrary states. Instead, with almost certain probability, it will
be in one of the states |dead⟩ or |alive⟩.
Finally, let us address two common misconceptions about Schrödinger’s cat. The
first one (which is also a misconception about quantum mechanics in general) is
that a conscious observer is needed to collapse the cat into being alive or dead.
In fact, consciousness plays no role whatsoever in quantum mechanics! There is
nothing special about conscious observers that unconscious measurement devices
do not have. In both cases, the interaction of the quantum system with a larger
system – whether it’s a human or a particle detector – causes it to undergo
decoherence and appear classical.
The second misconception occurs when Schrödinger’s cat is invoked in any situ­
ation where the state of something is unknown until it is measured. Usually this
takes the form of “Schrödinger’s X” for some X. For example, I heard the term
“Schrödinger’s millionaire” being used to describe someone who has a lottery
ticket which they have not yet checked to see if it’s the winning ticket; there­
fore, that person is “both a millionaire and not a millionaire until the ticket is
checked”. However, the fact that you don’t know the state of something until
you measure it is completely trivial, and has nothing to do with Schrödinger’s cat,
or even with quantum mechanics in general. The purpose of the Schrödinger’s
cat thought experiment is to illustrate the difference between the classical and
quantum worlds.

4.6 The No­Cloning Theorem and Quantum Teleportation

4.6.1 The No­Cloning Theorem

The no­cloning theorem states that it is impossible to make a copy of an unknown


quantum state. Note that it is possible, in principle, to generate a known quan­

116
tum state as many times as we want; all we need to do is repeat whatever process
is known to generate that state. However, if someone gives you an unknown
quantum state |Ψ⟩ and doesn’t tell you anything about it, the no­cloning theorem
states that you will never be able to make another copy of |Ψ⟩.
To prove the theorem, let us assume that we have a “copying operator” 𝑈 which
gets a tensor product of two states as input, and copies the state from the first
slot into the second slot:

𝑈 ( |Ψ⟩ ⊗ |?⟩) = |Ψ⟩ ⊗ |Ψ⟩ . (4.209)

The second state |?⟩ in the input can be anything – it doesn’t matter what it was
originally, since it will be overwritten with the state |Ψ⟩ that we are copying.
We are looking for a universal copying operator, which can copy any state |Ψ⟩,
even if we don’t know in advance what the state is. If this operator only works for
a specific state |Ψ⟩, that means we must know what |Ψ⟩ is in advance, in order
to choose the specific 𝑈 that copies it. Let us use 𝑈 to copy two states, |Ψ1 ⟩ and
|Ψ2 ⟩:
𝑈 ( |Ψ1 ⟩ ⊗ |?⟩) = |Ψ1 ⟩ ⊗ |Ψ1 ⟩ , (4.210)

𝑈 ( |Ψ2 ⟩ ⊗ |?⟩) = |Ψ2 ⟩ ⊗ |Ψ2 ⟩ . (4.211)

We can take the inner product of the last two equations by turning the second
equation into a bra:

( ⟨Ψ2 | ⊗ ⟨?|) 𝑈 † 𝑈 ( |Ψ1 ⟩ ⊗ |?⟩) = ( ⟨Ψ2 | ⊗ ⟨Ψ2 |) ( |Ψ1 ⟩ ⊗ |Ψ1 ⟩) . (4.212)

By the Evolution Axiom, 𝑈 must be a unitary operator, so we have 𝑈 † 𝑈 = 1:

( ⟨Ψ2 | ⊗ ⟨?|) ( |Ψ1 ⟩ ⊗ |?⟩) = ( ⟨Ψ2 | ⊗ ⟨Ψ2 |) ( |Ψ1 ⟩ ⊗ |Ψ1 ⟩) . (4.213)

The inner product can be calculated using equation (4.55):

⟨Ψ2 |Ψ1 ⟩⟨?|?⟩ = ⟨Ψ2 |Ψ1 ⟩⟨Ψ2 |Ψ1 ⟩. (4.214)

On the right­hand side, we have ⟨Ψ2 |Ψ1 ⟩⟨Ψ2 |Ψ1 ⟩ = ⟨Ψ2 |Ψ1 ⟩2 :

⟨Ψ2 |Ψ1 ⟩⟨?|?⟩ = ⟨Ψ2 |Ψ1 ⟩2 . (4.215)

Finally, even though we haven’t specified the state |?⟩ (since we don’t care what
it is), we still know it must be normalized such that ⟨?|?⟩ = 1, since otherwise it
won’t be a proper state. Therefore, we obtain:

⟨Ψ2 |Ψ1 ⟩ = ⟨Ψ2 |Ψ1 ⟩2 . (4.216)

117
This is a quadratic equation, so it has two solutions:

• The first solution is ⟨Ψ2 |Ψ1 ⟩ = 1, in which case the states must be the same
state: |Ψ1 ⟩ = |Ψ2 ⟩. So 𝑈 is a copying operator that can only copy one specific
state, in contradiction with our requirement above that 𝑈 is universal.

• The second solution is ⟨Ψ2 |Ψ1 ⟩ = 0, in which case |Ψ1 ⟩ and |Ψ2 ⟩ must be
orthogonal. Again, this means that 𝑈 cannot be universal, since it can only
copy states that are orthogonal to a specific state, and thus we cannot clone
an unknown quantum state.

In conclusion, we have proven that it is impossible to find a unitary operator 𝑈


that can clone any arbitrary state |Ψ⟩.
By the way, this is one of the reasons quantum computers are so hard to build. In
a classical computer, we can just make several copies of each bit, and use that for
error correction in case the bit gets corrupted. In a quantum computer, we cannot
do that, since we cannot make copies of a qubit, due to the no­cloning theorem.
Still, quantum error correction is possible – but it is much more complicated.
Problem 4.50. The opposite of the no­cloning theorem is the no­deleting theo­
rem, which states that given two identical copies47 of the same unknown quan­
tum state, one can never delete one of them and end up with just one copy. Prove
the no­deleting theorem.
Problem 4.51. Remarkably, if cloning a quantum state was possible, it would
have allowed faster­than­light communication! Assuming Alice and Bob each
have one qubit of an entangled Bell state, and Bob can make as many copies
of his qubit as he wants, show that it is possible for Alice to send Bob a message
instantaneously, regardless of the distance between them.
This result is especially noteworthy due to the fact that we can freely copy clas­
sical bits, and yet classical correlation definitely does not allow instantaneous
communication. This demonstrates something very special about quantum en­
tanglement, which does not apply to classical correlation.

4.6.2 Quantum Teleportation

We discovered that it is impossible to copy a quantum state, which is quite sur­


prising. Quantum teleportation is another surprising discovery, which also serves
to illustrate the powerful consequences of entanglement. We begin with the Bell
state (4.102):
1
|𝛽00 ⟩ ≡ √ (|00⟩ + |11⟩) , (4.217)
2
47
Of course, deleting just one copy of a quantum state is trivial – all you need to do is measure
it!

118
where we again used the shorthand notation |𝑥𝑦⟩ ≡ |𝑥⟩ ⊗ |𝑦⟩. Alice takes the first
qubit, Bob takes the second, and they go their separate ways. In this entangled
state, if Alice measures 0, Bob will also measure 0, and if Alice measures 1, Bob
will also measure 1.
Later, Alice receives an arbitrary qubit
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1, (4.218)

but she does not know the state of the qubit, that is, the coefficients 𝑎 and 𝑏.
Alice needs to transfer this unknown qubit in its entirety to Bob using only two
classical bits. This seems impossible, for two different reasons:

1. The exact state of the qubit is determined by the two arbitrary complex
numbers 𝑎 and 𝑏. Even if Alice did know the values of these numbers, trans­
ferring that information requires much more than two classical bits – in fact,
to transmit the precise value of an arbitrary complex (or even real) number,
an infinite number of bits are required.

2. Even if Alice was somehow able to magically transmit two complex numbers
using only two classical bits, there is no way she could determine the values
of 𝑎 and 𝑏 in the first place. Any measurement that Alice makes on her qubit
will simply result in either 0 or 1; it does not tell Alice anything about the
probabilities, not to mention the probability amplitudes. To get information
about the probabilities, Alice must make a large number of measurements
(in fact, an infinite number of them, if she wants to know the precise values of
the probabilities). However, this is impossible due to the no­cloning theorem;
Alice can only measure the qubit once, and that’s it.

To make the impossible possible, Alice can use the fact that her half of the Bell
state is entangled with Bob’s half. All three qubits can be represented together
by the composite state

|𝛾⟩ ≡ |Ψ⟩ ⊗ |𝛽00 ⟩


1
= √ (𝑎 |0⟩ + 𝑏 |1⟩) ⊗ (|00⟩ + |11⟩)
2
1
= √ (𝑎 (|000⟩ + |011⟩) + 𝑏 (|100⟩ + |111⟩)) ,
2

where we used the shorthand notation

|𝑥𝑦𝑧⟩ ≡ |𝑥⟩ ⊗ |𝑦⟩ ⊗ |𝑧⟩ . (4.219)

Here the first qubit is the one that is to be teleported from Alice to Bob, the second
is Alice’s half of the Bell state, and the third is Bob’s half.

119
First, Alice sends the first qubit (the unknown qubit |Ψ⟩) and the second qubit (her
half of the Bell state) through a CNOT gate, which as you recall, flips the second
qubit only if the first qubit is |1⟩:

1
CNOT1,2 |𝛾⟩ = √ (𝑎 (|000⟩ + |011⟩) + 𝑏 (|110⟩ + |101⟩))
2
1
= √ (𝑎 |0⟩ ⊗ (|00⟩ + |11⟩) + 𝑏 |1⟩ ⊗ (|10⟩ + |01⟩)) .
2

Here we used the notation CNOT1,2 to indicate that the gate only acts on qubits 1
and 2 out of the three qubits. Explicitly, this would be the tensor product of the
CNOT gate on the left with the 2 × 2 identity matrix on the right:

1 0 0 0

⎜ ⎞

0 1 0 0 1 0
CNOT1,2 ≡ CNOT ⊗ 1 = ⎜




⎟ ⊗( ). (4.220)
⎜ 0 0 0 1 ⎟ 0 1
⎝ 0 0 1 0 ⎠

Next, she sends the first qubit through the Hadamard gate, which as you recall,
√ √
takes |0⟩ to |+⟩ ≡ (|0⟩ + |1⟩) / 2 and |1⟩ to |−⟩ ≡ (|0⟩ − |1⟩) / 2:

1
𝐻1 ⋅ CNOT1,2 |𝛾⟩ = √ (𝑎 |+⟩ ⊗ (|00⟩ + |11⟩) + 𝑏 |−⟩ ⊗ (|10⟩ + |01⟩))
2
1
= (𝑎 (|0⟩ + |1⟩) ⊗ (|00⟩ + |11⟩) + 𝑏 (|0⟩ − |1⟩) ⊗ (|10⟩ + |01⟩))
2
1
= (𝑎 ((|000⟩ + |011⟩) + |100⟩ + |111⟩) + 𝑏 (|010⟩ + |001⟩ − |110⟩ − |101⟩))
2
1
= 𝑎 (|00⟩ ⊗ |0⟩ + |01⟩ ⊗ |1⟩ + |10⟩ ⊗ |0⟩ + |11⟩ ⊗ |1⟩) +
2
1
+ 𝑏 (|01⟩ ⊗ |0⟩ + |00⟩ ⊗ |1⟩ − |11⟩ ⊗ |0⟩ − |10⟩ ⊗ |1⟩) .
2
Again, the notation 𝐻1 means we act with the Hadamard gate only on the first
qubit:

1 1 1 1 0 1 0
𝐻1 ≡ 𝐻 ⊗ 1 ⊗ 1 = √ ( )⊗( )⊗( ). (4.221)
2 1 −1 0 1 0 1

120
We can rearrange the transformed state as follows:

1
𝐻1 ⋅ CNOT1,2 |𝛾⟩ = |00⟩ ⊗ (𝑎 |0⟩ + 𝑏 |1⟩) +
2
1
+ |01⟩ ⊗ (𝑎 |1⟩ + 𝑏 |0⟩) +
2
1
+ |10⟩ ⊗ (𝑎 |0⟩ − 𝑏 |1⟩) +
2
1
+ |11⟩ ⊗ (𝑎 |1⟩ − 𝑏 |0⟩) .
2
Finally, Alice performs a measurement on the first two qubits (the one to be
teleported, and her half of the Bell state), and obtains one of four results: 00, 01,
10, or 11. These are two classical bits, which she can then send to Bob. With
this information, Bob can read from the last equation exactly which operations
he has to perform on his qubit (which you will determine in problem 4.53) in
order to obtain the original qubit |Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩. The qubit has been successfully
teleported from Alice to Bob!
Note that since Alice measured the original qubit, it collapsed and its quantum
state has been destroyed. Therefore, quantum teleportation does not violate the
no­cloning theorem; the state of the qubit was not cloned or copied, it was just
moved from one qubit to another. Also, since Alice had to send two classical bits
to Bob – for example through a cable or radio waves – the speed of teleportation
is limited by the speed of light, and there is no violation of relativity.
Finally, since quantum teleportation requires Alice and Bob to already have one
half of an entangled pair each, and the entanglement is destroyed in the process
due to Alice’s measurement, the number of qubits they can teleport is limited by
the number of entangled pairs they have. Once they run out of entangled pairs,
they can no longer teleport any qubits until they physically exchange more en­
tangled pairs. This means that you can’t just establish two teleportation stations
on, say, two planets, and teleport qubits between them forever; you will have
to actually send a spaceship from one planet to the other with a fresh supply of
entangled particles every once in a while.
Problem 4.52. Quantum teleportation has been demonstrated experimentally in
many different experiments, over distances of up to 1400 km, and not just with
qubits but even with more complicated systems. Whenever a new quantum tele­
portation experiment happens, articles appear in the media with sensationalist
headlines such as “scientists demonstrate teleportation is possible!” or “is tele­
portation closer than we think?”, where by “teleportation” they actually mean the
science­fiction concept of “teleportation”, where a macroscopic object is sent
from one place to another without going through the space in between. Is the
word “teleportation” in “quantum teleportation” indeed justified? In what ways
is quantum teleportation the same as science­fiction teleportation, and in what

121
ways is it different? Think about this, and discuss with your classmates.
Problem 4.53. For each of the four results of Alice’s measurement, 00, 01, 10,
and 11, determine which unitary transformations Bob must perform on his qubit
in order to obtain the original |Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩.
Problem 4.54. Write a computer program48 that gets an arbitrary composite
state of 𝑁 qubits49 as input and allows the user to perform the following actions:

• Analyze whether or not two of the qubits are entangled.

• Act on one or more of the qubits with a quantum gate; for example, act with
Hadamard on one qubit or with CNOT on two qubits.

• Simulate a measurement of one or more of the qubits as dictated by the Pro­


jective Measurement Axiom, with the result determined randomly according
to the appropriate probability distribution, and the state collapsing after the
measurement according to the value that was measured.

Use your program to simulate quantum teleportation, and show that it indeed
works.

4.7 The Foundations of Quantum Theory: Summary

Quantum theory is a fundamental mathematical framework for describing physical


systems in our universe. For discrete systems, which have finite­dimensional
Hilbert spaces, we defined this framework using a set of seven axioms:

1. The System Axiom: Discrete physical systems are represented by complex


𝑛­dimensional Hilbert spaces ℂ𝑛 , where 𝑛 depends on the specific system.

2. The State Axiom: The states of the system are represented by unit 𝑛­
vectors in the system’s Hilbert space, up to a complex phase.

3. The Operator Axiom: The operators on the system, which act on states
to produce other states, are represented by 𝑛 × 𝑛 matrices in the system’s
Hilbert space.

4. The Observable Axiom: Physical observables in the system are repre­


sented by Hermitian operators on the system’s Hilbert space. The eigenval­
ues of the observable (which are always real, since it’s Hermitian) represent
its possible measured values. The eigenstates of the observable can be used
to form an orthonormal eigenbasis of the Hilbert space.
48
As in problem 3.65, I recommend either Mathematica or Python, but feel free to use whatever
language you like.
49
If the general case of 𝑁 qubits proves to be too hard, you may write code that only works for
the specific case of 𝑁 = 3.

122
• Superposition: Any state |Ψ⟩ can be written as a linear combination of
the eigenstates |𝐵𝑖 ⟩ of an observable:
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (4.222)
𝑖=1

5. The Composite System Axiom: The Hilbert space of a composite system


is represented by the tensor product of the individual systems.

• Entanglement: A state of a composite system that cannot be written as


a single tensor product of states of the individual systems is entangled.
Quantum entanglement is a form of correlation between systems, and
by Bell’s theorem, it is stronger than classical correlation.

6. The Evolution Axiom: If the system is in the state |Ψ1 ⟩ at some point in
time, and in another state |Ψ2 ⟩ at another point in time, then the two states
must be related by the action of some unitary operator 𝑈 :

|Ψ2 ⟩ = 𝑈 |Ψ1 ⟩ . (4.223)

7. The Measurement Axiom: Consider an observable 𝐴 of the form


𝑛
𝐴 = ∑ 𝜆𝑖 𝑃𝑖 . (4.224)
𝑖=1

If the system is in the state |Ψ⟩, then the probability to measure the eigen­
value 𝜆𝑖 is given by
⟨Ψ|𝑃𝑖 |Ψ⟩. (4.225)

After the measurement, if the eigenvalue 𝜆𝑖 was measured, then the system
will collapse to the state
𝑃𝑖 |Ψ⟩
|Ψ⟩ ↦ . (4.226)
√⟨Ψ|𝑃𝑖 |Ψ⟩

• The Simplified Measurement Axiom: Consider an observable with an


eigenbasis of non­degenerate eigenstates |𝐵𝑖 ⟩ corresponding to eigen­
values 𝜆𝑖 . If the system is in the state |Ψ⟩, then the probability to mea­
sure the eigenvalue 𝜆𝑖 corresponding to the eigenstate |𝐵𝑖 ⟩ is given by
2
|⟨𝐵𝑖 |Ψ⟩| . (4.227)

After the measurement, if the eigenvalue 𝜆𝑖 was measured, then the


system will collapse to the eigenstate |𝐵𝑖 ⟩:

|Ψ⟩ ↦ |𝐵𝑖 ⟩ . (4.228)

123
If a measurement is performed only on part of a composite system,
the total probability to measure the eigenvalue 𝜆𝑖 is the sum of the
probabilities for all the possible ways in which this eigenvalue can be
measured. After the measurement, if the eigenvalue 𝜆𝑖 was measured,
then only the system we measured will collapse to the eigenstate |𝐵𝑖 ⟩,
while the other systems will stay in a superposition.
• Expectation Value: If the system is in the state |Ψ⟩, the expectation
value for the measurement of the observable 𝐴 is given by ⟨Ψ|𝐴|Ψ⟩.
• Uncertainty Principle: If two observables 𝐴 and 𝐵 don’t commute,
the standard deviations of their measurements satisfy the uncertainty
relation
1
Δ𝐴Δ𝐵 ≥ |⟨[𝐴, 𝐵]⟩| . (4.229)
2
The mathematical framework we have defined here is not enough on its own; one
must use the framework to define different models, which map the framework to
specific physical systems. A model is a specific choice of the following ingredients:

• A Hilbert space describing a specific physical system,

• Hermitian operators corresponding to specific physical observables that may


be measured for the system,

• Unitary operators corresponding to the time evolution and other possible


transformations of the system,

• The states on which these operators act, which correspond to different con­
figurations of the system.

In the simple case of a qubit, we saw that the Hilbert space is ℂ2 , the Hermitian
operators corresponding to observables are linear combinations of the Pauli ma­
trices, the unitary operators corresponding to transformations are the quantum
gates, and the states are the two possible values of the qubits, 0 and 1 (and
superpositions thereof).
Of course, not every possible model we can make will actually correspond to a
physical system that we can find in nature. However, amazingly, the opposite
statement does seem to be true: every physical system that we find in nature50
can be precisely described by a model built using the ingredients of quantum
theory.
We can think of quantum theory as a sort of language. Just like English is
a language with rules such as grammar and spelling, so is quantum theory a
50
Except perhaps general relativity, but we are pretty sure that there is a quantum theory of
general relativity, we just don’t have a consistent formulation of it yet. If time permits, we will
discuss this theory – quantum gravity – at the end of this course.

124
language with its own rules: observables must be Hermitian operators, possible
measurement results are given by the eigenvalues of these operators, and so on.
And just like we can use English to make any sentence we want, both true and
false, we can use quantum theory to make any model we want, both models that
correspond to real physical systems and those that do not.

5 Continuous Quantum Systems

Quantum mechanics is a confusing and unintuitive theory, and requires the in­
troduction of many new concepts. One of the main goals of this course is to
introduce quantum mechanics to students in a way that is mathematically as sim­
ple as possible, so that they won’t have to struggle with complicated math on top
of trying to understand new physical concepts.
It is quite remarkable that we have managed to describe all of the axioms of
quantum theory, and almost all of its important aspects such as superposition,
entanglement, and the uncertainty principle, using only linear algebra – without
any calculus. Moreover, by focusing on discrete two­state systems, or qubits, we
actually managed to do everything almost exclusively in ℂ2 , the simplest non­
trivial complex vector space.
Unfortunately, in real life not all systems are discrete, and the time has finally
come to start introducing some calculus and talking about continuous quantum
systems, which are described by infinite­dimensional Hilbert spaces. However,
the student may take comfort in the fact that this is going to be merely a straight­
forward generalization of what we’ve already learned. The only real difference is
that now states are going to be functions instead of vectors, and operators are
going to be derivatives instead of matrices.

5.1 Mathematical Preliminaries

5.1.1 Exponentials and Logarithms

The exponential function is defined on arbitrary complex numbers 𝑧 ∈ ℂ using a


power series as follows:

𝑧𝑛 1 1
e𝑧 ≡ ∑ = 1 + 𝑧 + 𝑧2 + 𝑧3 + ⋯ . (5.1)
𝑛=0
𝑛! 2 3!

The complex number 𝑧 is called the exponent. If the exponent is zero, then all the
terms in the series vanish except the first one, and we get e0 = 1. If the exponent
is a natural number 𝑛 ∈ ℕ, then (5.1) turns out to be the same as taking the real

125
number51 e ≈ 2.718 to the power of 𝑛, that is, multiplying it by itself 𝑛 times. This
can then be expanded to negative integers using the formula

1
e−𝑛 ≡ , (5.2)
e𝑛
and to rational numbers using
𝑎 √
e𝑎/𝑏 ≡ e𝑎 .
𝑏
∈ℚ ⟹ (5.3)
𝑏
However, for arbitrary real or complex numbers, we generally use the power
series definition (5.1) directly, or an equivalent definition such as the ones you
will prove in problems 5.2 and 5.3 below.
By taking the complex conjugate of the series (5.1), we get:
∗ ∗
(e𝑧 ) = e𝑧 , (5.4)

so the conjugate operation commutes with taking the exponential. In particular,


given a complex number in the polar representation (see section 3.1.4), we have

𝑧 = 𝑟 ei 𝜙 ⟹ 𝑧 ∗ = 𝑟 e− i 𝜙 , 𝑟, 𝜙 ∈ ℝ. (5.5)

This indeed makes sense, as taking the complex conjugate mean reflecting 𝑧
across the real line, and thus turns the angle 𝜙, which is the angle with respect
to the real line, into its negative – see 3.1.
One can also prove from the definition (5.1) that e𝑧+𝑤 = e𝑧 e𝑤 , so

𝑒i 𝜙 𝑒− i 𝜙 = 𝑒i 𝜙−i 𝜙 = e0 = 1. (5.6)

Therefore the magnitude of 𝑧 is


√ √ √
|𝑧| = 𝑧𝑧∗ = 𝑟𝑒i 𝜙 ⋅ 𝑟𝑒− i 𝜙 = 𝑟2 = 𝑟, (5.7)

as expected.
The exponential function is its own derivative:

d 𝑧
e = e𝑧 . (5.8)
d𝑧
In fact, it can be defined using this property, as you will prove in problem 5.1.
Using the chain rule, we get the more general result

d 𝜆𝑧 d
e = (𝜆𝑧) e𝜆𝑧 = 𝜆 e𝜆𝑧 , (5.9)
d𝑧 d𝑧
51
But of course, in order to know the value of the number e in the first place, we need to calculate
the power series (5.1) for 𝑧 = 1!

126
where 𝜆 is any constant complex number (i.e. independent of 𝑧).
The inverse function of the exponential is the logarithm:

𝑤 = e𝑧 ⟺ 𝑧 = log 𝑤, elog 𝑧 = log e𝑧 = 𝑧. (5.10)

This is also called the natural logarithm, since it is taken with respect to the
“natural” base e ≈ 2.718. More generally, a logarithm with respect to the base 𝑏
satisfies
𝑤 = 𝑏𝑧 ⟺ 𝑧 = log𝑏 𝑤, 𝑏log𝑏 𝑧 = log𝑏 𝑏𝑧 = 𝑧. (5.11)

For a general base 𝑏 we have

d 𝑧
𝑏 = 𝑏𝑧 loge 𝑏, (5.12)
d𝑧
and the extra term vanishes when 𝑏 = e, since loge e = 1. This explains why the
base e ≈ 2.718 is “natural”; it is the unique base for which the function 𝑏𝑧 is its
own derivative, without the extra term. Sometimes the notation ln is also used
for the natural logarithm: ln ≡ log𝑒 . Since 𝑏 = eln 𝑏 , the power series definition
(5.1) can be used to define the exponential of any base 𝑏 with respect to arbitrary
complex numbers 𝑧 using the formula
𝑧
𝑏𝑧 = (eln 𝑏 ) = e𝑧 ln 𝑏 . (5.13)

Problem 5.1. By assuming a generic power series expansion



𝑓 (𝑧) = ∑ 𝑎𝑛 𝑧𝑛 , (5.14)
𝑛=0

prove that if 𝑓 (𝑧) is its own derivative, then it must be the exponential function,
i.e. 𝑎𝑛 = 1/𝑛!.

Problem 5.2. The power series expansions of the trigonometric functions cos 𝑥
and sin 𝑥 are 𝑛

(−1) 2𝑛 1 1
cos 𝑥 ≡ ∑ 𝑥 = 1 − 𝑥2 + 𝑥4 + ⋯ , (5.15)
𝑛=0
(2𝑛)! 2 4!
∞ 𝑛
(−1) 1 1
sin 𝑥 ≡ ∑ 𝑥2𝑛+1 = 𝑥 − 𝑥3 + 𝑥5 + ⋯ . (5.16)
𝑛=0
(2𝑛 + 1)! 3! 5!

Use them to prove Euler’s formula

ei 𝑥 = cos 𝑥 + i sin 𝑥. (5.17)

127
As a corollary, show that

ei 𝑥 + e− i 𝑥
cos 𝑥 = Re (ei 𝑥 ) = , (5.18)
2

ei 𝑥 − e− i 𝑥
sin 𝑥 = Im (ei 𝑥 ) = . (5.19)
2i
Problem 5.3. The binomial theorem states that for 𝑥, 𝑦 ∈ ℂ and 𝑛 ∈ ℕ:
𝑛
𝑛 𝑛
(𝑥 + 𝑦) = ∑ ( )𝑥𝑛−𝑘 𝑦𝑘 , (5.20)
𝑘=0
𝑘

where the binomial coefficients are defined as


𝑛 𝑛!
( )≡ . (5.21)
𝑘 𝑘! (𝑛 − 𝑘)!

So, explicitly, we have

𝑛 1
(𝑥 + 𝑦) = 𝑥𝑛 + 𝑛𝑥𝑛−1 𝑦 + 𝑛 (𝑛 − 1) 𝑥𝑛−2 𝑦2 + ⋯ . (5.22)
2
Using the binomial theorem and the power series definition of the exponential
(5.1), prove the equivalent definition

𝑧 𝑛
e𝑧 = lim (1 + ) . (5.23)
𝑛→∞ 𝑛

5.1.2 Matrix and Operator Exponentials

To generalize the exponential to complex matrices 𝐴, we define the matrix expo­


nential:

𝐴𝑛 1 1
e𝐴 ≡ ∑ = 1 + 𝐴 + 𝐴2 + 𝐴3 + ⋯ , (5.24)
𝑛=0
𝑛! 2 3!

where 1 is now the identity matrix, and 𝐴𝑛 means the product of the matrix 𝐴
with itself 𝑛 times. Note that it satisfies

e0 = 1, (5.25)

that is, the exponential of the zero matrix is the identity matrix, in analogy with
the fact that the exponential of zero is one. It also satisfies, as you will prove in
problem 5.4,
† †
(e𝐴 ) = e𝐴 , (5.26)

in analogy with equation (5.4).


Things start to become more complicated when we consider the product of two

128
exponentials, e𝐴 e𝐵 . For numbers (which can be considered 1 × 1 matrices) we
have e𝑧 e𝑤 = e𝑧+𝑤 , but to prove that, we used the fact that numbers commute.
For arbitrary 𝑛 × 𝑛 matrices 𝐴 and 𝐵, it is in general not true that e𝐴 e𝐵 = e𝐴+𝐵 .
However, in problem 5.5 you will prove that this identity is true if [𝐴, 𝐵] = 0, that
is, if 𝐴 and 𝐵 commute.
The matrix logarithm is the inverse function of the matrix exponential:

𝐵 = e𝐴 ⟺ 𝐴 = log 𝐵, elog 𝐴 = log e𝐴 = 𝐴. (5.27)

This is analogous to equation (5.10). However, although every complex number


has a logarithm52 , a matrix has a logarithm if and only if it is invertible. One of
the directions of this proof is easy: if a matrix 𝐵 has a logarithm 𝐴 = log 𝐵, then
we can write 𝐵 = e𝐴 , and then the inverse will be 𝐵−1 = e−𝐴 . We won’t prove the
other direction here.
Recall that we said that Hermitian matrices are analogous to real numbers, and
unitary matrices are analogous to complex numbers with unit norm. Since for real
𝜙 we have ∣ei 𝜙 ∣ = 1, we should expect that the exponential of i times a Hermitian
matrix will be a unitary matrix. Let 𝐻 be a Hermitian matrix and let 𝑡 ∈ ℝ be a
real number, and let us define53

𝑈 ≡ e− i 𝐻𝑡 . (5.29)

To prove that 𝑈 is unitary, let us take its adjoint:


† ∗
𝐻 † 𝑡∗
𝑈 † = (e− i 𝐻𝑡 ) = e− i = ei 𝐻𝑡 , (5.30)

since 𝐻 † = 𝐻 due to 𝐻 being Hermitian, 𝑡∗ = 𝑡 due to 𝑡 being real, and i = − i.
Thus:
𝑈 𝑈 † = e− i 𝐻𝑡 ei 𝐻𝑡 = e0 = 1, (5.31)

so e− i 𝐻𝑡 is indeed unitary. Note that here we used the fact that 𝐻 commutes with
itself, and therefore the product of the exponentials is the exponential of the sum,
as we discussed above. In fact, since all unitary matrices are invertible, and all
invertible matrices have a logarithm, any unitary matrix 𝑈 can be written as e− i 𝐻
52
In fact, every complex number has an infinite number of logarithms. The arbitrary complex
number 𝑧 = 𝑟 ei 𝜙 can also be written as 𝑧 = 𝑟 ei(𝜙+2𝜋𝑛) for all integer 𝑛, since adding a multiple of
2𝜋 to the angle 𝜙 results in the same angle. Thus we have

log 𝑧 = log (𝑟 ei(𝜙+2𝜋𝑛) ) = log 𝑟 + log ei(𝜙+2𝜋𝑛) = log 𝑟 + i (𝜙 + 2𝜋𝑛) , (5.28)

where 𝑛 can be any integer. Here we used the identity log (𝑎𝑏) = log 𝑎 + log 𝑏, which follows from
the identity e𝑧 e𝑤 = e𝑧+𝑤 .
53
The minus sign here is a convention; the inverse matrix, ei 𝐻𝑡 , is of course unitary as well.

129
for some Hermitian matrix 𝐻, where

𝐻 = i log 𝑈 ⟹ 𝑈 = e− i 𝐻 . (5.32)

Finally, equation (5.9) generalizes to matrices as well:

d 𝐴𝑡
e = 𝐴 e𝐴𝑡 , (5.33)
d𝑡
where 𝐴 is any constant complex matrix (i.e. independent of 𝑡).
Everything we described here was defined for matrices; however, it actually also
applies to general operators on any Hilbert space – and in the infinite­dimensional
case it is less convenient to think about operators as matrices, since those matri­
ces would be infinite­dimensional as well. The operator exponential is defined is
exactly the same way as the matrix exponential, with the identity matrix replaced
by the identity operator (which does not change the state it acts on), the power
𝐴𝑛 means the operator 𝐴 is applied 𝑛 times, and so on.
† †
Problem 5.4. Prove that (e𝐴 ) = e𝐴 .
Problem 5.5. Prove that if two matrices 𝐴 and 𝐵 commute, that is, [𝐴, 𝐵] = 0,
then
e𝐴 e𝐵 = e𝐴+𝐵 . (5.34)

One way to do this is using the power series definition (5.24).


Problem 5.6.
A. Prove that the exponential of a diagonal matrix with diagonal elements 𝜆𝑖 is a
diagonal matrix with diagonal elements e𝜆𝑖 :

𝜆1 0 0 e𝜆1 0 0
⎛ ⎞ ⎛ ⎞
exp ⎜
⎜ 0 ⋱ ⎟
⎟=⎜
⎜ 0 ⋱ ⎟
⎟. (5.35)
⎝ 0 0 𝜆𝑛 ⎠ ⎝ 0 0 e𝜆𝑛 ⎠

B. Prove that if 𝐵 is an invertible matrix, then for any matrix 𝐴 :


−1
e𝐵𝐴𝐵 = 𝐵 e𝐴 𝐵−1 . (5.36)

C. Using (A) and (B), prove that if 𝐴 is diagonalizable, that is, 𝐴 = 𝑃 𝐷𝑃 −1 for
some matrix 𝑃 and a diagonal matrix 𝐷 (recall section 3.2.16), then

e𝐴 = 𝑃 e𝐷 𝑃 −1 . (5.37)

This gives us a straightforward way to calculate the exponential of any diago­


nalizable matrix (and in particular every normal matrix, since they are always
diagonalizable).

130
Problem 5.7. Find the matrix exponential e− i 𝜃𝜎𝑦 where 𝜎𝑦 is the Pauli matrix

0 −i
𝜎𝑦 ≡ ( ). (5.38)
i 0

Does the result look familiar?

5.2 Continuous Time Evolution, Hamiltonians, and the Schrödinger


Equation

When we described the Evolution Axiom, we only talked about evolution from one
discrete point in time to another. As a first step towards quantum mechanics of
continuous systems, let us discuss time evolution with a continuous time variable.

5.2.1 The Schrödinger Equation and Hamiltonians: Preface

Usually, in introductory quantum mechanics courses, the Schrödinger equation


is introduced at the very beginning – as a fundamental postulate, without any
explanation or motivation. The student is simply told that this is the equation
they are going to be working with, and typically, much of the rest of the course
consists of solving the Schrödinger equation for different systems.
In this course, I chose to do the exact opposite, and introduce the Schrödinger
equation only at the very end of the course. The reason is that the Schrödinger
equation is actually not a fundamental component of the modern 21st­century
formulation of quantum theory54 . What is truly fundamental about quantum the­
ory is what we spent the majority of this course studying: the abstract formulation
of the theory in terms of Hilbert spaces, states, and operators, using the axioms
we have presented above. The Schrödinger equation turns out to be merely a
special case of the Evolution Axiom – which, as you recall, simply says that
quantum states evolve by the action of unitary operators.
The Evolution Axiom applies to any kind of evolution, whether in time or due
to some transformation performed on the system; and with regards to evolution
in time, the time variable can be either discrete or continuous. Quantum gates,
which are the main type of evolution we have seen so far, correspond to discrete
time evolution – the qubit is in some state now, and will be in another state after
it passes through the gate, but these are two discrete points in time, and nothing
of interest is happening in the gap between them.
In the specific case when the evolution is the system’s natural evolution in time
(so not a result of some explicit transformation, like a rotation) and with respect
54
In fact, entire books have been written about fields such as quantum computation and quantum
gravity without mentioning the Schrödinger equation even once!

131
to a continuous time variable, it is useful in practice to replace the Evolution
Axiom, which is very abstract, with the Schrödinger equation, which is a concrete
differential equation that can be solved, either exactly or approximately, for a
variety of different systems. The focus then shifts from the unitary evolution
operator of the Evolution Axiom to a Hermitian operator called the Hamiltonian.
We will see below precisely how these two operators are related to each other.
To further illustrate the fact that the Evolution Axiom is more fundamental than
the Schrödinger equation, consider the fact that the Evolution Axiom is an almost
inevitable result of the mathematical framework of quantum theory – indeed, if
quantum states evolved with non­unitary operators, then probabilities would no
longer sum to 1, and the theory wouldn’t make any sense. While the Schrödinger
equation also preserves probabilities (as it must), this fact is not immediately
obvious from the form of the equation.

5.2.2 Derivation of the Schrödinger Equation

Let us recall the Evolution Axiom from section 4.5.1, with slightly different nota­
tion. If the system is in the state |Ψ (𝑡1 )⟩ at time 𝑡1 , and in another state |Ψ (𝑡2 )⟩
at time 𝑡2 , then the two states must be related by the action of some unitary
operator 𝑈 (𝑡2 ← 𝑡1 ):
|Ψ (𝑡2 )⟩ = 𝑈 (𝑡2 ← 𝑡1 ) |Ψ (𝑡1 )⟩ . (5.39)

The main difference between this formulation and the one we had for discrete
systems is that now we are letting 𝑈 be a continuous function of 𝑡1 and 𝑡2 , so
that we can encode the unitary evolution of the system from any point in time
to any other point in time. This is very different than what we discussed in the
discrete case, where for example, a quantum gate is not a function of time – it is
the same quantum gate at all times.
However, this is still just a special case of the Evolution Axiom; the axiom simply
states that evolution between any two points in time must be encoded in some
unitary operator, but it will in general be a different operator for different start
and end times, so here we have explicitly encoding the different operators as one
universal function 𝑈 (𝑡2 ← 𝑡1 ).
In equation (5.39), if we assume that 𝑡2 = 𝑡1 (that is, no time has passed) then
we get
|Ψ (𝑡1 )⟩ = 𝑈 (𝑡1 ← 𝑡1 ) |Ψ (𝑡1 )⟩ . (5.40)

Since this must be true for every state |Ψ (𝑡1 )⟩ and for every time 𝑡1 , we see55
that if no time has passed, 𝑈 (𝑡1 ← 𝑡1 ) must be the identity operator:

𝑈 (𝑡1 ← 𝑡1 ) = 1, ∀𝑡1 ∈ ℝ. (5.41)


55
Recall problem 3.29!

132
Let us now assume that the system is in the state |Ψ (𝑡3 )⟩ at time 𝑡3 . Then from
equation (5.39) we must have on the one hand

|Ψ (𝑡3 )⟩ = 𝑈 (𝑡3 ← 𝑡1 ) |Ψ (𝑡1 )⟩ , (5.42)

but on the other hand

|Ψ (𝑡3 )⟩ = 𝑈 (𝑡3 ← 𝑡2 ) |Ψ (𝑡2 )⟩ = 𝑈 (𝑡3 ← 𝑡2 ) 𝑈 (𝑡2 ← 𝑡1 ) |Ψ (𝑡1 )⟩ . (5.43)

Therefore, 𝑈 must satisfy the composition property56 :

𝑈 (𝑡3 ← 𝑡1 ) = 𝑈 (𝑡3 ← 𝑡2 ) 𝑈 (𝑡2 ← 𝑡1 ) , ∀𝑡1 , 𝑡2 , 𝑡3 ∈ ℝ. (5.44)

In particular, if 𝑡3 = 𝑡1 we get

1 = 𝑈 (𝑡1 ← 𝑡1 ) = 𝑈 (𝑡1 ← 𝑡2 ) 𝑈 (𝑡2 ← 𝑡1 ) . (5.45)

Therefore we must have

𝑈 (𝑡1 ← 𝑡2 ) = 𝑈 −1 (𝑡2 ← 𝑡1 ) = 𝑈 † (𝑡2 ← 𝑡1 ) , ∀𝑡1 , 𝑡2 ∈ ℝ, (5.46)

or in other words, evolution to the past is given by the adjoint (or inverse) of the
evolution to the future, as we discussed in section 4.5.1.
We now change notation slightly by taking 𝑡1 ↦ 𝑡0 and 𝑡2 ↦ 𝑡 in equation (5.39):

|Ψ (𝑡)⟩ = 𝑈 (𝑡 ← 𝑡0 ) |Ψ (𝑡0 )⟩ . (5.47)

For any arbitrary time 𝑡, the evolution of the system from a fixed time 𝑡0 is given
by this equation. Let us take the time derivative of the equation:

d d𝑈 (𝑡 ← 𝑡0 )
|Ψ (𝑡)⟩ = |Ψ (𝑡0 )⟩ , (5.48)
d𝑡 d𝑡
where we consider |Ψ (𝑡0 )⟩ to be independent of 𝑡 since 𝑡0 is a fixed time. From
equation (5.47) we find, by multiplying both sides by 𝑈 † (𝑡 ← 𝑡0 ) from the left, that

|Ψ (𝑡0 )⟩ = 𝑈 † (𝑡 ← 𝑡0 ) |Ψ (𝑡)⟩ . (5.49)


56
This property is the reason we used the notation 𝑈 (𝑡2 ← 𝑡1 ): we wanted the 𝑡2 in 𝑈 (𝑡2 ← 𝑡1 )
and the 𝑡2 in 𝑈 (𝑡3 ← 𝑡2 ) to be adjacent. If the times were arranged from left to right, we would
have had 𝑈 (𝑡2 → 𝑡3 ) 𝑈 (𝑡1 → 𝑡2 ) which does not make it clear that the operator on the left starts
when the operator on the right ends. Note that when applying operators to a ket, the operators
always act from right to left. So in 𝑈 (𝑡3 ← 𝑡2 ) 𝑈 (𝑡2 ← 𝑡1 ) |Ψ (𝑡1 )⟩ the operator 𝑈 (𝑡2 ← 𝑡1 ) acts on
the state first, to take it to 𝑡2 , and then 𝑈 (𝑡3 ← 𝑡2 ) acts on the result, to take it to 𝑡3 .

133
We plug that into equation (5.48) and find

d d𝑈 (𝑡 ← 𝑡0 ) †
|Ψ (𝑡)⟩ = 𝑈 (𝑡 ← 𝑡0 ) |Ψ (𝑡)⟩ , (5.50)
d𝑡 d𝑡

where the time derivative only acts on 𝑈 and not on 𝑈 † . Now, let us define a new
operator 𝐻 called the Hamiltonian as follows:

d𝑈 (𝑡 ← 𝑡0 ) †
𝐻 (𝑡) ≡ i 𝑈 (𝑡 ← 𝑡0 ) . (5.51)
d𝑡
Note that 𝐻 can in general be a function of 𝑡, but it is independent of 𝑡0 , which
is why we called it 𝐻 (𝑡) and not 𝐻 (𝑡, 𝑡0 ) or 𝐻 (𝑡 ← 𝑡0 ). Also, the Hamiltonian is
Hermitian. You will prove both of these facts in problem 5.8.
In terms of the Hamiltonian, equation (5.50) becomes

d
i |Ψ (𝑡)⟩ = 𝐻 (𝑡) |Ψ (𝑡)⟩ . (5.52)
d𝑡

This equation is called the Schrödinger equation5758 .

Problem 5.8.
A. Prove that 𝐻 (𝑡) as defined in equation (5.51) is independent of 𝑡0 , thus jus­
tifying the notation 𝐻 (𝑡), as well as its use in the Schrödinger equation (5.52),
where 𝑡 is the only variable.
B. Prove that 𝐻 (𝑡) is a Hermitian operator.
57
In non­natural units, this equation features the reduced Planck constant ℏ:
d
iℏ |Ψ (𝑡)⟩ = 𝐻 (𝑡) |Ψ (𝑡)⟩ . (5.53)
d𝑡
Of course, as we discussed in section 4.1.1, ℏ is dimensionful and therefore its numerical value
doesn’t matter, so we can just choose units such as the Planck units, where it simply has the value
ℏ ≡ 1.
58
In the Schrödinger equation, a time derivative d/d𝑡 is acting on the state |Ψ (𝑡)⟩. Therefore,
one might wonder whether d/d𝑡 is an operator on the Hilbert space. However, the answer is no.
This is because here we are dealing with non­relativistic quantum mechanics, and non­relativistic
theories – both classical and quantum – treat space and time differently: while 𝑥 is an operator
(as we will see below), 𝑡 is just a label. See also footnote (68).
In this section we defined a function |Ψ(𝑡)⟩, which takes some real number 𝑡 as input, and returns
some state in the Hilbert space as output. The derivative d/d𝑡 doesn’t act on the vectors in the
Hilbert space, which is what operators do; instead, it acts on this function. Therefore, d/d𝑡 is not
an operator on the Hilbert space.
To illustrate this further, consider a system with a finite Hilbert space, such as a qubit. We can
define a function |Ψ(𝑡)⟩ which returns a particular state of the qubit given a particular point 𝑡 in
time. Then d/d𝑡 would be the derivative of that function with respect to time. But as we have seen,
operators on finite Hilbert spaces take the form of matrices acting on vectors in the space. d/d𝑡 is
not a matrix, so it is not an operator on the Hilbert space – it’s just a derivative with respect to a
label.

134
5.2.3 Time­Independent Hamiltonians

Let us now assume that the Hamiltonian is constant, that is, time­independent.
Although in some quantum systems the Hamiltonian does depend on time, this is
not very common; most quantum systems have time­independent Hamiltonians.
We can rewrite equation (5.51) as follows:

d𝑈 (𝑡 ← 𝑡0 )
= − i 𝐻𝑈 (𝑡 ← 𝑡0 ) . (5.54)
d𝑡
Compare this with equation (5.33):

d 𝐴𝑡
e = 𝐴 e𝐴𝑡 , (5.55)
d𝑡
which we derived assuming that 𝐴 is constant. If the Hamiltonian 𝐻 is constant,
then 𝐴 ≡ − i 𝐻 is also constant. In addition, we can replace 𝑡 with 𝑡 − 𝑡0 in the
exponent, since that does not change the derivative (because 𝑡0 is constant).
Hence, we see that the solution59 to the differential equation (5.54) is

𝑈 (𝑡 ← 𝑡0 ) ≡ e− i 𝐻(𝑡−𝑡0 ) . (5.56)

In other words, if we take 𝑈 (𝑡 ← 𝑡0 ) ≡ e− i 𝐻(𝑡−𝑡0 ) , then its time derivative will be


− i 𝐻𝑈 (𝑡 ← 𝑡0 ), and thus it will satisfy equation (5.54). Don’t get confused by the
notation: 𝐻 (𝑡 − 𝑡0 ) in the exponent is the constant 𝐻 times the number 𝑡 − 𝑡0 ,
not the function 𝐻 evaluated at 𝑡 − 𝑡0 !
The reason we wrote 𝑡 − 𝑡0 instead of 𝑡 in the exponential is that from equa­
tion (5.41) we have the initial condition

𝑈 (𝑡0 ← 𝑡0 ) = 1. (5.57)

This condition is indeed satisfied for 𝑈 as defined in equation (5.56), since

𝑈 (𝑡0 ← 𝑡0 ) = e− i 𝐻(𝑡0 −𝑡0 ) = e0 = 1, (5.58)

by equation (5.25). However, it would not be satisfied if we just wrote e− i 𝐻𝑡 , since


then we would have 𝑈 (𝑡0 ← 𝑡0 ) = e− i 𝐻𝑡0 ≠ 1. In general, when solving differential
equations, the solution always depends on the initial (or boundary) conditions.
We can rewrite equation (5.56) to match the notation in the beginning of sec­
tion 5.2.2 as follows:
𝑈 (𝑡2 ← 𝑡1 ) ≡ e− i 𝐻(𝑡2 −𝑡1 ) . (5.59)
59
Even if the Hamiltonian is time­dependent, it is still possible to solve the differential equation
(5.54); however, the solution is then much more complicated and involves time­ordered exponen­
tials, which we will not cover in this course.

135
The evolution operator between any two arbitrary points in time, 𝑡1 and 𝑡2 , is given
by equation (5.59).
It is interesting that, since 𝐻 is constant, the unitary evolution operator is not a
function of both 𝑡1 and 𝑡2 , but only the difference between them, 𝑡2 − 𝑡1 . So for
example, the evolution from time 𝑡1 = 3 to time 𝑡2 = 4 and from time 𝑡1 = 4 to time
𝑡2 = 5 will be given by the same unitary operator, e− i 𝐻 , since in both cases the
time difference is 𝑡2 − 𝑡1 = 1.

Problem 5.9. For the unitary operator defined in equation (5.59):


A. Verify that it satisfies the composition property (5.44).
B. Verify that it is invariant under the time shift transformation

𝑡1 ↦ 𝑡1 + 𝑡, 𝑡2 ↦ 𝑡2 + 𝑡, (5.60)

where 𝑡 ∈ ℝ.
C. Verify that under a time­reversal transformation

𝑡1 ↦ −𝑡1 , 𝑡2 ↦ −𝑡2 , (5.61)

the evolution operator is replaced with its adjoint (or inverse). Thus the evolution
equation (5.39) is invariant under time reversal if we also replace 𝑈 by its adjoint.
This is an explicit example of the time­reversal symmetry of quantum mechanics,
which we discussed in section 4.5.1.

Problem 5.10. In section 4.5.2 we discussed several unitary operators which act
on qubits. For example, the quantum 𝑍 gate is given by the Pauli matrix 𝜎𝑧

1 0
𝑍 ≡ 𝜎𝑧 = ( ), (5.62)
0 −1

and its action is to leave |0⟩ unchanged but flip the phase of |1⟩. Find the Hamil­
tonian corresponding to this unitary evolution operator. Since this is a discrete
evolution, the time coordinate is discrete and not continuous, and we can take
the time interval to be 1. In other words, you need to find the 𝐻 in the equation
𝑍 = e− i 𝐻 .

5.2.4 Hamiltonians and Energy

In problem 5.8, you proved that the Hamiltonian is a Hermitian operator. There­
fore, it should correspond to an observable. Indeed, it does; this observable is
the energy of the system. Its (real) eigenvalues 𝐸𝑖 correspond to energy eigen­

136
states |𝐸𝑖 ⟩ which, as usual, make up an orthonormal basis60 :

𝐻 |𝐸𝑖 ⟩ = 𝐸𝑖 |𝐸𝑖 ⟩ . (5.63)

This is often referred to as the time­independent Schrödinger equation, but it’s


really just an eigenvalue equation!
The basis eigenstate |𝐸𝑖 ⟩ corresponds to a measurement of 𝐸𝑖 for the energy.
There will always be a state of lowest energy, that is, a state |𝐸0 ⟩ for which the
eigenvalue 𝐸0 is the lowest among all the eigenvalues:

𝐸0 < 𝐸 𝑖 , ∀𝑖 > 0. (5.64)

Such a state is called the ground state.


As we have seen, the Hamiltonian is used to evolve continuous systems in time.
What does energy have to do with time, you ask? Well, from relativity we know
that momentum in spacetime is described by a 4­vector called the 4­momentum,
which is defined as follows:

𝐸

⎜ ⎞

𝑝
p (𝑡, 𝑥, 𝑦, 𝑧) ≡ ⎜


𝑥 ⎟
⎟. (5.65)
⎜ 𝑝𝑦 ⎟

⎝ 𝑝𝑧 ⎠

Here, 𝑝𝑥 , 𝑝𝑦 , and 𝑝𝑧 are the momenta in the 𝑥, 𝑦, and 𝑧 directions respectively. In


the first component, which is the one in the time direction, we have the energy 𝐸.
Thus energy is actually “momentum in the time direction”! Indeed, in relativity
we will often write 𝑝0 for the energy. Just like momentum moves you in space, so
does energy move you in time. This is exactly why the Hamiltonian is responsible
for evolution in time. It is also why Hamiltonians are usually time­independent –
if they are not, then energy is not conserved!

5.3 Hamiltonian Mechanics and Canonical Quantization

We have seen that in order to create a model for a specific physical system in
quantum theory, we must choose a specific Hilbert space with specific states
and specific operators. But how do we know which Hilbert space, states, and
operators to use for a given physical system? This is often a hard question to
answer. For example, we currently do not have a consistent and experimentally
verified quantum model for general relativity; the problem of finding such a model
is known as quantum gravity, and it is one of the hardest problems in physics.
60
Here we used slightly different notation than usual, with the basis eigenstates being |𝐸𝑖 ⟩ instead
of |𝐵𝑖 ⟩ and the eigenvalues being 𝐸𝑖 instead of 𝜆𝑖 – compare equation (3.142).

137
Luckily, it turns out that there is a certain prescription that allows us to take a
classical theory and turn it into a quantum theory in a straightforward way. The
properties of the classical theory will dictate the type of Hilbert space, states, and
operators we should use in the corresponding quantum theory. This process is
known as quantization. It doesn’t work for every classical theory; for example,
it doesn’t work for general relativity, which is why quantizing gravity is so hard.
However, it does work, in an experimentally verifiable way, for most classical
theories of interest.

5.3.1 A Quick Review of Classical Hamiltonian Mechanics

Classical mechanics can be reformulated using a quantity called the (classical)


Hamiltonian. This is basically the total energy of the system, usually written as
kinetic energy plus potential energy and in terms of the canonical coordinates
𝑞 and 𝑝. Here we will consider the case where 𝑞 and 𝑝 represent position and
momentum respectively, and therefore we will label them 𝑥 and 𝑝 instead.
The phase space of the system consists of all the possible values of the canonical
coordinates; for a particle, the phase space includes both “actual” space (all the
values of 𝑥) and momentum space (all the values of 𝑝).
Since we have limited time, and we are interested in quantum mechanics and
not classical mechanics, we will not go over the Hamiltonian formulation in detail.
Instead, we will just review certain important definitions and results.
The Hamiltonian is generally of the form

𝐻 = 𝑇 (𝑝) + 𝑉 (𝑥) , (5.66)

where 𝑇 is the kinetic energy, which depends only on the momentum 𝑝, and 𝑉 is
the potential energy, which depends only on the position 𝑥.
Let us consider the specific case of a single particle of mass 𝑚. In Newtonian
mechanics, the particle’s momentum is defined as

d𝑥
𝑝 ≡ 𝑚𝑣, where 𝑣 ≡ 𝑥̇ ≡ . (5.67)
d𝑡

The kinetic energy is defined as 21 𝑚𝑣2 , and we can write it in terms of the momen­
tum as follows:
1 1 2 𝑝2
𝑇 = 𝑚𝑣2 = (𝑚𝑣) = . (5.68)
2 2𝑚 2𝑚
We conclude that for a particle of mass 𝑚, the Hamiltonian will generally be of
the form
𝑝2
𝐻= + 𝑉 (𝑥) . (5.69)
2𝑚

138
The kinetic energy of a particle will always be 𝑝2 /2𝑚, but the potential energy 𝑉 (𝑥)
depends on the forces acting on the particle, such as gravity or electromagnetism.
Now, let us define the Poisson brackets of two functions 𝑓, 𝑔 of position 𝑥 and
momentum 𝑝 as follows:
𝜕𝑓 𝜕𝑔 𝜕𝑔 𝜕𝑓
{𝑓, 𝑔} ≡ − . (5.70)
𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝
In problem 5.11 you will prove some properties of these brackets; in particular
they are anti­symmetric, {𝑔, 𝑓} = − {𝑓, 𝑔} which means that {𝑓, 𝑓} = 0 for any 𝑓.
For 𝑥 and 𝑝 themselves we have

{𝑥, 𝑥} = {𝑝, 𝑝} = 0, (5.71)

and
𝜕𝑥 𝜕𝑝 𝜕𝑝 𝜕𝑥
{𝑥, 𝑝} = − = 1, (5.72)
𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝
since 𝑥 and 𝑝 are assumed to be independent variables, so their derivatives with
respect to each other vanish. Even though in Newtonian mechanics we define
the momentum to be 𝑝 ≡ 𝑚𝑥,̇ in Hamiltonian mechanics we “forget” about this
relation and just assume that 𝑥 and 𝑝 are two completely independent degrees of
freedom of the system, thus generalizing the concept of momentum to any kind
of system.
The dynamics of the system in Hamiltonian mechanics are determined as follows.
If 𝐴 is any function of 𝑥 and 𝑝, then its time derivative is given by61

d𝐴
𝐴̇ ≡ = {𝐴, 𝐻} . (5.74)
d𝑡
For 𝑥 and 𝑝 themselves, we get

d𝑥 𝜕𝑥 𝜕𝐻 𝜕𝐻 𝜕𝑥 𝜕𝐻
𝑥̇ ≡ = {𝑥, 𝐻} = − = , (5.75)
d𝑡 𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝 𝜕𝑝

d𝑝 𝜕𝑝 𝜕𝐻 𝜕𝐻 𝜕𝑝 𝜕𝐻
𝑝̇ ≡ = {𝑝, 𝐻} = − =− . (5.76)
d𝑡 𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝 𝜕𝑥
In other words, the evolution of each parameter depends on the derivative of the
Hamiltonian with respect to the other parameter. Equations (5.75) and (5.76)
are called Hamilton’s equations.
61
Here we are assuming that 𝐴 does not depend on 𝑡 explicitly, but only implicitly via its
dependence on 𝑥 and 𝑝. If 𝐴 does have explicit dependence on 𝑡, then this equation becomes
d𝐴 𝜕𝐴
= {𝐴, 𝐻} + . (5.73)
d𝑡 𝜕𝑡

139
For a point particle with Hamiltonian equation (5.69), we get

d𝑥 𝜕 𝑝2 𝑝
𝑥̇ ≡ = ( + 𝑉 (𝑥)) = , (5.77)
d𝑡 𝜕𝑝 2𝑚 𝑚

d𝑝 𝜕 𝑝2 𝜕
𝑝̇ ≡ =− ( + 𝑉 (𝑥)) = − 𝑉 (𝑥) . (5.78)
d𝑡 𝜕𝑥 2𝑚 𝜕𝑥
The first equation relates the two independent variables 𝑥 and 𝑝 to each other:
𝑝 = 𝑚𝑥.̇ Of course, this is just the definition of the momentum of a particle in
Newtonian mechanics, but Hamiltonian mechanics allows us to consider more
general systems and define a generalized momentum for any kind of system. For
example, in a rotating system 𝑝 will be the angular momentum, and so on.
The second equation is Newton’s second law: the time derivative of momentum
is the force, and the force is given by minus the derivative of the potential62 . We
can take the derivative of (5.77) and plug (5.78) into it to get

d2 𝑥 d𝑥 ̇ 1 d𝑝 1 𝜕
𝑥̈ ≡ 2
= = =− 𝑉 (𝑥) . (5.80)
d𝑡 d𝑡 𝑚 d𝑡 𝑚 𝜕𝑥
Multiplying by 𝑚, we get the familiar form of Newton’s law:

d2 𝑥 𝜕
𝐹 = 𝑚𝑎 = 𝑚𝑥̈ ≡ 𝑚 2
= − 𝑉 (𝑥) , (5.81)
d𝑡 𝜕𝑥
where 𝑎 is the acceleration.

Problem 5.11. Prove the following properties of the Poisson brackets:

• Anti­symmetry: For all functions 𝑓, 𝑔

{𝑓, 𝑔} = − {𝑔, 𝑓} . (5.82)

• Linearity: For all functions 𝑓, 𝑔, ℎ and numbers 𝛼, 𝑏

{𝛼𝑓 + 𝛽𝑔, ℎ} = 𝛼 {𝑓, ℎ} + 𝛽 {𝑔, ℎ} . (5.83)

• Leibniz rule: For all functions 𝑓, 𝑔, ℎ

{𝑓𝑔, ℎ} = 𝑓 {𝑔, ℎ} + {𝑓, ℎ} 𝑔. (5.84)


62
Here we are working in one spatial dimension, for simplicity. In the 3­dimensional case, the
force is minus the gradient of the potential:
𝜕𝑉 𝜕𝑉 𝜕𝑉
F = −∇𝑉 = − ( , , ). (5.79)
𝜕𝑥 𝜕𝑦 𝜕𝑧

140
• Jacobi identity:

{𝑓, {𝑔, ℎ}} + {𝑔, {ℎ, 𝑓}} + {ℎ, {𝑓, 𝑔}} = 0. (5.85)

5.3.2 Canonical Quantization

Recall the definition of the expectation value for the measurement of an observ­
able 𝐴 when the system is in the state |Ψ⟩:

⟨𝐴⟩ ≡ ⟨Ψ|𝐴|Ψ⟩. (5.86)

Let us take the time derivative of this, assuming that the state |Ψ⟩ depends on
time but the observable 𝐴 doesn’t (which is usually the case):

d ⟨𝐴⟩ d d
= ( ⟨Ψ|) 𝐴|Ψ⟩ + ⟨Ψ|𝐴 ( |Ψ⟩) . (5.87)
d𝑡 d𝑡 d𝑡

By the Schrödinger equation (5.52), we have

d
|Ψ⟩ = − i 𝐻 |Ψ⟩ . (5.88)
d𝑡
We can take the adjoint of this equation to get (remember that 𝐻 is Hermitian so
𝐻 = 𝐻 †)
d
⟨Ψ| = i ⟨Ψ| 𝐻. (5.89)
d𝑡
Plugging into equation (5.87), we get

d ⟨𝐴⟩
= i⟨Ψ|𝐻𝐴|Ψ⟩ − i⟨Ψ|𝐴𝐻|Ψ⟩
d𝑡
= − i⟨Ψ| (𝐴𝐻 − 𝐻𝐴) |Ψ⟩
= − i⟨Ψ| [𝐴, 𝐻] |Ψ⟩
= − i ⟨[𝐴, 𝐻]⟩ .

Comparing this with equation (5.74),

d𝐴
= {𝐴, 𝐻} , (5.90)
d𝑡
we find a very interesting result: the quantum expectation value of the observ­
able 𝐴 evolves in time just as classical Hamiltonian mechanics predicts, provided
we relate the Poisson brackets of functions and the commutator of operators as
follows:
[𝐴, 𝐻] ≡ i {𝐴, 𝐻} , (5.91)

141
or more generally for any two observables 𝐴 and 𝐵,

[𝐴, 𝐵] ≡ i {𝐴, 𝐵} . (5.92)

This is called the canonical commutation relation.


Equation (5.92) makes sense, because in problem 5.11 you proved some proper­
ties of Poisson brackets, and these properties also happen to be satisfied by the
commutator, as you proved in problems 4.29, 4.30, 4.32, and 4.33!
In particular, for 𝑥 and 𝑝 themselves, according to equation (5.72) we have {𝑥, 𝑝} =
1, so in the quantum theory we will have63

[𝑥, 𝑝] = i . (5.93)

What we have derived (or at least, motivated) here is called canonical quantiza­
tion. Given a classical system described by a Hamiltonian, we can turn it into a
quantum system – quantize it – by “promoting” classical functions on the phase
space, including the variables 𝑥 and 𝑝 themselves, to Hermitian operators. We
are not provided with any specific information about these operators, except that
they are Hermitian (which they must be, since in classical physics all variables
are real!) and that the quantum commutators should be related to the classical
Poisson brackets according to the prescription equation (5.92).
These Hermitian operators now represent observables in the quantum theory;
they have eigenstates and eigenvalues which represent possible measurement
outcomes. This means that the values of 𝑥 and 𝑝 are no longer uniquely deter­
mined from some initial conditions, as in the classical theory; they become prob­
abilistic. In addition, the time evolution of the system is no longer described by
Hamilton’s equations, but rather, by the Schrödinger equation.
Note that what we did here does not constitute a proof that all classical theories
are related to quantum theories in this way. Canonical quantization merely en­
sures that expectation values of the observables in the quantum theory evolve in
time in the same way as the observables in the classical theory, which is some­
thing that we expect to be true, but it is not by itself a sufficient condition for cre­
ating a sensible quantum theory. Indeed, there are known cases where canonical
quantization doesn’t quite work, or is at least ambiguous, because two Poisson
brackets which in the classical theory are equal to each other will have different
values in the quantum theory, generating an inconsistency64 .
Nevertheless, canonical quantization works incredibly well in the vast majority of
63
With ℏ, this equation will take the form [𝑥, 𝑝] = i ℏ.
64
There are better ways than canonical quantization to turn classical theories into quantum the­
ories, the most popular being path integral quantization. However, these alternative quantization
methods generally require much more advanced math, so we will only discuss canonical quantiza­
tion in this course.

142
cases – and indeed, most classical theories, from a single point particle to very
complicated systems with many different particles and forces, can be quantized
in this way, and the results have been verified experimentally to high precision!
Just as in the case of the Schrödinger equation, in introductory quantum mechan­
ics courses canonical quantization is usually just presented as an arbitrary axiom.
I hope I managed to motivate it and give you some intuition as to why classical
and quantum theories are related in this way.

5.4 The Harmonic Oscillator

5.4.1 The Classical Harmonic Oscillator

The quantum harmonic oscillator is the quantization of the classical harmonic


oscillator, and just like the qubit, it is a reasonably simple quantum system which
turns out to describe many different realistic physical systems, either exactly or
approximately. This makes sense, because the classical harmonic oscillator itself
describes or approximates many different classical systems!
In particular, quantum harmonic oscillators form the basis of quantum field the­
ory, which is the theory describing all of the known elementary particles, including
matter particles (such as electrons and quarks), particles which mediate funda­
mental interactions (such as photons, which mediate the electromagnetic force,
and gluons, which mediate the strong nuclear force), and others (such as the
Higgs boson).
The (simple) classical harmonic oscillator has the Hamiltonian

𝑝2 1
𝐻= + 𝑚𝜔2 𝑥2 . (5.94)
2𝑚 2

We have the standard kinetic energy term 𝑇 (𝑝) = 𝑝2 /2𝑚, where 𝑚 is the mass of
the particle, and the potential energy

1
𝑉 (𝑥) ≡ 𝑚𝜔2 𝑥2 , (5.95)
2
where 𝜔 is a numerical constant called the frequency or angular frequency, be­
cause it represents the frequency in which the oscillator oscillates.
It is easy to find the equations of motion using Hamilton’s equations (5.75) and
(5.76). Alternatively, since this is a particle with a Hamiltonian of the standard
form (5.69), we can just use Newton’s second law (5.81) directly:

d2 𝑥 1 𝜕 1 𝜕 1
2
=− 𝑉 (𝑥) = − ( 𝑚𝜔2 𝑥2 ) = −𝜔2 𝑥. (5.96)
d𝑡 𝑚 𝜕𝑥 𝑚 𝜕𝑥 2

143
To solve this differential equation, we can use the fact that

d d
cos 𝑡 = − sin 𝑡, sin 𝑡 = cos 𝑡, (5.97)
d𝑡 d𝑡
which means that
d2 d
2
cos 𝑡 = − sin 𝑡 = − cos 𝑡. (5.98)
d𝑡 d𝑡
If we replace 𝑡 by 𝜔𝑡 + 𝜙, where both 𝜔 and 𝜙 are constant (independent of 𝑡), then
since
d
(𝜔𝑡 + 𝜙) = 𝜔, (5.99)
d𝑡
we get, by the chain rule, that each derivative generates a factor of 𝜔, so

d2 d
2
cos (𝜔𝑡 + 𝜙) = −𝜔 ( sin (𝜔𝑡 + 𝜙)) = −𝜔2 cos (𝜔𝑡 + 𝜙) . (5.100)
d𝑡 d𝑡
Therefore, this differential equation has the solution:

𝑥 (𝑡) = 𝐴 cos (𝜔𝑡 + 𝜙) , (5.101)

where the integration constants 𝐴 and 𝜙 are real numbers determined by the initial
conditions. Now we see why this is called a harmonic oscillator: the position of
the particle oscillates repeatedly between +𝐴 and −𝐴 over time.

Problem 5.12. Prove that the most general solution for the classical harmonic
oscillator can also be written as

𝑥 (𝑡) = 𝐵 cos (𝜔𝑡) + 𝐶 sin (𝜔𝑡) , (5.102)

where 𝐵 and 𝐶 are integration constants, or as

𝑥 (𝑡) = 𝐷 ei 𝜔𝑡 +𝐸 e− i 𝜔𝑡 , (5.103)

where 𝐷 and 𝐸 are integration constants. All of these solutions are equivalent;
find the relationships between the integration constants {𝐴, 𝜙}, {𝐵, 𝐶}, and {𝐷, 𝐸}
– that is, write each pair in terms of another pair.

Problem 5.13. As an example of solving the equation of motion for specific initial
conditions, if the particle starts at time 𝑡 = 0 at position 𝑥 (0) = 1 with velocity
𝑥̇ (0) = 0, then we have

𝑥̇ (0) = −𝜔𝐴 sin 𝜙 = 0 ⟹ 𝜙 = 0, (5.104)

𝑥 (0) = 𝐴 = 1 ⟹ 𝐴 = 0, (5.105)

144
and thus the solution is
𝑥 (𝑡) = cos (𝜔𝑡) . (5.106)

Similarly, find a solution for the classical harmonic oscillator with the initial con­
ditions 𝑥 (0) = 0 and 𝑥̇ (0) = 𝜔.

Problem 5.14. By plugging the general solution (5.101) into the Hamiltonian
(5.94), show that the total energy of the system is

1
𝐻 = 𝑚𝜔2 𝐴2 . (5.107)
2
Thus the Hamiltonian is time­independent, and energy is conserved.

5.4.2 Quantizing the Harmonic Oscillator

Let us now quantize the simple harmonic oscillator by promoting 𝑥 and 𝑝 to opera­
tors. We are interested in finding the energy eigenstates of this quantum system.
Instead of finding them by solving a differential equation, we will use an easier
and more intuitive method. We define the ladder operators:

𝑚𝜔 i 𝑚𝜔 i
𝑎=√ (𝑥 + 𝑝) , 𝑎† = √ (𝑥 − 𝑝) , (5.108)
2 𝑚𝜔 2 𝑚𝜔

where 𝑎† is called the creation operator and 𝑎 is called the annihilation operator.
Notice that 𝑎† is indeed the adjoint of 𝑎, since the numbers 𝑚, 𝜔 are real and the
operators 𝑥, 𝑝 are Hermitian. These definitions may be inverted to get the position
and momentum operators in terms of the ladder operators:

1 𝑚𝜔 †
𝑥=√ (𝑎† + 𝑎) , 𝑝 = i√ (𝑎 − 𝑎) . (5.109)
2𝑚𝜔 2

Now, notice that

𝑚𝜔 i 𝑚𝜔 i
𝜔𝑎† 𝑎 = 𝜔√ (𝑥 − 𝑝) ⋅ √ (𝑥 + 𝑝)
2 𝑚𝜔 2 𝑚𝜔
1 i i
= 𝑚𝜔2 (𝑥 − 𝑝) (𝑥 + 𝑝)
2 𝑚𝜔 𝑚𝜔
2
1 i i i
= 𝑚𝜔2 (𝑥2 + 𝑥𝑝 − 𝑝𝑥 − ( 𝑝) )
2 𝑚𝜔 𝑚𝜔 𝑚𝜔
1 𝑝2 i
= 𝑚𝜔2 ( 2 2 + 𝑥2 + [𝑥, 𝑝])
2 𝑚 𝜔 𝑚𝜔
𝑝2 1 1
= + 𝑚𝜔2 𝑥2 + i 𝜔 [𝑥, 𝑝] .
2𝑚 2 2

145
Recall that in the classical theory we have {𝑥, 𝑝} = 1, so in the quantum theory we
have [𝑥, 𝑝] = i. Therefore:

𝑝2 1 1
𝜔𝑎† 𝑎 = + 𝑚𝜔2 𝑥2 − 𝜔. (5.110)
2𝑚 2 2
Comparing this to the Hamiltonian operator (5.94):

𝑝2 1
𝐻= + 𝑚𝜔2 𝑥2 , (5.111)
2𝑚 2
we see that we can write
1
𝐻 = 𝜔 (𝑎† 𝑎 + ) . (5.112)
2
Finally, we define a new operator called the number operator:

𝑁 = 𝑎† 𝑎. (5.113)

Now the Hamiltonian may be written as

1
𝐻 = 𝜔 (𝑁 + ) . (5.114)
2

The Hamiltonian has been simplified considerably! Since both 𝜔 and 1/2 are just
numbers, the problem of finding the eigenvalues and eigenstates of 𝐻 now re­
duces to finding the eigenvalues and eigenstates of 𝑁 .

Problem 5.15.
A. Show that 𝑁 is Hermitian.
B. Show that if |𝑛⟩ is an eigenstate of 𝑁 with the eigenvalue 𝑛, that is,

𝑁 |𝑛⟩ = 𝑛 |𝑛⟩ , (5.115)

then |𝑛⟩ is also an eigenstate of 𝐻 with the eigenvalue 𝜔 (𝑛 + 12 ).


C. Calculate, using the canonical commutation relation [𝑥, 𝑝] = i, the following
commutators:
[𝑎, 𝑎† ] = 1, [𝑁 , 𝑎† ] = 𝑎† , [𝑁 , 𝑎] = −𝑎. (5.116)

5.4.3 The Energy Eigenstates of the Harmonic Oscillator

Let |𝑛⟩ be an eigenstate of 𝑁 with eigenvalue 𝑛:

𝑁 |𝑛⟩ = 𝑛 |𝑛⟩ . (5.117)

146
Since 𝑁 is Hermitian, we know that 𝑛 must be a real number. Let us calculate the
expectation value of the observable 𝑁 with respect to the eigenstate |𝑛⟩:
2
⟨𝑁 ⟩𝑛 = ⟨𝑛|𝑁 |𝑛⟩ = ⟨𝑛|𝑎† 𝑎|𝑛⟩ = ‖𝑎𝑛‖ , (5.118)

where we used the fact that ⟨𝑛| 𝑎† is the bra of 𝑎 |𝑛⟩. On the other hand, we have

⟨𝑁 ⟩𝑛 = ⟨𝑛|𝑁 |𝑛⟩ = 𝑛⟨𝑛|𝑛⟩ = 𝑛, (5.119)

where we used equation (5.117) and the fact that the state |𝑛⟩ is normalized to
1. By comparing the two equations, we see that
2
𝑛 = ‖𝑎𝑛‖ ≥ 0, (5.120)

that is, 𝑛 is not only real but also non­negative.


Next, we act with 𝑁 𝑎 and 𝑁 𝑎† on |𝑛⟩. In problem 5.15 you showed that

𝑁 𝑎 − 𝑎𝑁 = [𝑁 , 𝑎] = −𝑎, (5.121)

𝑁 𝑎† − 𝑎† 𝑁 = [𝑁 , 𝑎† ] = 𝑎† , (5.122)

so we have

𝑁 𝑎 = 𝑎𝑁 − 𝑎 = 𝑎 (𝑁 − 1) , 𝑁 𝑎† = 𝑎† 𝑁 + 𝑎† = 𝑎† (𝑁 + 1) , (5.123)

and thus
𝑁 𝑎 |𝑛⟩ = 𝑎 (𝑁 − 1) |𝑛⟩ = (𝑛 − 1) 𝑎 |𝑛⟩ , (5.124)

𝑁 𝑎† |𝑛⟩ = 𝑎† (𝑁 + 1) |𝑛⟩ = (𝑛 + 1) 𝑎† |𝑛⟩ , (5.125)

where we used equation (5.117) and the fact that since 𝑛 ± 1 is a number, it
commutes with operators and can be moved to the left. Writing this result in a
different way, we see that

𝑁 (𝑎 |𝑛⟩) = (𝑛 − 1) (𝑎 |𝑛⟩) , (5.126)

𝑁 (𝑎† |𝑛⟩) = (𝑛 + 1) (𝑎† |𝑛⟩) , (5.127)

or in other words, 𝑎 |𝑛⟩ is an eigenstate of 𝑁 with eigenvalue 𝑛 − 1, and 𝑎† |𝑛⟩ is


an eigenstate of 𝑁 with eigenvalue 𝑛 + 1. However, by definition, the normalized
eigenstates of 𝑁 with eigenvalues 𝑛−1 and 𝑛+1 are |𝑛 − 1⟩ and |𝑛 + 1⟩ respectively.
Thus, we conclude that 𝑎 |𝑛⟩ is proportional to |𝑛 − 1⟩ and 𝑎† |𝑛⟩ is proportional to
|𝑛 + 1⟩. The proportionality factors must be chosen so that the states are normal­
2
ized. Let us therefore calculate the norms. The norm ‖𝑎𝑛‖ was already calculated

147
above:
2
‖𝑎𝑛‖ = ⟨𝑛|𝑎† 𝑎|𝑛⟩ = ⟨𝑛|𝑁 |𝑛⟩ = 𝑛. (5.128)

To calculate ‖𝑎† 𝑛‖2 , we recall from problem 5.15 that

𝑎𝑎† − 𝑎† 𝑎 = [𝑎, 𝑎† ] = 1, (5.129)

and thus
𝑎𝑎† = 𝑎† 𝑎 + 1 = 𝑁 + 1. (5.130)

We therefore get

‖𝑎† 𝑛‖2 = ⟨𝑛|𝑎𝑎† |𝑛⟩ = ⟨𝑛| (𝑁 + 1) |𝑛⟩ = ⟨𝑛|𝑁 |𝑛⟩ + ⟨𝑛|𝑛⟩ = 𝑛 + 1. (5.131)

To summarize, the norms are


√ √
‖𝑎𝑛‖ = 𝑛, ‖𝑎† 𝑛‖ = 𝑛 + 1. (5.132)

The normalized eigenstates are now obtained, as usual, by dividing by the norm:

1 1
|𝑛 − 1⟩ = √ 𝑎 |𝑛⟩ , |𝑛 + 1⟩ = √ 𝑎† |𝑛⟩ . (5.133)
𝑛 𝑛+1

Another way to write this, from a different point of view, is as the action of the
operators 𝑎 and 𝑎† on the state |𝑛⟩:
√ √
𝑎 |𝑛⟩ = 𝑛 |𝑛 − 1⟩ , 𝑎† |𝑛⟩ = 𝑛 + 1 |𝑛 + 1⟩ . (5.134)

We see that 𝑎 reduces the energy eigenvalue by 1, while 𝑎† increases the energy
eigenvalue by 1. In other words, 𝑎† gets us to the state of next higher energy
(it “creates one quantum of energy”) while 𝑎 gets us to the state of next lower
energy (it “annihilates one quantum of energy”). This is the reason we called 𝑎†
the creation operator and 𝑎 the annihilation operator. We call them the ladder
operators because they let us “climb the ladder” of energy eigenstates.
Going back to the definition of the Hamiltonian in terms of the number operator,
we see that
1
𝐻 |𝑛⟩ = 𝜔 (𝑛 + ) |𝑛⟩ , (5.135)
2
and thus, as you proved in problem 5.15, |𝑛⟩ is an energy eigenstate with eigen­
value
1
𝐸𝑛 ≡ 𝜔 (𝑛 + ) . (5.136)
2
In particular, since we showed above that 𝑛 must be non­negative, and since we
now also see that it has to be an integer (as it can only be increased or decreased

148
by 1!), the possible eigenstates are found to be

|0⟩ , |1⟩ , |2⟩ , |3⟩ , … . (5.137)

We found that the energy of the quantum harmonic oscillator is discrete, or quan­
tized, and the system can only have energy which differs from 𝜔/2 by equal steps
of 𝜔. The state of lowest energy, also called the ground state, is |0⟩. It has the
energy eigenvalue
1
𝐸0 = 𝜔. (5.138)
2
If we act on the ground state with the annihilation operator, we get

𝑎 |0⟩ = 0, (5.139)

which is not a state, because it has norm 0 and cannot be normalized. This means
that we cannot generate states with energy lower than that of the ground state.
If we act on |0⟩ with the creation operator, we get

𝑎† |0⟩ = |1⟩ . (5.140)

We say that 𝑎† , which takes us from |0⟩ to |1⟩, excites the harmonic oscillator from
the ground state to the first excited state, which has exactly one “quantum” of
energy. The state |𝑛⟩ has exactly 𝑛 quanta, while the ground state |0⟩ has no
quanta.
As we mentioned above, the quantum harmonic oscillator may be used to de­
scribe many different physical systems. In quantum field theory, the operator 𝑁
corresponds to the number of particles excited from the field. So |0⟩ is the vacuum
state, or a state with no particles65 ; |1⟩ is a state where one particle has been ex­
cited from the field (e.g. one photon has been excited from the electromagnetic
field); |2⟩ is a state with two particles; and so on.

Problem 5.16. Prove that 𝑛


(𝑎† )
|𝑛⟩ = √ |0⟩ . (5.141)
𝑛!
This means that, once we know the ground state, we can create any energy
eigenstate by simply applying 𝑛 times the operator 𝑎† and normalizing.

Problem 5.17. Find ⟨𝑉 ⟩ for the harmonic oscillator given that the system is in
the energy eigenstate |𝑛⟩. How is the potential energy related to the total energy?
65
Notice that the vacuum state, despite having no particles, still has non­zero energy 𝜔/2! This
is called zero­point energy, and it is simply the energy of the field itself.

149
5.5 Wavefunctions, Position, and Momentum

5.5.1 The Position Operator

When canonically quantizing a particle, the position function 𝑥 is promoted to


a Hermitian position operator. We usually denote this operator with a hat, 𝑥,̂
to distinguish it from its eigenvalues, which are confusingly also written as 𝑥.
Even more confusingly, we denote the position eigenstate corresponding to a
measurement of position 𝑥 as |𝑥⟩:

𝑥̂ |𝑥⟩ = 𝑥 |𝑥⟩ , 𝑥 ∈ ℝ. (5.142)

As usual, since 𝑥̂ is a Hermitian operator, its eigenstates |𝑥⟩ form an orthonor­


mal basis66 . Recall that for an orthonormal basis |𝐵𝑖 ⟩, 𝑖 ∈ {1, … , 𝑛} in a finite­
dimensional Hilbert space, the orthonormality condition is given by equation (3.53):

0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (5.143)
1 if 𝑖 = 𝑗.

The Kronecker delta 𝛿𝑖𝑗 has the property that, when evaluated inside a sum over
an index 𝑖, it “chooses” the term in the sum with index 𝑗:
𝑛
∑ 𝑓𝑖 𝛿𝑖𝑗 = 𝑓𝑗 , (5.144)
𝑖=1

where 𝑓𝑖 represents the terms to be summed upon. You don’t actually need to
evaluate the sum, since all of the terms with 𝑖 ≠ 𝑗 vanish, and you are left with
just one term, the one with 𝑖 = 𝑗.
The infinite­dimensional version of this is that for two basis states |𝑥⟩ and |𝑥′ ⟩,
where 𝑥, 𝑥′ ∈ ℝ, we have
⟨𝑥|𝑥′ ⟩ = 𝛿 (𝑥 − 𝑥′ ) , (5.145)

where 𝛿 (𝑥 − 𝑥′ ) is the Dirac delta function. This function is zero everywhere except
when 𝑥 = 𝑥′ , in which case it is divergent. More precisely, the Dirac delta isn’t
actually a function, it is a distribution, which basically means it is only well­defined
when used inside an integral. For any function 𝑓, the Dirac delta satisfies the
condition
+∞
∫ 𝑓 (𝑥) 𝛿 (𝑥 − 𝑥′ ) d𝑥 = 𝑓 (𝑥′ ) . (5.146)
−∞

In other words, when evaluated inside an integral over a variable 𝑥, the delta
function 𝛿 (𝑥 − 𝑥′ ) simply “chooses” the value of the integrand for which 𝑥 = 𝑥′ . This
66
Our Hilbert space is now infinite­dimensional, and a rigorous discussion of such a space requires
dealing with many mathematical subtleties, but we will mostly ignore them in this course due to
lack of time.

150
is simply a generalization the property of the Kronecker delta in equation (5.144).
You don’t need to evaluate the integral, since all of the terms with 𝑥 ≠ 𝑥′ vanish,
and you are left with just one term, the one with 𝑥 = 𝑥′ .
Problem 5.18. Prove the following properties of the Dirac delta function:
A. +∞
∫ 𝑓 (𝑥) 𝛿 (𝑥) d𝑥 = 𝑓 (0) . (5.147)
−∞

B. +∞
∫ 𝛿 (𝑥) d𝑥 = 1. (5.148)
−∞

C.
𝛿 (𝑥) = 𝛿 (−𝑥) . (5.149)

D.
1
𝛿 (𝜆𝑥) =𝛿 (𝑥) , 𝜆 ∈ ℝ. (5.150)
|𝜆|
Problem 5.19. Let us define the Heaviside step function:

⎧0 𝑥 < 0,
{
{
Θ (𝑥) ≡ ⎨ 12 𝑥 = 0, (5.151)
{
{1
⎩ 𝑥 > 0.

Prove that
d
Θ (𝑥) = 𝛿 (𝑥) , (5.152)
d𝑥
where 𝛿 (𝑥) is the Dirac delta function.

5.5.2 Wavefunctions in the Position Basis

Since |𝑥⟩ is an orthonormal eigenbasis, we should be able to write down any state
|Ψ⟩ as a linear combination – or superposition – of the basis eigenstates. Let us
recall that in the finite­dimensional case, with a finite basis |𝐵𝑖 ⟩, we have
𝑛
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (5.153)
𝑖=1

In section 3.2.7 we said that ⟨𝐵𝑖 |Ψ⟩ – the probability amplitudes – are the coor­
dinates of the representation of the vector |Ψ⟩ with respect to the basis |𝐵𝑖 ⟩, and
they can be collected into an 𝑛­dimensional vector:

⟨𝐵1 |Ψ⟩

⎜ ⎞

|Ψ⟩ ∣ ≡ ⎜ ⋮ ⎟. (5.154)
𝐵 ⎝ ⟨𝐵𝑛 |Ψ⟩ ⎠

151
In the infinite­dimensional case, we simply replace the sum with an integral (and
optionally add time dependence, since we now have a continuous time coordi­
nate):
+∞
|Ψ (𝑡)⟩ = ∫ |𝑥⟩ ⟨𝑥 | Ψ (𝑡)⟩ d𝑥. (5.155)
−∞

In this case, ⟨𝑥|Ψ (𝑡)⟩ are the coordinates of the representation of the vector |Ψ (𝑡)⟩
with respect to the basis |𝑥⟩. Since there is one coordinate for each real number
𝑥, we cannot collect them into a vector; instead, we define a function:

𝜓 (𝑡, 𝑥) ≡ ⟨𝑥 | Ψ (𝑡)⟩ . (5.156)

The complex­valued function 𝜓 (𝑡, 𝑥), which returns the probability amplitude to
measure the particle at position 𝑥 at time 𝑡, is called the wavefunction.
Given a wavefunction 𝜓 (𝑡, 𝑥), the probability density to find the particle at position
𝑥 at time 𝑡 is given by the magnitude squared of the probability amplitude:
2 2
|𝜓 (𝑡, 𝑥)| = |⟨𝑥 | Ψ (𝑡)⟩| . (5.157)

The reason this is a probability density, and not a probability, is that continuous
probability distributions behave a bit differently than discrete ones. The probabil­
ity to find the particle somewhere in the real interval [𝑎, 𝑏] ⊂ ℝ at time 𝑡 is given
by the integral
𝑏
2
∫ |𝜓 (𝑡, 𝑥)| d𝑥. (5.158)
𝑎

If 𝑎 = 𝑏, then the integral evaluates to zero. This means that the probability to
find a particle at any one specific point 𝑥 is actually zero! A set containing just
one point, or even a countable number of discrete points, is a set of Lebesgue
measure zero, which means it has no length. It only makes sense to talk about
finding a particle inside an interval such as [𝑎, 𝑏] with 𝑎 ≠ 𝑏, which has non­zero
Lebesgue measure and thus non­zero length.
Also, instead of the probabilities summing to 1, we must demand that the integral
of the probability densities over the entire real line evaluates to 1:
+∞
2
∫ |𝜓 (𝑡, 𝑥)| d𝑥 = 1. (5.159)
−∞

This makes sense, because there is 100% probability to find the particle some­
where on the real line, that is, inside the interval (−∞, +∞).
Using the wavefunction 𝜓 (𝑡, 𝑥) = ⟨𝑥 | Ψ (𝑡)⟩, we can rewrite equation (5.155) as
follows:
+∞
|Ψ (𝑡)⟩ = ∫ 𝜓 (𝑡, 𝑥) |𝑥⟩ d𝑥. (5.160)
−∞

152
If we are given a state |Ψ (𝑡)⟩, we can use equation (5.156) to convert it to a
wavefunction, and conversely, if we are given a wavefunction 𝜓 (𝑡, 𝑥), we can use
equation (5.160) to convert it to a state. This is, of course, a consequence of
the wavefunction being a representation of the state in a specific basis. For this
reason, you will sometimes hear the term “wavefunction” used as a synonym for
“state”; for systems where a wavefunction description exists, such as a quantized
particle, these two descriptions are equivalent.
However, it should be noted that wavefunctions are not fundamental entities in
modern quantum theory. The fundamental entities are the states, since any quan­
tum system has states, but only some systems have wavefunctions. For example,
there is no wavefunction for a qubit, since there are no continuous variables with
respect to which the wavefunction can be defined67 . Even for systems that do
have wavefunctions, the description using states is more general, since a state is
independent of a basis, while a wavefunction is only defined in a particular basis.
Next, recall the completeness relation (3.81):
𝑛
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 | = 1. (5.161)
𝑖=1

We can use equation (5.155) to derive an infinite­dimensional analogue. We


simply note that |Ψ (𝑡)⟩ does not explicitly depend on the variable 𝑥, so it can
actually be taken out of the integral, and we get:
+∞
|Ψ (𝑡)⟩ = (∫ |𝑥⟩ ⟨𝑥| d𝑥) |Ψ (𝑡)⟩ , (5.162)
−∞

from which we get the infinite­dimensional completeness relation


+∞
∫ |𝑥⟩ ⟨𝑥| d𝑥 = 1. (5.163)
−∞

This relation allows us to define an explicit inner product between states on our
67
This is why, in the discussion of the Measurement Axiom, I used the term “collapse” rather than
the more popular “wavefunction collapse”. Qubits also collapse, but they do not have wavefunc­
tions!

153
infinite­dimensional Hilbert space as follows:
+∞
⟨Ψ (𝑡) |Φ (𝑡′ )⟩ = ⟨Ψ (𝑡) | (∫ |𝑥⟩ ⟨𝑥| d𝑥) |Φ (𝑡′ )⟩
−∞
+∞
=∫ ⟨Ψ (𝑡) |𝑥⟩⟨𝑥|Φ (𝑡′ )⟩d𝑥
−∞
+∞
=∫ 𝜓∗ (𝑡, 𝑥) 𝜙 (𝑡′ , 𝑥) d𝑥,
−∞

where 𝜓∗ (𝑡, 𝑥) ≡ ⟨Ψ (𝑡) |𝑥⟩ is the complex conjugate of the wavefunction for |Ψ (𝑡)⟩
defined in equation (5.156) (since as usual, switching the order of states in the
inner product turns it into its complex conjugate), and 𝜙 (𝑡′ , 𝑥) ≡ ⟨𝑥|Φ (𝑡′ )⟩ is the
wavefunction for the state |Φ (𝑡′ )⟩.
This is really nothing more than the familiar inner product we defined all the way
back in section 3.2.2, except instead of summing on the components of a vector,
we are integrating on the values of a function! The vector in the discrete case
was the representation of the state in a particular basis (such as the standard
basis), while the function in the continuous case is also the representation of the
state in a particular basis, in this case the position basis.
Now we can see that the normalization condition in equation (5.159) simply says
that the norm of a state has to be 1, as usual:

+∞
2
‖Ψ (𝑡)‖ ≡ √⟨Ψ (𝑡) |Ψ (𝑡)⟩ = √∫ |𝜓 (𝑡, 𝑥)| d𝑥 = 1. (5.164)
−∞

Problem 5.20. The expectation value of the position, given that the state of the
system is |Ψ (𝑡)⟩, is defined as usual by

⟨𝑥⟩ ≡ ⟨Ψ (𝑡) |𝑥|Ψ


̂ (𝑡)⟩. (5.165)

By inserting the completeness relation (5.163), show that, in terms of the wave­
function 𝜓 (𝑡, 𝑥), the expectation value of 𝑥 is
+∞
2
⟨𝑥⟩ = ∫ 𝑥 |𝜓 (𝑡, 𝑥)| d𝑥. (5.166)
−∞

Problem 5.21. Let 𝑉 (𝑥) be an arbitrary smooth function of 𝑥. When we promote


𝑥 into an operator, 𝑉 (𝑥) becomes the operator 𝑉 (𝑥). ̂ (For example, if 𝑉 (𝑥) = 𝑥2 ,
then 𝑉 (𝑥)̂ is the operator 𝑥2̂ .) By expanding 𝑉 (𝑥)̂ in a Taylor series, show that |𝑥⟩
is an eigenstate of 𝑉 (𝑥)̂ with eigenvalue 𝑉 (𝑥):

𝑉 (𝑥)̂ |𝑥⟩ = 𝑉 (𝑥) |𝑥⟩ . (5.167)

154
As a corollary, show that

⟨𝑥| 𝑉 (𝑥)̂ |Ψ (𝑡)⟩ = 𝑉 (𝑥) 𝜓 (𝑡, 𝑥) . (5.168)

Exercise 5.22. A wavefunction is given by


2
𝜓 (𝑡, 𝑥) = 𝐴 e−𝑥 , 𝐴 ∈ ℂ. (5.169)

Find a value of 𝐴 for which the wavefunction is properly normalized, that is, equa­
tion (5.159) is satisfied. Then, calculate the expectation value ⟨𝑥⟩ for this wave­
function.

5.5.3 The Momentum Operator

When we canonically quantize a particle, in addition to the position operator, we


also promote the momentum function to a Hermitian momentum operator 𝑝.̂ This
operator has momentum eigenstates |𝑝⟩, which correspond to measurements of
momentum 𝑝:
𝑝̂ |𝑝⟩ = 𝑝 |𝑝⟩ . (5.170)

Everything that we discussed in the previous two sections also applies to the
momentum operator and its eigenstates – simply replace 𝑥 with 𝑝. This also
includes the wavefunction, which can be represented in the momentum basis as

𝜓 (𝑡, 𝑝) ≡ ⟨𝑝 | Ψ (𝑡)⟩ . (5.171)

Now, let us recall that in section 5.2 we found out that the unitary operator re­
sponsible for shifts in time can be written as the exponential of the Hamiltonian.
This can be written in slightly different notation

e− i 𝐻𝑡0 |Ψ (𝑡)⟩ = |Ψ (𝑡 + 𝑡0 )⟩ . (5.172)

From this relation, we derived the Schrödinger equation (5.52), which tells us
that the Hamiltonian – the Hermitian operator corresponding to energy – acts on
states as a time derivative:
d
𝐻 |Ψ (𝑡)⟩ = i |Ψ (𝑡)⟩ . (5.173)
d𝑡
Since the energy is just the momentum in the time direction, we expect, in anal­
ogy, that the momentum operator will act on states as a derivative with respect to
position, and that its exponential will translate states in space. However, here
we encounter a complication: in non­relativistic quantum mechanics, time is con­
sidered to be just a label on the states |Ψ (𝑡)⟩, while position is the eigenvalue

155
of the position operator68 . Due to this complication, we won’t give the derivation
here, but simply state the result:

𝜕 𝜕
⟨𝑥|𝑝|Ψ
̂ (𝑡)⟩ = − i ⟨𝑥 | Ψ (𝑡)⟩ = − i 𝜓 (𝑡, 𝑥) . (5.174)
𝜕𝑥 𝜕𝑥
This means that the representation of the momentum operator in the position
basis is given by the derivative with respect to position (times − i, which is a con­
vention). This result will be very useful in section 5.6, when we discuss solution
to the Schrödinger equation. Equation equation (5.174) is often written simply
as
𝜕
𝑝̂ = − i , (5.175)
𝜕𝑥
but this actually incorrect (or at the very least, serious abuse of notation), since
the momentum operator is an abstract operator, and only becomes a derivative
when represented in the position basis!
By exponentiating the momentum operator, we get the translation operator e− i 𝑝𝑎̂ ,
a unitary operator (as it has to be, since it must preserve norms) which translates
position eigenstates a distance 𝑎 in space:

e− i 𝑝𝑎̂ |𝑥⟩ = |𝑥 + 𝑎⟩ . (5.176)

By taking the adjoint of this expression and acting on a state |Ψ (𝑡)⟩, we get

⟨𝑥| ei 𝑝𝑎̂ |Ψ (𝑡)⟩ = ⟨𝑥 + 𝑎|Ψ (𝑡)⟩ = 𝜓 (𝑡, 𝑥 + 𝑎) . (5.177)

Therefore, the translation operator translates not only position eigenstates but
also wavefunctions.

Problem 5.23. Calculate the expectation value of the momentum, ⟨𝑝⟩, given that
the state of the system is |Ψ (𝑡)⟩, in terms of the wavefunction 𝜓 (𝑡, 𝑥).

5.5.4 Quantum Interference

Let us consider the double­slit experiment, which we discussed all the way back
in section 2.1.3. Schematically, the particle’s state can be described as a super­
position of passing through slit 𝐴 and passing through slit 𝐵:
2 2
|Ψ⟩ = 𝑎 |Ψ𝐴 ⟩ + 𝑏 |Ψ𝐵 ⟩ , |𝑎| + |𝑏| = 1. (5.178)
68
This is, in fact, a big problem when trying to combine quantum mechanics with special relativity,
since relativity merges space and time into a 4­dimensional spacetime, and this means space and
time must be treated on equal footing. However, we won’t go into that here. See also footnote (58).

156
We suppress the time dependence here, for brevity. The probability amplitude to
measure the particle at the position 𝑥 is given by

𝜓 (𝑥) ≡ ⟨𝑥|Ψ⟩ = 𝑎⟨𝑥|Ψ𝐴 ⟩ + 𝑏⟨𝑥|Ψ𝐵 ⟩ ≡ 𝑎𝜓𝐴 (𝑥) + 𝑏𝜓𝐵 (𝑥) . (5.179)

The probability density is then, as usual, the magnitude squared of the amplitude:
2 2
|𝜓 (𝑥)| = |𝑎𝜓𝐴 (𝑥) + 𝑏𝜓𝐵 (𝑥)|
= (𝑎∗ 𝜓𝐴

(𝑥) + 𝑏∗ 𝜓𝐵

(𝑥)) (𝑎𝜓𝐴 (𝑥) + 𝑏𝜓𝐵 (𝑥))
= 𝑎∗ 𝑎𝜓𝐴

(𝑥) 𝜓𝐴 (𝑥) + 𝑏∗ 𝑏𝜓𝐵

(𝑥) 𝜓𝐵 (𝑥) + 𝑎∗ 𝑏𝜓𝐴

(𝑥) 𝜓𝐵 (𝑥) + 𝑏∗ 𝑎𝜓𝐵

(𝑥) 𝜓𝐴 (𝑥)
2 2 2 2
= |𝑎| |𝜓𝐴 (𝑥)| + |𝑏| |𝜓𝐵 (𝑥)| + 2 Re (𝑎∗ 𝑏𝜓𝐴

(𝑥) 𝜓𝐵 (𝑥)) .

2 2 2 2
The terms |𝑎| |𝜓𝐴 (𝑥)| and |𝑏| |𝜓𝐵 (𝑥)| are always positive, for any 𝑥. However,
the third term 2 Re (𝑎∗ 𝑏𝜓𝐴∗
(𝑥) 𝜓𝐵 (𝑥)), called the interference term or sometimes
the cross term (because it “crosses” 𝜓𝐴 and 𝜓𝐵 ), is a real number which can be
either positive or negative, depending on the specific values of 𝑎 and 𝑏, as well as

the specific position 𝑥 in which 𝜓𝐴 (𝑥) and 𝜓𝐵 (𝑥) are calculated.
The interference term will either increase or decrease the probability to find the
particle at 𝑥. If it increases the probability, this is constructive interference, and
it if decreases the probability, this is destructive interference. This is precisely
what is responsible for the interference pattern in the double­slit experiment,
illustrated in figure 2.5; for different values of 𝑥, there will be different amounts
of constructive and destructive interference.

5.6 Solutions of the Schrödinger Equation

5.6.1 The Schrödinger Equation for a Particle

Recall the Schrödinger equation (5.52):

d
i |Ψ (𝑡)⟩ = 𝐻 |Ψ (𝑡)⟩ . (5.180)
d𝑡
For a particle, we have the Hamiltonian (5.69):

𝑝2
𝐻= + 𝑉 (𝑥) . (5.181)
2𝑚
Therefore, the Schrödinger equation becomes

d 𝑝̂ 2
i |Ψ (𝑡)⟩ = ( + 𝑉 (𝑥))
̂ |Ψ (𝑡)⟩ , (5.182)
d𝑡 2𝑚

157
where we promoted the position and momentum to operators. To find the rep­
resentation of this equation in the position basis, we multiply by ⟨𝑥| from the
left:
d 𝑝̂2
⟨𝑥| i |Ψ (𝑡)⟩ = ⟨𝑥| ( + 𝑉 (𝑥))
̂ |Ψ (𝑡)⟩ . (5.183)
d𝑡 2𝑚
On the left­hand side, since the position eigenstate |𝑥⟩ is independent of time, we
can move the time derivative out of the inner product:

d d d
⟨𝑥| i |Ψ (𝑡)⟩ = i ⟨𝑥 | Ψ (𝑡)⟩ = i 𝜓 (𝑡, 𝑥) . (5.184)
d𝑡 d𝑡 d𝑡
On the right­hand side, since in the position representation we have

𝜕
𝑝̂ = − i , (5.185)
𝜕𝑥
the first term will be
2
𝑝̂2 1 𝜕
⟨𝑥| |Ψ (𝑡)⟩ = (− i ) 𝜓 (𝑡, 𝑥)
2𝑚 2𝑚 𝜕𝑥
1 𝜕 𝜕
= (− i ) (− i ) 𝜓 (𝑡, 𝑥)
2𝑚 𝜕𝑥 𝜕𝑥
1 𝜕2
=− 𝜓 (𝑡, 𝑥) .
2𝑚 𝜕𝑥2
As for the second term, in problem 5.21 you showed that

⟨𝑥| 𝑉 (𝑥)̂ |Ψ (𝑡)⟩ = 𝑉 (𝑥) 𝜓 (𝑡, 𝑥) . (5.186)

In total, we get:
d 1 𝜕2
i 𝜓 (𝑡, 𝑥) = (− + 𝑉 (𝑥)) 𝜓 (𝑡, 𝑥) . (5.187)
d𝑡 2𝑚 𝜕𝑥2
This is the Schrödinger equation in the position basis. It is a concrete differential
equation that one can solve for a variety of different potentials 𝑉 (𝑥).

Problem 5.24. In problem 5.20, you showed that


+∞
2
⟨𝑥⟩ = ∫ 𝑥 |𝜓 (𝑡, 𝑥)| d𝑥, (5.188)
−∞

and in problem 5.23, you calculated ⟨𝑝⟩. Using equation (5.187), show that

d ⟨𝑥⟩
⟨𝑝⟩ = 𝑚 . (5.189)
d𝑡

You will have to use integration by parts, and assume69 that 𝜓 (𝑡, 𝑥) → 0 as 𝑥 → ±∞.
69
This is pretty much always assumed to be true about wavefunctions in quantum mechanics.

158
This shows that the expectation values of the position and momentum in the
quantum theory satisfy the same relation as the position and momentum in the
classical theory. Similarly, show that

d ⟨𝑝⟩ 𝜕𝑉 (𝑥)
= −⟨ ⟩, (5.190)
d𝑡 𝜕𝑥

which is Newton’s second law (5.78) in terms of expectation values.

Problem 5.25. Recall the time­independent Schrödinger equation (5.63), which


is just the eigenvalue equation for the Hamiltonian:

𝐻 |𝐸𝑖 ⟩ = 𝐸𝑖 |𝐸𝑖 ⟩ . (5.191)

Let us denote the wavefunctions corresponding to the energy eigenstates as fol­


lows:
𝜓𝑖 (𝑥) ≡ ⟨𝑥|𝐸𝑖 ⟩. (5.192)

They don’t depend on 𝑡, since we are assuming the Hamiltonian doesn’t depend
on 𝑡 either, and energy is constant. Show that (for a point particle with mass 𝑚)
these wavefunctions satisfy the equation

1 𝜕2
(− + 𝑉 (𝑥)) 𝜓𝑖 (𝑥) = 𝐸𝑖 𝜓𝑖 (𝑥) . (5.193)
2𝑚 𝜕𝑥2

5.6.2 Separation of Variables

Let us assume that the wavefunction can be separated into a part which depends
only on 𝑥 and a part which depends only on 𝑡:

𝜓 (𝑡, 𝑥) = 𝜓𝑖 (𝑥) 𝜓𝑡 (𝑡) . (5.194)

By plugging this into the Schrödinger equation (5.187) and dividing by 𝜓, we


obtain the equation
i d𝜓𝑡 1 1 𝜕 2 𝜓𝑖
=− + 𝑉 (𝑥) . (5.195)
𝜓𝑡 d𝑡 2𝑚 𝜓𝑖 𝜕𝑥2
Since the left­hand side only depends on 𝑡 and the right­hand side only depends
on 𝑥, we conclude that they must in fact both be constant, that is, independent
of both 𝑡 and 𝑥 – otherwise, if for example the left­hand side was a function of 𝑡,
2
It can be justified in two ways. First, according to equation (5.159), |𝜓 (𝑡, 𝑥)| has to integrate
to 1 so that the state is normalized. Therefore, it makes sense that 𝜓 (𝑡, 𝑥) should vanish at
infinity – although, if you look hard enough (you are encouraged to try!), you can find normalized
wavefunctions which nonetheless do not vanish at infinity. Second, if we create a particle in the
lab, we would expect the probability to find this particle a trillion light years away to be very close
to zero...

159
then the right­hand side would have to be a function of 𝑡 also, in contradiction with
our assumption that it only depends on 𝑥. This is called separation of variables.
Let 𝐸𝑖 be the constant that both sides are equal to. Then we get two equations.
The first equation will just be the eigenvalue equation (5.193), which therefore
implies that 𝐸𝑖 is the energy (and thus must be real). The other equation will be

d𝜓𝑡
= − i 𝐸𝑖 𝜓𝑡 . (5.196)
d𝑡
Recalling equation (5.9), we see that the solution to equation (5.196) is simply

𝜓𝑡 = e− i 𝐸𝑖 𝑡 . (5.197)

Therefore, any separable solution to the Schrödinger equation is given by a


wavefunction of the form
𝜓 (𝑡, 𝑥) = 𝜓𝑖 (𝑥) e− i 𝐸𝑖 𝑡 . (5.198)

These are called stationary states. Since these states are energy eigenstates,
they have a well­defined energy 𝐸𝑖 .
As it turns out, since the Schrödinger equation is linear, the most general solu­
tion to the equation is a linear combination of stationary states:

𝜓 (𝑡, 𝑥) = ∑ 𝛼𝑖 𝜓𝑖 (𝑥) e− i 𝐸𝑖 𝑡 , (5.199)


𝑖

where 𝛼𝑖 ∈ ℂ are constant coefficients and 𝐸𝑖 are all the possible energy eigen­
states, of which there can be infinitely many. Of course, this is nothing other
than a superposition of energy eigenstates, represented in the position basis,
and therefore the coefficients 𝛼𝑖 are none other than the probability amplitudes
to measure each energy 𝐸𝑖 given the state |Ψ (𝑡)⟩.
In other words, the general solution to the Schrödinger equation simply amounts
to writing the state of the system as a superposition with respect to the eigenbasis
of a particular observable – the Hamiltonian. With the time dependence out of
the way, all that remains is to solve the time­independent Schrödinger equation
(5.193) for 𝜓𝑖 , and find the coefficients 𝛼𝑖 . The solution will depend on the explicit
form of the potential 𝑉 (𝑥). However, this is, of course, the hard part! Thousands
upon thousands of pages have been written in the last 100 years or so about
solutions (or even just approximations of solutions) to the Schrödinger equation
for all kinds of different potentials.
Unfortunately, our course has come to an end, and we won’t have time to work
out any specific solutions. The focus of this course has been on developing deep
intuition and conceptual understanding of quantum theory, as it is formulated in
modern 21st­century theoretical physics. For this reason, we spent the vast ma­
jority of the course developing the entire mathematical framework of the theory

160
from scratch, highlighting and debunking common misconceptions, focusing on
concepts and their meaning rather than calculations, and giving examples from
discrete systems, where the math is simple, so we could concentrate our efforts
on understanding the physics without being bogged down by the math.
Still, solving the Schrödinger equation is something every physicist should know
how to do, and in the final project of the course, presented in problem 5.28, you
will find the solutions corresponding to two simple potentials, related to scattering
and tunneling of particles in one dimension.

Problem 5.26. Show that the probability density of a stationary state, as well
as the expectation value of any observable 𝐴 with respect to that state, are inde­
pendent of 𝑡.

Exercise 5.27. A wavefunction is given at time 𝑡 = 0 by

𝜓 (0, 𝑥) = 𝛼1 𝜓1 (𝑥) + 𝛼2 𝜓2 (𝑥) . (5.200)

What is the wavefunction 𝜓 (𝑡, 𝑥) at some other time 𝑡, and what is the correspond­
ing probability density?

Problem 5.28. (Final project) You should now have all the tools needed to
solve the Schrödinger equation for particular potentials. Solve it for the following
two simple potentials:

• Finite square well – scattering:

⎧0 𝑥 < −𝑎,
{
{
𝑉 (𝑥) = ⎨−𝑉0 −𝑎 < 𝑥 < 𝑎, (5.201)
{
{0
⎩ 𝑥 > 𝑎.

• Finite square barrier – tunneling:

⎧0 𝑥 < −𝑎,
{
{
𝑉 (𝑥) = +𝑉0 −𝑎 < 𝑥 < 𝑎, (5.202)

{
{0
⎩ 𝑥 > 𝑎.

In both cases, 𝑎 and 𝑉0 are two positive numbers. Make nice plots of the potentials
and the wavefunctions. You are allowed, and even encouraged, to make use of
textbooks and online resources; however, you should write the solutions in your
own words and summarize what you learned from the results. You are also
encouraged to collaborate with classmates on this project.

161
Index
4­momentum, 137 Bra­ket notation, 23
68­95­99.7 rule, 55
Canonical commutation relation, 142
Absolute value, 21 Canonical coordinates, 138
Addition of vectors, 23 Canonical quantization, 142
Adjoint, 30 Cauchy­Schwarz inequality, 47
Algebraically closed field, 17 Central limit theorem, 55
AND gate, 101 Change­of­basis matrix, 36
Angular frequency, 143 CHSH inequality, 91
Annihilation operator, 145, 148 Classical bit, 73
Anti­commutation relation, 73 Classical gate
Anti­commutator, 73 AND, 101
Anti­Hermitian operator, 97 NOT, 101
Associative operation, 24 OR, 102
Axioms of quantum mechanics XOR, 102
Composite System Axiom, 77 Classical harmonic oscillator, 143
Evolution Axiom, 100 Classical limit, 14
Measurement Axiom (Projective), 106 Classical logic gate, 101
Measurement Axiom (Simplified), 110 Closed operation, 23, 24
Observable Axiom, 61 CNOT gate, 104
Operator Axiom, 60 Collapse, 106
Probability Axiom, 61 Collapse models, 114
State Axiom, 60 Commutation relation
System Axiom, 59 Canonical, 142
Of spin matrices, 72
Base of a logarithm, 127 Commutative operation, 24
Bell inequality, 91 Commutator, 72, 93
Bell states, 87 Complete metric space, 26
Bell’s theorem, 89 Completeness relation, 34
Bijection, 61 Infinite­dimensional case, 153
Binomial coefficients, 128 Complex 𝑛­vector, 23
Binomial theorem, 128 Complex conjugation, 19
Bit Complex numbers, 17
Classical, 73 Complex phase, 22
Quantum, 73 Complex plane, 19
Black body, 6 Complex vector space, 25
Boltzmann constant, 59 Composite system, 77
Born rule, 61 Composite System Axiom, 77
Bra, 26 Composition property, 133

162
Computational basis, 73 Energy, 136
Conditional probability, 51 Kinetic, 138
Conjugate­symmetric inner product, 27 Potential, 138
Constructive interference, 11, 157 Energy eigenstates, 137
Controlled­NOT gate, 104 Entangled state, 84
Coordinates EPR states, 87
Canonical, 138 Equivalence class of vectors, 60
Coordinates of a matrix in a basis, 44 Euler’s formula, 22, 127
Coordinates of a vector in a basis, 35 Even permutation, 73
Infinite­dimensional case, 152 Everett interpretation, 112
Copenhagen interpretation, 111 Evolution Axiom, 100
Copying operator, 117 Excited state, 149
Coulomb constant, 59 Exclusive OR, 102
COVID­19, 51 Expected (or expectation) value, 52
Creation operator, 145, 148 Of a quantum observable, 67
Cross terms, 157 Exponent, 125
Exponential
Dagger, 30 Of a matrix, 128
De Broglie–Bohm theory, 76, 92 Time­ordered, 135
And the measurement problem, 113 Exponential function, 125
Decoherence time, 116
Degenerate eigenvectors, 105 Fair coin or die, 49
Destructive interference, 11, 157 Field (algebra), 16
Determinant, 86 Fine­structure constant, 58
Determinism, 89 Finite square barrier, 161
Diagonal matrix, 45 Finite square well, 161
Diagonalizable matrix, 45 First excited state, 149
Dimensionful constants, 58 Frequency, 143
Dimensionless constants, 58 Fundamental theorem of algebra, 18
Dirac delta function, 150
Dirac notation, 23 Gate
Discriminant, 17 Classical AND, 101
Distribution (generalized function), 150 Classical NOT, 101
Distributive operation, 24 Classical OR, 102
Double­slit experiment, 8, 156 Classical XOR, 102
Dual vector, 25 Quantum CNOT, 104
Quantum Hadamard, 103
Eigenspace, 105 Quantum NOT (X), 102
Eigenstates, 61 Quantum Z, 103
Eigenvalue, 39 Gaussian distribution, 55
Eigenvector, 39 Generalized momentum, 140
Degenerate, 105 Gluon, 143

163
Gram–Schmidt process, 47 Ket, 26
Gravitational constant, 59 Kinetic energy, 138
Ground state, 137, 149 Kronecker delta, 28, 150
GRW model, 114
Ladder operators, 145, 148
Hadamard gate, 103 Lebesgue measure, 152
Hamilton’s equations, 139 Levi­Civita symbol, 72
Hamiltonian Linear combination, 27
Classical, 138 Linear inner product, 27
Quantum, 132, 134 Linearly independent, 28
Time­independent, 135 Loaded coin or die, 49
Hat notation for operators, 150 Local hidden variable theories, 89
Heaviside step function, 151 Local realism, 90
Hermitian matrix, 40 Locality, 90
Hermitian operator, 61 Logarithm, 127
Hidden variable theories, 76 Of a matrix, 129
And the measurement problem, 113 Logic gate
Local, 89 Classical, 101
Non­local, 76, 92 Quantum, 102
Higgs boson, 143 Loop quantum gravity, 59
Hilbert space, 26
Magnitude of a complex number, 21
Identity matrix, 31 Many­worlds interpretation, 112
Identity scalar, 25 Matrices inside inner products, 38
Identity vector, 24 Matrix, 30
Imaginary number, 18 Matrix anti­commutator, 73
Imaginary unit, 17 Matrix commutator, 72, 93
Inner product, 26 Matrix determinant, 86
Infinite­dimensional case, 153 Matrix exponential, 128
Integration constants, 144 Matrix logarithm, 129
Interference term, 157 Matrix product, 37
Interpretations of quantum mechanics, Mean, 52
111 Measurement Axiom (Projective), 106
Inverse matrix, 37 Measurement Axiom (Simplified), 110
Inverse vector, 24 Measurement problem, 111
Invertible matrix, 37 Model, 124
Involution, 19 Momentum eigenstates, 155
Involutory matrix, 69 Momentum operator, 155
Isomorphism, 21 Momentum space, 138
Multiplication of vector by scalar, 23
Jacobi identity, 94, 141
Joint probability, 50 Natural logarithm, 127
Newton’s second law, 140

164
No collapse, 112 Planck units, 59
No­cloning theorem, 116 Planck’s law, 7
No­communication theorem, 89 Poisson brackets, 139
No­deleting theorem, 118 Polar coordinates, 22
Non­local hidden variable theories, 76, Polarization, 74
92 Position eigenstate, 150
Norm, 26 Position operator, 150
Infinite­dimensional case, 154 Position­momentum uncertainty relation,
Normal distribution, 55 94
Normal matrix, 43 Positive­definite inner product, 27
Normalizing a vector, 29 Potential energy, 138
NOT (X) gate, 102 Power series, 125
NOT gate Probability, 49
Classical, 101 Probability amplitude, 61
Quantum, 102 Probability Axiom, 61
Number operator, 146 Probability density, 152
Probability distribution, 49
Observable, 61 Projection, 64
Observable Axiom, 61 Projective measurements, 106
Occam’s razor, 114 Projector, 105
Odd permutation, 73 Promotion of operators, 142
Operator, 60
Operator anti­commutator, 73 Quanta, 7
Operator Axiom, 60 Quantization, 12, 138, 142
Operator commutator, 72, 93 Canonical, 142
Operator exponential, 130 Path integral, 142
OR gate, 102 Quantum bit, 73
Orthogonal, 28 Quantum computer, 13
Orthonormal basis, 27 Quantum decoherence, 116
Orthonormal eigenbasis, 41 Quantum electromagnetic field, 12
Orthonormal vectors, 28 Quantum entanglement, 15
Outer product, 32 Quantum excitation, 12
Outer product representation, 44 Quantum field theory, 12, 143, 149
Quantum gate
Path integral quantization, 142 CNOT, 104
Pauli matrices, 69 Hadamard, 103
Permutation, 73 NOT (X), 102
Phase space, 138 Z, 103
Photoelectric effect, 7 Quantum gravity, 14, 124, 137
Photons, 7 Quantum harmonic oscillator, 143
Planck constant, 59 Quantum logic gate, 102
Planck length, 59 Quantum observable, 61

165
Quantum operator, 60 Stationary states, 160
Quantum state, 60 Stern­Gerlach experiment, 12
Quantum system, 59 Strong nuclear force, 143
Quantum teleportation, 118 Superposition, 15, 63
Quark, 143 Meaning of, 74
Qubit, 73 System, 59
Qubits, 13 System Axiom, 59

Random variable, 48 Tensor product, 77


Rank, 32 atan2 (𝑥, 𝑦), 22
Ray in a Hilbert space, 60 Time reversal, 136
Real 𝑛­vectors, 19 Time reversal symmetry, 101, 136
Real and imaginary parts of a complex Time shift, 136
number, 17, 19 Time­independent Hamiltonian, 135
Real interval, 152 Time­independent Schrödinger equation,
Real numbers, 16 137
Realism, 90 Time­ordered exponential, 135
Representing a matrix in a basis, 43 Translation operator, 156
Representing a vector in a basis, 35 Tunneling, 161
Infinite­dimensional case, 152 Two­state system, 69
Rotation matrix, 31
Ultraviolet catastrophe, 7
Scalar, 23 Uncertainty, 94
Scalar matrix, 38 Uncertainty principle, 15, 94
Scattering, 161 Uniform probability distribution, 49
Schrödinger equation, 131, 134 Unit scalar, 25
Separability problem, 86 Unit vector, 28
Separable state, 84 Unitary evolution, 100
Separation of variables, 160 Unitary matrix, 42
Set of measure zero, 152 Unitary transformation, 100
Shut up and calculate, 111
Simultaneous diagonalization, 97 Vacuum, 73
Singular matrix, 37 Vacuum state, 149
Span, 27 Variance, 53
Spin, 12, 70 Vector space, 23
Spin singlet, 93 Wave­particle duality, 11
Spooky action at a distance, 92 Wavefunction, 152
Standard basis, 29 Wavefunction collapse, 110
Standard deviation, 53
State, 60 X (NOT) gate, 102
State Axiom, 60 XOR gate, 102
State collapse, 106
Z gate, 103

166
Zero vector, 24
Zero­point energy, 149

167

You might also like