Relativity A Modern Primer

Relativity: A Modern Primer
Kevin Han
August 30, 2021

Contents
Introduction 1
1 Special relativity and the nature of time 3

1.1 Postulates of special relativity . . . . . . . . . . . . . . . . . 4
1.2 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Natural units . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Galilean transformations . . . . . . . . . . . . . . . . . . . . 6
1.5 Lorentz transformations . . . . . . . . . . . . . . . . . . . . 7
1.6 Time dilation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Length contraction . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Velocity addition . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.11 Four-vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 Energy and momentum . . . . . . . . . . . . . . . . . . . . 21
1.13 Lightcones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.14 Wick rotation . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 The action principle 25

2.1 The action . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Multiple particles . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . 28
2.5 Noether’s theorem . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 Relativistic particles . . . . . . . . . . . . . . . . . . . . . . 32
2.7 From particle to field . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Gauge invariance . . . . . . . . . . . . . . . . . . . . . . . . 35
2.9 Fields in motion . . . . . . . . . . . . . . . . . . . . . . . . . 36
CONTENTS
2.10 The Maxwell Lagrangian . . . . . . . . . . . . . . . . . . . . 38

2.11 Charges and currents . . . . . . . . . . . . . . . . . . . . . . 41
3 The geometry of spacetime 42

3.1 Submanifolds of flat space . . . . . . . . . . . . . . . . . . . 42
3.2 The metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 A Euclidean analogy . . . . . . . . . . . . . . . . . . . . . . 48
3.4 General tensors . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . 53
3.7 Curvature: Riemann and Ricci . . . . . . . . . . . . . . . . . 55
4 General relativity 61
4.1 The geodesic equation . . . . . . . . . . . . . . . . . . . . . 61
4.2 The equivalence principle . . . . . . . . . . . . . . . . . . . 64
4.3 Fermi normal coordinates . . . . . . . . . . . . . . . . . . . 66
4.4 Local measurements . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Static spacetimes . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Gravitational redshift . . . . . . . . . . . . . . . . . . . . . . 70
4.7 Field Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8 Einstein-Hilbert action . . . . . . . . . . . . . . . . . . . . . 73
4.9 The Schwarzschild solution . . . . . . . . . . . . . . . . . . 76
4.10 Black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.11 The energy-momentum tensor . . . . . . . . . . . . . . . . . 82
4.12 Energy-momentum conservation . . . . . . . . . . . . . . . 82
4.13 T µν for particles . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.14 T µν for mass densities . . . . . . . . . . . . . . . . . . . . . 84
4.15 T µν for ideal fluids . . . . . . . . . . . . . . . . . . . . . . . 87
5 Cosmology and the expanding universe 90

5.1 Matter, radiation, and dark energy . . . . . . . . . . . . . . 91
5.2 Shape of the universe . . . . . . . . . . . . . . . . . . . . . 94
5.3 Time evolution and fate of the universe . . . . . . . . . . . . 95
Conclusion 98
CONTENTS
A Linear algebra 99
A.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2 Linear functions and matrices . . . . . . . . . . . . . . . . . 101
A.3 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . 102
A.4 Determinants and volumes . . . . . . . . . . . . . . . . . . . 103
B Lorentz transformation from moving clocks 107
C Vector calculus 110

C.1 Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
D Riemann tensor components 113

Introduction
This is the textbook I wish I had when self-studying relativity. I aim to com-
bine the best aspects of my favorite textbooks, from the clarity of Dirac’s
General Theory of Relativity to the elegance of Landau and Lifshitz’s Course
of Theoretical Physics. Some features include:
• Concise: Only about 100 pages long.
• Deep: Unlike a popular physics book, we will dive into the math.
Keep a pencil and paper handy.
• Broad: In addition to special and general relativity, it covers La-

grangian mechanics, a bit of electromagnetism, and an introduction
to cosmology.
• Pedagogical: Clear explanations with many figures and exercises.

Exercises are marked by difficulty with 1-3 stars (*). I have tried to
find the most intuitive way to understand each concept, and I antici-
pate common points of confusion based on my own experience.
Prerequisites: vector calculus and classical mechanics. Basically, a stan-

dard freshman or sophomore physics curriculum in the US. Some knowl-
edge of linear algebra and electromagnetism will also help, although Ap-
pendix A develops the necessary linear algebra.
Relativity in a nutshell
Relativity says that physics happens on a spacetime manifold, a 4D surface
analogous to a sphere or a disk, but in four dimensions instead of two. Just
as the earth looks like a 2D plane as you stand at a point, this manifold
1
2
looks like a 4D “plane” locally around each point. At each point on the
earth (except the poles), one direction is associated with changing latitude
and one with changing longitude. Similarly, in spacetime, one direction
is associated with time and three with space. The curvature of the mani-
fold affects how matter (including light) propagates on it. In turn, matter
itself curves spacetime, causing nearby matter to become attracted. This
attraction is interpreted as the force of gravity.
First, let’s discuss spacetime in the absence of curvature, known as flat
space or Minkowski space. This theory is called special relativity.
Chapter 1
Special relativity and the nature

of time
Special relativity deals with events, things that occur at a specific position
and time. Position is measured with physical rulers of a standard length.
Time is defined as what clocks measure.
First of all, what is a clock? Clocks are all around us: watches, smart-
phones, wall clocks, etc. In general, a clock is any physical system that
undergoes change. When we say that a clock somewhere runs faster or
slower, we mean that any physical process there runs faster or slower. Also,
the ideal clock considered here is point-like, meaning it is much smaller
than the standard rulers used to measure distance.
A system of clocks with rulers separating them is called a reference frame
(Fig. 1.1)* . Such a system defines coordinate axes (t, x) so that t is the
time read by the clock at position x. We will consider only one spatial
dimension x for now. When the clocks and rulers are freely moving (no
forces acting on them), the system is called an inertial reference frame (IRF).
By definition, any two IRFs move with constant relative velocity, since they
cannot be accelerating (F = ma = 0).
* In relativity, time is typically drawn on the vertical axis.
3
CHAPTER 1. SPECIAL RELATIVITY AND THE NATURE OF TIME 4
Figure 1.1: A reference frame that defines the coordinate axes t and x.
From now on, we will only draw IRFs as coordinate axes, instead of
drawing all the clocks and rulers as in Fig. 1.1.
1.1 Postulates of special relativity

The fundamental postulates of special relativity are:
1. The laws of physics are the same in all IRFs.

Consider some physical equation that uses the coordinates (t, x) de-
fined by an IRF. It must remain the same when written using the
coordinates (t0 , x0 ) defined by a different IRF. This property is called
Lorentz covariance. Don’t worry if this seems vague — we will see
many examples later.
2. The speed of light is a constant (c ≈ 3 × 108 m/s) in all IRFs.

What is light? Light can be thought of as either an electromagnetic
wave or a particle (the photon). The particle description is better
for our current purpose. Then the second postulate simply says there
is some particle that always travels at speed c in all IRFs. We will
sometimes call this particle a light ray, borrowing the classical optics
term.
1.2 Locality
So far, IRFs seem clunky and useless. What do we need so many clocks for?
Conceptually, we need clocks at every position because relativity is based on
local measurements: we can only measure time at position x1 using a clock
at x1 , not a faraway clock at position x2 . In classical (non-relativistic)*
physics, we can use any clock in any IRF to measure time, since time is
globally shared among all objects.
When two clocks are at the same location, we can set them to the same
time, and they remain synchronized. However, in order to set up an IRF,
we must then move a clock from one location to another. How do we
guarantee they remain synchronized? More precisely: how do we define
synchronization between clocks in different locations? The constant speed
of light comes to the rescue here. We can synchronize clocks at different
locations by sending light rays between them and using ∆t = ∆x/c, where
∆x is the known distance between clocks. For example, the clock at the
origin (t = 0, x = 0) could send light rays in both directions. When another
clock at x receives this, it could adjust its time to |x|/c. This proceeds until
all clocks are synchronized and the IRF is completely “formed”.
All interactions between particles and fields must also be local in space-
time. More on this in Sec. 2.6.
1.3 Natural units

Given the constant speed of light, we will set c ≈ 3 × 108 m/s = 1 from now
on (so-called natural units). This allows us to omit factors of c from all
formulas, so we avoid having to keep track of it. If this makes you uneasy,
you can think of c as merely a conversion factor between distance and time,
which can be restored at the end of the calculation to obtain the right units.
For example, a time calculation may give 3 × 108 m, then dividing by c gives
* We will use “classical” to mean “non-relativistic”. Elsewhere, “classical” often means
“non-quantum”. Since we do not cover quantum physics here, we do not need to make
this distinction.
1 s. There is nothing special about this factor other than convenience. We

could also set 1 m/s = 1, for example. Then c ≈ 3 × 108 , which would be
an ugly number to include in our formulas.
* Exercise 1.1
As we will see later, the relativistic energy E and momentum p~ of a
particle are related to its mass m as:
E 2 − p~2 = m2 . (1.1)
Restore the factors of c in this equation. Recall that energy in S.I. units
is measured in [J] = [kg · m2 /s2 ], momentum in [kg · m/s], and mass in
[kg].
In general, for a system of n independent units, we can set n−1 indepen-

dent constants to 1 and still convert a quantity to the right unit at the end.
For example, if we add mass to our unit system, we have mass, length, and
time. Then, in quantum physics, we typically also set the reduced Planck’s
constant ~ ≈ 1.05 × 10−34 J · s = 1. (We will not do so here.)
1.4 Galilean transformations

Consider an IRF I and another IRF I 0 moving with velocity v > 0 relative
to I. I has coordinates (t, x), and I 0 has coordinates (t0 , x0 ). Since all the
clocks and rulers in an IRF are identical, we are free to choose the origin, so
we take I and I 0 to share a common origin t = t0 = 0, x = x0 = 0. Given an
event (t, x) measured in I, what are the time and position (t0 , x0 ) measured
in I 0 ?
Let us use the notation X ≡ (t, x)T for the column vector of coordinates.
Write X = f (X 0 ) for the function relating coordinates X to X 0 . Since f (X 0 )
must take straight lines into straight lines, it must be linear:
0
t Λ11 Λ12 t
= , (1.2)
x Λ21 Λ22 x0
or in matrix-vector notation:
X = Λ(v)X 0 . (1.3)

Λ11 Λ12
We are trying to find the matrix Λ(v) = , which only depends
Λ21 Λ22
on the relative velocity v.
First, let’s answer this in classical mechanics. The clock with constant
x0 = 0 moves along the path x = vt. We simply have: t = t0 , x = x0 + vt0 . In
matrix form: 0
t 1 0 t
= . (1.4)
x v 1 x0
This is called a Galilean transformation (Fig. 1.2).
In relativity, it turns out that t will also depend on x0 and v, so time is
no longer a globally shared coordinate among IRFs.
Figure 1.2: Relation between coordinates (t, x) and (t0 , x0 ), in classical me-
chanics. The grey shaded region is the square {|t| < a, |x| < a}, for some
constant a. The blue shaded region is {|t0 | < a, |x0 | < a}.
1.5 Lorentz transformations

In relativity, the formula relating (t, x) to (t0 , x0 ) is called a Lorentz transfor-
mation, or also a boost. It is graphically shown in Fig. 1.3. We will derive
it by finding the eigenvectors and eigenvalues (Sec. A.3) of Λ(v), using the
constant speed of light. Appendix B contains an alternative derivation that
may be easier to follow for some readers.
A light ray sent from the origin follows t = ±x in IRF I. In I 0 , it must also
follow t0 = ±x0 since the speed of light is constant. Thus, the eigenvectors
of Λ(v) are
1 1 1 1
ŵ1 = √ , ŵ2 = √ . (1.5)
2 1 2 −1
They are normalized so that ŵ1T ŵ1 = ŵ2T ŵ2 = 1. ŵ1 is the “forward-going”
ray going the same direction as the moving IRF I 0 . Call its eigenvalue λf .
Likewise, ŵ2 is the “backward-going” ray with eigenvalue λb (Fig. 1.3* ).
Then we have:
Λ(v) = λf ŵ1 ŵ1T + λb ŵ2 ŵ2T , (1.6)
which is easily verified by finding Λ(v)ŵ1 or Λ(v)ŵ2 , and noting that ŵ1 and
ŵ2 are orthonormal (see also (A.9)).
Figure 1.3: Relation between coordinates (t, x) and (t0 , x0 ), in relativity. The
grey shaded region is the square {|t| < a, |x| < a}, for some constant a. The
blue shaded region is {|t0 | < a, |x0 | < a}. Red lines show light rays in ±x
direction.
Now invert Eq. (1.3):

X 0 = Λ−1 (v)X. (1.7)
IRF I 0 is moving with velocity v relative to I, so I is moving with velocity
−v relative to I 0 . This gives:
Λ−1 (v) = Λ(−v). (1.8)

* We will always scale spacetime diagrams so that ct and x have the same length on
the page. Light rays then propagate at 45-degree angles.
Λ(−v) can also represent the boost for IRF I 0 going in the −x direction
instead of the +x direction in the original scenario. This swaps the forward-
going and backward-going eigenvectors:
Λ(−v) = λb ŵ1 ŵ1T + λf ŵ2 ŵ2T (1.9)
On the other hand, inverting a transformation simply inverts its eigenval-

ues:
1 1
Λ−1 (v) = ŵ1 ŵ1T + ŵ2 ŵ2T , (1.10)
λf λb
which you may verify by finding Λ(v)Λ−1 (v) = I.
Equating (1.9) and (1.10), we obtain:
1
λf = . (1.11)
λb
Then evaluating (1.6) explicitly gives:
1 1

1 λ+ λ−
Λ(v) = λ
1
λ
1 , (1.12)
2 λ− λ
λ+ λ
where we define λ ≡ λf for brevity.

To relate λ to v, note that the clock at x0 = 0 in IRF I 0 moves along the
path x = vt in IRF I (Fig. 1.3). Plugging X 0 = (t0 , 0)T into (1.3) and using
x/t = v gives:
λ − 1/λ
v= . (1.13)
λ + 1/λ
Solving for λ: r
1+v
λ= . (1.14)
1−v
Finally, plugging into (1.12) gives the Lorentz transformation:
0
t γ γv t
= . (1.15)
x γv γ x0
where we define
1
γ(v) = √ . (1.16)
1 − v2
As v → 1, γ → ∞, so the speed of light c = 1 is the speed limit for
relative motion of IRFs.
Let us restore the factors of c to compare with classical physics:
0 γvx0
t = γt + 2
c (1.17)
x = γvt0 + γx0 .
For low velocities v c, γ ≈ 1, and this reduces to the Galilean transfor-

mation
t ≈ t0
(1.18)
x ≈ vt0 + x0 ,
as expected.
Finally, because the Lorentz transformation is linear, it also applies to
time and length differences:
∆t0

∆t γ γv
= . (1.19)
∆x γv γ ∆x0
1.6 Time dilation

Fig. 1.4 shows the clock Cv at x0 = 0 in IRF I 0 . Plugging X 0 = (t0 , 0)T
into (1.15) gives t = γt0 . Since γ > 1, we have t > t0 . Moving clocks
run slower than the clocks in an IRF. This is called time dilation. For any
stationary clock in IRF I 0 , we can plug in ∆X 0 = (∆t0 , 0)T into (1.19) to get
∆t = γ∆t0 .
Figure 1.4: Same as Fig. 1.3, but showing the clock at x0 = 0. It displays a
lower time t0 = t/γ than the clocks in I (red point).
** Exercise 1.2
Twin paradox. Alice travels to the moon with constant velocity v, then
travels back to Earth with constant velocity −v. Her twin Eve stays on
Earth. From Eve’s perspective, Alice is always moving with speed |v|,
so Alice’s clock is slower. However, from Alice’s perspective, Eve is also
moving with speed |v|, so Eve’s clock is slower. Whose clock is behind
when Alice returns to Earth? (Ignore the rotation of the moon around
the Earth, and the rotation of the Earth around the sun, etc.)
Hint 1: draw a spacetime diagram showing their paths, from an IRF
where Eve is at rest (called her rest frame). Note that there is no single
IRF where Alice is always at rest, since she must accelerate to get from
velocity +v to −v.
Hint 2: it may help to read Sec. 1.8.
1.7 Length contraction

Just as moving clocks tick slower, moving rulers also appear shorter. This
phenomenon is called length contraction. First, what does it mean to mea-
sure a length? Length is measured from two ends of a standard ruler at the
same time, just like a time interval is measured at two times by a standard
clock at the same position.
Consider the ruler of length a in IRF I 0 between x0 = 0 and x0 = a (Fig.
1.5). From (1.15), the endpoint of the ruler X 0 = (0, a)T corresponds to
the point X = (γva, γa)T : the green point in Fig. 1.5. The path of this
endpoint is:
x = γa + v(t − γva). (1.20)
Plugging in t = 0, the two ends of the ruler at t = 0 are at x = 0 and
x = γa − γv 2 a = a/γ: the red point in Fig. 1.5. Thus, the observed length
in I is a/γ < a.
Figure 1.5: Length contraction. Red shaded region is the path of the ruler
between x0 = 0 and x0 = a. As measured in IRF I, its length is a/γ < a.
Although length contraction is “merely” the spatial version of time di-

lation, it is conceptually somewhat different. Time dilation involves two
events along the same path of a moving clock. On the other hand, length
contraction involves two events at different locations, so it is not possible
to measure for a point-like observer. Of course, we may translate between
space and time by arranging light to be sent from the faraway position (Fig.
1.6). Then we may either measure ∆t and use ∆x = c∆t, or use the spatial
information from the light signal, e.g. see the object with our eyes.
Figure 1.6: Translating a length measurement into a time measurement at

x = 0, with a light ray (red line).
** Exercise 1.3
Ladder paradox. This apparent paradox is similar to the twin paradox,
but for length contraction instead of time dilation. Consider a ladder
passing through a barn with open front and back doors (Fig. 1.7). The
ladder has length L at rest, but is moving with velocity v with respect
to the barn, so appears contracted to length L/γ(v). The barn at rest is
size L/γ(v), so it is able to close its front and back doors exactly when
the ladder is fully inside. The doors then open and the ladder exits.
Now from the ladder’s rest frame, the barn is moving with velocity
−v and appears contracted to length L/γ(v)2 , so it is far too small to fit
the ladder of length L: the doors cannot close!
Which scenario is correct?
Hint: the two events “front door closes” and “back door closes”
occur at the same time in the barn’s rest frame. Do they occur at the
same time in the ladder’s rest frame?
Figure 1.7: From top to bottom: (1) ladder with contracted length L/γ
enters barn of size L/γ, (2) doors close, (3) doors open and ladder exits.
Fig. 1.5 and the ladder paradox illustrate the principle of relativity of
simultaneity: the notion of two events being simultaneous depends on the
IRF. The green point in Fig. 1.5 has coordinates X 0 = (0, a)T in IRF I 0 , so it
occurs simultaneously with the origin X 0 = (0, 0)T . However, in IRF I, the
green point clearly has t 6= 0, so it is not simultaneous with the origin.
1.8 Proper time

Consider a clock moving on an arbitrary path x(t) in spacetime. As before,
x and t refer to position and time measured in an IRF I. The path of an
object is also called its worldline. It can be broken into an infinite number of
small segments. For each segment (dt, dx), we can set the origin of the IRF
to the start of the segment, and form another IRF I 0 with the t0 axis along
the segment (Fig. 1.8). Then dt is the time difference measured in the IRF,
and dτ ≡ dt0 is the difference in the clock’s reading over the segment. (We
use the symbol dτ instead of dt0 since the t0 axis changes for each segment.)
We may relate dτ and dt by plugging in X 0 = (dτ, 0)T into (1.15):
dt
dτ =
γ(ẋ)
√ (1.21)
= dt 1 − ẋ2
√
= dt2 − dx2 ,
where
√ ẋ = dx/dt is the instantaneous velocity of the segment, and γ(ẋ) =
1/ 1 − ẋ2 .
Figure 1.8: Worldline of a clock broken up into infinitesimal segments

(gray). We set up IRFs I and I 0 around one segment (dt, dx).
We may integrate Eq. (1.21) to obtain a finite difference in clock read-

ing τ : Z t2 p
τ (t1 , t2 ) ≡ dt 1 − ẋ(t)2 . (1.22)
t1
τ is known as the proper time, from the French propre, meaning own. It
measures the time difference displayed on a moving clock as it moves from
t1 to t2 : its “own” time.
* Exercise 1.4
Consider a clock attached to an oscillating spring or pendulum, so that
it moves on the path x = x0 sin(ωt), with the maximum velocity vmax =
x0 ω 1. What is the ratio of the time ∆τ displayed on the clock to the
time ∆t displayed on a stationary clock, for ∆t 1/ω? Find it to order
2
vmax . Hint: use the Taylor expansion (1 + )p ≈ 1 + p, for 1.
1.9 Velocity addition

How does a particle’s velocity ẋ = dx/dt transform under Lorentz transfor-
mations? It is a ratio between differentials dx and dt. Using (1.19), we
have:
dx γvdt0 + γdx0
ẋ = =
dt γdt0 + γvdx0
(1.23)
v + ẋ0
= .
1 + ẋ0 v
On the second line, we divide top and bottom by γdt0 . This is the velocity
addition rule in special relativity. As a quick check, an object moving with
speed ẋ = v would appear stationary in IRF I 0 , so ẋ0 = 0. As another quick
check, as ẋ0 → 1, ẋ → 1, so the object can never reach the speed of light in
any IRF. In non-relativistic physics, velocities add as ẋ = v + ẋ0 , so objects
can move at arbitrarily high speed.
1.10 Causality
While physical objects are limited to |ẋ| < 1, some phenomena can travel
faster than light (called superluminal propagation). For example, a moving
flashlight shining on a distant screen produces a moving spot (Fig. 1.9).
This spot can travel faster than c across the screen if the flashlight is rotated
fast enough or the screen is far enough away.
Figure 1.9: A moving flashlight illuminating a distant screen.
However, this spot cannot be used to send information: an observer on

one end of the screen cannot control the spot to send a signal to another
observer across the screen. Thus, it is really information that cannot travel
faster than light, since this would violate causality.
We will define causality as the property that all inertial observers agree
on the direction of cause and effect. Specifically, an event A can only cause
another event B if all inertial observers agree on the time ordering between
them: tA < tB . As usual, consider IRFs I and I 0 with coordinates X and
X 0 . Let A have coordinates X = X 0 = (0, 0)T , and B have coordinates
X 0 = (a, ua)T in IRF I 0 , with u, a > 0. u = ∆x0 /∆t0 > 1 corresponds to
superluminal propagation. From (1.15), we have:
tB = γa(1 + vu). (1.24)
We see that tB < tA for sufficiently negative v < −1/u. Since v is limited to
−1 < v, observers may disagree on the time ordering only for u > 1. Thus,
A can only cause B by sending a signal traveling at the speed of light or
slower.
Causality is a fundamental property of physical theories. Note that clas-
sical mechanics also satisfies causality while allowing superluminal signal
propagation: time is a globally shared coordinate, so observers always
agree on time ordering.
1.11 Four-vectors
Now let us finally move to four spacetime dimensions, and introduce some
new notation. Call the time coordinate x0 and the spatial coordinates
x1 , x2 , x3 . Greek indices µ, ν, · · · are used for spacetime coordinates, and
Latin indices i, j, · · · for spatial coordinates. We also use ~x for the spatial
position vector, and simply x for the full spacetime vector (instead of X).
The Lorentz transformation for a boost in the x1 direction becomes* :
xµ = Λµν (v)x0ν , (1.25)
where  
γ γv 0 0
 γv γ 0 0 
Λ(v) = 
 0 0 1 0 .
 (1.26)
0 0 0 1
Here, we have introduced the Einstein summation convention, where all
repeated indices in an expression are implicitly summed over, or contracted.
In this case, the index ν is summed from 0 to 3. The upper index of a matrix
(µ here) is always the row index, and the lower index (ν here) is the column
index (not that it matters because Λ is symmetric).
The proper time formula (1.21) becomes:
p √
dτ = dt2 − (dx1 )2 − (dx2 )2 − (dx3 )2 = dt2 − d~x2 . (1.27)
Since we have unified time and space coordinates into xµ , let’s try to
define a 4-component velocity uµbad as
dxµ

µ 1
ubad ≡ = . (1.28)
dt ~x˙
As its name implies, this is a bad definition. From (1.23), it transforms non-
linearly under Lorentz transformations due to the ẋ0 in the denominator.
Ideally, it would transform in the same way as xµ :
uµ = Λµν (v)u0ν (1.29)
Fortunately, we know of a differential that is invariant under Lorentz trans-

formations: the proper time dτ . It is the difference in clock reading over
a path segment, so it is only a property of the path segment and is inde-
pendent of the IRF used to measure time and distance dxµ . Indeed, you
* We can always rotate our spatial axes so that a boost is in the x1 direction. We will
not need the general formula for a boost in an arbitrary direction.
may verify that the proper time (1.21) does not change under a Lorentz
transformation (1.19)* . Such a quantity is called a Lorentz invariant.
Thus, let us define the four-velocity:
dxµ
µ
u ≡ . (1.30)
dτ
It evidently satisfies the correct transformation law (1.29).
The path of a particle can be parametrized by t or τ . They are related
by:
√
q
dt
dτ = dt2 − d~x2 = dt 1 − ~x˙ 2 = , (1.31)
γ
p
where γ = 1/ 1 − ~x˙ 2 . Thus, we have:
dxµ dt
uµ =
dt
dτ (1.32)
1
=γ ,
~x˙
using the chain rule (Sec. C.2). We may take further derivatives d/dτ to
obtain the four-acceleration, etc., which all transform linearly under boosts.
In general, any 4-component quantity V µ that transforms as
V µ = Λµν (v)V 0ν (1.33)
under a boost is called a four-vector.
Four-vectors are important because they allow us to form other Lorentz
invariants. Define the quantity ds2 as
ds2 ≡ −dτ 2 = d~x2 − dt2
(1.34)
= ηµν dxµ dxν .
where η is a 4 × 4 matrix
 
−1 0 0 0
 0 1 0 0 
η=
 0
. (1.35)
0 1 0 
0 0 0 1
√ √
* Thiscan be seen from the eigenvectors ŵ1 = (1, 1)T / 2 and ŵ2 = (1, −1)T / √ 2 (1.5).
1
Using these as a basis,
√ a displacement vector dv has coordinates dv = (dt + dx)/ 2 and
dv 2 = (dt − dx)/ 2. Since the eigenvalues of ŵ1 and ŵ2 are λ and 1/λ respectively, the
product dv 1 dv 2 = (dt2 − dx2 )/2 is constant under a boost.
ds2 is called the interval, and ηµν is called the Minkowski metric. More
terminology: two spacetime points separated by dxµ are timelike-separated
when ds2 < 0, null-separated when ds2 = 0, or spacelike-separated when
ds2 > 0. Relating to the previous section, spacelike-separated events are
causally disconnected: they cannot cause each other.
Clearly, we may replace dxµ and/or dxν with any object that transforms
the same way, and the result will also be Lorentz invariant. For example,
the four-velocity squared is simply a constant, since we are dividing ds2 by
dτ 2 :
dxµ dxν
u2 ≡ ηµν uµ uν = ηµν = −1. (1.36)
dτ dτ
Finally, you may have noticed that we always sum over an upper and
a lower index, and never upper/upper or lower/lower. This is because we
would like upper and lower indices to transform in different ways. Define
Vµ ≡ ηµν V ν (1.37)
for any four-vector V ν . In components: V0 = −V 0 , Vi = V i . From the

requirement that V µ Vµ is Lorentz invariant, Vµ transforms as:
Vµ = (Λ−1 (v))νµ Vν0 (1.38)
under a boost.
Eq. (1.37) is just a matrix-vector multiplication. We may invert the
matrix η and use it to raise indices:
V µ = η µν Vν . (1.39)
η µν is called the inverse metric, denoted by the same symbol but with upper
indices. Of course, it is the same matrix as the metric in this case, but we
will later replace the metric with a more general matrix.
For now, upper and lower indices are just a convenient notation. We
will see later that the two types of indices have a geometric interpretation
in general relativity.
* Exercise 1.5
How do the velocities in the x2 and x3 directions ẋ2 = dx2 /dt and
ẋ3 = dx3 /dt transform under a boost in the x1 direction?
*** Exercise 1.6

1. Derive the formula for the four-acceleration aµ ≡ duµ /dτ in terms
of the ordinary velocity ~x˙ and acceleration ~x¨.
2. Calculate the Lorentz invariant aµ aµ .
3. Consider an observer uniformly accelerating in the x1 direction,

so that aµ aµ = a20 , where a0 is constant. Assume it starts at rest at
t = 0, ~x = ~0. Find its path x1 (t).
1.12 Energy and momentum

Notation alert: since we are running out of nice symbols for velocity, and
~x˙ is rather clumsy, we will use ~v = ~x˙ for a particle’s velocity instead of the
relative velocity between IRFs, unless otherwise indicated.
We can multiply uµ by the mass m to get the four-momentum pµ :

µ µ 1
p = mu = mγ . (1.40)
~v
The corresponding Lorentz invariant is p2 = −m2 . At small velocity v 1,

the spatial components become p~nr = m~v , the usual non-relativistic mo-
mentum. What about the time component? Taylor expand γ(v) around
v = 0* :
1
γ(v) = 1 + v 2 + O(v 4 ) (1.41)
2
We see that
1
E ≡ p0 = m + m~v 2 + O(~v 4 ). (1.42)
2
E is the energy. It includes the familiar kinetic energy m~v 2 /2 plus the rest
energy m. This is the famous mass-energy equivalence Erest = mc2 , upon
restoring c.
What about particles that travel at the speed of light, like light itself
(the photon)? Their change in proper time dτ is always zero, so they do
not have a four-velocity. However, consider Eq. (1.40) as |~v | → 1. We
have γ(v) → ∞, but the energy E = mγ(v) can remain finite if m → 0 at
* The notation O(xn ) means terms of order xn or higher (xn+1 , xn+2 , etc).
the same time. Thus, massless particles move at the speed of light. Their
four-momentum is:
µ 1
pmassless = E (1.43)
n̂
where n̂ is a unit vector.
Just as energy and momentum are conserved in non-relativistic physics,
four-momentum is conserved* in all interactions. For example, consider
a mass M particle decaying into two mass m particles 1 and 2. We have
pµM = pµ1 + pµ2 , (pM )2 = −M 2 , and (p1 )2 = (p2 )2 = −m2 . In the rest frame of
M , we get:
T
pM = M, ~0
r !T
M M2
p1 = , n̂ − m2
2 4 (1.44)
r !T
M M2
p2 = , −n̂ − m2 ,
2 4
where n̂ is a unit vector. Note that this decay is allowed if 0 ≤ m ≤ M/2.
The extra mass M − 2m is converted to kinetic energy of the products.
** Exercise 1.7
1. Find the final momenta p1 and p2 for the decay of M into two un-
equal masses m1 and m2 , in the rest frame of M . Use coordinates
where m2 moves in the +x3 direction.
2. Show that this decay is only allowed for m1 + m2 ≤ M .
1.13 Lightcones
In the remainder of this chapter, we introduce some useful but nonessential
concepts.
* Some students get confused about conserved versus invariant quantities, since they
both involve the notion of staying the same. A conserved quantity stays the same over
time, while an invariant quantity is the same under some transformation. (You could say
that a conserved quantity is invariant under time translation.) Something can be both,
one, or neither. pµ is conserved but transforms as a four-vector under boosts so is not
invariant. ~x2 (τ ) is obviously not conserved but is invariant under spatial rotations.
All the light rays that intersect a given point xa form a region called the
lightcone at xa , since this region can be visualized as a cone in 3D spacetime
(Fig. 1.10). The upper cone with x0 > x0a is called the future lightcone, and
the lower cone with x0 < x0a is called the past lightcone. The worldline
of any object intersecting xa always lies within the lightcone at xa . Under
a boost, any event within the lightcone at xa stays within the lightcone
at x0a , since it remains timelike-separated from xa (dτ 2 = dt2 − d~x2 > 0).
Likewise, any event outside of the lightcone stays outside, since it remains
spacelike-separated.
Figure 1.10: Lightcone at the origin for a 3D spacetime.
1.14 Wick rotation

Finally, let’s discuss another way to think of Lorentz transformations that
may be more intuitive. The interval (1.34) looks just like the formula for
distance squared in Euclidean space d~x2 , but with the time dimension −dt2
added on with the wrong sign. We can make it look exactly like Euclidean
space using a trick called Wick rotation. Define an imaginary time variable
as* :
tE ≡ it. (1.45)
* tE ≡ −it would work just as well here. In quantum field theory, Wick rotation is
associated with rotating a contour integral in the complex plane. There, one choice is
better than the other.
Then tE is just another spatial dimension and (1.34) becomes:

ds2 = d~x2 + dt2E (1.46)
In Euclidean space, we know that rotations leave distances invariant. For
example, the rotation around the x3 axis
1 01
x cos φ sin φ x
= (1.47)
x 2
− sin φ cos φ x02
leaves ~x2 the same. Since tE is another spatial dimension, the “spacetime”
rotation 0
it cos θ sin θ it
= (1.48)
x1 − sin θ cos θ x01
clearly leaves ds2 invariant. However, we see that the angle θ cannot be
real, or t will be complex when t0 and x01 are real. Using the identities
eiθ + e−iθ
cos θ =
2 (1.49)
eiθ − e−iθ
sin θ = ,
2i
we see that if θ is imaginary, cos θ is real and sin θ is imaginary, which is just
what we need so that t and x stay real. Thus, we define
θ = iβ (1.50)
where the real number β is called the rapidity. We have
eβ + e−β
cos θ = = cosh β
2
(1.51)
e − e−β
β
sin θ = i = i sinh β,
2
so that
t0

t cosh β sinh β
= . (1.52)
x1 sinh β cosh β x01
Comparing with (1.15), we see that β is related to v as
sinh β
tanh β = = v. (1.53)
cosh β
Note that tanh β is bounded by ±1, as expected. Thus, Lorentz transforma-
tions are just rotations between space and imaginary time (by an imaginary
angle).
Chapter 2
The action principle
We have set the stage for physics in flat space. Now let us discuss how
particles and fields propagate in this spacetime more systematically. For
example, particles in non-relativistic physics follow Newton’s second law
F~ = m~a, and electromagnetic fields E(t,~ ~x) and B(t,
~ ~x) follow Maxwell’s
equations. These are both examples of equations of motion that are derived
from the principle of stationary action, or the action principle.
The action principle takes slightly different forms for particles and fields
(Table 2.1). For simplicity, we consider particles in classical mechanics first.
Those familiar with Lagrangians in classical mechanics can skip to Sec. 2.6.
2.1 The action

Consider a single particle in empty space. At any time t, it has a position
~x(t) and velocity ~v (t) = d~x/dt. Define a real-valued quantity Sif {~x(t)} that
depends on the path of the particle ~x(t) from time ti to tf . The action
principle states that the path the particle actually takes is one where the
action is stable to small perturbations in the path ~x(t) → ~x(t) + δ~x(t).
To elaborate, consider dividing the time interval from ti to tf into N
segments, and take N → ∞ in the end. You may think of Sif as a function
of all the positions and times of each segment:
Sif {~x(ti ), ti , ~x(ti + ∆t), ti + ∆t, · · · , ~x(tf ), tf }, (2.1)
where ∆t = (tf − ti )/N . (Note that the velocity ~v (t) = ~x(t+∆t)−~

∆t
x(t)
, so it
is not an independent variable here.) Such a function of infinitely many
25
CHAPTER 2. THE ACTION PRINCIPLE 26
variables, or “function of a function”, is called a functional. The principle

of stationary action is then
δSif
= 0, (2.2)
δxi (t)
i.e. the partial derivative of Sif with respect to any component of the posi-
tion xi at any time t is zero. The δ symbol is used instead of ∂ for functional
derivatives.
δS
Instead of the derivative notation δxiif(t) , we will mainly think of δS as a
small change in S coming from a small change in ~x(t):
Z
δSif
δS ≡ S{~x(t) + δ~x(t)} − S{~x(t)} = dt i δxi (t) = 0, (2.3)
δx (t)
where we expand to first order in δxi (t). This is just the functional version
of
df
∆f (~x) ≡ f (~x + ∆~x) − f (~x) = i ∆xi , (2.4)
dx
where f (~x) is some function of multiple variables xi .
δS
Eqs. (2.2) and (2.3) are equivalent. If we require δxiif(t) = 0, then δS = 0.
δS
Conversely, if we require δS = 0 for any δxi (t), then δxiif(t) = 0.
Finally, the action principle only applies to perturbations that are zero
at the boundaries: δ~x(ti ) = δ~x(tf ) = 0. This will become important later.
2.2 The Lagrangian

Consider the action S12 for time t1 to t2 , and the action S23 for time t2 to t3 ,
with t1 < t2 < t3 . We require locality in time, meaning that a perturbation
in the first interval only affects S12 and not S23 . Also, we require additivity
of the action: S12 + S23 = S13 . These conditions imply that Sif can be
written as an integral from ti to tf of some quantity:
Z tf
Sif = L(~x(t), ~v (t), t) dt. (2.5)
ti
L(~x(t), ~v (t), t) is known as the Lagrangian. In general, it may depend on

the position and velocity at time t, as well as the time t itself* .
* It also cannot depend on higher time derivatives due to the Ostrogradsky instability.
Note that we may add a total time derivative df dt

(~x, t) to the Lagrangian
without affecting the action principle. Such a term produces the action:
Z tf
df
dt (~x, t) = f (~x(tf ), tf ) − f (~x(ti ), ti ) (2.6)
ti dt
by the fundamental theorem of calculus. This is irrelevant for the action
principle since the perturbation δ~x(t) is zero at the boundaries by definition.
Let us now derive the form of the Lagrangian based on some other
fundamental principles:
• Homogeneity of space and time. No point in space or time is any

different from any other, so the Lagrangian cannot depend on ~x or t
explicitly.
• Isotropy of space. No direction in space is different from any other,
so the Lagrangian can only depend on the magnitude (squared) of
the velocity ~v (t)2 .
• Galilean invariance. The theory should be invariant under shifts by
a constant velocity, ~x → ~x + ~v0 t (a Galilean transformation). This is
the non-relativistic version of the Lorentz transformation* . Taking the
time derivative, this is ~v → ~v + ~v0 . To first order in ~v0 , the Lagrangian
changes as
δL 2
L(~v 2 ) → L(~v 2 + 2~v · ~v0 ) = L(~v 2 ) + 2 (~v )~v · ~v0 + O(~v02 ) (2.7)
δ~v 2
δL
The term 2 δ~ v2
(~v 2 )~v · ~v0 will not affect the physics if it is a total time
δL
derivative of the form above. This only occurs if δ~ v2
(~v 2 ) is a constant. Call
this constant 21 m. Thus, the Lagrangian for a single particle in free space is:
1
L = m~v 2 . (2.8)
2
The constant m is, of course, the mass.
To summarize, we derived the unique action and Lagrangian (up to a
total time derivative) for a single particle from the following postulates:
* It implies there is no universal stationary frame of reference, even in non-relativistic
mechanics. The difference in relativity is that the speed of light is constant in all inertial
frames.
• Locality in time
• Additivity of the action
• Homogeneity of space and time
• Isotropy of space
• Galilean invariance
2.3 Multiple particles

Now consider the n-particle case. The Lagrangian may generally depend on
all the positions and velocities {~x1 , ~v1 , · · · , ~xn , ~vn }. Following the postulates
above, it must take the form* :
n
!
X 1
L= ma~va2 − U (∆~xab ) (2.9)
a=1
2
where the function U (∆~xab ) depends on all the separations between the
particles {∆~x12 = ~x1 − ~x2 , ∆~x13 = ~x1 − ~x3 , · · · }.
For example, the Coulomb interaction between two charges q1 and q2 is:
q1 q 2
U (~x1 − ~x2 ) = . (2.10)
4π0 |~x1 − ~x2 |
2.4 Euler-Lagrange equations

Let us now apply the principle of stationary action to the action (2.5), re-
peated here: Z tf
Sif = L(~x(t), ~v (t), t) dt. (2.11)
ti
vi · ~vj with i 6= j is possible, but would imply that particles infinitely far
* A term like ~
away can affect each other, violating common sense.
Now take ~x(t) → ~x(t)+δ~x(t). The time derivative gives* ~v (t) → ~v (t)+ dδ~
x
dt
(t).
The change in the action is:
Z tf
δL i δL i
δSif = δx (t) + i δv (t) dt
ti δxi (t) δv (t)
Z tf (2.12)
δL d δL i
= − δx (t) dt.
ti δxi (t) dt δv i (t)
We use the Einstein summation convention of the previous chapter, where
the index i is summed over. On the second line, we use δ~v = dδ~ dt
x
and
integrate by parts. Note that we can discard the boundary term δvδL i (t) δx i
(t)
since δ~x = 0 at the boundaries.
Since this must equal zero for any variation δ~x(t), we obtain the Euler-
Lagrange equations:
δL d δL
= . (2.13)
δxi (t) dt δv i (t)
For multiple particles, this becomes:
δL d δL
i
= (2.14)
δxa (t) dt δvai (t)
for each particle a.
Applying this to the multi-particle Lagrangian (2.9) gives Newton’s law
for a conservative potential:
F~ ≡ −∇a U = ma~aa (2.15)
for each particle a, where ~aa = d~va /dt and ∇a is the gradient with respect
to ~xa .
2.5 Noether’s theorem

Noether’s theorem is a simple but profound result that relates symmetries
of the Lagrangian to conserved quantities. Consider a perturbation δ~x(t)
that only changes the Lagrangian by a total time derivative df /dt:
df
δL = . (2.16)
dt
* Thus, the variation “operator” δ commutes with the derivative d/dt: δ d~
x
dt =
dδ~
x
dt .
On the other hand, the Lagrangian depends on ~x(t) and ~v (t), so we have
δL i δL i
δL = δx + δv
δxi δv i
d δL δL
= i
δxi + i δv i (2.17)
dt δv δv

d δL i
= δx
dt δv i
where we have used the Euler-Lagrange equations (2.13) on the second

line. Equating these two, we see that

d δL i
δx − f = 0. (2.18)
dt δv i
Thus, the quantity

δL i
j≡ δx − f (2.19)
δv i
is conserved. j is called the Noether charge.
This may feel like circular reasoning, since the Euler-Lagrange equations
were derived by considering perturbations δ~x(t). The difference is that
δ~x(t) here does not necessarily vanish at the boundaries.
The extension to multiple particles is again straightforward; simply in-
dex by a as well as i:
δL
j ≡ i δxia − f. (2.20)
δva
Let’s test this out. The multi-particle Lagrangian (2.9) does not change
under a global translation ~xa → ~xa + ~b. We have δ~xa = ~b and δL = 0 (so
f = 0). Plugging into (2.20):
δL i
j~b = δx
δvai a (2.21)
= ma vai bi
Since dj~b /dt = 0 for any ~b, this implies that
p~tot ≡ ma~va (2.22)
is conserved. This is, of course, the total momentum.

Now consider time translation t → t + δt. We have δ~xa = ~va δt. The
Lagrangian depends on time implicitly through ~x(t) and ~v (t), so changes by
dL
δL = δt. (2.23)
dt
Note the distinction between the total derivative d/dt and the partial deriva-
tive ∂/∂t. ∂L/∂t = 0 since L does not depend on time explicitly, but
dL/dt 6= 0. Thus, f (t) = Lδt. Plugging into (2.20):
δL i
jδt = δx − f
δvai a
= ma~va2 δt − Lδt (2.24)

1 2
= ma~va + U (∆~xab ) δt.
2
Again, because this is conserved for all δt, the energy

1
E ≡ ma~va2 + U (∆~xab ) (2.25)
2
is conserved.
** Exercise 2.1
If the potential U (∆~xab ) in (2.9) only depends on the magnitudes |~xa −
~xb |, the Lagrangian is invariant under rotations. The change
in xia for
an infinitesimal rotation around the axis θ~ by an angle θ~ is:

i
δxia = θ~ × ~xa (2.26)
as you may verify using a diagram and the right-hand rule. Show that
the total angular momentum
~ tot ≡ ~xa × p~a
L (2.27)
is conserved. You may find Eq. (C.5) useful.

2.6 Relativistic particles

Let’s go back to relativity and start with a single (massive) particle again.
Since its worldline can still be parametrized by t, the Euler-Lagrange equa-
tions (2.13) still hold. However, the Lagrangian is different. Because the
laws of physics should take the same form in all IRFs, the action must be
an integral of a Lorentz invariant. The only differential Lorentz invariant
that characterizes the path is the proper time dτ . Thus, the point-particle
action is: Z
2
Spp = −mc dτ (2.28)
where we have temporarily restored c and introduced a constant m with

units of mass, to match the units of the non-relativistic action (mass ×
length2 /time)* . Using the proper time formula (1.27):
Z p
Spp = −m 1 − ~v (t)2 dt (2.29)
so the Lagrangian is p
Lpp = −m 1 − ~v (t)2 . (2.30)
Plugging into the Euler-Lagrange equations gives:

d ~v
0= m√
dt 1 − ~v 2
d (2.31)
= (mγ(v)~v )
dt
d~p
=
dt
where p~ is the spatial part of the four-momentum (1.40). Thus, the ve-
locity is a constant and particles propagate in straight lines, as expected.
This also holds for massless particles, although we started with a massive
Lagrangian.
Unlike non-relativistic mechanics, it is difficult to couple multiple parti-
cles through a direct interaction as in (2.9). This is because non-relativistic
* The negative sign is so that the action is minimized when the proper time is maxi-
mized. It is always possible to connect two timelike-separated points with multiple null
vectors so that the proper time is minimized at zero, but this is not a stationary path and
obviously not the path the particle takes.
physics allows action-at-a-distance: particles far away can affect each other
instantaneously in time. However, relativistic interactions must be local in
spacetime while preserving Lorentz invariance. This only permits delta-
function terms like
Z Z
dτa dτb δ 4 (xa (τa ) − xb (τb )) (2.32)
in the action, which do not correspond to any known interaction. Here,

δ 4 (x) ≡ δ(x0 )δ(x1 )δ(x2 )δ(x3 ). Indeed, this term can be integrated explic-
itly, giving a term ∝ δ 2 (· · · ) which is either infinite or zero depending on
whether the paths of the two particles intersect. The problem is that there
are only two variables of integration dτa and dτb but we must use a four-
dimensional delta function for Lorentz invariance* .
2.7 From particle to field

Although direct particle interaction has issues, particles can be coupled to
fields quite easily. A field is simply a quantity that varies in spacetime. For
example, a scalar field φ(x) assigns a number to each point in spacetime.
Under a boost (1.25), it transforms as:
φ(x) = φ0 (x0 (x)) (2.33)
where x0µ = (Λ−1 (v))µν xν . Note that although the function φ is defined in
terms of the function φ0 , they are different functions of their argument:
φ(x) 6= φ0 (x).
A four-vector field Aµ (x) assigns a four-vector to each point in space-
time. Under a boost, it transforms as:
Aµ (x) = (Λ−1 (v))νµ A0ν (x0 (x)). (2.34)
Since Aµ is a four-vector, it transforms according to Eq. (1.38) in addition

to the transformation of the argument x. This is shown in Fig. 2.1 for a
rotation instead of a boost.
can show δ 4 (x) is Lorentz invariant using the identity dD x δ D (~x) = 1 for general
R
* We
dimension D. Here, dD x ≡ dx0 dx1 · · · dxD−1 . Under a linear change of variables ~x = A~x0 ,
dD x = dD x0 |det A| (A.21), so δ D (~x) = δ D (~x0 )/ |det A|. For the Lorentz transformation
(1.15), det Λ = 1.
Figure 2.1: A vector’s components V i and position (x, y) look different un-
der a change of coordinates.
Let’s try to couple a field Aµ (x) to a particle. The simplest Lorentz

invariant action is
Z
SA = q Aµ (t, ~x)dxµ
Z (2.35)
i

=q A0 (t, ~x(t)) + v Ai (t, ~x(t)) dt,
where we have explicitly indicated the dependence of Aµ on t and ~x(t). The

total Lagrangian is
L = Lpp + LA
√ (2.36)
= −m 1 − v 2 + q(A0 + v i Ai ).
The Euler-Lagrange equations are:

∂A0 ∂Aj d
q i
+ qv j i = (pi + qAi )
∂x ∂x dt (2.37)
dpi ∂Ai ∂Ai
= +q + q j vj ,
dt ∂t ∂x
where on the second line we use the chain rule on dAi (t,~ dt
x(t))
, since Aµ de-
pends on t through ~x(t) as well as t explicitly (Sec. C.2). Rearranging, we
get:
dpi
= q(∂i A0 − ∂0 Ai ) + qvj (∂i Aj − ∂j Ai ), (2.38)
dt
using an abbreviated notation ∂µ ≡ ∂/∂xµ . Now use the vector calculus

identity (C.6) to write the second term in a more familiar form:
dpi
~

= q (∂i A0 − ∂0 Ai ) + q ~v × ∇ × A . (2.39)
dt i
This is the Lorentz force law* for a charge q:

d~p ~ + q~v × B,
~
= qE (2.40)
dt
upon defining the electric and magnetic fields
Ei ≡ ∂i A0 − ∂0 Ai

~ = ijk ∂j Ak , (2.41)
Bi ≡ ∇ × A
i
using (C.3). You may recognize A ~ as the vector potential and A0 = −A0 =
−V as the electric potential of electromagnetism. We have “discovered”
electromagnetism by simply postulating a four-vector field Aµ and writing
a Lorentz-invariant coupling to a particle!
** Exercise 2.2
Show that equations (2.41) imply two of Maxwell’s equations:
~ =0
∇·B
~ (2.42)
~ = − ∂B .
∇×E
∂t
You may find the identities in Appendix C useful.
2.8 Gauge invariance

Let us discuss an additional symmetry principle that constrains the La-
grangian. Examine the action SA again (2.35). The transformation
Aµ (x) → Aµ (x) + ∂µ φ(x) (2.43)

* Note that p
~ here is still the spatial part of the four-momentum, which only reduces to
the non-relativistic momentum m~v for u 1.
for some scalar field φ(x) produces a total time derivative in the Lagrangian:
Z
SA → SA + q ∂µ φdxµ
Z
= SA + q (∂t φ + v i ∂i φ)dt (2.44)
Z
dφ
= SA + q dt.
dt
As we have repeated many times, a total time derivative does not affect the
physics. Indeed, you can verify that E ~ and B~ (2.41) are left invariant by
this transformation, as you may recall from your electromagnetism courses.
Also, note that ∂µ φ(x) transforms as a four-vector with lower index
(1.38) under a boost:
∂x0ν 0 0 0
∂µ φ(x) = ∂ φ (x )
∂xµ ν (2.45)
= (Λ−1 (v))νµ ∂ν0 φ0 (x0 )
using the chain rule. Thus, the new Aµ (x) is still a four-vector.
The transformation (2.43) is called a gauge transformation. We will re-
quire all our Lagrangians to be gauge-invariant (up to a total time deriva-
tive). This eliminates terms like
Z
A2 (x)dτ (2.46)
that we could have added to the particle Lagrangian.

Gauge invariance (in a more general form) is a fundamental principle
in physics. It is the basis of all forces in the Standard Model of particle
physics: electromagnetic, strong, and weak. Even gravity can be thought
of as a type of gauge theory, although we will not explore that here.
2.9 Fields in motion

We have been treating the field Aµ (x) as fixed and deriving equations of
motion for particles propagating on this background field. We will now
treat the field as dynamic and derive its equations of motion. The dynami-
cal variable xµ (t) becomes a free parameter, and Aµ (x) is now the dynami-
cal variable. The change is summarized in the simple translation table:
Particle Field
Dynamical variable x A
Free parameter t x
Table 2.1: Objects in particle and field theory.
The action is given by an integral over space and time:

Z
S = L(Aµ , ∂ν Aµ ) d4 x (2.47)
where d4 x ≡ dx0 dx1 dx2 dx3 . We use L for field Lagrangians

R instead of L.
Since Eq. (2.47) looks like Eq. (2.5) if we define L ≡ dx1 dx2 dx3 L, L
is sometimes called the Lagrangian density, but we will not do so. L now
depends on the field Aµ and all its partial derivatives. It cannot depend
on x explicitly because of spacetime homogeneity. Note that d4 x is already
Lorentz invariant; see the footnote on p. 33. Thus, L must be Lorentz
invariant.
The action integral is now over all of spacetime. Just as the particle
action was defined between times ti and tf , we will pretend the spacetime
V has a boundary ∂V . The action principle only applies to field variations
δAµ (x) away from the boundary. A term in the Lagrangian ∂µ f µ (x) for some
vector field f µ (x) becomes a boundary term:
Z Z
µ 4
∂µ f d x = f µ nµ d3 s, (2.48)
V ∂V
where nµ is an outward normal vector. This is the divergence theorem from

vector calculus* , written in familiar vector notation as:
Z Z
∇ · f~ d x =
4
f~ · d3~s. (2.49)
V ∂V
This boundary term does not affect the physics, just as a total time deriva-
tive did not affect the particle Lagrangian.
Following a similar derivation as Sec. 2.4, the Euler-Lagrange equations
(2.13) become:
δL δL
= ∂ν . (2.50)
δAµ (x) δ∂ν Aµ (x)
* Unlike Stoke’s theorem, the divergence theorem holds in any number of dimensions.
In this case, a 4D spacetime integral becomes a 3D boundary integral.
Note that when ∂µ is applied to expressions involving Lagrangians, it ac-

tually means the total derivative, since Lagrangians do not depend on xµ
explicitly. For example, if the Lagrangian for a scalar field is L(φ) = φ2 ,
then ∂µ L = ∂µ (φ2 ) = 2φ∂µ φ.
** Exercise 2.3
Noether’s theorem for fields. Consider a Lagrangian L(φ, ∂µ φ) for the
scalar field φ(x). Assume it changes as L → L + ∂µ f µ (x) under a field
transformation φ(x) → φ(x) + δφ(x). Following Sec. 2.5, show that the
Noether current
δL
jµ = δφ − f µ (2.51)
δ∂µ φ
is conserved:
∂µ j µ = 0. (2.52)
In vector notation, this is the continuity equation:
dj 0
= −∇ · ~j. (2.53)
dt
j 0 is the charge density, and ~j is the current density. Integrating over

volume V :
dj 0 3
Z Z
dQ
d x= =− ∇ · ~j d3 x
V dt dt
ZV
~
~j · dS (2.54)
=−
∂V
= 0.
R
Q ≡ V j 0 d3 x is the total charge. On the second line, we use the di-
vergence theorem. We take V large enough so that the current density
vanishes at the boundary ∂V . Thus, the total charge is conserved.
2.10 The Maxwell Lagrangian

Let us try to construct a Lagrangian LA for Aµ (x). Due to gauge invari-
~ and B
ance, it must be constructed from the E ~ fields, but it is hard to see
how these transform under Lorentz transformations using the traditional
~ and B
3D vector notation. Looking back to (2.41), note how both E ~ are
related to the two-index quantity
Fµν ≡ ∂µ Aν − ∂ν Aµ (2.55)
as
Ei = Fi0
1 (2.56)
Bi = ijk Fjk .
2
Fµν is sometimes called the field strength. It can be written as a matrix,
where µ is the row index and ν is the column index:
 
0 −E1 −E2 −E3
 E1 0 B3 −B2 
F = E2 −B3
. (2.57)
0 B1 
E3 B2 −B1 0
Because the derivative ∂µ transforms as a four-vector (2.45), both indices

of Fµν transform as lower indices:
Fµν (x) = (Λ−1 (v))ρµ (Λ−1 (v))σν Fρσ

0
(x0 ) (2.58)
as you may verify. Thus, we may form the Lorentz-invariant quantity

0 µν 0 ~ 2 ~ 2
LA = − Fµν F = E −B (2.59)
4 2
where indices are raised using the inverse metric (1.39): F µν = η µσ η νρ Fσρ .
This is the Maxwell Lagrangian. 0 ≈ 8.854 × 10−12 F/m is the vacuum
permittivity* . Note that the simpler term Fµµ = η µν Fµν vanishes since Fµν is
antisymmetric† .
* Sincewe have a new unit (electric charge), we can add to our system of natural
units by setting another constant to one. The usual choice is 4π0 = 1. The Lagrangian
1
becomes: LA = − 16π Fµν F µν . Since we will only briefly discuss electromagnetism, we do
not do this here.
†
We can also add the term µνσλ Fµν Fσλ to the Lagrangian, where µνσλ is the totally
antisymmetric symbol, with 0123 = 1. You may show that this term is Lorentz-invariant
using det(Λ(v)) = 1 and the definition of the determinant (A.11). This gives the equation
of motion µνσλ ∂ν Fσλ = 0. This is trivially satisfied since Fµν ≡ ∂µ Aν − ∂ν Aµ and ∂µ ∂ν =
∂ν ∂µ .
The Euler-Lagrange equations (2.50) give:

δLA
0 = ∂ν
δ∂ν Aµ (x)

0 δ σα λβ

= − ∂ν Fσλ Fαβ η η
4 δ∂ν Aµ
(2.60)
0 δ σα λβ

= − ∂ν (∂σ Aλ − ∂λ Aσ )(∂α Aβ − ∂β Aα )η η
4 δ∂ν Aµ
= 0 ∂ν (∂ µ Aν − ∂ ν Aµ )
= 0 ∂ν F µν .
The third line is evaluated using:

δ δ∂σ Aλ δ∂α Aβ
(∂σ Aλ ∂α Aβ ) = ∂α Aβ + ∂σ Aλ
δ∂ν Aµ δ∂ν Aµ δ∂ν Aµ (2.61)
ν µ ν µ
= δσ δλ ∂α Aβ + δα δβ ∂σ Aλ .
Using (2.56), you may show that (2.60) is equivalent to the other two
Maxwell equations:
∇·E ~ =0
~ (2.62)
∇×B ~ = ∂E
∂t
in the absence of sources.
Maxwell’s equations imply the speed of light is a constant. To see this,
take a plane wave:
Aµ = A0µ sin(kµ xµ ) (2.63)
for some constant kµ and A0µ . k 0 = ω is the frequency and ~k is the wavevec-
tor. This Aµ satisfies the equation of motion (2.60) if
k 2 = kµ k µ = −ω 2 + ~k 2 = 0, (2.64)
kµ Aµ0 = 0. (2.65)
In the wave description of light, the speed of light is the phase velocity: how
fast the peaks and troughs of the wave propagate. For a plane wave, this is
given by vp = ω/|~k|. Eq. (2.64) implies the phase velocity is constant: vp =
1. This holds in all IRFs since Maxwell’s equations come from a Lorentz
invariant Lagrangian.
Finally, we may write the Lorentz force law (2.38) in a clearly Lorentz
covariant way using Fµν :
dpµ
= qF µν uν , (2.66)
dτ
as you may verify.
2.11 Charges and currents

We may easily incorporate a four-vector source J µ (x) by adding a source
term J µ (x)Aµ (x) to the Lagrangian:
0
LEM = LA + J µ Aµ = − Fµν F µν + J µ Aµ . (2.67)
2
ρ = J 0 is the charge density and J~ is the current density. For example,

µ 3 1
Jpp = qδ (~x − ~xp (t)) d~xp (2.68)
dt
(t)
for aRpoint charge q moving along the path ~xp (t). Indeed, you can check
µ
that Jpp Aµ d4 x gives the action for a point charge SA {~xp } (2.35), upon
doing the spatial integral over d3 x ≡ dx1 dx2 dx3 .
The Euler-Lagrange equations give:
Jµ
∂ν F µν = . (2.69)
0
This is equivalent to Maxwell’s equations with sources:
~ =ρ
∇·E
0
~ ~ (2.70)
~ = J + ∂E
∇×B
0 ∂t
Any source that obeys the equation of motion (2.69) also satisfies:
∂µ J µ = 0 ∂µ ∂ν F µν = 0 (2.71)
due to the antisymmetry of F µν . This means the electric charge is conserved

(2.54).
Chapter 3
The geometry of spacetime
We now move from flat space to curved space. This involves first de-
veloping the machinery of differential geometry on manifolds. Unfortu-
nately, the usual treatment using abstract manifolds is quite unintuitive.
We will instead pretend d-dimensional curved spacetime is embedded in
D-dimensional flat space, called the ambient space. Here, d < D, and we
are most interested in d = 4, not caring much what D is* .
3.1 Submanifolds of flat space

We use y I for the coordinates in the ambient space, 0 ≤ I ≤ D − 1,
and xµ for the coordinates in spacetime, 0 ≤ µ ≤ d − 1. As before, y 0
is the time coordinate and y 1 , · · · , y D−1 are the spatial coordinates. Also,
x = (x0 , · · · , xd−1 )T is the coordinate vector, and ∂µ ≡ ∂/∂xµ is the partial
derivative.
Spacetime is defined by the D functions f I :
y I = f I (x). (3.1)
This is called a submanifold of the ambient Minkowski space.

* Every curved spacetime can in fact be embedded in higher-dimensional flat space, so
we lose no generality here. For some interesting discussion on the dimension D needed,
see here. There is a more abstract definition of spacetime that does not involve embedding
it into an ambient space. Such an object is called a manifold. We will not discuss this, and
instead will use “submanifold” and “manifold” interchangeably.
42
CHAPTER 3. THE GEOMETRY OF SPACETIME 43
Note that the xµ here do not necessarily have an interpretation as space

or time, but are simply used to parametrize the manifold, just like how
spherical coordinates (θ, φ) parametrize the sphere embedded in 3D space
(Ex. 3.2). This is unlike the xµ in the previous chapter, which correspond
to physical times and lengths measured in an IRF.
At each point x in spacetime, there is a vector space spanned by the d
tangent vectors
∂f I
eI(µ) ≡ , (3.2)
∂xµ
called the tangent space at x (Fig. 3.1).
Figure 3.1: Tangent space at x for a 2D submanifold.
Now reparametrize spacetime using new coordinates x0µ , so that y I =

f (x0 ) ≡ f I (x(x0 )). The basis vectors become:
0I
∂f 0I 0
e0I(µ) = (x )
∂x0µ
∂f I 0 ∂xν 0 (3.3)
= (x(x )) (x )
∂xν ∂x0µ
∂xν
= eI(ν) 0µ (x0 )
∂x
using the chain rule. We show the function arguments for clarity. Since
what we call “new” and “old” coordinates is arbitrary, we also have:
∂x0ν
eI(µ) = e0I(ν)(x). (3.4)
∂xµ
A given tangent vector V can be written as a linear combination of basis
vectors:
V I = v µ eI(µ) (3.5)
where v µ are the components of the vector. Because this tangent vector
exists in the ambient space, it does not depend on the parametrization of
the submanifold:
V I = v µ eI(µ) = v 0µ e0I(µ)
∂xν (3.6)
= v 0µ 0µ eI(ν) ,
∂x
using (3.3). Comparing the components on both sides, we obtain:
∂xµ
v µ = v 0ν (3.7)
∂x0ν
upon relabeling indices. Any object with one index that transforms as (3.7)
under a reparametrization is called a contravariant vector, or simply vec-
tor* . It is called contravariant because it transforms oppositely to the basis
vectors (3.4). An example of a vector is a coordinate displacement dxµ .
The corresponding tangent vector is simply a displacement in the ambient
space:
∂f I
dy I = dxµ µ . (3.8)
∂x
Conversely, any object that transforms as
∂x0ν
vµ = vν0 (3.9)
∂xµ
is called a covariant vector, or covector, since it transforms in the same way
as the basis vectors. An easy way to remember the transformation prop-
erties (3.7) and (3.9) is that indices are always summed top with bottom,
and primed with primed. (A ∂xµ in the denominator acts as a bottom in-
dex.) An example of a covector is the gradient ∂µ φ(x) of any function φ(x)
defined on the submanifold† . It transforms as:
∂x0ν 0
∂µ φ(x) = ∂ φ(x0 ), (3.10)
∂xµ ν
using the chain rule.
* The tangent vector V in the ambient space is also called a “vector”. We will always
capitalize such vectors to avoid confusion.
†
Such as the embedding functions themselves f I (x). However, we do not call the basis
vectors themselves covectors, hence the parentheses around the index eI(µ) . Also, some
texts define the basis vectors more abstractly as the partial derivative operators ∂/∂xµ .
There is no particular advantage to doing so here.
The transformation laws for vectors and covectors (3.7) and (3.9) gen-
eralize those of four-vectors (1.33) and (1.37) under boosts. However, they
mean slightly different things. As mentioned above, the coordinates xµ in
Chapter 1 correspond to physically measured times and distances in an IRF,
so Eqs. (1.33) and (1.37) relate physical coordinates. On the other hand,
the xµ here in general have no physical significance, so Eqs. (3.7) and (3.9)
are simply mathematical statements of how vector components transform
under a change of coordinates.
Finally, let us emphasize that each point x has its own tangent space. If
we add vectors at two different points x and y, the result will not transform
as a vector:
∂xµ 0 0ν 0 ∂x
µ
v µ (x) + wµ (y) = v 0ν (x0 ) 0ν
(x ) + w (y ) 0ν
(y 0 ). (3.11)
∂x ∂x
∂xµ ∂xµ
We use (3.7) with the arguments restored. Since ∂x0ν
(x0 ) 6= ∂x0ν
(y 0 ), we
cannot factor it out.
3.2 The metric

We may write the interval ds2 (1.34) in terms of displacements dxµ on the
spacetime manifold:
ds2 = ηIJ dy I dy J
∂f I ∂f J µ ν
= ηIJ dx dx (3.12)
∂xµ ∂xν
= gµν dxµ dxν ,
using (3.8). We have defined the metric tensor, or simply metric:
∂f I ∂f J
gµν (x) ≡ ηIJ µ (x) ν (x). (3.13)
∂x ∂x
It is a symmetric d × d matrix. Since it is made of two basis vectors, both
indices transform covariantly under reparametrization:
∂x0ρ ∂x0σ 0
gµν = g . (3.14)
∂xµ ∂xν ρσ
In matrix notation:
ḡ = J T ḡ 0 J, (3.15)
where Jσν = ∂x0σ /∂xν is the Jacobian. We use ḡ for the matrix since g is
typically used for the determinant of ḡ:
g ≡ det ḡ. (3.16)
We assume that the manifold is embedded such that one eigenvalue of

ḡ is negative and three are positive, just like the Minkowski metric ηµν .
The respective eigenvectors correspond to timelike (ds2 < 0) and spacelike
(ds2 > 0) directions. For an eigenvector dx with eigenvalue λ, Eq. (3.12)
in matrix form reads: ds2 = dxT ḡdx = λdxT dx, which has the same sign as
λ.
The number of positive and negative eigenvalues is called the signature
of the metric. For a manifold in Minkowski space, it is denoted (−, +, +, +).
It is invariant under a change of coordinates. To see this, fix ḡ 0 and start
from J = I. As J is continuously deformed, the eigenvalues of ḡ change
continuously. The metric must stay full rank (g 6= 0), so the eigenvalues
cannot cross zero* .
We can always choose coordinates to make the metric Minkowski at any
given point x (so-called inertial coordinates at x). From (A.7), we can write:
ḡ(x) = X(x)Λ(x)X −1 (x), (3.17)
where X(x) is the matrix whose columns are the eigenvectors of ḡ(x) (start-
ing with the timelike one), and Λ(x) is the diagonal matrix of eigenvalues.
We will omit the argument x from now on. Since ḡ is symmetric, we have:
X T = X −1 , (3.18)
so X defines a coordinate change with Jacobian J = X T and ḡ 0 = Λ. Then

we can simply apply the scaling† K = diag |λ0 |1/2 , · · · , |λd−1 |1/2 :
Λ = K T ηK, (3.19)
so the overall transformation is:
ḡ = J T ηJ, (3.20)
* Parity-reversing transformations like x1 = −x01 cannot be obtained by a continuous
deformation of the coordinates, but they also do not change the signature.
†
Note that the first transformation is a change of basis, while the second is not, since
−1
K 6= K T .
where the Jacobian J = KX T .

In fact, it is possible to choose coordinates so that the metric is Minkowski
within the entire neighborhood of a worldline, as explained in Sec. 4.3.
Similar to the Minkowski metric in flat space, we can lower indices using
the metric. Given a vector v µ , the quantity
vµ ≡ gµν v ν (3.21)
transforms as a covector. Define the inverse metric g µν as the matrix inverse

of the metric, so that
g µν gνρ = δρµ . (3.22)
This must stay the matrix inverse under reparametrization, so both indices
transform contravariantly:
∂xµ ∂xν 0ρσ
g µν = g (3.23)
∂x0ρ ∂x0σ
The inverse metric can be used to raise indices. Given a covector vµ , the
quantity
v µ ≡ g µν vν (3.24)
transforms as a vector.
* Exercise 3.1
Derive (3.23) using g µν gνρ = g 0µν gνρ
0
= δρµ . Use the identity
∂xµ ∂x0ρ
= δνµ (3.25)
∂x0ρ ∂xν
coming from the chain rule.
You can think of the metric gµν as the dynamical field of spacetime, like
the vector field Aµ is the field of electromagnetism* . Electric charges and
currents produce electromagnetic fields, while mass and energy produce a
curved metric.
* Actually,
when gravity is developed as a gauge theory, Aµ is analogous to the Christof-
fel symbols Γµνσ , and Fµν is analogous to the curvature tensor Rµνσλ . However, for most
purposes, gµν is more similar to Aµ since we vary gµν in the field Lagrangian.
3.3 A Euclidean analogy

To understand vectors and covectors more intuitively, let’s pretend we are
embedding spacetime in Euclidean space instead of Minkowski space. As
mentioned in Sec. 1.14, Minkowski space is just like Euclidean space with
an imaginary coordinate. The Minkowski metric ηIJ becomes simply δIJ
(1.46), and the interval ds2 = gµν dxµ dxν measures distance squared on the
manifold. The metric gµν is then the dot product of two basis vectors:
gµν = δIJ eI(µ) eJ(ν) = e(µ) · e(ν) . (3.26)
It has signature (+, +, +, +). The covariant components of a vector vµ are:
vµ = gµν v ν = e(µ) · e(ν) v ν = e(µ) · V (3.27)
using (3.5). Thus, covariant components are dot products with the basis
vectors, while contravariant components are the weights of the basis vec-
tors (3.5). The two types of components are identical in orthonormal bases,
but they differ in skew bases (Fig. 3.2) or with basis vectors of non-unit
length.
Figure 3.2: Contravariant (left) and covariant (right) components of vector

V , where |e(0) | = |e(1) | = 1.
Note that covariant components naturally exist even for vectors V out-
side of the tangent space, using the same formula (3.27). On the other
hand, contravariant components depend on the choice of basis vectors
{e(d) , · · · , e(D−1) } outside of the tangent space. For example, in Fig. 3.2,
let’s say e(0) is the basis vector of a 1D tangent space with 2D ambient
space. Then v0 does not depend on e(1) , but v 0 does. Going back to ambient
Minkowski space, (3.27) becomes
∂f I
vµ = ηIJ (x)V J (3.28)
∂xµ
for any vector V at the point x. This will be crucial when we discuss parallel
transport and the covariant derivative.
* Exercise 3.2
3D Euclidean space with coordinates (x0 , x1 , x2 ) can be reparametrized
with spherical coordinates (r, θ, φ) as:
x0 = f 0 (r, θ, φ) = r sin θ cos φ

x1 = f 1 (r, θ, φ) = r sin θ sin φ (3.29)
x2 = f 2 (r, θ, φ) = r cos θ.
Show that the metric is:
ds2 = (dx0 )2 + (dx1 )2 + (dx2 )2 = dr2 + r2 (dθ2 + sin2 θ dφ2 ). (3.30)
By fixing r so that dr = 0, we also get the metric for a sphere of radius

r parametrized by (θ, φ):
ds2sphere = r2 (dθ2 + sin2 θ dφ2 ). (3.31)
3.4 General tensors

Now that we have seen 1-index objects (vectors and covectors) and 2-index
objects (the metric and inverse metric), it is natural to construct objects
with any number of indices, where each upper index transforms as a vector,
and each lower index transforms as a covector. These are called tensors.
Specifically, a (p, q)-tensor T µν···σρ··· is an object with p upper indices and q
lower indices that transforms as:
∂xµ ∂xν ∂x0γ ∂x0δ
T µν···σρ··· = · · · · · · T 0αβ···γδ··· (3.32)
∂x0α ∂x0β ∂xσ ∂xρ
under reparametrizations. The rank r of the tensor is p + q. It is called

contravariant if it has only upper indices, covariant if it has only lower
indices, and mixed otherwise. A rank-0 tensor has no indices and is called a
scalar. An example is the interval ds2 . Scalars are like the Lorentz invariants
of Chapter 1, but invariant under any reparametrizations.
Note that the order of the indices matters. Uµ ν has a first index that
transforms covariantly and a second index that transforms contravariantly,
and vice versa for U µν .
Given a tensor with upper and lower indices, we can form a new tensor
by summing over one pair, e.g. Tµ ≡ Uµν ν . This is called index contraction.
Since upper and lower indices transform oppositely, the result of contrac-
tion still transforms as a tensor with two fewer indices* , e.g.
∂x0σ ∂x0ρ ∂xν 0 λ
Uµν ν = U
∂xµ ∂xν ∂x0λ σρ
∂x0σ ρ 0 λ
= δ U (3.33)
∂xµ λ σρ
∂x0σ 0 ρ
= U ,
∂xµ σρ
using (3.25) on the second line.
We can also form new tensors by raising and lowering indices with the
metric, e.g. T µν ≡ g µρ Tρν . Of course, we could do this with any rank-2
tensor g µρ , but when it is the metric, the new tensor is denoted by the same
symbol T .
Since the coordinate system has no physical significance, any laws of
physics must hold in all coordinate systems. This property is called general
covariance. It is a simple but important aspect of relativity. Any tensor equa-
tion is generally covariant as long as the upper and lower indices match on
both sides, since both sides transform in the same way under a change of
coordinates.
Tensors can be confusing since they are not as easily visualized as vec-
tors, although rank-2 tensors are often helpfully written as matrices. The
best way to understand tensors is to work with various examples, which we
will do.
* If at any time it feels like you are drowning in indices, just relax and let it happen. It is
a rite of passage for any relativist. You will get used to it after some practice. Using Latin
instead of Greek indices would probably smooth the process, but unfortunately Greek
indices are mostly standard.
3.5 Parallel transport

When moving a tangent vector V from x to x + δx, it may not stay in the
tangent space of x + δx, due to the curvature of the manifold (Fig. 3.3). We
would like to define a notion of transporting a vector from point to point
that makes it stay as “parallel” as possible while remaining in the tangent
space. This is done by simply projecting the vector V at x + δx onto the
tangent space so it becomes V̄ , with covariant components v̄µ .
Figure 3.3: The tangent vector V at x does not stay in the tangent space
when transported to x + δx.
We have:
∂f I
v̄µ = ηIJ µ
(x + δx)V J
∂x
∂f I ∂f J
= ηIJ µ (x + δx)v ν ν (x)
∂x
I ∂x (3.34)
∂ 2f I J

∂f σ ν ∂f
= ηIJ (x) + (x)δx v (x)
∂xµ ∂xµ ∂xσ ∂xν
= vµ + Γν,µσ (x)v ν δxσ .
∂f I
On the first line, we use (3.28). On the third line, we expand ∂x µ (x + δx)
to first order in δx. On the last line, we define the Christoffel symbols of the
first kind:
∂ 2f I ∂f J
Γν,µσ (x) ≡ ηIJ µ σ (x) ν (x). (3.35)
∂x ∂x ∂x
Note that it is symmetric in the last two indices.

Thus, the change in the covariant vector vµ as it is parallel transported
from x to x + δx is:
δvµ = v̄µ − vµ = Γν,µσ (x)v ν δxσ . (3.36)
We may lower the index of v ν so the formula is purely in terms of covariant
components:
δvµ = Γνµσ (x)vν δxσ (3.37)
where the Christoffel symbols of the second kind are:
∂ 2f I ∂f J
Γνµσ (x) ≡ g νρ Γρ,µσ = g νρ ηIJ (x) (x). (3.38)
∂xµ ∂xσ ∂xρ
The Christoffel symbols are related to the partial derivatives of the metric:
2 I
∂ f ∂f J ∂f I ∂ 2 f J

∂σ gµν = ηIJ + µ σ ν
∂xσ ∂xµ ∂xν ∂x ∂x ∂x (3.39)
= Γν,µσ + Γµ,νσ .
We would like to invert this formula to express Γµ,νσ in terms of the metric.
Take all permutations of the indices (µνσ) to get three equations total (since
gµν is symmetric). Now add two of these equations and subtract the other,
to get* :
1
Γµ,νσ = (∂σ gµν + ∂ν gµσ − ∂µ gνσ ) . (3.40)
2
** Exercise 3.3
1. Derive the parallel transport equation for contravariant vectors:
δv µ = −Γµνσ v ν δxσ . (3.41)
Hint: expand v̄ ν to first order in δx:
v̄ ν = v ν + Cµσ
ν µ
v δxσ (3.42)
ν
for some quantity Cµσ . Then use v̄µ = gµν (x + δx)v̄ ν with (3.37)
* This equation comes up often and is worth memorizing. I remember it as adding all
permutations (µνσ) of ∂σ gµν , with the ν ↔ σ symmetric term ∂µ gνσ having a minus sign.
ν
and (3.39), and expand to first order in δx, to show that Cµσ =
ν
−Γµσ .
2. Show that this implies the product aµ (x)bµ (x) is constant as vec-
tors a and b are parallel transported.
Finally, we can also parallel transport tensors, since they just act like
products of vectors. For example, the tensor Tµν ≡ vµ wν becomes:
T̄µν = v̄µ w̄ν

= vµ wν + Γρµσ vρ wν δxσ + Γρνσ vµ wρ δxσ (3.43)
= Tµν + Γρµσ Tρν δxσ + Γρνσ Tµρ δxσ ,
Any rank-2 tensor Aµν can be written as a sum of such vector products, so
(3.43) holds for any rank-2 tensor since it is linear in T . The generalization
to contravariant indices and tensors of any rank is obvious: add a term like
(3.37) for each lower index and (3.41) for each upper index.
3.6 Covariant derivative

Note that δvµ = v̄µ (x + δx) − vµ (x) does not transform as a covector, since
it involves subtracting covectors at two different points. Indeed, you can
verify that the Christoffel symbols Γµ,νσ do not transform as a tensor, using
(3.40) and (3.14).
Now instead of a covector at a single point x, consider a covector field
vµ (x). Expand it at a point x + δx as:
vµ (x + δx) = vµ (x) + ∂σ vµ (x)δxσ . (3.44)
The directional derivative ∂σ vµ (x) also does not transform as a tensor, since
∂σ vµ (x)δxσ = vµ (x + δx) − vµ (x) involves subtracting covectors at two dif-
ferent points. This inspires us to write:
vµ (x + δx) = v̄µ (x + δx) − Γνµσ (x)vν (x)δxσ + ∂σ vµ (x)δxσ

(3.45)
= v̄µ (x + δx) + (∇σ vµ )(x)δxσ
using (3.37). On the last line, we define the covariant derivative
∇σ vµ ≡ ∂σ vµ − Γνσµ vν . (3.46)
Since ∇σ vµ δxσ involves subtracting two covectors at the same point x + δx,
∇σ vµ is manifestly a rank-2 tensor, which you should verify explicitly.
Going through the same procedure for a vector field v µ (x), we have
∇σ v µ ≡ ∂σ v µ + Γµσν v ν . (3.47)
as the covariant derivative of a vector field* .
Another way to think of the covariant derivative is as the projection of
the directional derivative ∂ν V I (x) onto the tangent space. We have:
I

I µ ∂f
∂ν V = ∂ν v
∂xµ
(3.48)
I 2 I
∂f ∂ f
= ∂ν v µ µ + v µ µ ν .
∂x ∂x ∂x
The covariant components of this (non-tangent) vector are (3.28):
∂f I J 2 J

µ ∂f µ ∂ f
(∇ν v)σ ≡ ηIJ σ ∂ν v +v
∂x ∂xµ ∂xµ ∂xν (3.49)
µ µ
= gσµ ∂ν v + Γσ,µν v .
Raising the index σ gives:
(∇ν v)σ = ∂ν v σ + Γσµν v µ , (3.50)
which is the same as (3.47).
The covariant derivative of a scalar field φ(x) is defined as the ordinary
directional derivative:
∇µ φ ≡ ∂µ φ, (3.51)
since this is already a covector.
Finally, just as we can parallel transport a tensor, we can take the co-
variant derivative of a general tensor by contracting Γµνσ with each index as
appropriate. For example,
∇σ T µν = ∂σ T µν + Γµσρ T ρν − Γρσν T µρ . (3.52)
This implies that the covariant derivative follows the same product rule as
the ordinary derivative:
∇(T U ) = (∇T )U + T (∇U ), (3.53)
for tensors T , U , suppressing the indices.
*Iremember these equations as: the Christoffel symbol Γµσν “steals” the index from the
(co)vector and contracts with it. It comes in with a positive sign for “usual” vectors and a
negative sign for “unusual” covectors.
* Exercise 3.4
Show that the metric is covariantly constant:
∇σ gµν = 0. (3.54)
This is because gµν can be simply viewed as the constant Minkowski

metric η IJ projected onto the tangent space.
3.7 Curvature: Riemann and Ricci

How do we measure the curvature of a manifold? We know that parallel
transport is affected by curvature, since a transported vector can lie outside
the tangent space when the manifold is curved. However, as we saw above,
the Christoffel symbols are not tensors. We would like some tensor quantity
that measures curvature and only depends on the metric and its derivatives.
One idea is to parallel transport a vector along two different paths to the
same destination. The final vector depends on the path taken, when the
manifold is curved (Fig. 3.4). We can take the difference of the two final
vectors as a measure of curvature.
Figure 3.4: Parallel transport of a vector (black) along two different paths
can give two different vectors (blue, green) when the manifold is curved,
like a sphere.
Mathematically, it is easiest to work with infinitesimal paths. Let v µ be

the vector at point x, and transport it along the two paths x → x + δx1 →
x + δx1 + δx2 and x → x + δx2 → x + δx2 + δx1 . Call the first vector v12 and
the second v21 (Fig. 3.5).
Figure 3.5: Vector v µ at x transported along two infinitesimal paths to x +

δx1 + δx2 .
It turns out that this difference is given by the commutator of two co-
variant derivatives:
µ µ
v21 − v12 = [∇ν , ∇σ ]v µ (x)δxν1 δxσ2 ≡ (∇ν ∇σ − ∇σ ∇ν )v µ (x)δxν1 δxσ2 . (3.55)
We have:
∇ν ∇σ v µ = ∇ν (∂σ v µ + Γµσλ v λ )
= ∂ν ∂σ v µ + ∂ν Γµσλ v λ + Γµσλ ∂ν v λ (3.56)
− Γλνσ ∂λ v µ − Γρνσ Γµρλ v λ + Γµνλ ∂σ v λ + Γµνρ Γρσλ v λ ,
using the definition of covariant derivative for vectors (3.47) and 2-tensors
(3.52). Now exchange the indices ν ↔ σ and subtract, to get:
[∇ν , ∇σ ] v µ = Rµλνσ v λ δxν1 δxσ2 , (3.57)
where we define the Riemann curvature tensor
Rµλνσ ≡ ∂ν Γµλσ − ∂σ Γµλν + Γµρν Γρλσ − Γµρσ Γρλν . (3.58)
Note that it is clearly antisymmetric in its last two indices. It also satisfies
the first Bianchi identity:
Rµρλν + Rµλνρ + Rµνρλ = 0, (3.59)
which is easily verified from (3.58).

Let us show that this commutator indeed gives the difference between
the two transported vectors. Recall that the covariant derivative at x in-
volves the difference of a vector field at x + δx and a vector transported
from x to x + δx (3.45). Write this as:
µ µ
(∇ν wµ )x δxν1 = wx+δx1
− wx;δx 1
(3.60)
for a contravariant vector wµ . The first subscript on a vector denotes where
µ
it starts, and the following subscripts denote parallel transports, e.g. wx;δx1
means wµ (x) transported to x+δx1 . Now let wµ itself be a covariant deriva-
tive contracted with a displacement δx2 :
µ µ
wxµ ≡ (∇σ v µ )x δxσ2 = vx+δx2
− vx;δx2
. (3.61)
We have: µ µ µ
wx+δx1
= vx+δx1 +δx2
− vx+δx1 ;δx2
µ µ µ (3.62)
wx;δx1
= vx+δx2 ;δx1
− vx;δx2 ;δx1
.
Plug this into (3.60) to get:
µ µ µ µ
(∇ν ∇σ v µ )x δxν1 δxσ2 = vx+δx1 +δx2
− vx+δx1 ;δx2
− vx+δx2 ;δx1
+ vx;δx 2 ;δx1
.
(3.63)
Exchanging the indices ν ↔ σ is the same as exchanging δx1 ↔ δx2 . Ex-
changing and subtracting, we get:
µ µ
[∇ν , ∇σ ] v µ (x)δxν1 δxσ2 = vx;δx 2 ;δx1
− vx;δx1 ;δx2
µ µ (3.64)
= v21 − v12 .
** Exercise 3.5
Show that a covariant vector vµ satisfies:
v21µ − v12µ = [∇ν , ∇σ ]vµ δxν1 δxσ2 = −Rλµνσ vλ δxν1 δxσ2 (3.65)
upon parallel transport along two different paths.

Since the metric is covariantly constant (3.54), we also have:
[∇ν , ∇σ ] vµ = gµλ [∇ν , ∇σ ]v λ

= gµλ Rλρνσ v ρ (3.66)
ρ
= Rµρνσ v ,
using (3.57). Since this equals −Rρµνσ v ρ (3.65), Rµρνσ is antisymmetric

in its first two indices:
Rµρνσ = −Rρµνσ . (3.67)
As a rank-4 tensor, Rρνσα has d4 = 256 elements for d = 4. However,

most of these elements are not independent due to the symmetry conditions
found above:
Rρνσα = −Rρνασ = −Rνρσα
(3.68)
Rρνσα + Rρσαν + Rρανσ = 0.
After applying these conditions, there are only d2 (d2 − 1)/12 independent
components (Appendix D). For d = 4, this is 20.
We can contract indices of the Riemann tensor to form a simpler rank-2
tensor. Due to the antisymmetries of Rρνσα , the only unique contraction is:
Rµν ≡ Rρµρν = ∂ρ Γρµν − ∂ν Γρµρ + Γραρ Γαµν − Γραν Γαµρ , (3.69)
known as the Ricci tensor. From the above symmetry conditions, it is sym-
metric: Rµν = Rνµ . Finally, we can also contract the indices of the Ricci
tensor to form the Ricci scalar, or the scalar curvature:
R ≡ Rµµ = g µν Rµν . (3.70)
As we will see later, the Ricci scalar is essentially the Lagrangian of general
relativity.
* Exercise 3.6
Show that [∇µ , ∇ν ] φ = 0, for a scalar field φ(x).
** Exercise 3.7
1. Recall the product rule for the covariant derivative (3.53). Show
that the commutator [∇µ , ∇ρ ] also satisfies this:
[∇µ , ∇ρ ](vν wσ ) = ([∇µ , ∇ρ ]vν )wσ + vν ([∇µ , ∇ρ ]wσ ). (3.71)
Since a 2-tensor Tνσ acts like a product of vectors vν wσ , this im-

plies that
[∇µ , ∇ρ ]Tνσ = −Rλνµρ Tλσ − Rλσµρ Tνλ , (3.72)
using (3.65).
2. By direct computation, prove the Jacobi identity
[A, [B, C]] + [B, [C, A]] + [C, [A, B]] = 0 (3.73)
for any operators A, B, C. As above, [A, B] ≡ AB − BA.
3. Calculate [∇µ , [∇ν , ∇σ ]]vρ using (3.65) and (3.72), then use the
Jacobi identity to derive the second Bianchi identity:
∇µ Rλρνσ + ∇ν Rλρσµ + ∇σ Rλρµν = 0. (3.74)
* Exercise 3.8
1. Using the metric for a 2D sphere you derived in Ex. 3.2, find
the Christoffel symbols Γµνρ . In two dimensions, there are only six
independent components: Γ111 , Γ112 , Γ122 , Γ211 , Γ212 , Γ222 .
2. Find the Riemann curvature tensor Rρνσα . In two dimensions,

there is only one independent component: R1212 .
3. Find the Ricci tensor Rµν and scalar R.
* Exercise 3.9
Consider 3D Minkowski space with coordinates (t, x, y). Let H 2 be the
2D manifold defined by all points a constant proper time ds2 = −R2
away from the origin:
t2 − x2 − y 2 = R2 . (3.75)
This is known as 2D hyperbolic space (Fig. 3.6).
Figure 3.6: Left: hyperbolic space H 2 embedded in 3D Minkowski

space. Right: cross section at y = 0. Red lines are t = ±x.
From the figure, we see that any tangent vector is spacelike, so the
signature is (+, +).
1. We may parametrize H 2 using coordinates (x, y). Find the metric

gµν . Remember that the interval is
ds2 = dx2 + dy 2 − dt2 (3.76)
since we are in ambient Minkowski space.
2. Find the Riemann tensor Rρνσα , Ricci tensor Rµν , and Ricci scalar
R.
Chapter 4
General relativity
We have covered the mathematical description of curved spacetime. Now

we get to the physics. This essentially follows the development of particle
and field dynamics in Chapter 2, but with general covariance instead of
Lorentz invariance as a guiding principle. As before, let us start with the
equation of motion for a massive particle in curved spacetime.
4.1 The geodesic equation

The particle Lagrangian in general relativity is the same as special relativity
(2.28): Z Z q
Spp = −m dτ = −m −gµν (x)dxµ dxν (4.1)
but using (3.12) for dτ 2 = −ds2 . Now that x0 is not necessarily a time coor-
dinate, it cannot be used to parametrize the path. Instead, we parametrize
the path by some quantity* λ. Given a path in spacetime, we can always
associate a value of λ to each point (Fig. 4.1).
* We cannot use the proper time τ to parametrize the path in the action, since we
cannot independently vary xµ (τ0 ) at a given τ0 without violating dτ 2 = −gµν dxµ dxν .
Indeed, this would give a trivial Lagrangian Lpp = −m.
61
CHAPTER 4. GENERAL RELATIVITY 62
Figure 4.1: Path of an object parametrized by λ.
The action is:

Z q
Spp = −m −gµν (x(λ))U µ (λ)U ν (λ)dλ (4.2)
where U µ ≡ dxµ /dλ, and we show the arguments explicitly. It is invariant

under reparametrizations λ0 (λ), as it must be. The Euler-Lagrange equa-
tions (2.13) now read:
δL d δL
µ
= . (4.3)
δx (λ) dλ δU µ (λ)
The Lagrangian is: p

Lpp = −m −gµν U µ U ν . (4.4)
First, Lpp only depends on x through gµν (x), so we have:
δL m
µ
= √ ∂µ gνσ U ν U σ , (4.5)
δx 2 −U 2
where U 2 = gµν U µ U ν . The right-hand side of (4.3) involves a long calcula-

tion, and we do not show all the details. We have:
δL m
µ
= −√ gµν U ν . (4.6)
δU −U 2
This depends on λ through gµν (x(λ)) and U µ (λ). We get:
d δL
= −m(−U 2 )−3/2 ×
dλ δU µ
(4.7)
dU ρ

1 µ σ ρ λ 2 σ ν
U ∂σ gρλ U U U + 2Uρ − U (∂σ gµν U U ) .
2 dλ
Equating this to (4.5) and collecting U 2 terms, we get:

dU ν

2 1 ν σ ν σ
0=U ∂µ gνσ U U − ∂σ gµν U U − gµν
2 dλ
dU ρ

1 σ ρ λ
+ Uµ ∂σ gρλ U U U + Uρ
2 dλ
ν (4.8)
dU ρ

2 ν σ dU ρ σ λ
= U −Γµ,νσ U U − gµν + Uµ Γρ,σλ U U U + Uρ
dλ dλ
ν
dU
= + Γνρσ U ρ U σ (Uµ Uν − U 2 gµν ).
dλ
On the second line, we use (3.39) and (3.40), along with the identity:
1
∂σ gµν U ν U σ = (∂σ gµν + ∂ν gµσ )U ν U σ , (4.9)
2
since U ν U σ is a symmetric tensor. This “symmetrization trick” is worth
remembering. On the last line, we relabel indices and factorize.
Eq. (4.8) is a complicated equation of motion for general λ. We may
choose the parametrization λ = τ to simplify it. Then we have U µ = uµ =
dxµ /dτ and u2 = gµν uµ uν = −1. Taking the derivative:
d duµ ν
(gµν uµ uν ) = 0 = ∂σ gµν uσ uµ uν + 2gµν u
dτ dτ
duµ ν
= 2Γσ,µν uσ uµ uν + 2gµν u (4.10)
µ dτ

du
= 2uµ + Γµνσ uν uσ ,
dτ
using (3.39) on the second line. We can then eliminate the Uµ Uν term in
(4.8), so that the term in the left parentheses must be zero:
d 2 xν ρ
ν dx dx
σ
+ Γ ρσ = 0. (4.11)
dτ 2 dτ dτ
This is called the geodesic equation. The resulting worldline is called a
timelike geodesic.
There is an easier way to remember the geodesic equation: it is the
result of parallel transporting the velocity vector uµ = dxµ /dτ along the
velocity vector itself. Indeed, take the parallel transport equation (3.41)
with v µ = uµ , and divide by δτ , to directly obtain (4.11). Intuitively, a

particle wants to take the straightest path where its velocity changes the
least, which is accomplished by parallel transport.
For massless particles, τ cannot be used as a parameter since dτ = 0.
Instead, the geodesic equation still holds for a particular parametrization
λ̄:
d2 xν ρ
ν dx dx
σ
+ Γ ρσ = 0. (4.12)
dλ̄2 dλ̄ dλ̄
The resulting worldline is called a null geodesic. It is easy to show that this
equation is only invariant under reparametrizations λ̄0 (λ̄) where d2 λ̄0 /dλ̄2 =
0, or λ̄0 = aλ̄ + b with a, b constant. This is an affine transformation, so λ̄ is
also called an affine parameter. Similarly, (4.11) is only invariant under a
reparametrization τ 0 = aτ + b.
As in classical mechanics, we solve the equation of motion using initial
conditions xµ (τ0 ) = xµ0 and U µ (τ0 ) = U0µ . Due to the affine freedom, we may
scale U0µ by any constant without affecting the resulting path of the particle,
although only one choice corresponds to the four-velocity U02 = −1. U 2
remains constant on the path since it is preserved by parallel transport (Ex.
3.3). For massless particles, replace τ with λ̄ and use U µ = dxµ /dλ̄. U 2 = 0
always, so there is no preferred scaling of U0µ .
4.2 The equivalence principle

While the four-acceleration d2 xµ /dτ 2 is a four-vector in flat space, it is not
a vector in curved space, since δuµ = uµ (x + δx) − uµ (x) subtracts vectors
at two different points. To find the acceleration vector, note that we may
write:
d2 xµ duµ
= = (∂ν uµ )uν , (4.13)
dτ 2 dτ
using the chain rule* . We can make this transform like a vector by replacing
∂ν with ∇ν . Thus, we define the acceleration vector
aµ ≡ (∇ν uµ )uν . (4.14)

* This might seem fishy, since ∂ν uµ treats uµ (x) as a field (function of x) although it
is only defined on the path of the particle xµ (τ ). However, it is contracted with uν , so
(∂ν uµ )uν is really the directional derivative along the path.
What is the significance of making this a vector? Due to general covariance,

only scalar quantities are measurable, since all other quantities depend on
the choice of coordinates. Thus, the perceived acceleration (squared) is
a2 = aν aν .
Now let us compare the geodesic equation (4.11) with the equation of
motion in flat space (2.31), which is simply* :
d2 xν
= 0. (4.15)
dτ 2
The geodesic equation can be written in a similar form:
duµ
0= + Γµνσ uν uσ
dτ
= (∂ν uµ )uν + Γµνσ uν uσ (4.16)
= (∇ν uµ )uν
= aµ .
Objects on geodesics experience no acceleration. This is the equivalence

principle: moving freely in a gravitational field is locally indistinguishable
from moving freely in flat space. Since the geodesic equation is indepen-
dent of the mass or composition of the object, the equivalence principle
applies to all objects equally.
This implies that objects on non-geodesic paths experience acceleration.
Consider an object constrained to move on a path where
d 2 xµ
=0 (4.17)
dτ 2
with respect to a particular choice of coordinates xµ . Then the perceived
acceleration a2 6= 0 in general. As we will see in the next section, one
example is when an object is stationary with respect to a mass, such as
when one stands on the surface of the Earth (neglecting the rotation of the
Earth). This is indistinguishable from standing in an accelerating rocket in
flat space (Fig. 4.2).
* Eq.(2.31) only contains the spatial components, but it is easy to show du0 /dτ = 0
using u = ~u2 − (u0 )2 = −1.
2
Figure 4.2: A non-geodesic path (left) is locally indistinguishable from ac-

celeration in flat space (right).
4.3 Fermi normal coordinates

The equivalence principle implies that given a timelike geodesic γ, we may
choose coordinates such that the metric is Minkowski within a small neigh-
borhood of the geodesic: gµν = ηµν . Physically, this is easy to understand:
as you move along the geodesic, simply construct an IRF using clocks and
rulers around you (Fig. 4.3). This IRF moves along the geodesic and de-
fines a coordinate system, called Fermi normal coordinates.
Figure 4.3: Fermi normal coordinates defined by an IRF around the

geodesic γ.
Since the metric is constant in a neighborhood of γ, all the Christoffel

symbols Γµνσ (γ) = 0. However, the derivatives ∂ρ Γµνσ (γ) are nonzero in
general, so the curvature tensors are nonzero, as required for a curved
spacetime.
We will not cover the mathematical details of constructing Fermi normal
coordinates here.
4.4 Local measurements

As we have emphasized, only inertial coordinates have physical signifi-
cance. Likewise, tensor components only have physical significance in an
inertial coordinate system. The transformation law of the metric (3.14)
from general coordinates to inertial coordinates can be written as:
ηab = eρa (x)eσb (x)gρσ (x), (4.18)
where eρa ≡ ∂xρ /∂xa . Indices a, b, · · · denote inertial coordinates while

Greek indices denote general coordinates. We will omit the argument x
from now on. Likewise, we have:
η ab = eaρ ebσ g ρσ , (4.19)

where eaρ ≡ ∂xa /∂xρ . We also have eaµ eνa = δµν and eaµ eµb = δba , from (3.25).
Given a tensor in general coordinates T µν···σρ··· , we can find the compo-
nents in inertial coordinates by multiplying by factors of eµa and eaµ .
Now instead of thinking of the eµa as a coordinate transformation, we can
think of them as a set of four orthonormal vectors e(0) , e(1) , e(2) , e(3) . They
are orthonormal in the sense of (4.18). These form the coordinate axes of
the IRF. e(0) is the time axis, which is also the four-velocity of a stationary
observer in the IRF. The e(i) are the orthonormal spacelike vectors defined
by the rulers, i = 1, 2, 3. We can see this by finding the components of the
vector e(µ) in inertial coordinates:
∂xa µ
ea(ν) = e = eaµ eµ(ν) = δ(ν)
a
. (4.20)
∂xµ (ν)
Thus, in inertial coordinates, e(0) = (1, 0, 0, 0)T , and so on.
For example, we will encounter the energy-momentum tensor T µν in Sec.
4.11. The component T 00 is the energy density in flat space, and the mo-
mentum density in the i direction is T 0i . Then, the energy density mea-
sured by an observer with four-velocity uµ is: Tinertial
00
= e0µ e0ν T µν = uµ uν T µν .
If the observer erects an IRF defining the i direction with a vector eµ(i) , the
0i
momentum density is Tinertial = uµ eiν T µν .
4.5 Static spacetimes

A stationary mass distribution such as a non-rotating planet generates a
static spacetime. In this case, we may choose time and space coordi-
nates x = (t, x1 , x2 , x3 )T such that a worldline with constant position ~x ≡
(x1 , x2 , x3 )T = ~x0 corresponds to a stationary object with respect to the
mass distribution. Then the metric does not depend on t, and g0i = g 0i = 0
due to time reversal invariance. As before, Latin indices i, j, · · · = 1, 2, 3.
gµν takes the form:
g00 (~x) 0
gµν = (4.21)
0 gij (~x)
where the spatial part of the metric gij is a 3 × 3 matrix. The inverse metric
is:
µν 1/g00 (~x) 0
g = (4.22)
0 g ij (~x)
where g ij is the matrix inverse of gij .

For slow objects where |~u| 1, |u0 | |ui |. We have u2 = −1 ≈ u0 u0 =
g00 (u0 )2 . The spatial acceleration vector (4.14) becomes approximately:
dui
ai ≈ + Γi00 (u0 )2
dτ
(4.23)
dui 1 ij ∂j g00
≈ + g ,
dτ 2 g00
using (3.40) on the second line. The a0 component becomes:
du0
a0 ≈ (4.24)
dτ
since Γ000 = 0.
Let us apply this to a stationary, spherically symmetric body of mass M
and radius R. As we will show in Sec. 4.9, the metric outside of this body
is the Schwarzschild solution:

2 2GM 1
ds = − 1 − dt2 + dr2 + r2 (dθ2 + sin2 θdφ2 ), (4.25)
r 1 − 2GM
r
where G ≈ 6.674 × 10−11 m3 kg−1 s−2 is Newton’s constant. The spatial

coordinates are (r, θ, φ). r is the radial direction, and the piece r2 dΩ2 ≡
r2 dθ2 + r2 sin2 θ dφ2 is the metric for a sphere that you derived in Ex. 3.2.
This metric is only valid in the r > R region, with R > 2GM .
Plugging the metric (4.25) into (4.23), we obtain:
dur 1 rr ∂r g00
ar ≈ + g
dτ 2 g00
r
(4.26)
du GM −3
= + 2 + O(r ),
dτ r
so the acceleration experienced by a stationary object at radius r is
√ p GM
a2 ≈ grr (ar )2 ≈ 2 . (4.27)
r
In classical physics, this is generated by the normal force counteracting the
“force” of gravity:
GM m
F~n = r̂, (4.28)
r2
where m is the mass of the object. Due to the equivalence principle, the
force of gravity cannot be detected, so only the normal force is experienced.
4.6 Gravitational redshift

The static metric (4.21) has the property that time differences ∆t between
null worldlines are preserved from place to place. Consider a light wave
sent by an oscillating charge q at ~x1 (Fig. 4.4), from (t1 , ~x1 ) to (t2 , ~x2 ).
Figure 4.4: A light wave being sent from ~x1 to ~x2 . Red lines indicate null
worldlines corresponding to constant phase of the electromagnetic field.
The lines are not straight since the metric gµν varies with position ~x.
The peaks of the electromagnetic field propagate along null worldlines:
ds2 = 0 = g00 (~x)dt2 + gij (~x)dxi dxj . (4.29)
Solving for dt and integrating from position ~x1 to ~x2 :

Z ~x2 s
gij (~x)dxi dxj
t2 − t1 = − . (4.30)
~
x1 g00 (~x)
Since g00 and gij are independent of t, t2 − t1 is purely a function of space.

Thus, the next peak occurring a time ∆t = 1/f later at ~x1 will also occur ∆t
later at ~x2 . However, the physical time difference at ~x is the proper time:
p
∆τ = −g00 (~x)∆t. (4.31)
The physical frequency is then:

1 f
fphys = =p (4.32)
∆τ −g00 (~x)
Thus, we have:
p p
f= −g00 (~x1 )fphys,1 = −g00 (~x2 )fphys,2 . (4.33)
Far away from any gravitating bodies, space is nearly flat and g00 ≈ −1.
Close to a massive object, g00 > −1 (4.25). Thus, fphys,far < fphys,near . This is
the phenomenon of gravitational redshift: the frequency of light decreases
as it moves away from a gravitating body, and vice versa* .
4.7 Field Lagrangians

Once again, we move from particles to fields. Let us construct an action
for the metric field gµν . From general covariance, it must be a scalar. We
immediately run into a problem: the volume element
d4 x = dx0 dx1 dx2 dx3 (4.34)
is not a scalar, since it transforms as (A.21):
d 4 x0
d4 x = (4.35)
|det J|
∂x0
under a coordinate change Jij ≡ ∂xji . Since the metric is used to convert co-
ordinate lengths dxµ into physical scalar lengths ds2 , we can try to multiply
d4 x by some function of the metric to produce a scalar. Using the transfor-
mation law (3.15) and the properties (A.13) and (A.16) of determinants,
we see that
g = |det J|2 g 0 , (4.36)
so the quantity √
−g d4 x (4.37)
* Generally, redshift/blueshift refers to a decrease/increase in frequency, since red is a
low frequency of the optical spectrum and blue is a high frequency.
is a scalar (recall g < 0). It can be interpreted as a physical spacetime

“volume”.
We can now form the simplest action
√
Z
Λ
SΛ = − −g d4 x (4.38)
8πG
for some constant Λ known as the cosmological constant.
Instead of using the Euler-Lagrange equations, it is easier to find the
change in the action δS given a variation δgµν , and setting δS = 0. We will
need the variations of certain quantities. First,
δg = gg µν δgµν . (4.39)
This is derived as follows. The determinant g is a sum of terms, each one
a product of d = 4 matrix elements gµν (A.11). Taking the variation of this
product is equivalent to taking the variation of the first item and multiply-
ing by d, from (A.11). We may write
g = gµν C µν (4.40)
where C µν represents a product of d − 1 elements. Then δg = dδgµν C µν .
From inspection of (4.40), C µν = gg µν /d, giving (4.39).
√
Now we can easily find δ −g:
√ 1√
δ −g = −gg µν δgµν . (4.41)
2
and
√
Z
Λ
δSΛ = − g µν δgµν −g d4 x. (4.42)
16πG
This must equal zero for any δgµν (x), so the equation of motion is simply
g µν = 0, which is clearly nonsensical. We started with this simple example
to show the calculational procedure. However, the cosmological constant
term with Λ > 0 is important in cosmology as the source of dark energy, as
explained in Chapter 5.
* Exercise 4.1
Consider the metric for the sphere from Ex. 3.2 again:
ds2 = r2 dθ2 + r2 sin2 θ dφ2 . (4.43)

Show that
2π π
√
Z Z
g dθdφ (4.44)
0 0
√
gives the familiar surface area of the sphere: 4πr2 . We use +g here
since we are in Euclidean space, with (+, +) signature.
** Exercise 4.2
Derive the following useful identities:
√ √
∂µ ( −gAµ ) = −g∇µ Aµ (4.45)
√ √
∂µ ( −gB µν··· ) = −g∇µ B µν··· (4.46)
where Aµ is a vector field, and B µν··· is a totally antisymmetric tensor
field. Hint: from (4.41), we have:
√ 1√
∂σ −g = −gg µν ∂σ gµν
2
√ (4.47)
= −gΓµσµ .
On the second line, we use the identity

1
Γµσµ = g µν ∂σ gµν , (4.48)
2
which comes from contracting (3.39) with g µν .
4.8 Einstein-Hilbert action

The next simplest scalar involving the metric is the Ricci scalar R (3.70).
This gives the standard action of general relativity, known as the Einstein-
Hilbert action:
√
Z
1
SEH = R −g d4 x, (4.49)
16πG
We have:
√
Z
1
δSEH = δ R −g d4 x
16πG
√ √
Z
1
= δR −g + Rδ −g d4 x
16πG
Z
√ (4.50)
1 1 µν
= δR + Rg δgµν −g d4 x
16πG 2
√
Z
1 µν µν 1 µν
= δRµν g + Rµν δg + Rg δgµν −g d4 x.
16πG 2
On the third line, we use (4.41). On the fourth line, we use R = Rµν g µν .
As usual, we must find δRµν and δg µν in terms of δgµν , so that we can
factor out δgµν . Let’s start with δg µν . g µν can be thought of as raising both
indices of gµν :
δg µν = δ(g µσ g νλ gσλ )
= 2δg µν + g µσ g νλ δgσλ (4.51)
= −g µσ g νλ δgσλ .
On the third line, we move the 2δg µν to the other side and negate both
sides.
Finding δRµν involves a long calculation. We will bypass it using dimen-
sional analysis and general covariance.
First, consider δΓµνσ . Although Γµνσ is not a tensor, δΓµνσ must be a tensor.
To see this, write the parallel transport equation (3.37) as:
vµ(g) (x + δx) − vµ (x) = Γν(g) σ
µσ (x)vν (x)δx , (4.52)
(g)
where vµ (x + δx) denotes the vector vµ (x) transported to x + δx using the
metric gµν . Then we have:
vµ(g+δg) (x + δx) − vµ(g) (x + δx) = vµ(g+δg) (x + δx) − vµ (x) − vµ(g) (x + δx) − vµ (x)

= Γν(g+δg) (x) − Γν(g) σ

µσ µσ (x) vν (x)δx
= δΓν(g) σ
µσ (x)vν (x)δx .
(4.53)
Since the left-hand side (LHS) subtracts vectors at the same point, the right-
ν(g)
hand side (RHS) must be a vector, so δΓµσ (x) must be a tensor. The most
general form it can take is:
δΓµνσ = Bνσ
µλρα
∇λ δgρα (4.54)
µλρα
for some tensor Bνσ . This is seen as follows. Since Γµνσ contains one
derivative, the RHS also contains one derivative. We must use the covariant
derivative instead of the ordinary derivative so that the RHS is a tensor.
Since ∇σ gµν = 0 (3.54), we can only take the derivative of δgµν .
µλρα
Bνσ contains no derivatives and only depends on the metric. Indeed,
an explicit calculation gives:
1
δΓµνσ = g µλ (∇ν δgλσ + ∇σ δgλν − ∇λ δgνσ ). (4.55)
2
Note that δΓµνσ only depends on derivatives of δgµν , and not δgµν directly.
The same is true for δRµν , since Rµν only involves products and derivatives
of Γµνσ (3.58). Then, we may write the most general form of δRµν :
αβσλ
δRµν = Cµν ∇α ∇β δgσλ (4.56)
αβσλ
for some tensor Cµν . Again, since Rµν contains two derivatives, so must
the RHS. We must use covariant derivatives so that the RHS is a tensor.
There are no terms like ∇σ δgµν since there are no one-derivative tensors
αβσλ
to contract it with (∇σ gµν = 0). Finally, Cµν contains no derivatives and
only depends on the metric.
αβσλ
Now we can evaluate the first term in the integral (4.50). Since Cµν
only involves the metric, which is covariantly constant, we can factor out
the covariant derivative:
√ √
Z Z
µν
δRµν g −g d x = g µν Cµν
4 αβσλ
∇α ∇β δgσλ −g d4 x
√
Z
= ∇α Aα −g d4 x (4.57)
√
Z
= ∂α Aα −g d4 x.
where we define the vector Aα ≡ g µν Cµν αβσλ

∇β δgσλ . On the third line, we
use (4.45). Since this is a total derivative, it can be converted to a boundary
term ∝ ∇β δgσλ , using the divergence theorem (2.48). We are only inter-
ested in physics away from the boundary, so we assume that this variation
vanishes fast enough that the boundary integral equals zero.
Putting it all together, we have:
√
Z
1 µν 1 µν
δSEH = −R + Rg δgµν −g d4 x. (4.58)
16πG 2
Since this must vanish for any δgµν , we arrive at the Einstein field equations
in vacuum:
1
Rµν − Rg µν = 0. (4.59)
2
We can contract this with gµν , giving R = 0. Plugging this back into
(4.59), we get the simpler form of Einstein’s equations in vacuum:
Rµν = 0. (4.60)
The metrics that satisfy this equation are called vacuum solutions. The sim-
plest one is, of course, flat space. In inertial coordinates, gµν = ηµν , so all
the Christoffel symbols and curvature tensors are zero. In arbitrary coor-
dinates, the Christoffel symbols are not necessarily zero, but the curvature
tensors remain zero since they transform as tensors. Another solution is the
Schwarzschild metric (4.25) describing a black hole* , which we will derive
in Sec. 4.9.
Finally, let’s add the cosmological constant term back in:
√
Z
1
S = SΛ + SEH = (R − 2Λ) −g d4 x. (4.61)
16πG
The variation is
√
Z
1 µν µν 1 µν
δS = −Λg − R + Rg δgµν −g d4 x, (4.62)
16πG 2
so the equation of motion is
1
Rµν − Rg µν + Λg µν = 0. (4.63)
2
4.9 The Schwarzschild solution

Let us look for static, spherically symmetric vacuum solutions. The general
form of such a metric is:
ds2 = F (R)dt2 + H(R)dR2 + I(R)dΩ2 , (4.64)
where dΩ2 ≡ dθ2 + sin2 θ dφ2 , as before. R is a radial coordinate, and

F (R), H(R), I(R) are arbitrary functions. Spherical symmetry means that
* Historically, it was also the first vacuum solution found after flat space.
(θ, φ) only appear in the combination dΩ2 . Now define a new radial coor-
dinate r by:
I(R) = r2 (4.65)
so that
ds2 = f (r)dt2 + h(r)dr2 + r2 dΩ2 . (4.66)
By differentiating (4.65), one can show that
4r2
f (r) = F (R(r)), h(r) = 2 H(R(r)), (4.67)
dI
dR
(R(r))
where R(r) is implicitly defined by (4.65). r is a better radial coordinate,
since a sphere at radius r centered at the origin has surface area 4πr2 . This
is seen by fixing t and r, so that the metric becomes (4.43). Thus, we will
use the form (4.66) as our starting point.
Now we plug and chug* to find Rµν . I will simply show the results but
you should go through the algebra for practice. First, the non-vanishing
Christoffel symbols are (3.40):
1 1 1
Γttr = f −1 f 0 Γrtt = − f 0 h−1 Γrrr = h−1 h0 (4.68)
2 2 2
Γrθθ = −rh−1 Γrφφ = −rh−1 sin2 θ Γθrθ = Γφrφ = r−1 (4.69)
Γθφφ = − cos θ sin θ Γφθφ = cot θ (4.70)
where f 0 = df /dr, h0 = dh/dr. Plugging into (3.69), we obtain:
1 1 1
Rtt = f 0 h0 h−2 − f 00 h−1 + f 02 f −1 h−1 − r−1 f 0 h−1 = 0 (4.71)
4 2 4
1 0 −1 0 −1 1 00 −1 1 02 −2
Rrr = f f h h − f f + f f + r−1 h0 h−1 = 0. (4.72)
4 2 4
We see that f Rrr is identical to hRtt except for the last term. We have:
0
−1 hf 0
f Rrr − hRtt = r + f = 0, (4.73)
h
or
f0 h0
=− . (4.74)
f h
* Tedious calculations like this one can be automated using symbolic algebra software
such as Mathematica. A good Mathematica package for general relativity is GREATER2.
Plugging this back into hRtt to eliminate h, we get:

1 00 f 0
f + = 0. (4.75)
2 r
This can be solved using the substitution y(r) = f 0 (r) followed by separa-
tion of variables. We get:
c1
f (r) = + c2 (4.76)
r
for constants c1 , c2 .
To solve for h, we may write (4.74) as:
df dh
=− , (4.77)
f h
since the dr in the denominator cancels out. Integrating both sides, we get:
c3
h(r) = (4.78)
f (r)
for some constant c3 .
To determine the constants c1 , c2 , c3 , we require that the metric be flat
as r → ∞:
ds2 → −dt2 + dr2 + r2 dΩ2 . (4.79)
This condition is called asymptotic flatness. The spatial part is the metric of
3D Euclidean space (3.30). This gives:
c2 = c3 = −1. (4.80)
Thus, the final metric becomes the Schwarzschild solution (4.25), upon
identifying c1 = 2GM .
4.10 Black holes

As mentioned in Sec. 4.5, the Schwarzschild solution (4.25) gives the met-
ric outside of a spherically symmetric body of radius R and mass M , with
R > 2GM . To see why, note that there is no matter outside r = R, so
the metric must obey the vacuum Einstein’s equations Rµν = 0 there. It
must also be static, spherically symmetric, and asymptotically flat. From
the derivation above, the Schwarzschild solution is the only such metric.
However, we may also regard (4.25) as a vacuum solution valid for all
values of r (except r = 2GM and r = 0, where the metric components go
to zero or infinity). This solution is called the Schwarzschild black hole.
In general, a black hole is any object with an event horizon: a boundary
through which light and matter can only pass one way. The event hori-
zon for the Schwarzschild solution is at r = r0 ≡ 2GM . r0 is called the
Schwarzschild radius. Once an object enters the r < r0 region, it can only
travel towards the origin r = 0. There, it encounters a gravitational singu-
larity, where spacetime itself becomes undefined.
You may suspect that spacetime also becomes undefined at r = r0 since
gtt → 0 and grr → ∞. However, this is merely a coordinate singularity
caused by a poor choice of coordinates. This can be seen by calculating a
scalar quantity such as K ≡ Rµνσλ Rµνσλ (the Kretschmann scalar), since this
doesn’t depend on the coordinate system. For the Schwarzschild solution,
K ∝ 1/r6 , with no unusual behavior near r = r0 . Conversely, r = 0 contains
a true gravitational singularity, since K blows up there.
To understand the event horizon, we can make a coordinate change to
eliminate the coordinate singularity at r = r0 . First, write (4.25) as:
1
ds2 = −f (r)dt2 + dr2 + r2 dΩ2 , (4.81)
f (r)
where f (r) = 1 − r0 /r. The singularity comes from f (r0 ) = 0. Define a new
time coordinate
T = t + l(r) (4.82)
where l(r) will be chosen to eliminate the singularity. We have:
dT = dt + l0 dr, (4.83)
where the prime denotes d/dr, as before. The metric in (T, r, θ, φ) coordi-
nates becomes:
1
ds2 = −f dT 2 + 2f l0 dT dr − (f l02 − )dr2 + r2 dΩ2 . (4.84)
f
By choosing
1
l0 = , (4.85)
f
we both eliminate the dr2 term and make the coefficient of dT dr constant.
Solving this differential equation gives:

r − r0
l = r + r0 ln
(4.86)
r1
where r1 is a constant of integration. The metric becomes:

r0 2
ds2 = − 1 − dT + 2dT dr + r2 dΩ2 . (4.87)
r
These are called Eddington–Finkelstein coordinates. At r = r0 , although the
dT 2 term vanishes, the remaining term 2dT dr is a perfectly valid metric for
the (T, r) plane.
For null worldlines moving in the radial direction (dθ = dφ = 0), we
have:
ds2 = dT (2dr − f dT ) = 0. (4.88)
This gives the two equations:
dT = 0, (4.89)
dT 2
= = 2l0 , (4.90)
dr f
corresponding to ingoing and outgoing light rays. The solutions are:
T = T0 , (4.91)

r − r0
T = 2l + T1 = 2r + 2r0 ln , (4.92)
r1
for constants T0 , r1 . We have absorbed the constant T1 into r1 , since chang-

ing r1 just corresponds to an additive shift. These are plotted in Fig. 4.5 for
various T0 , r1 . The curves intersecting a given (T, r) form the “lightcone” in
the T –r plane that bounds the radial motion of particles (Fig. 4.5). We see
that for r > r0 , particles can escape to r = ∞, while for r < r0 , they can
only go in the −r direction.
Figure 4.5: Null worldlines (dashed lines) and lightcones for varying r at
constant T = 0.
To an observer at constant r = rs > r0 , an object falling into the black

hole appears to take an infinite amount of time to reach the horizon. From
the future lightcones in Fig. 4.5, an outgoing light ray emitted from an
object at T = 0, r = re > r0 follows (4.92). Solving for r1 gives:
r1 = (re − r0 )ere /r0 . (4.93)
Then plugging in r = rs into (4.92) gives T → ∞ as re → r0 , since r1 → 0.

Note that T is a good time coordinate for the observer, since p the proper
time of the observer ∆τ is simply proportional to ∆T : ∆τ = f (rs )∆T .
The gravitational redshift (4.33) also becomes infinite at the horizon,
since gtt → 0 there. To an outside observer, objects appear to slow down
and become redder and dimmer as they approach the horizon.
The Schwarzschild solution is the simplest black hole. It is non-rotating
and uncharged. Rotating black holes are described by the Kerr metric, black
holes with electric charge are described by the Reissner–Nordström metric,
and black holes that are both rotating and charged are described by the
Kerr-Newman metric. Real astrophysical black holes closely follow the Kerr
solution, since they are generally rotating and carry negligible charge. We
will not discuss the details of these other black holes here.
4.11 The energy-momentum tensor

Just as electric charges and currents J µ produce an electromagnetic field,
matter causes spacetime to curve. Let us add a general matter action Sm to
SEH and SΛ :
S = SEH + SΛ + Sm . (4.94)
The variation is:
1 µν √
Z
1 µν µν δSm
δS = −Λg − R + Rg −g + δgµν d4 x. (4.95)
16πG 2 δgµν
The variation of Sm becomes an integral over x; compare with (2.3). The
equation of motion is then:
1
Rµν − Rg µν = −Λg µν + 8πGT µν , (4.96)
2
where we define the energy-momentum tensor or stress-energy tensor
2 δSm
T µν (x) ≡ √ . (4.97)
−g δgµν (x)
It is clearly symmetric: T µν = T νµ .
In cases where we know T µν but not Rµν , it is helpful to write (4.96) in
terms of T = T µµ instead of R. First, contract the indices of (4.96):
− R = −4Λ + 8πGT. (4.98)
Then we have:
µν µν µν 1 µν
R = Λg + 8πG T − T g . (4.99)
2
4.12 Energy-momentum conservation

** Exercise 4.3
Recall the second Bianchi identity (3.74). Contract the λ and ν indices,
multiply by g µρ , and raise the index σ, to obtain:

µν 1 µν
∇µ R − Rg = 0. (4.100)
2
Thus, using (4.96), the stress-energy is covariantly conserved:
∇µ T µν = 0. (4.101)
In flat space with inertial coordinates* , Γµνσ = 0, so we have ∂µ T µν = 0.

Then T µν defines four currents, one for each ν, that are conserved in the
sense of (2.52). T µ0 is the energy current: ∂µ T µ0 = 0 means the change
in energy density equals the incoming energy flux. Likewise, T µi is the
momentum current in direction i. Importantly, the total energy and mo-
mentum are conserved (2.54). In matrix form:
00
µν T = Energy density T 0i = Momentum density
T = . (4.102)
T i0 = Energy flux T ij = Momentum flux
Because T µν is symmetric, the momentum density equals the energy flux.

We will see how this works for a mass density in Sec. 4.14.
However, when Γµνσ 6= 0, there is no ordinary conservation law:
∂µ T µν 6= 0, (4.103)
and thus the derivation of charge conservation (2.54) does not hold, since
we cannot convert the spatial integral into a boundary term. In other
words, there is no globally conserved energy or momentum in general space-
times.
4.13 T µν for particles

Let us find T µν for a single particle of mass m. The point-particle action is
(4.2): Z
Spp = L(xp , Up , λ) dλ (4.104)
q
where L(xp , Up , λ) = −m −gµν (xp )Upµ Upν . Since we are varying with re-
spect to gµν (x) and not x, we write this in terms of a fixed path xp (λ). As
before, Upµ = dxµp /dλ.
* We assume that the energy-momentum is small enough that it does not curve the
metric, analogous to a test particle in an electromagnetic field.
We would like the functional derivative δSm /δgµν (x) to be nonzero only
when x is on the path of the particle xp (λ). This can be done using a delta
function:
Z
δSpp δL
= (xp , Up , λ)δ 4 (x − xp ) dλ
δgµν (x) δgµν
Upµ Upν 4
Z
1
= m p δ (x − xp ) dλ
2 −Up2
(4.105)
(dxµ /dλ)(dxν /dλ) 4
Z
1
= m δ (x − xp ) dλ
2 dτ /dλ
dxµ dxν 4
Z
1
= m δ (x − xp ) dτ
2 dτ dτ
On the third line, we use −dxµ dxµ = dτ 2 . The differentials dλ on top and
bottom cancel out. The stress-energy tensor is:
δ 4 (x − xp )
Z
Tpp (x) = m uµp uνp √
µν
dτ, (4.106)
−g
where uµp = dxµp /dτ .
4.14 T µν for mass densities

Now instead of a point particle, consider a distributed mass (Fig. 4.6).
√
Starting with (4.106), the quantity δ 4 (x)/ −g gets smeared out into an
inverse spacetime volume:
δ 4 (x) H(x) H(x)
√ → 4 √ = . (4.107)
−g d x −g dV dτ
Recall that we may convert a coordinate volume d4 x to a physical volume

√
by multiplying by −g (4.37). Thus, dV is a physical spatial volume that
contains the mass, and dτ is a proper time. H(x) is 1 when x is within this
spacetime volume centered at the origin, and 0 otherwise. Then we have:
X mi H(x − xi )
Tρµν (x) = uµi uνi (4.108)
i
dVi
→ ρ(x)uµ (x)uν (x). (4.109)
The integral becomes a sum over spacetime locations xi that contain each
volume dVi with mass mi . In the infinitesimal limit, this becomes a mass
density ρ(x): a mass per unit volume. uµ (x) is the local four-velocity at x.
Figure 4.6: Particle worldline xp (τ ) and mass density ρ(x).
Energy-momentum conservation gives:
∇µ Tρµν = ∇µ (ρuµ )uν + ρuµ ∇µ uν = 0. (4.110)
The second term is zero since the mass moves along geodesics (4.16). The
vanishing of the first term then implies that the “mass current” ρuµ is co-
variantly conserved.
In flat space with inertial coordinates, we have (1.32):
1
u0 = γ ≈ 1 + ~v 2
2 (4.111)
~u = γ~v ≈ ~v ,
for v c and expanding up to order ~v 2 . Then we have:
Tρ00 = ργ 2 ≈ ρ + ρ~v 2
Tρ0i = T i0 = ργ 2 v i ≈ ρv i (4.112)
Tρij =T ji 2 i j i j
= ργ v v ≈ ρv v .
As mentioned above, Tρ00 is the energy density. It consists of the rest energy
density ρrest = ρc2 , plus the kinetic energy density ρ~v 2 . We see that the
kinetic energy is twice as large as that of a point particle (1.42). This is

because a moving volume holding a mass m gets length contracted in the
direction of motion (Fig. 4.7). Thus, the density ρ acquires another factor
of γ.
Figure 4.7: A box at rest containing mass m (left) gets length contracted
when moving with velocity v (right).
Tρ0i is the momentum density, since if we take ρ = m/V , we have ρ~v =

m~v /V . It is also the energy flux (power per unit area), since the rest energy
density ρc2 [J/m3 ] is multiplied by the velocity ~v [m/s].
Finally, Tρij is evidently the momentum flux, from the conservation law
∂µ T µj = 0. More intuitively, it is the pressure tensor (force per unit area).
It can be written as a 3 × 3 matrix of rank 1:
Q = ρ~v~v T (4.113)
so that the pressure in any direction n̂ is given by:
n̂T Qn̂ = ρ(n̂ · ~v )2 = ρv 2 cos2 θ, (4.114)
where n̂ is a unit vector and θ is the angle between n̂ and ~v . One factor
of |n̂ · ~v | comes from the increased area (lower flux) seen by the surface
perpendicular to n̂, and the other factor comes from the angle between n̂
and ~v (Fig. 4.8).
Figure 4.8: Matter with velocity ~v impinging on a surface dS with normal

vector n̂.
For massless matter, the four-velocity is a null vector tµ , with tµ tµ = 0.

The stress-energy tensor is:
Tkµν = k(x)tµ (x)tν (x), (4.115)
where k(x) is just some scalar field instead of a mass density.

In flat space, we have:
Tk00 = k(t0 )2 ≡ ρE
Tk0i = kt0 ti = ρE t̂i (4.116)
Tkij = kti tj = ρE t̂i t̂j ,
where t̂i ≡ ti /t0 is a unit vector. Note that the momentum density Tk0i
has the same magnitude as the energy density Tk00 = ρE , as expected from
(1.43).
4.15 T µν for ideal fluids

Now consider an ensemble of particles in thermal equilibrium. We may
treat this as a mass distribution (4.109), where ρ(~x) is the time-averaged
mass density and uµ is a statistical quantity. Let us find T µν . From (4.112),
we have:
T 00 = ρhγ 2 i (4.117)
0i 2 i
T = ρhγ v i = 0 (4.118)
ij 2 i j 2 i 2 ij ij
T = ρhγ v v i = ρhγ (v ) iδ ≡ P δ , (4.119)
where we define the pressure P ≡ ρhγ 2 (v i )2 i. Here, hAi is the time-averaged

value of the quantity A. Since the velocities are equally distributed between
the +i and −i direction, the average hγ 2 v i i = 0, and hγ 2 v i v j i = 0 for the

components in different directions (i 6= j). In matrix form:
 
ρ 0 0 0
 0 P 0 0 
Tfµν = 
 0 0 P 0 .
 (4.120)
0 0 0 P
Let us show that P is indeed the usual pressure in the nonrelativistic
limit. Assume the particles are enclosed in a cubic box with linear dimen-
sion L. A single particle with mass m and velocity v i in the xi direction
imparts a momentum change ∆pi = 2mv i on a wall during a round trip,
since it hits the wall with velocity +v i and leaves the wall with velocity
−v i . The round trip time is ∆t = 2L/v i . Thus, the time-averaged pres-
sure is P = ∆pi /(∆tL2 ) = m(v i )2 /V . Here, L2 is the area of the wall, and
V = L3 is the volume. Summing over all N particles gives:
Nm i 2
P = h(v ) i = ρh(v i )2 i, (4.121)
V
where ρ = N m/V is the mass density.
Now let us try to write a tensor expression for Tfµν , in terms of g µν and
U µ (x) ≡ huµ (x)i, the average four-velocity. It must take the form:
Tfµν = AU µ U ν + Bg µν (4.122)
for scalars A and B, since these are the only 2-tensors we can form from U µ
and g µν . This must agree with (4.120) in flat space with zero net velocity,
where U µ = (1, 0, 0, 0)T and g µν = η µν . We get:
Tfµν = (ρ + P )U µ U ν + P g µν . (4.123)
Matter with T µν given by (4.123) is also called an ideal fluid. In general,
ρ, P , and U µ can depend on x.
Any equation relating ρ and P using macroscopic (thermodynamic)
quantities is called an equation of state. Different types of matter will have
different equations of state. For example, if the particles are coupled to a
heat bath at temperature T , they follow the ideal gas law (which we will
not derive here):

N kT Nm kT kT
P = = =ρ (4.124)
V V m m
where V is the volume and k is Boltzmann’s constant.

For an ensemble of massless particles in flat space, we have (4.116):
T 00 = ρE (4.125)
0i i
T = ρE ht̂ i = 0 (4.126)
1
T ij = ρE ht̂i t̂j i = ρE δ ij . (4.127)
3
2
The 1/3 factor comes from the average value of t̂i for a randomly ori-
ented unit vector. To find this, integrate (x3 )2 = cos2 θ over the unit sphere
in spherical coordinates, and divide by the total surface area:
R 2π R π
0 0
cos2 θ sin θ dθ dφ 1
R 2π Rπ = . (4.128)
sin θ dθ dφ 3
0 0
Thus, Tfµν follows the same tensor expression (4.123), with P = ρ/3. Al-
though each individual particle does not have a four-velocity, we can still
define the average four-velocity as U µ = (1, 0, 0, 0)T in flat space, since the
particle velocities in different directions cancel out.
Chapter 5
Cosmology and the expanding

universe
We conclude with a basic introduction to cosmology and the history of the

universe. This will apply many of the concepts that we have learned. We
will omit most of the intermediate calculations, but you should go through
them for practice, since they are not particularly onerous.
On a large scale, the universe is homogeneous and isotropic in space ~x,
but evolves in time T . The general form of the metric under these assump-
tions is:
ds2 = −f (T )dT 2 + Hij (T, ~x)dxi dxj , (5.1)
for some 3 × 3 matrix Hij . f (T ) is some function of time but not of space,
due to spatial homogeneity. We may change to another time coordinate
t(T ) so that the coefficient of dt2 is 1:
f (T )dT 2 = dt2 . (5.2)
There are no dt dxi terms due to isotropy of space. We additionally assume

that Hij is separable into a function of time a(t)2 and a function of space
hij (~x):
ds2 = −dt2 + a(t)2 hij (~x)dxi dxj . (5.3)
This is called the Friedmann–Lemaı̂tre–Robertson–Walker (FLRW) metric.
a(t) is known as the scale factor. At fixed time t, it scales all lengths dx
to get the physical distance ds. We will also scale the coordinates ~x so
that a(t0 ) = 1, where t0 is the current time. This can be done since a(t) is
90
CHAPTER 5. COSMOLOGY AND THE EXPANDING UNIVERSE 91
essentially constant within human timescales. In matrix form:

−1 0 µν −1 0
gµν = g = , (5.4)
0 a2 hij 0 a−2 hij
where hij is the inverse matrix of hij . An increasing a over time corresponds
to an expanding universe. This can be visualized as the size of lightcones
decreasing over time in coordinate space (Fig. 5.1).
Figure 5.1: Lightcones over time for an expanding universe (da/dt > 0).
The nonzero Christoffel symbols are:

ȧ
Γtij = aȧhij , Γitj = δji , Γijk = Γ̃ijk , (5.5)
a
where the dot means d/dt, and Γ̃ijk is the Christoffel symbol calculated
with the spatial metric hij : Γ̃ijk = (1/2)hil (∂j hlk + ∂k hjl − ∂l hjk ). In what
follows, Ã will mean the quantity A calculated with the metric hij and only
summing over spatial components.
5.1 Matter, radiation, and dark energy

The matter content is described by a homogeneous ideal fluid (4.123):
Tmµν = (ρm (t) + Pm (t))U µ U ν + Pm (t)g µν . (5.6)

Due to isotropy, the velocity vector is in the t direction: U µ = (1, 0, 0, 0). It

is correctly normalized so that U µ Uµ = −1.
The full equation of motion is (4.96). We use the subscript m in (5.6)
because we will absorb the cosmological constant term in (4.96) into T µν ,
so that the total energy-momentum is:
Λ µν
T µν = Tmµν − g = (ρ + P )U µ U ν + P g µν , (5.7)
8πG
and the equation of motion in terms of T (4.99) becomes:

1
Rµν = 8πG Tµν − T gµν , (5.8)
2
where we also lower the indices for later convenience.
From (5.7), T µν and Tµν in matrix form are:

µν ρ 0 ρ 0
T = Tµν = . (5.9)
0 a−2 P hij 0 a2 P hij
The total effective ρ and P are:

Λ
ρ = ρm +
8πG (5.10)
Λ
P = Pm − .
8πG
Finally, the trace T = T µµ = 3P − ρ.
The
P total energyPdensity and pressure are composed of several sources
ρ = i ρi and P = i Pi . Each has an equation of state:
Pi = xi ρi . (5.11)
• For massive particles at low temperature (kT m), xi ≈ 0 (4.124).

This is commonly just called matter. In our universe, ordinary matter
makes up only ∼16% of this, while so-called dark matter makes up the
rest. Dark matter is a hypothetical form of matter that only interacts
gravitationally.
• For massless particles, xi = 1/3 (4.127). This is commonly called

radiation. It is composed of photons and neutrinos (which are nearly
massless). Sometimes, only the photons are called radiation.
• For a cosmological constant, xi = −1 (5.10). This is known as dark

energy. Specifically, dark energy is any substance with positive energy
and negative pressure that is highly uniform in space. The cosmolog-
ical constant with Λ > 0 is the simplest model of this substance that
agrees with observation. The fundamental origins of dark energy and
dark matter remain mysterious.
Typically, one source dominates at a time, so we will simply write P = xρ

and consider each x separately.
Let’s start by applying energy-momentum conservation (4.101). This
will relate ρ and P to the scale factor a. We have:
ȧ
∇µ T µ0 = ρ̇ + 3(ρ + P ) = 0. (5.12)
a
µi ˜ ji
∇µ T = ∇j T = 0, (5.13)
where the dot denotes ∂/∂t. Note that (5.13) is automatically satisfied
˜ j hji = 0.
since ∇
Rewrite (5.12) as:

ρ̇ dρ ρ+P
= = −3 , (5.14)
ȧ da a
since the dt cancels out. Thus, ρ and P are functions of a. Using P = xρ

gives:
dρ da
= −3(1 + x) . (5.15)
ρ a
Integrating, we get:
ρ = ρ0 a−3(1+x) , (5.16)
where ρ0 is the energy density at the current time t0 . For matter, ρ ∝ a−3 ,
since a constant amount of energy is divided by the volume V ∝ a3 . For
radiation, ρ ∝ a−4 . One way to understand this is as follows. In quantum
physics, each
P field mode i contributes an energy ~ωi , so the total energy
is: E = i ~ωi Ni , where Ni is the number of particles in mode i. Since
ω ∼ |~k| ∝ 1/a, the energy density E/V acquires another factor of 1/a.
Finally, for the cosmological constant, ρ is constant.
At small a, the universe was dominated by radiation. As a increased,
matter began to dominate, followed by dark energy (Fig. 5.2).
Figure 5.2: Schematic log-log plot of energy density ρ versus a, for various
forms of energy.
5.2 Shape of the universe

Plugging (5.5) into (3.69), the Ricci tensor is:
3ä
Rtt = − (5.17)
a
Rij = R̃ij + (2ȧ2 + aä)hij . (5.18)
Plugging these into (5.8), we get:

3ä
− = 4πG (ρ + 3P ) (5.19)
a
R̃ij + (2ȧ2 + aä)hij = 4πGa2 (ρ − P ) hij . (5.20)
From the second equation, we see that
R̃ij = c2 hij (5.21)
for some constant c2 . In fact, this is already required by spatial homogeneity

and isotropy. Intuitively, for a space where every point looks the same, R̃ij
can only be formed from hij and constants. Similarly, the spatial Riemann
tensor R̃ijkl must only involve hij while satisfying the symmetries (3.68).
This gives:
R̃ijkl = c1 (hik hjl − hil hjk ) (5.22)
for some constant c1 . By contracting indices to form R̃ij , we have: 2c1 = c2 .

Mathematically, a homogeneous and isotropic space is called maximally
symmetric. There are three maximally symmetric spaces of dimension n,
one for each sign of c2 : the n-sphere S n (c2 > 0), Euclidean space E n
(c2 = 0), and hyperbolic space H n (c2 < 0). We have already encountered
the 2-sphere S 2 and hyperbolic space H 2 in Exs. 3.8 and 3.9. You can verify
that (5.22) and (5.21) hold for these spaces with c1 = c2 = ±1/R2 , where
R is the characteristic length. In general, for S n and H n , c1 = ±1/R2 and
c2 = ±(n − 1)/R2 . For our 3D space, we have c2 = 2k/R2 for k = −1, 0, or
1:
2k
R̃ij = 2 hij . (5.23)
R
A more rigorous discussion of maximally symmetric spaces involves isome-
tries and isometry groups. We will not cover this here for brevity.
In cosmology, k = 1 is called a closed universe, since a sphere has a finite
volume. k = 0 is called a flat universe. k = −1 is called an open universe,
since a hyperbolic space has infinite volume. Observations suggest our
universe is nearly flat, but a very small curvature cannot be ruled out.
5.3 Time evolution and fate of the universe

Let us finally solve for a(t). First, solve for ä in (5.19). Then plug into
(5.20) and use (5.23) to get:
r
8πG 2 k
ȧ = a ρ(a) − 2 . (5.24)
3 R
We take the positive square root ȧ > 0 for an expanding universe. ρ(a) is
found using (5.14) and the equations of state (5.11). This equation can
have many different behaviors depending on k and the composition of ρ.
We will only consider cases where one source of energy dominates.
First, consider the flat case k = 0. Set ρ = ρ0 a−p , with p = 3 in the
matter-dominated case and p = 4 in the radiation-dominated case. After
separation of variables and integrating, we get:
p 2/p
a= H0 (t − t0 ) + 1 , (5.25)
2
p
where H0 ≡ 8πGρ0 /3. Recall that a(t0 ) = 1 by definition. As t → ∞,
a ∝ t2/3 for matter and a ∝ t1/2 for radiation. We also have a = 0 at time
ts = t0 − 2/(pH0 ), when space ceased to exist. This is called the big bang.
For a cosmological constant, ρ = ρ0 , and we have
a = eH0 (t−t0 ) . (5.26)
We are currently in the dark-energy-dominated era with such p an exponen-

tial expansion. In this case, H0 is also related to Λ as: H0 = Λ/3, using
(5.10).
Next, consider the open universe with k = −1. If the curvature term
1/R2 in (5.24) dominates, then we simply have a linear growth:
t − t0
a= + 1. (5.27)
R
If there is matter and radiation but no cosmological constant, the curvature
term eventually dominates, since matter and radiation get diluted as a−3 or
a−4 . If there is a cosmological constant Λ > 0, it will eventually dominate.
Finally, the most interesting case is the closed universe with k = 1. From
(5.24), we see that ρ must be greater than the minimum density
3
ρmin = . (5.28)
8πGa2 R2
For matter and radiation, ρ decreases as the universe expands. Once it
reaches ρmin , the expansion stops and the universe starts to contract, even-
tually reaching a = 0. This is called a big crunch. For radiation with
ρ = ρ0 a−4 , we can solve (5.24) exactly to obtain:
s 2
t
arad = (H0 R)2 − H0 R − , (5.29)
R
where we shift time so the big bang is at t = 0. This is plotted in Fig. 5.3.
In order to get a ≥ 1, we must have H0 R ≥ 1, which is the same as the
condition ρ0 ≥ ρmin at a = 1. For matter with ρ = ρ0 a−3 , (5.24) cannot
be solved analytically for a(t), but the overall behavior is similar. For the
cosmological constant with ρ = ρ0 , the expansion accelerates and there is
no big crunch.
Figure 5.3: a(t) in a radiation-dominated closed universe, for different val-

ues of H0 R.
You may wonder why the universe expands in the first place, since mat-
ter should cause it to contract by gravitational attraction. It comes down
to the difference between velocity and acceleration* . If we assume the uni-
verse is expanding in the first place, which we indeed observe, then matter
causes the expansion to decelerate, as seen in the sublinear time depen-
dence of (5.25). More directly, from (5.19), the expansion decelerates if
the pressure P > −ρ/3, and accelerates otherwise. Thus, the important
feature of dark energy that causes accelerating expansion is not the energy
density itself, but rather the negative pressure.
* Zee, Einstein Gravity in a Nutshell, p. 500.

Conclusion
In this primer, I have tried to explain Einstein’s theory of relativity as simply

and deeply as possible. By keeping it to 100 pages, I had to leave out some
fascinating topics: rotating/charged black holes, causal structure/Penrose
diagrams, isometries, de Sitter/anti-de Sitter space, and differential forms,
to name a few. Nevertheless, I hope that it provides a solid foundation for
further learning. Of the textbooks I have read, I highly recommend the
following two:
• Dirac, Paul. General Theory of Relativity.
• Zee, Anthony. Einstein Gravity in a Nutshell.
98
Appendix A
Linear algebra
Here, we review some basic linear algebra.

Following the mathematician’s style, we start with vector spaces instead
of matrices and linear equations. This approach is more elegant but may be
unfamiliar for some readers. To guide intuition, keep in mind the idea of
a vector as an arrow in 2D or 3D space: something with a magnitude and
direction. These are added “tip to tail”, placing the tail of one vector on
the tip of another. As you read, check how each definition applies to such
vectors.
A.1 Vector spaces

A vector space consists of a set of vectors V as well as vector addition and
scaling operations. The vector space is closed under vector addition and
~ ∈ V for all ~v , w
scaling* : ~v + w ~ ∈ V , and a~v ∈ V for all ~v ∈ V and a a
real number . It must also contain a zero vector ~0, where ~v + ~0 = ~v for all
†
~v ∈ V , and 0~v = ~0 for all ~v ∈ V . We only consider real vector spaces here.
For complex vector spaces, a is a complex number.
A linear combination of the vectors {~vi } is any weighted sum:
X
ci~vi (A.1)
i
* The
“∈” symbol means “is an element of”.
†
Vector addition and scaling must also satisfy some boring and obvious properties like
commutativity (~v + w ~ + ~v ), etc. For a full definition, see Wikipedia.
~ =w
99
APPENDIX A. LINEAR ALGEBRA 100
where the {ci } are real numbers that are not all zero.
A set of vectors is linearly dependent if some linear combination of them
equals zero. Otherwise they are linearly independent.
A basis is a set of linearly independent vectors {~ei } in V such that all
vectors ~v in V can be formed from a linear combination of basis vectors:
X
~v = v i~ei . (A.2)
i
v i are the components of the vector ~v . The dimension d of the vector space
is the number of basis vectors. As a shorthand, we may write ~v as a column
vector of its components:  
v1
 v2 
~v =  ..  . (A.3)
 
 . 
vd
Of course, this representation depends on the chosen basis.
The span of a set of vectors is the vector space formed from taking all
linear combinations of the vectors. Thus, we may also define a basis as a
set of linearly independent vectors that span the whole space V .
A subspace W of a vector space V is a subset of vectors in V that also
form a vector space.
The simplest vector space of dimension d is Rd , the set of d-tuples of
real numbers (a1 , a2 , · · · , ad ). This can be visualized as Euclidean d-space:
R1 is a line, R2 is a plane, etc. Every d-dimensional vector space V has a
one-to-one correspondence with Rd : simply choose a basis of V , then the
components of any vector ~v ∈ V are a vector in Rd . One reason we do not
simply define a real vector space as Rd is that this correspondence depends
on the chosen basis of V .
* Exercise A.1
1. Show that the vectors (1, 1, 1), (1, 0, 0), (0, 1, 0), and (0, 0, 1) in R3
are linearly dependent.
2. If V is a d-dimensional vector space, show that any set of vectors

with more than d elements is linearly dependent.
A.2 Linear functions and matrices

A linear function f (~x) is any function that satisfies the distributive and
scaling properties: f (~x + ~y ) = f (~x) + f (~y ) and f (a~x) = af (~x), for vectors
~x, ~y and real number a.
Consider a linear function between vector spaces f : V → W , where V
has dimension dV and W has dimension dW . Let V have a basis {~e1 , ~e2 , · · · ,
~edV }, and W have a basis {~g1 , ~g2 , · · · , ~gdW }. Then f is completely specified
by its action on basis vectors {~ei }, since all vectors ~v in V can be expanded
in this basis: !
X
f (~v ) = f v i~ei
i
X
f v i~ei

= (A.4)
i
X
= v i f (~ei ) ,
i
using the distributive and scaling properties of f . Let w ~ i ≡ f (~ei ). Each w

~i
1 dW
has components {wi , · · · , wi }. Then f can be represented by a matrix A
with the {w
~ i } as the columns:
 
v1
2 
 v 
f (~v ) = w~1 w ~2 · · · w ~ dV  .. 
 . 
v dV
  
w11 w21 wd1V v1 (A.5)
 w2 2 2 2
1 w2 wd V   v 
  
= .. .. ··· ..   .. 

 . . .  . 
w1dW w2dW wddVW v dV
= A~v .
A has dW rows and dV columns and has components Aji ≡ wij , where the
upper (lower) index is the row (column) index. Then f (~v ) can also be
written: X j
(f (~v ))j = Ai v i . (A.6)
i
A.3 Eigenvectors and eigenvalues

Let A be a d×d matrix. An eigenvector ~v 6= ~0 of A satisfies A~v = λ~v for some
real number λ, called the eigenvalue. Clearly, ~v remains an eigenvector with
the same eigenvalue when it is scaled by any nonzero real number. The
rank of A is r = d − nz , where nz is the number of linearly independent
zero eigenvectors (eigenvectors with λ = 0).
Let h(~v ) = A~v be the linear function represented by A. h is invertible
when h(~v ) 6= h(w) ~ for all ~v 6= w.~ If h(~v ) = h(w)~ for some ~v 6= w,
~ then
h(~v −w)~ = 0, so ~v −w
~ is a zero eigenvector. Conversely, any zero eigenvector
can be written as a difference ~v − w ~ for ~v 6= w.
~ Thus, the rank equals d if
and only if h is invertible. The inverse function h−1 is represented by the
inverse matrix A−1 , so that AA−1 = A−1 A = I, where I is the identity
matrix, Iij = δij .
If A has d eigenvectors {~vi }, it can be written as:
A = XΛX −1 , (A.7)
where X is the matrix whose columns are the {~vi }:

X = ~v1 ~v2 · · · ~vd , (A.8)
and Λ = diag(λ1 , λ2 , · · · , λd ) is the diagonal matrix of eigenvalues. This can

be seen by acting A on X. Since X −1 acts on each column of X separately,
X −1~vi picks out the i’th column of the identity matrix. Λ then multiplies by
the corresponding eigenvalue λi , and X turns it back into λi~vi .
Two column vectors ~v , w ~ are orthogonal if ~v T w ~ = 0. A set of vectors {~vi }
T
is orthonormal if ~vi ~vj = δij . If the d eigenvectors of A are orthonormal,
then X −1 = X T and we have:
X
A = XΛX T = λi~vi~viT . (A.9)
i
* Exercise A.2
What is the rank of the rotation matrix

cos θ sin θ
(A.10)
− sin θ cos θ
for some angle θ? Does it have any real eigenvectors when θ 6= {0, π}?
A.4 Determinants and volumes

In this section, we will use the Einstein summation convention, where re-
peated indices are summed over (p. 18).
The determinant of an n × n matrix A is:
1
det A ≡ i i ···i j j ···j Ai j Ai j · · · Ain jn , (A.11)
n! 1 2 n 1 2 n 1 1 2 2
where i1 i2 ···in is the totally antisymmetric symbol: it is negated under ex-
change of any two indices, and we define 12···n = 1. For example, for
n = 3:
1
det A = ijk lmn Ail Ajm Akn
6
= A11 A22 A33 − A11 A23 A32 + A12 A23 A31 (A.12)
− A12 A21 A33 + A13 A21 A32 − A13 A22 A31 .
From the definition, we also have:
det A = det AT . (A.13)
The determinant can be viewed as a multilinear, antisymmetric function

of the column vectors in the matrix: det(~v1 , ~v2 , · · · , ~vn ). Multilinear means
linear in all arguments:
~ ~v2 , · · · , ~vn ) = a det(~v , ~v2 , · · · , ~vn ) + b det(w,

det(a~v + bw, ~ ~v2 , · · · , ~vn ) (A.14)
for every argument. Multilinearity comes from the fact that each term in
the sum (A.11) contains exactly one element from each column. Antisym-
metry means it is negated under exchange of any two arguments. In partic-
ular, if ~vi = ~vj for any i 6= j, the determinant is zero. Antisymmetry comes
from the antisymmetry of j1 j2 ···jn .
In fact, the determinant is the unique multilinear, antisymmetric func-

tion of n vectors that also satisfies:
det I = det(ê1 , ê2 , · · · , ên ) = 1. (A.15)
where êi is the standard orthonormal basis. This is because every argument
can be expanded in this basis and the determinant reduced to (constant) ×
det(ê1 , ê2 , · · · , ên ) using antisymmetry and multilinearity. Thus, these three
conditions are enough to define the function.
Note that this definition only agrees with the matrix definition when
the vector arguments are expanded in the standard basis. Then the êi have
T T
components ê1 = 1 0 · · · 0 , ê2 = 0 1 · · · 0 , etc. In another
basis, the êi have different components, so the identity matrix I does not
equal the matrix (ê1 ê2 · · · eˆn ).
The determinant of a product of matrices is:
det(AB) = det(A) det(B). (A.16)
To see this, expand it as:
det(AB) = det(A~b1 , A~b2 , · · · , A~bn ), (A.17)
where the ~bi are the column vectors of B. Viewed as a function of the ~bi ,
this is antisymmetric and multilinear, so must equal c(A) det B for some
constant c(A) depending on A. By a similar argument, this also equals
c(B) det A for some constant c(B) depending on B. Thus, it must equal
det(A) det(B), with the overall constant fixed by taking A = B = I, for
example.
The determinant also gives the oriented volume Vol(~v1 , ~v2 , · · · , ~vn ) of the
parallelpiped spanned by the column vectors (Fig. A.1). Oriented volume is
defined as the usual volume but antisymmetric under exchange of vectors.
We normalize it by defining Vol(ê1 , ê2 , · · · , ên ) = 1.
Figure A.1: The oriented parallelpiped spanned by {~v1 , ~v2 , ~v3 }. It has ori-
ented volume V = det(~v1 , ~v2 , ~v3 ).
Then, in order to show the determinant is the same function, we simply

have to show that Vol is multilinear. First, observe that scaling any vector
by c also scales the volume by c. For the distributive property, we can
decompose any argument vector as ~v = ~vk + ~v⊥ , where ~vk is parallel to the
subspace spanned by the remaining vectors and ~v⊥ is perpendicular to the
subspace. Then we have:
~ ~v2 , · · · , ~vn ) = Vol(~vk + ~v⊥ + w

Vol(~v + w, ~k + w ~ ⊥ , ~v2 , · · · , ~vn )
= Vol(~v⊥ + w ~ ⊥ , ~v2 , · · · , ~vn )
(A.18)
= Vol(~v⊥ , ~v2 , · · · , ~vn ) + Vol(w ~ ⊥ , ~v2 , · · · , ~vn )
= Vol(~v , ~v2 , · · · , ~vn ) + Vol(w, ~ ~v2 , · · · , ~vn )
On the second line, shifting ~v⊥ + w ~ ⊥ by a parallel vector ~vk + w ~ k does not
affect the volume, so the parallel part vanishes. Then the perpendicular
part clearly distributes (third line). On the last line, we restore the parallel
part.
An infinitesimal volume is defined by the displacement vectors {d~x(1) ,
d~x(2) , · · · , d~x(n) } as:

dn x ≡ Vol(d~x(1) , d~x(2) , · · · , d~x(n) ) . (A.19)
Under a coordinate change x → x0 , we have:
∂xi 0j
dxi(a) = dx (A.20)
∂x0j (a)
for each a, so that

dn x = Vol(d~x(1) , d~x(2) , · · · , d~x(n) )
= Vol(Jd~x0(1) , Jd~x0(2) , · · · , Jd~x0(n) )

(A.21)
= |det J| Vol(d~x0(1) , d~x0(2) , · · · , d~x0(n) )

= |det J| dn x0 ,
∂xi
where Jji ≡ ∂x 0 is the Jacobian. We use (A.17) and the product rule (A.16)
j
on the third line.
Finally, the determinant is non-zero if and only if the matrix A is invert-
ible. To see this, assume A has a zero eigenvector ~v , so is not invertible.
The equation A~v = 0 gives a linear combination of column vectors of A that
equals zero, from (A.5). Thus, we can rewrite one of the column vectors
in this linear combination in terms of the others* . Then by antisymmetry
and multilinearity, the determinant equals zero. Conversely, if A has no
zero eigenvectors, the column vectors are all linearly independent. Then
the volume spanned by the column vectors is non-zero.
To summarize, the following conditions on an n × n matrix A are all
equivalent:
• A has rank n
• A is invertible
• The column vectors of A are linearly independent
• det A 6= 0
* Or, if there is only one column vector in this linear combination, it equals zero and
the determinant is trivially zero.
Appendix B
Lorentz transformation from

moving clocks
We can arrive at the Lorentz transformation by first deriving time dilation.

Consider two clocks C and C 0 moving with velocity −v/2 and +v/2 re-
spectively, starting from the origin. They are synchronized to read 0 at the
origin. They send light signals to each other when they each read time τs .
These signals are received when they each read time τr . Now view this
situation from an IRF where C is stationary at x = 0 (Fig. B.1). C 0 now
moves on the path x = vt. The t-axis in this IRF now corresponds to the
times read by C, but not necessarily the times read by the moving clock C 0 .
Instead, C 0 now sends a signal at time t = ts and receives a signal at time
t = tr , as measured in the IRF.
107
APPENDIX B. LORENTZ TRANSFORMATION FROM MOVING CLOCKS108
Figure B.1: Clocks C and C 0 each send signals (red) when they read τs
and receive signals when they read τr . In the IRF shown, C is stationary at
x = 0.
From Fig. B.1, we have:
τr = ts + vts
(B.1)
tr = τs + vtr .
Because τs is arbitrary and time is homogeneous, ts (the time in the IRF)

is related to τs (the time on C 0 ) by a constant factor: ts = γτs . Likewise,
tr = γτr . Combine these four equations and solve for γ. After some algebra,
we obtain:
1
γ=√ . (B.2)
1 − v2
Since γ > 1, the time on the moving clock (τs ) is smaller than the time in
the IRF (ts ). This is called time dilation. Note that as v → 1, γ → ∞. If
we take ts as finite, τs is “frozen” at 0. Thus, the speed of light c = 1 is the
speed limit for relative motion of IRFs.
Back to the Lorentz transformation. C 0 is also the clock at x0 = 0 in IRF
I 0 (Fig. 1.3). It reads time t0 . For the point X 0 = (t0 , 0)T , we now have
t = γt0 and x = vt = vγt0 . This gives two of the matrix elements of Λ(v):
0
t γ Λ12 t
= . (B.3)
x γv Λ22 x0
APPENDIX B. LORENTZ TRANSFORMATION FROM MOVING CLOCKS109
To find the other two matrix elements, invert Eq. (1.3):

X 0 = Λ−1 (v)X. (B.4)
IRF I 0 is moving with velocity v relative to I, so I is moving with velocity
−v relative to I 0 . This gives:
Λ−1 (v) = Λ(−v). (B.5)
The inverse of a 2 × 2 matrix is:
−1
a b 1 d −b
= , (B.6)
c d ad − bc −c a
as you may verify. We have:

−1 1 d −b
Λ (v) = , (B.7)
γ(d − bv) −γv γ
where b = Λ12 (v), d = Λ22 (v). Plugging this into (B.5), we get:

1 d −b γ b̄
= , (B.8)
γ(d − bv) −γv γ −γv d¯
where b̄ = Λ12 (−v), d¯ = Λ22 (−v). The bottom left equation gives:
γ(d − bv) = 1, (B.9)
or
γd − 1
b= . (B.10)
γv
Then the top left equation gives:
d = γ. (B.11)
Plugging back into (B.10), we get:
γ2 − 1
b= = γv, (B.12)
γv
using the definition of γ (B.2).
Thus, we arrive at Eq. (1.15).
I personally find such derivations harder to remember than the one in
the main text, because linear transformations are best understood using
their eigenvectors and eigenvalues. You may have a different view!
Appendix C
Vector calculus
C.1 Identities
Here we derive some 3D vector calculus identities using index notation,
which may be unfamiliar for some readers but is more flexible than tradi-
tional vector notation. Here, ∂j ≡ ∂/∂xj , and ijk is the totally antisymmet-
ric symbol: 123 = 231 = 312 = 1, 132 = 213 = 321 = −1, with all other
entries 0.
We also use the Einstein summation convention where repeated indices
are summed over.
Product of antisymmetric symbols
ijk klm = δil δjm − δim δjl (C.1)
This is shown as follows. Since k is the same index on both ’s, (ij) must
be the same indices as (lm) in some order. There are only two orderings,
corresponding to the two terms with different signs.
Div, curl and cross product

~ ≡ ∂i Ai
∇·A (C.2)

∇×A~ ≡ ijk ∂j Ak (C.3)
i

~×B
A ~ ≡ ijk Aj Bk (C.4)
i
110
APPENDIX C. VECTOR CALCULUS 111
Dot product of cross product

~ ~ ~
A · B × C = Ai ijk Bj Ck

~· A
=C ~×B ~ (C.5)

~· C
=B ~ ×A ~
Cross product of curl

~× ∇×B
A ~ = ijk Aj klm ∂l Bm
i
= Aj ∂l Bm (δil δjm − δim δjl ) (C.6)
= Aj ∂i Bj − Aj ∂j Bi
Div of curl
~
∇· ∇×A = ∂i ijk ∂j Ak = 0 (C.7)
i
due to the antisymmetry of ijk .
C.2 Chain rule

The multidimensional chain rule is:
df ∂f ∂y i
(~y (~x)) = i (~y (~x)) j (~x) (C.8)
dxj ∂y ∂x
for a function f (~y (~x)). You can visualize this using lines connecting all
possible paths to a variable:
APPENDIX C. VECTOR CALCULUS 112
This extends to further nested functions:

df ∂f ∂z k ∂y i
(~
z (~
y (~
x ))) = (~
z (~
y (~
x ))) (~
y (~
x )) (~x) (C.9)
dxj ∂z k ∂y i ∂xj
for a function f (~z(~y (~x))).

Appendix D
Riemann tensor components
The Riemann tensor Rρνσα satisfies:
Rρνσα = −Rρνασ = −Rνρσα (Antisymmetry)

(D.1)
Rρνσα + Rρσαν + Rρανσ = 0 (First Bianchi identity)
The number of independent components of Rρνσα in d dimensions is:
d2 (d2 − 1)
. (D.2)
12
For d = 4, this is 20.
One derivation is as follows. First, any component with 3 or 4 of the
same index is zero due to antisymmetry. The next largest number of com-
mon indices is 2 of one index and 2 of another, such as R1212 . Next are
components with 3 different indices, such as R1213 . Finally, there are com-
ponents with all different indices, such as R1234 . The number of ways to
choose each index pattern is given
in Table D.1 as a function of d. For ex-
4
ample, for d = 4, there is only = 1 way to choose 4 different indices.
4
n
Here, ≡ n!/(k!(n − k)!) is the choose function.
k
113
APPENDIX D. RIEMANN TENSOR COMPONENTS 114
Index pattern # choices Antisymmetry Antisymmetry+Bianchi

d
R1212 1 1
2
d
R1213 (d − 2) 2 1
2
d
R1234 6 2
4
Table D.1: Number of ways to choose each index pattern, and number of
components after imposing antisymmetry or antisymmetry + Bianchi.
For each index pattern, we can apply antisymmetry and the first Bianchi
identity to any particular choice of indices. First, pattern R1212 has only 1
component due to antisymmetry. Pattern R1213 has 2: R1213 and R1312 .
These
are related by Bianchi, reducing the count to 1. Pattern R1234 has
4
= 6 by antisymmetry. Applying Bianchi to each index in the first
2
position gives 4 independent equations, reducing the count to 2. These
are summarized in Table D.1. Multiplying the number of choices per index
pattern by the number of components per choice, and summing, gives:
d2 (d2 − 1)

d d d
+ (d − 2) + ·2= . (D.3)
2 2 4 12

Relativity A Modern Primer

Uploaded by

Copyright:

Available Formats

Relativity A Modern Primer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Relativity A Modern Primer

Uploaded by

Copyright:

Available Formats

Relativity: A Modern Primer

August 30, 2021

1 Special relativity and the nature of time 3

2 The action principle 25

2.10 The Maxwell Lagrangian . . . . . . . . . . . . . . . . . . . . 38

3 The geometry of spacetime 42

5 Cosmology and the expanding universe 90

B Lorentz transformation from moving clocks 107

C Vector calculus 110

D Riemann tensor components 113

• Concise: Only about 100 pages long.

• Broad: In addition to special and general relativity, it covers La-

• Pedagogical: Clear explanations with many figures and exercises.

Prerequisites: vector calculus and classical mechanics. Basically, a stan-

Special relativity and the nature

1.1 Postulates of special relativity

1. The laws of physics are the same in all IRFs.

2. The speed of light is a constant (c ≈ 3 × 108 m/s) in all IRFs.

1.3 Natural units

1 s. There is nothing special about this factor other than convenience. We

In general, for a system of n independent units, we can set n−1 indepen-

1.4 Galilean transformations

1.5 Lorentz transformations

Now invert Eq. (1.3):

Λ−1 (v) = Λ(−v). (1.8)

Λ(−v) = λb ŵ1 ŵ1T + λf ŵ2 ŵ2T (1.9)

On the other hand, inverting a transformation simply inverts its eigenval-

where we define λ ≡ λf for brevity.

Let us restore the factors of c to compare with classical physics:

For low velocities v  c, γ ≈ 1, and this reduces to the Galilean transfor-

1.6 Time dilation

1.7 Length contraction

Although length contraction is “merely” the spatial version of time di-

Figure 1.6: Translating a length measurement into a time measurement at

1.8 Proper time

Figure 1.8: Worldline of a clock broken up into infinitesimal segments

We may integrate Eq. (1.21) to obtain a finite difference in clock read-

1.9 Velocity addition

Figure 1.9: A moving flashlight illuminating a distant screen.

However, this spot cannot be used to send information: an observer on

tB = γa(1 + vu). (1.24)

xµ = Λµν (v)x0ν , (1.25)

uµ = Λµν (v)u0ν (1.29)

Fortunately, we know of a differential that is invariant under Lorentz trans-

for any four-vector V ν . In components: V0 = −V 0 , Vi = V i . From the

Vµ = (Λ−1 (v))νµ Vν0 (1.38)

*** Exercise 1.6

2. Calculate the Lorentz invariant aµ aµ .

3. Consider an observer uniformly accelerating in the x1 direction,

1.12 Energy and momentum

The corresponding Lorentz invariant is p2 = −m2 . At small velocity v  1,

2. Show that this decay is only allowed for m1 + m2 ≤ M .

Figure 1.10: Lightcone at the origin for a 3D spacetime.

1.14 Wick rotation

Then tE is just another spatial dimension and (1.34) becomes:

The action principle

2.1 The action

Sif {~x(ti ), ti , ~x(ti + ∆t), ti + ∆t, · · · , ~x(tf ), tf }, (2.1)

where ∆t = (tf − ti )/N . (Note that the velocity ~v (t) = ~x(t+∆t)−~

For low velocities v c, γ ≈ 1, and this reduces to the Galilean transfor-

The corresponding Lorentz invariant is p2 = −m2 . At small velocity v 1,