Tensors Poor Man
Tensors Poor Man
Tensors Poor Man
Justin C. Feng
Authors Note
In the spring semester of 2013, I took a graduate fluid mechanics class taught by Philip J. Morrison. On the first
homework assignment, I found that it was easier to solve some of the problems using tensors in the coordinate basis,
but I was uncomfortable with it because I wasnt sure if I was allowed to use a formalism that had not been taught in
class. Eventually, I decided that if I wrote an appendix on tensors, I would be allowed to use the formalism. Sometime
in 2014, I converted1 the appendix into a set of introductory notes on tensors and posted it on my academic website,
figuring that they might be useful for someone interested in learning the topic.
Over the past three years, I had become increasingly dissatisfied with the notes. The biggest issue with these notes
was that they were typeset in Mathematicaduring my early graduate career, I typeset my homework assignments in
Mathematica, and it was easiest to copy and paste the appendix into another Mathematica notebook (Mathematica
does have a feature to convert notebooks to TeX, but much of the formatting is lost in the process). Furthermore, I had
become extremely dissatisfied with the contentthe notes contained many formulas that werent suciently justified,
and there were many examples of sloppy reasoning throughout. At one point, I had even considered removing the
notes from my website, but after receiving some positive feedback from students who benefited from the notes, I
decided to leave it up for the time being, vowing to someday update and rewrite these notes in TeX. My teaching
duties and dissertation work have delayed my progressit has been more than two years since my last update to these
notesbut I have finally found the time to properly update these notes.
These notes have been rewritten; this means that I have re-typed the notes from beginning to end, revising and
adding to the notes as I went along. What I did not do was copy and paste text from my old notes and edit the
resultevery single word in this document has been re-typed. I do this to ensure that all of the shortcomings of the
earlier versions (those that I am aware of, at least) are addressed. All sections have been rewritten and expanded. The
overall organization of these notes is the same as it was in the original, except for the addition of a section on index
gymnastics and a section at the end which covers surface integrals, the divergence theorem, and Stokes theorem.
Exercises have been added to the end of each section to enhance the readers understanding of the material. An index
has been added as well. Though I have added a lot of content, I have attempted to maintain the succinctness of the
original notes, and resisted the temptation to include a full-blown treatment of dierential forms and Riemannian
geometryIll save those for a separate set of notes.
The title, The Poor Mans Introduction to Tensors, is a reference to Gravitation by Misner, Thorne and Wheeler,
which characterizes simplified approaches to a problem as the poor mans way to do X. Originally, these notes were
intended to be a short, informal primer on tensors, and were by no means a substitute for a more formal and complete
treatment of the subject. However, I fear that in my eort to overcome the shortcomings of the previous version,
these notes have become too formal to justify the label Poor Mans in the original sense of a simplified, informal
treatment of the topic. On the other hand, I have tried to rewrite these notes in a manner that is still accessible to
anyone with a basic training in linear algebra and vector analysis, and I promise to always make these notes freely
available on the webthese notes are in this sense The Poor Mans Introduction to Tensors.
If you find any errors or have any suggestions for these notes, feel free to contact me at: [email protected]
Have Fun!
Justin C. Feng
Austin, Texas
September 2017
1 Thisinvolved adding some content, in particular the material on the Levi-Civita tensor and integration, and the removal of frivolous
contentthe original appendix was full of jokes and humorous remarks, which I often placed in my homework assignments to entertain
myself (and also the grader).
The Poor Mans Introduction to Tensors
Justin C. Feng1
1
Physics Department, The University of Texas at Austin
(Dated: September 2017)
When solving physical problems, one must often choose between writing formulas in a coordinate
independent form, or a form in which calculations are transparent. Tensors are useful because they
provide a formalism that is both coordinate independent and transparent for performing calculations.
In particular, tensors facilitate the transformation of partial dierential equations and the formulas
of vector calculus to their corresponding forms in curvilinear coordinates. In these notes, I provide
an introduction to tensors in Euclidean space for those who are familiar with the basics of linear
algebra and vector calculus.
CONTENTS
I. Introduction 2
V. Vectors 6
XIII. The Levi-Civita Tensor: Cross Products, Curls, and Volume Integrals 30
Acknowledgments 38
References 38
Index 39
2
I. INTRODUCTION
These notes were written for a broad audience; I wrote these notes to be accessible to anyone with a basic knowledge
of linear algebra and vector calculus.2 I have done my best to build up the subject from first principles; the goal of
these notes is not to simply teach you the mechanics of the formalism3 , but to provide you with a fundamental
understanding of what tensors are. Since these notes are intended for a broad audience, I will avoid discussion of
General Relativity and non-Euclidean geometry, and focus instead on developing the formalism for ordinary three-
dimensional Euclidean space. In addition to providing a fundamental understanding of what tensors are, these notes
are intended to provide you with the tools to eortlessly write down explicit expressions for Partial Dierential
Equations and integrals in a general curvilinear coordinate system.4
When learning a new topic, I often find it helpful to identify the central ideas and principles firstI usually get more
out of the topic when I do so. For your convenience, I present to you, in a single paragraph, the essence of tensor
analysis:
Simply put, a tensor is a mathematical construction that eats a bunch of vectors, and spits out a
scalar. The central principle of tensor analysis lies in the simple, almost trivial fact that scalars are
unaected by coordinate transformations. From this trivial fact, one may obtain the main result of tensor
analysis: an equation written in tensor form is valid in any coordinate system.
In attempting to condense the essence of tensor analysis into a single paragraph, I have left out many important
details. For instance, the definition for tensors in the first sentence is an incomplete one; in particular, it leaves out
the fact that tensors are linear maps, as well as the fact that tensors also eat other objects called dual vectors.
These details will be discussed in the remainder of these notes.
If you are already familiar with indices, it may be tempting to skip this section. However, I emphasize some
important points in this sectionat the very least, make sure to take note of the boldfaced text.
Indices (the plural of index) provide a useful way to organize a large number of quantities, be they variables,
functions, or abstract elements of sets. They are particularly useful when you have a large collection of equations
that all have a similar form. Before I tell you what an index is, Id like to provide a quick motivating example first
to sell you on the formalism. Suppose you encounter a physical system which is described by 89 dierent variables.
If you attempt to represent each variable with a single Latin or Greek letter, you will run out of letters before you
could write down all the variables for the system!
An index is written as a superscript or a subscript that we attach to a symbol; for instance, the subscript letter i in
qi is an index for the symbol q, as is the superscript letter j in pj is an index for the symbol p. Indices often represent
positive integer values; as an example, for qi , i can take on the values i = 1, i = 2, i = 3, and so on. In this way, I can
represent all 89 variables by simply writing down qi , with the understanding that i can have any integer value from
1 to 89. In particular, qi provides a simple way to represent the full list of 89 variables:
q1 , q2 , q3 , q4 , q5 , q6 , q7 , q8 , q9 , q10 , q11 , q12 , q13 , q14 , q15 , q16 , q17 , q18 , q19 , q20 , q21 , q22 , q23 , q24 ,
q25 , q26 , q27 , q28 , q29 , q30 , q31 , q32 , q33 , q34 , q35 , q36 , q37 , q38 , q39 , q40 , q41 , q42 , q43 , q44 , q45 , q46 ,
q47 , q48 , q49 , q50 , q51 , q52 , q53 , q54 , q55 , q56 , q57 , q58 , q59 , q60 , q61 , q62 , q63 , q64 , q65 , q66 , q67 , q68 ,
q69 , q70 , q71 , q72 , q73 , q74 , q75 , q76 , q77 , q78 , q79 , q80 , q81 , q82 , q83 , q84 , q85 , q86 , q87 , q88 , q89
This is a pain to write out by handits much easier to just write qi , with i representing integer values from 1 to 89.
2 For those who are unfamiliar with these topics and those who need a refresher, I can suggest a few books (and a short symmery). Linear
Algebra: Step by Step by K. Singh covers all Linear algebra concepts that I assume of the reader. There is also a short 4-page summary
in [25], which summarizes the topics covered in the recent (crudely-titled) book No Bullshit Guide to Linear Algebra by Ivan Savov. The
book Div, Grad, Curl, and All That by H. M. Schey [26] provides an excellent informal introduction to vector calculus. I learned the
basics from the book Mathematical Methods in the Physical Sciences by Mary Boas [4].
3 In these notes, the word formalism is defined as a collection of rules and techniques for manipulating symbols. A good formalism should
provide a systematic way of writing down a complicated mathematical operation in a much simpler form. One of my goals in writing these
notes is to show you how the formalism of tensors simplify coordinate transformations for PDEs and integrals.
4 Curvilinear coordinates on Euclidean space are defined as coordinate systems in which the coordinate lines are curved.
3
I now consider a more concrete example. Many problems in physics and engineering are formulated in Cartesian
coordinates on three-dimensional Euclidean space. For a three-dimensional Euclidean space, Cartesian coordinates
refer to the three variables x, y, and z. Instead of using three dierent variables x, y, and z to describe the coordinates
of a point, I can instead write xi , where i can be any number in the list (1, 2, 3). Explicitly, I write the following:
x1 = x
x2 = y (3.1)
3
x =z
A word of cautionThe superscripts 1, 2, and 3 are NOT exponents! The superscripts 1, 2, and 3 are simply
labels telling you which coordinate you are referring to. To be clear: the 2 in x2 means that x2 represents coordinate
number 2; it DOES NOT mean x x!
You might be wondering why I choose to represent the coordinates xi with a superscript, rather than a subscript
(for instance, I could have written xi instead). Though this is partly a matter of convention, the use of superscripts
for coordinate indices is widely used. In any case, I feel that I must emphasize this convention:
In these notes, coordinate indices will always be superscripts
This may seem to be overly pedantic, but Im doing this because I want to emphasize and alert you to the fact that
in tensor analysis, INDEX PLACEMENT IS IMPORTANT! In case it isnt clear, Index placement refers to
whether the index is a superscript or a subscript. Ill take this opportunity to introduce you to some terminology: a
superscript index, like the j in pj , is called a raised index , and a subscript index, like the i in qi , is called a lowered
index .
Indices may also be used to describe a vector in three-dimensional Euclidean space. Typically, we write v to represent
a vector. In three dimensions, we use three numbers to describe a vector, so that for a vector v in Euclidean space
(assuming Cartesian coordinates), v represents the list of three numbers (vx , vy , vz ), with vx being the component of
the vector in the x-direction, vy being the component of the vector in the y-direction, and so on. In index notation,
I may write v as v i , so that:
v 1 = vx
v 2 = vy (3.2)
3
v = vz
The formulas above (3.2) allow me to identify v 1 as the component of the vector in the x1 -direction, v 2 as the
component of the vector in the x2 -direction, and so on. The expression v i is therefore a compact way express the
components of a vector.
Note that I also use raised indices (superscripts) for vectors. This is because introductory courses tend to characterize
vectors as the dierence between two points, which in index notation may be written as xi . Since the index i in
xi is raised, vector components defined in this manner should have raised indices.
Index notation may be extended to vector formulas in a straightforward manner. Consider the following well-known
formula:
F = m a, (3.3)
where F and a are vectors, and m is some positive number. To convert this to index form, I replace the arrows with
indices:
F i = m ai (3.4)
Equation (3.4) is just a compact way of writing the following three equations:
F 1 = m a1
F 2 = m a2 (3.5)
3 3
F = ma
Index notation may also be used to describe square matrices. Recall that one may write down a square matrix M
in three dimensions as the following table of numbers:
M11 M12 M13
M = M21 M22 M23 (3.6)
M31 M32 M33
4
It is tempting to write the components of the matrix as Mij , and this is often done. I will do this for now, but dont
get too comfortable with itI will change my conventions in just a bit. In index notation, I may write the Eigenvector
formula M v = v as:
3
Mij v j = v i (3.7)
j=1
For me, it is uncomfortable to deliberately write this formula downthe feeling is kind of like hearing the sound of
fingernails scraping a chalkboard. This is due to a mismatch in the placement of indices. Notice that on the right
hand side of (3.7), the index i is raised, but on the left hand side, the index i is lowered. When a matrix acts on a
vector, the result must also be a vector, but according to my earlier convention, vectors must have raised indices. If
the left hand side of (3.7) form the components of a vector, the index i must also be raised.
If I insist that vector indices must be raised, then the proper way to express the components of a matrix is M i j ,
so that the Eigenvector formula (3.7) becomes:
3
M i j vj = vi (3.8)
j=1
At first, this may seem rather awkward to write, since it suggests that the individual matrix elements be written as
M 1 1 , M 1 2 , ... etc. However, I assure you that as you become more comfortable with tensor analysis, equation (3.8)
will seem less awkward to write than (3.7).
While Im on the topic of matrices, Id like to introduce the Kronecker delta ji , which is just the components of
the identity matrix. Specifically, ji is defined in the following way:
{ 1 1 1
1 2 3 1 0 0
1 if {i = j}
ji = ji = 12 22 32 = 0 1 0 (3.9)
0 if {i = j} 3 3 3 0 0 1
1 2 3
The quantities ij and ij are similarly defined, and are also referred to as Kronecker deltas. To drive home the point
that INDEX PLACEMENT IS IMPORTANT, it turns out that ji form the components of a tensor, but ij
and ij do notafter I present the definition for a tensor, I will give you an exercise (exercise IX.3) where you show
this.
I hope I am not frustrating you with my obsession about index placement. The whole point of this is to alert
you to the importance of index placement in tensor analysis. Ill write it again: INDEX PLACEMENT IS
IMPORTANT! My shouting is especially directed to those who have been exposed to Cartesian tensors1 . Since
I have emphasized this point repeatedly, it is appropriate for me to give you some idea of why index placement is
important. The placement of the index is used to distinguish two types of objects:2 vectors v i and dual vectors wi .
This distinction will be critically important in upcoming sections and we will not be able to proceed without it.
Exercise III.1
In your own hand, write down the following sentence three times on a sheet of paper:
INDEX PLACEMENT IS IMPORTANT!
Exercise III.2
Let M, P, and Q be n n matrices, with the respective components M i j , P i k , and Qk j (I choose the
letters for the indices on the components to help you out). Rewrite the formula M = PQ (which involves
matrix multiplication) using index notation. Your result should contain a sum over the values of one index.
1 Many treatments of tensor analysis begin by studying Cartesian tensors (tensors expressed exclusively in Cartesian coordinates), and when
doing so, the distinction between raised and lowered indices is often ignored. One example is the treatment of tensors in [4], which I used
as an undergraduate, and also [15]. I understand that the intent in taking this approach is to provide a gentler introduction to tensors,
but I feel that this approach obscures the essence of tensors, and can lead to a great deal of confusion when moving on to curvilinear
coordinates.
2 Ill take this opportunity to introduce some terminology, which I will repeat later on. Vectors v i are often called contravariant vectors,
Ive been told on more than one occasion something to the eect that Albert Einsteins greatest contribution to
physics and mathematics1 is his invention of the Einstein summation convention, which is the following rule:
Any time you see a pair of indices (one raised and one lowered) written with the same
symbol, a sum is implied.
For instance, given the matrix components M i j and vector components v i , Einstein summation convention states
that when you see an expression of the form M i j v j , there is a sum over the index j, since the letter j appears twice.
More explicitly, Einstein summation convention states that the following expression:
M i j vj (4.1)
I state this again: Einstein summation convention states that when I write down the expression M i j v j , I should
automatically assume that there is a sum over any index that appears twice (again, one must be raised and the other
lowered).
Einstein invented this convention after noting that the sums which appear in calculations involving matrix and
vector products always occur over pairs of indices. At first, Einstein summation convention seems like a potentially
dangerous thing to do; one in equation (4.2) and erasing the
if you think about it, were basically taking a sum like the
summation symbol j . You might imagine that erasing summation symbols j will produce a bunch of ambiguities
in the formalism. However, Einstein summation convention works (in the sense that it is unambiguous) because when
performing a tensor calculation, the indices you sum over always come in a pairone raised and one lowered. If you
encounter more than two repeated indices or a pair of indices that are both raised or both lowered, you have either
written down a nonsensical expression, or you have made a mistake.
Summation convention does have one limitation. If you want to refer to a single term in the sum, but you dont
want to specify which one, you have to state that there is no sum implied. One way to get around this (though it is
not standard3 ) is to underline the index pairs that you do not sum over, for instance: M i a v a .
Exercise IV.1
If you did Exercise III.2 properly, you would have noticed that your sum is consistent with Einsteins
observation: the symbol for the indices that you sum over should appear twice in your expression, with one
of the indices raised and the other lowered. If your result is not consistent, fix this. Write out the matrix
multiplication MPQ in index notation using explicit sums (again, assume that M, P, and Q are n n
matrices). This time, you should get two sums, andif you do this correctlyyou will find that Einsteins
observation4 holds in this case as well.
1 Ivealso been told that Einstein himself made a remark to this eect.
2 To
be clear, summation convention DOES NOT apply to formula (4.2); by including the summation symbol j , I have made the
summation explicit.
3 What is usually done is to explicitly state that there is no sum (see, for instance, page 9 of [17]).
4 Namely, the observation that sums in matrix and vector products occur over pairs of indices.
6
V. VECTORS
Before I provide the definition of a tensor, I must first provide a definition for a vector; in fact, you will later see
that a vector is in fact an example of a tensor.
A vector is simply a directional derivative. Before you write me o as a nut, examine the directional derivative for
some function f (xa ):
(xa ) = v i f
v f (5.1)
xi
where xa represent Cartesian coordinates on Euclidean space.1 I use Einstein summation convention in equation
(5.1)in Einstein summation convention, the index i for partial derivatives x i is treated as if it were a
lowered index. Now I remove the function from equation (5.1) to obtain the directional derivative operator:
= vi
v = v1 + v2 + v3
v (5.2)
xi x1 x2 x3
Now compare this with the vector v written out in terms of the orthogonal unit vectors ea (if you are more familiar
with the unit vectors i, j, and k, then you can imagine that e1 = i, e2 = j, and e3 = k):
v = v i ei v = v 1 e1 + v 2 e2 + v 3 e3 (5.3)
Side-by-side, equations (5.2) and (5.3) suggest that partial derivatives /xi can be thought of as basis vectors!
This is essentially what Im claiming when I say that a vector is simply a directional derivative. The basis of partial
derivatives, by the way, is called the coordinate basis.
Of course, in order for me to say that a vector is a directional derivative operator, I must show that the direc-
tional derivative operator contains the same information as the information contained in the explicit
components (v 1 , v 2 , v 3 ). I can do one betterI can show you how to extract the components (v 1 , v 2 , v 3 ) from the
directional derivative operator v .
Lets see what happens when I feed the trivial function f = x3 into the directional
derivative operator:
3 3 3 3
3 = v i x = v 1 x + v 2 x + v 3 x
v x (5.4)
x i x 1 x 2 x3
xi xi
Coordinates are independent of each other, so xj = 0 if i = j and xj = 1 if i = j. A compact way of writing this is:
xi
= ji (5.5)
xj
where ji is the Kronecker delta defined in (3.9). Equation (5.4) is then:
v x
3 = v i i3 = v 3 (5.6)
This result (equation (5.6)) means that all I have to do to pick out a component v i of a vector is to feed the
corresponding coordinate xi into the directional derivative operator. In fact, you can even define the components v i
of the vector v this way:
v i := v x
i (5.7)
The whole point of this discussion is to motivate (in an operational sense) the definition of a vector as the following
operator:
v() := v ()
= v i i () v = vi (5.8)
x xi
1 Often, indices are suppressed in the arguments of a function: it is typical to write f (x) rather than f (xa ). However, there are some
situations in which it is convenient to use the same symbol for two (or more) dierent quantities, with the distinction provided by the
number of indicesfor instance, one might write pi for a vector and pij for a matrix.
7
I drop1 the arrow on the vector v to indicate that it is an operator now (with () being the place where you insert a
function). Given the definition (5.8), the components (5.7) may be rewritten as:2
v i := v(xi ) (5.9)
where the right-hand side is to be interpreted as the (directional derivative) operator v() in (5.8) acting on the
coordinate xi .
To give you some intuition for the definition (5.8), Ill relate it to an example that may be more familiar to you.
Consider a curve in Euclidean space. I may parameterize the curve by the functions xi (t); in particular, I write
the coordinates of points that lie along the curve as functions of a parameter t. The notation here is meant to be
suggestive; you might imagine that xi (t) describes the motion of a particle with respect to time. If I take the derivative
of xi (t) with respect to t, I obtain a vector v i that is tangent to the curve xi (t) (the tangent vector to the curve):
dxi
vi = (5.10)
dt
Note that this is a local expression; it can be evaluated at a single point in space. To see how this relates to the
directional derivative definition (5.8) for a vector, consider a function3 (xa (t)). Using the chain rule, the derivative
of (xa (t)) with respect to t is:
You can find a discussion of manifolds and their mathematical underpinnings in [2], [27], [20], [11], and [19]. Some General Relativity
textbooks, such as [31] and [5], will also introduce and discuss the concept of a manifold.
1 Another reason for doing this is to get you comfortable with notation that is commonly used in the literature.
2 Vectors that have components v i with raised indices are sometimes called contravariant vectors.
= .
3 If it helps, you could imagine that is the potential energy for a particle, with a corresponding force F If xa (t) is the trajectory
of the particle, then (5.11) computes the power P = dW/dt (W being work) applied to the particle by the force F : P = F v .
4 When working with tensors, you need to get rid of the idea that a single vector can be defined everywhere in space, if you havent already.
This idea only makes sense in Euclidean space, and it is only useful in Cartesian coordinates. Though these notes will focus on curvilinear
coordinates in Euclidean space, the study of non-Euclidean spaces is the main motivation for doing tensor analysis, particularly in General
Relativity.
8
Exercise V.1
In this section, I used Einstein summation convention throughout. Identify all the equations in this section
that use Einstein summation convention, and re-insert the summation symbols i .
Exercise V.2
In linear algebra, a vector is defined as an element of a vector space. Show that the directional derivative
operator (5.8) is indeed an element of a vector space.
Exercise V.3
You can describe a circle of radius R in Euclidean space as a curve xi () parameterized by the parameter
, with the coordinates xi () explicitly given by the following formulas:
x1 () = R cos()
x2 () = R sin()
x3 () = 0
i
Use these formulas to obtain the components of the tangent vector v i () = dx d . Draw a circle (use a
compass!) on a sheet of graph paper, and draw arrows representing the vector v i for various values of
, with the tail ending at the point of the circle corresponding to each value of . In doing so, convince
yourself that v i is indeed a vector tangent to the circle.
9
You should be familiar with the dot product u v for two vectors u and v. Note that I have again dropped the
arrow notation for vectors; for the remainder of these notes, the symbols u and v will be used exclusively for vectors.
In index notation, the dot product can be written as (assuming Cartesian coordinates on Euclidean space):
u v = ij ui v j (6.1)
where ij is the Kronecker delta with lower indices (ij = 1 for i = j and ij = 0 for i = j).
The dot product is an example of an inner product1 for vectors u, v, which is a generalization of the dot product.
The inner product may be written in terms of a quantity gij called the metric:
u, v = gij ui v j (6.2)
Since inner products are symmetric u, v = v, u, the metric has the following symmetry:
gij = gji (6.3)
We require the existence of an inverse metric 2 g ij , defined as the solution to the following equation:
g ik gkj = ji (6.4)
Recall that the Kronecker delta ji , defined in equation (3.9), is just the components of the identity matrix; this
justifies the term inverse metric for g ij .
For dot products in Cartesian coordinates on Euclidean space (u, v = u v), we see that the metric is gij = ij .
Explicitly, the metric may be written as the following table (the metric is not exactly a matrix, since both indices are
lowered):
g11 g12 g13 1 0 0
gij = g21 g22 g23 = 0 1 0 (6.5)
g31 g32 g33 0 0 1
It should not be too hard to infer (see exercise VI.1 below) that the components of the inverse metric are g ij = ij .
Ill take a moment to explain meaning of the metric components gij . Recall that a three-dimensional vector may
be written in terms of three linearly independent basis vectors ei in the following way:
v = v i ei (6.6)
Here, I do not assume that ei are orthogonal unit vectors; they may not be of unit length and they may not be
orthogonal to each other. Since ei are themselves vectors, I can take inner products of the basis vectors: ei , ej .
Using the properties3 of inner products, you can show (see exercise VI.2) that ei , ej form the components of the
metric tensor:
gij = ei , ej (6.7)
Thus, the metric components gij are just inner products between basis vectors. In the previous section, I introduced
the idea that partial derivatives x i are basis vectors. Equation (6.7) in turn suggests that the metric components
1 In relativity, we often use a definition for inner products that may be slightly dierent from those used in your linear algebra classes (see
definition 4.1 in [29])). In particular, we define an inner product by the following four properties for vectors p, u, v and scalar :
1. u + p, v = u, v + p, v
2. u, v = u, v
3. u, v = v, u
4. Nondegeneracy: There exists no nonzero v such that u, v = 0 holds for all vectors v
The dierence is in property 4; in linear algebra classes, the positive-definite condition (v, v 0 for all v) is often used in place of
property 4. In special relativity, we use property 4 because special relativity requires a notion of inner product for which v, v can have
any value for nonzero vectors v.
2 Confusingly, some authors refer to both g ij as the metric. You can get away with this, since the index placement in g ij is used
ij and g
to indicate that it is the inverse of gij , and gij and g ij can be thought of as two dierent ways of writing down the same information
(since in principle, you can get one from the other).
3 See footnote 1.
10
In the literature, you may encounter statements to the eect that the metric provides a way to measure distances
in space. This is because the metric can be used to construct the line element:
ds2 = gij dxi dxj (6.9)
which can be thought of as the norm dx, dx of an infinitesimal displacement vector dxi . In the usual x-y-z variables
for Cartesian coordinates, line element may also be written as (the superscript 2 in (6.10) is an exponent, not an
index):
ds2 = dx2 + dy 2 + dz 2 (6.10)
i
Given a curve x (s) parameterized by a parameter t, the line element can be used to measure distances along the
curve. The line element (6.9) can be thought of as the square of the infinitesimal distance ds along the curve, and
the distance s between any two points xi1 = xi (t1 ) and xi2 = xi (t2 ) is given by the following integral:
t2
dxi dxj
s = gij dt (6.11)
t1 dt dt
You can derive (6.11) from (6.9) by taking the square root of (6.9) to get a formula for ds, and using the physicists
trick of multiplying and dividing by dierentials:
i j i j
dt dxi dxj
ds = gij dx dx = gij dx dx = gij dt (6.12)
dt dt dt
One can then integrate equation (6.12) to obtain (6.11).
I must make one last remark before concluding the present discussion. The metric/inner product can be
thought of as a (linear1 ) machine that eats two vectors and spits out a scalar. We will later see
that tensors can be defined by a similar characterization.
Exercise VI.1
Show (or argue2 ) that the components of the inverse metric g ij are given by g ij = ij , where ij = 1 for
i = j and ij = 0 for i = j.
Exercise VI.2
Show that the inner products between basis vectors ei form the components of the metric gij ; in particular,
show that gij = ei , ej . You can do this by writing out u, v as the explicit sum (dropping Einstein
summation convention):
3
3
i
u, v = u ei , v j ej
i=1 j=1
Here, it is appropriate to think of ui and v i as scalars in the sense that they are just coecients in front
of the basis vectors ei . In particular, show that:
3
3
3
3
3
3
ui ei , v j ej = ui v j ei , ej u, v = ui v j ei , ej
i=1 j=1 i=1 j=1 i=1 j=1
by expanding the sums into nine terms and using the property u, v = u, v (where is a scalar)
on each of the nine terms. Finally, compare the result (specifically the formula to the right of the arrow
symbol ) to equation (6.2).
Exercise VI.3
Consider a circle of radius R in Euclidean space. Recall the formulas for the parameterization of the circle
xi () from exercise V.3. Using the metric gij = ij , show that the integral (6.11) yields the correct value
for the circumference of the circle.
In this section, I wish to introduce a new type of vector, which I call a dual vector (also called a one-form 1 , or
a covariant vector ). Simply put, a dual vector is a quantity that eats a vector and spits out a scalar (or a real
number). Explicitly, a dual vector w() is an operator that does the following:
w(v) = wi v i (7.1)
where wi are the components of the dual vector (note the lowered index!). In a similar manner, you can think of
vectors v as being dual to dual vectors; vectors can be thought of as operators that eat dual vectors and spit
out scalars:2
v(w) = v i wi (7.2)
I will take the opportunity here to introduce some notation. It is standard to write expressions like (7.1) and (7.2)
using the following inner product notation:
w, v = v, w = wi v i = v i wi (7.3)
This diers from the inner product in equation (6.2) in that one of the arguments in , is a dual vector. There is no
ambiguity here; if both of the arguments are vectors, , is given by equation (6.2), and if one of the arguments is a
vector and one is a dual vector, , is given by equation (7.3). Later, Ill give you an exercise where you show that
, as defined in (7.3) satisfies the same linearity properties as the usual inner product. In particular, for vectors v, p,
dual vectors w, q, and scalars , , we have the properties:
v + p, w + q = v, w + q + p, w + q = v + p, w + v + p, q
(7.4)
= v, w + v, q + p, w + p, q
The notion that vectors and dual vectors are dual to each other (in the sense of equations (7.1), (7.2) and (7.3))
is central to tensor calculuswe will return to this in the next section. For now, I must say a few more things about
dual vectors.
A natural set of basis elements (or basis dual vectors) for the dual vector are the coordinate dierentials dxi , so
that a dual vector may be written as:3
w = wi dxi (7.5)
This is in contrast to vectors, which have partial derivatives as basis elements (or basis vectors). The sense in
xi
i
which coordinate dierentials dx form a natural set of basis elements comes from the dierential of a function, which
is an example of a dual vector. The dierential df of a function f (xa ) is defined as:
f
dxi df := (7.6)
xi
The components of the dierential are just the components of gradient of the function. To simplify things, Ill use
the symbol Gi to represent the components of the gradient:
f
Gi := df := Gi dxi (7.7)
xi
The index i in Gi is lowered because as stated earlier, the index i in partial derivatives x
i are treated as lowered
indices in Einstein summation convention. To see that Gi do indeed form the components of a dual vector (in the
sense of equation(7.1)), apply the directional derivative operator v as defined in (5.8) to the function f . Using (7.7),
I can write the following expression for v(f ):
f
v(f ) = v i = v i Gi (7.8)
xi
1 Dual vectors are referred to as one-forms because they are an example of a class of tensors called dierential forms, which are beyond the
scope of these notes.
2 Im being a bit sloppy here, since the dual of a dual vector is technically a double dual, not a vector. However, the vector space formed by
double duals is isomorphic to the vector space that the vector v lives in, so it possesses the same algebraic structure. This is why we can
get away with treating double duals as if they are vectors.
3 Given equation (7.5), you might wonder if a dual vector w can always be written as a dierential of some scalar function f . Equivalently,
f
you might ask whether there always exists a function f such that wi = x i . It turns out that in general, you cant; this is only possible
w
if wi satisfies the (Frobenius) integrability condition: w
xj
i
= xij (as an exercise, show that the gradient components Gi satisfy this
property). This integrability condition is a special case of the Frobenius theorem, the details of which can be found in [3, 11, 16, 27].
12
Upon comparing the above with equation (7.1) and recalling that v(f ) is a scalar, we see that the Gi can be used to
form the components of a dual vector G = df that eats v and spits out a scalar.
The basis elements dxi are themselves dual vectors, in the same way that basis vectors are themselves vectors, be
i
they partial derivatives x i or unit vectors ei . There must also be a sense in which dx eats vectors and spits out
i i
scalars. Equation (7.3), combined with the expressions v = v xi and w = wi dx can be used to infer the following
formula for the basis elements:
dxj , i = , dx j
= ij (7.9)
x xi
The formula above states that when dxj eats a basis vector x i , it spits out 1 or 0 depending on whether i = j.
1 i
It also states that when the basis vector xj eats dx , it also spits out 1 or 0 depending on whether i = j.
When the Dierential df Eats a Vector
To give you some intuition for dual vectors, imagine that at some point p, the components of the vector v i form the coordinates for
some 3d Euclidean space, which Ill call Tp M (which is a fancy name for the tangent space of a manifoldrecall the remarks at the end of
section V). We can construct scalar functions (v a ) on Tp M, just like we can in ordinary Euclidean space. A dual vector w is just a linear
function (v a ) = wi v i on the Euclidean space Tp M. It is sometimes helpful to visualize whats going on: (v a ) is a linear function, so
surfaces defined by constant values for are 2d planes in the 3d Euclidean space Tp M. If you imagine the vector v as an arrow in Tp M
(with the tail at the origin), the tip of the vector lies on the 2d plane corresponding to the value of at the tip (w(v) = (v a )).2
If you would like a less abstract (but sloppier) example, recall that the dierential df is a dual vector, which eats a vector v i and spits
out a scalar:
df (v) = Gi v i (7.10)
In Cartesian coordinates on Euclidean space, displacement vectors xi make sense. Now pretend for a moment that v is a small
displacement vector v i = xi . If the magnitude of v is small enough, then can then interpret df (v) as the change in value of the function
f between the tip and tail of the displacement vector v i = xi :
df (v) f (tip of v) f (tail of v)
If you think about it, this isnt terribly surprising, since this is virtually the same statement as the definition of the dierential (7.6):
df = Gi dxi , if you imagine that dxi is a small displacement vector.
Exercise VII.1
Show that v, w = v i wi (equation (7.3) satisfies the linearity properties (7.4).
Exercise VII.2
Convince yourself of (7.9). You can do this by writing out v, w explicitly:
i 1 2 3
v, w = v j
, wj dx = v +v +v 1 2
, w1 dx + w2 dx + w3 dx 3
xi x1 x2 x3
1 In
this context, you should think of x j as an abstract basis vector, rather than a partial derivative.
2 If
you tried to learn about dierential forms from [18], this is essentially what the whole bongs of a bell example is all about. I recall
being bewildered by the explanation in [18], but my good friend Luis Suazo eventually clarified it for me.
13
In section III, I repeatedly shouted the statement: INDEX PLACEMENT IS IMPORTANT! If you have had
exposure to Cartesian tensors, you may wonder how so many authors can get away with ignoring the issue of index
placement. This section will reveal the reason why you can get away with this in Cartesian coordinates.
If you have a metric gij , you can convert vectors to dual vectors. Explicitly, this may be done by performing the
following operation on the components v i :
vj = gij v i (8.1)
The quantities vj form the components of a dual vector v, which may be written as:
v() = v, (8.2)
where the argument () requires a vector. If we feed a vector u into the dual vector v, we recover the inner product
(6.2) between vectors u and v:
Note that the notation vj introduced in (8.1) provides a slick way to write the inner product: v, u = vj uj .
We see that the metric can turn vectors into dual vectors. It would be nice to do the reverse; Id like to take a dual
vector and turn it into a vector. To do this, recall that we require the existence of an inverse metric g ij , defined as
the solution to equation (6.4): g ij gij = ji . The inverse metric g ij can then be used to turn the components of a dual
vector wj into the components of a vector wi :
wi = g ij wj (8.4)
The process of using metrics and inverse metrics to convert vectors to dual vectors and vice versa is called the
lowering and raising of indices. In Cartesian coordinates, the metric is just the Kronecker delta gij = ij , as is the
inverse metric g ij = ij . As a result, the raising and lowering of indices will not change the value of the components:
in Cartesian coordinates, v1 will have the same value as v 1 , v2 will have the same value as v 2 , and so on. This is why
one can get away with neglecting index placement when working with Cartesian tensors. In curvilinear coordinates,
however, the metric no longer has this simple form, and index placement becomes important.1
The raising and lowering of indices is part of a formalism for manipulating indices called index gymnastics. I wont
go through all of the techniques in the formalism here, but you will encounter most of the rule in one way or another
in these notes.2 On the other hand, I would like to introduce a few of them before moving on to the next section.
An important technique is the contraction of indices, which is a generalization of the trace operation for matrices.
Given an object with two indices or more, index contraction is the process of relabeling (and the repositioning) of an
index so that it has the same symbol as another index. Einstein summation convention then implies that there is a
sum over the indices. For a matrix M with components M i j , the contraction of indices is just the trace:
M ii = M 11 + M 22 + M 33 (8.5)
Now consider a quantity with three indices: Qijk (dont worry about visualizing it; just think about it as a collection
of 3 3 3 = 27 variables). If I wish to contract the indices i and k, I can use the metric gia to lower the index i to
obtain a quantity Qa jk :
I then relabel the index a: explicitly, I replace the symbol a with the symbol k to get Qk jk . The result is Qj , which
are the components of a vector:
Index contraction provides a way to reduce the number of indices for a quantity. There is an operation, called the
tensor product, which can be use to construct quantities with more indices. The tensor product is a straightforward
1A professor in one of my undergraduate classes once made a remark to the following eect: You can spend the rest of your life working
in Cartesian coordinates, but it would be a very miserable life!
2 If you want, see page 85 of [18] for a list of the techniques that make up index gymnastics.
14
operation: given the components of a vector v i and a dual vector wi , multiply them to form the components of a
matrix1 K i j :
K i j := v i wj (8.8)
Tensor products can be performed on quantities with more indices. For instance, you could multiply the quantity
Qijk with the matrix components M i j to form the components of a big 5-indexed quantity B ijkl m :
A very useful tool for index manipulation is the relabeling of dummy indices. Dummy indices refer to the indices
that I sum over, for instance the indices i and k in the expression gik Qijk (8.7). They are called dummy indices
because I can change the symbols that are being summed over without aecting the meaning of the expression. For
instance, the expressions wi v i and wj v j have the same exact meaning:
wi v i = wj v j (8.10)
If the above isnt clear, it may help to write the above (8.10) as an explicit sum (we drop Einstein summation
convention here):
3
3
wi v i = wj v j = w1 v 1 + w2 v 2 + w3 v 3 (8.11)
i=1 j=1
A more illustrative example is the expression gik Qijk in (8.7). I can change the label i to a and k to s in the expression
gik Qijk to obtain gas Qajs ; both gik Qijk and gas Qajs have the same meaning:
If you need more convincing, write both sides of (8.12) as explicit sums and expandyou should see that in each case,
you get the same result (exercise VIII.4). The relabeling of dummy indices can be an extremely useful tool when
dealing with long expressionsit is particularly useful for identifying quantities that are equivalent to each other (see
exercise VIII.5).
Another set of techniques is is the symmetrization and antisymmetrization of indices. A quantity Sij is said to be
symmetric in the indices i and j if it satisfies the property Sij = Sji . Given some quantity Pij that is not symmetric
in i and j, I can symmetrize the indices i and j by performing the following operation:
1
P(ij) := (Pij + Pji ) (8.13)
2
where P(ij) is called the symmetric part of Pij . A quantity Aij is said to be antisymmetric in the indices i and j if it
satisfies the property Aij = Aji . If Pij that is not antisymmetric in i and j, I can antisymmetrize the indices i and
j by performing the following operation:
1
P[ij] := (Pij Pji ) (8.14)
2
where P[ij] is called the antisymmetric part of Pij .
Exercise VIII.1
Let ui and vj be dual vector components obtained from the respective vector components ui and v j . Show
that the inner product u, v can be written as: ui v j . Also show that ui vi is equivalent to ui v i .
1A warning: In general, it is not always possible to write matrix components as a product of vector components. One example is the
identity matrix/Kronecker delta ji ; there exist no vectors that can generate ji by way of formula (8.8).
15
Exercise VIII.2
Take the trace of ji , assuming 3d Euclidean space. Now do the same for 2d Euclidean space (in 2d
Euclidean space, the indices i and j only take on two values: 1 and 2). Do the same for 4d Euclidean
space. You should be able to deduce the result for the trace in n-dimensional Euclidean space. What
happens when you contract the indices of ij and ij ?
Exercise VIII.3
Consider the quantity Zijkl . Contract the indices k and l, and raise an index to obtain a matrix S i j . Write
down the expression for S i j in terms of the original quantity Zijkl , the metric gij , and the inverse metric
g ij . Contract the indices of S i j , and write down the result in terms of Zijkl , gij and g ij .
Exercise VIII.4
Write gik Qijk and gas Qajs as explicit sums. Now expand the result (you should have nine terms), and in
doing so, show that equation (8.12) is valid: gik Qijk = gas Qajs .
Exercise VIII.5
Relabel dummy indices to show that the following three quantities (which are all scalars) are equivalent
to gjl Qijk M l i vk :
gki Qjkl M i j vl
Qkil vl M j k gij
vd M b c gab Qcad
It is quite possible to do this by inspection, but if you wish to show this explicitly, you can do this by
writing out the expression each time you relabel a pair of indices, proceeding until you have the expression
gjl Qijk M l i vk .
Exercise VIII.6
Let Aij be an antisymmetric quantity, so that Aij = Aji , and Sij be a symmetric quantity, so that
Sij = Sji . Show that if you raise the indices of both quantities to obtain Aij and S ij , they satisfy the
same properties: Aij = Aji and S ij = S ji . Use this result to show that Aij Sij = 0 and Aij S ij = 0.
Exercise VIII.7
Show that if you symmetrize an antisymmetric quantity Aij (Aij = Aji ), you get zero: A(ij) = 0.
Likewise, show that if you antisymmetrize a symmetric quantity Sij (Sij = Sji ), you also get zero:
S[ij] = 0. Finally, show that if you have a quantity Pij that is neither symmetric or antisymmetric
(Pij = Pji and Pij = Pji ), then you can decompose it into its antisymmetric parts; in other words, show
that Pij = P(ij) + P[ij] .
16
In this section, I will finally reveal to you the precise definition of a tensor, and show how it follows from the
principle that the value of a scalar function at a point is unaected by coordinate transformations. This is because
coordinates are just labels or names that we give to points in space.1 Now I must make an important distinction
here: the principle I mentioned is only valid if I think of scalar functions as functions of points, not coordinates.
From this section on, I will always define scalar functions so that scalar functions are functions of points, not
coordinates. It follows that a coordinate transformation can change how a scalar function depends on coordinates,
but a coordinate transformation does NOT change how a scalar function depends on points.2
A vector is a geometric quantity. In kindergarten,3 we are taught that vectors consist of a magnitude and direction.
How we represent the magnitude and the direction of the vector depends on the coordinates. Again, coordinates are
just labels or names that we give to points in space, so they should not alter the geometric properties of a vector
(its magnitude, for instance); geometry does not care about the way we choose to name points in space.
Intuitively, this means that coordinate transformations cannot change the meaning of an inner product or a direc-
tional derivative. More precisely, both inner products and directional derivatives acting on a function yield scalars,
and the value of a scalar at a point should not depend on the name (coordinates) we give to that point. In other
words, if I evaluate the following scalar quantities at some point p, the numerical values should be unaected by
coordinate transformations:
u, v = gij ui v i (9.1)
v(f ) = v f (9.2)
f
On the other hand, the values for the vector components ui , v i and the values for the gradient components Gi = x i
do change under coordinate transformations. To proceed, I need to know the transformation law for vectors; in
particular, I want to know how the components v i of the vectors change under a coordinate transformation.
Before I derive the transformation law for vectors, let me first describe the coordinate transformation I wish
to perform. Consider a coordinate transformation y (xa ), where y are my new coordinates, and xi are my old
coordinates. Lowercase Greek4 letters , , , ... will be used for indices corresponding to new coordinates y , and
lowercase Latin indices a, b, c, ...i, j, k, ... will be used for indices corresponding to the old coordinates xi . I assume
that the functions y (xa ) are invertible, so that I can obtain from it the functions xi (y ) (and vice-versa). I can take
xj y
derivatives of the functions y (xa ) and xi (y ), to obtain the transformation matricesthe quantities y and xj form
the components of the transformation matrices. The chain rule tells me that the derivatives/transformation matrices
xj
y
and y
xj are inverses of each other in the following way:
xi y xi
j
= = ji
y x xj
(9.3)
y xi y
i
= =
x y y
The last equality comes from the assumption that the coordinates are independent of each other (cf. (5.5)).
The chain rule also tells us how the components of the gradient transforms:
f xj f
= (9.4)
y y xj
As discussed in section VII, the components of the gradient Gi = xf
i are the components of a dual vector, which
suggests the following transformation law for the components of dual vectors:
xj
w = wj (9.5)
y
1 See chapter 1 of [18] to see how points in space can have a physical meaning independent of coordinate labels.
2 For this reason, all formulas in this section should be thought of as being evaluated at the same point, irrespective of the coordinate values
for that point.
3 Steven Weinberg often makes this remark when recalling an elementary concept in his lectures.
4 If you plan to study relativity, I must alert you to the face that this notation is nonstandard (usually, primes are put on indices). In
relativity, Greek indices are typically reserved for coordinates on spacetime, and Latin indices either used to denote spatial indices,
spacetime indices in the orthonormal basis (see for instance Appendix J in [5]), or as abstract indices used to keep track of tensor
arguments (the latter is called abstract index notation, which is discussed in [31]).
17
If v is a vector and w is a dual vector, then the value of w(v) = wi v i , being a scalar, cannot be aected by a coordinate
transformation. If the value of w(v) to remains unchanged under a coordinate transformation, the transformation law
for v i must be opposite 1 to that of wi :
y i
v = v (9.6)
xi
Under a coordinate transformation, I can then write the following expression for w(v):
xj y i xj y
w(v) = w v = wj v = wj v i = ij wj v i = wi v i w v = wi v i (9.7)
y xi y xi
This computation (9.7) demonstrates that the transformation laws (9.5) and (9.6), combined with (9.3), guarantee
that w(v) is unchanged under coordinate transformations. In short, I have shown that:
If v i transforms in a manner that is opposite to the transformation of wi under a change
of coordinates, then the transformations in w(v) = wi v i cancel out, ensuring that the value
of the scalar w(v) remains unchanged under coordinate transformations.
Inner products are invariant under coordinate transformations. This demand establishes the transformation law for
the metric tensor. If you examine the expression (9.1) for the inner product u, v = gij ui v i , you may realize that the
indices of the metric tensor must acquire transformation matrices that cancel out the transformation matrices that
the vector components ui and v i acquire under a coordinate transformation. The metric tensor therefore satisfies the
following transformation law:
xi xj
g = gij (9.8)
y y
Since the inner product can also be written as u, v = g ij ui v j (see exercise VIII.1), the inverse metric can be
similarly shown to satisfy the following transformation law:
y y ij
g = g (9.9)
xi xj
Now recall the characterization of the metric at the end of section VI as a (linear2 ) machine that eats two
vectors and spits out a scalar. One might imagine a more general construction, which eats any number of vectors
and dual vectors, and spits out a scalar. This construction is what we call a tensor , which is defined by the
statements:
A tensor is a linear map that eats vectors and/or dual vectors and spits out a scalar.
By linear map,3 I mean that a tensor must be a linear function4 of the things it eats and also that a tensor vanishes
when it eats zero, meaning that if you feed a tensor a zero vector v i = 0 or a dual vector with all vanishing
components wi = 0, it automatically returns zero.5 These conditions imply that a tensor that eats one vector v and
one dual vector w must have the following form:
T (w, v) = T i j wi v i (9.10)
Note that the above formula is linear6 in both wi and v i , and vanishes if either wi = 0 or v i = 0 (as per the eats
zero property). For T (w, v) to be a scalar, the components T i j must transform in the following way:
y xj i
T = T j (9.11)
xi y
1 The y xj y
transformation is opposite in that that it uses the transformation matrix xi
instead of y
; recall that from equation (9.3), xi
is
j
x
the inverse of y and vice versa.
2 In the sense of properties 1. and 2. of footnote 1 on page 9.
3 A more formal definition for the term linear map is given in exercise IX.7.
4 By this I mean that if a tensor eats a vector, it is a linear function of the vector components if everything else it eats is held fixed.
At this point, you may notice a pattern in equations (9.8), (9.9) and (9.11). In all cases, raised indices transform just
like vector indices, and lowered indices transform like dual vector indices. This leads to the following properties of
tensor components:
Raised indices of tensor components transform like vector indices
These properties are often used as defining properties of tensors; in much of the literature, tensors are defined by
components which transform according to the two properties listed above.
Tensors can in general eat an arbitrary number of vectors and dual vectors. The number of vectors and dual
vectors a tensor eats is called the rank of the tensor. The two indexed tensor T i j defined in (9.10) is called a rank
2 tensor (to be more precise, we say that it is a rank (1, 1) tensor, since it eats one vector and one dual vector). An
example of a rank 5 tensor (or rank (3,2) tensor), which eats the vectors v, u, q and the dual vectors w, p, is the
following:
Note that the above expression is linear in the individual components wi , pi , v k , ul , and q m , and vanishes if any one
of the vectors or dual vectors is zero (the eats zero property).
The metric gij and its inverse g ij are examples of tensors; they are both linear maps, and respectively eat vectors
and dual vectors, and spit out scalars. Vectors and dual vectors are also themselves examples of tensors (they are
both rank-1 tensors); a vector is a linear map that eats a single dual vector and spits out a scalar, and a dual vector
is a linear map that eats a single vector and spits out a scalar:
v(w) = v i wi
(9.13)
w(v) = wi v i
Ive given you the definition for a tensor. Now, Ill deliver the punchline for these notes:
Tensors are useful because tensor equations look the same in all coordinate systems.
Gi j = T i j (9.14)
where Gi j and T i j are (rank-2) tensors, and is some constant. Under a coordinate transformation, Gi j and T i j
transform as:
y xj i
T = T j
xi y
(9.15)
y xj i
G = Gj
xi y
y xj i y xj i
G j = T j G = T (9.16)
xi y xi y
We see that in the new coordinates, equation (9.16) has the same form as it did in the old coordinates (9.14)!
Exercise IX.1
f
Use the transformation laws (9.4) and (9.6) to show that the directional derivative v(f ) = V i x i is
Exercise IX.2
Write out the transformation law for the components S ij klm of the tensor S defined in (9.12).
19
Exercise IX.3
Show that the Kronecker delta (3.9) with one raised index and one lowered index, ji , retains its identity
in all coordinate systems. In particular, show that1 if I define the matrix D as the coordinate transfor-
xj i
mation of the Kronecker delta: D = y
xi y j , then D = . Note that this is not in general true
for the lowered index Kronecker delta ij or the raised index Kronecker delta ij .
Exercise IX.4
Show that if B ij klm form the components of a rank-5 tensor, then performing a contraction on any pair
of indices yields a rank-3 tensor. For instance, show that B ij ilm is a rank-3 tensor, B ijk km = g kl B ij klm
is a rank-3 tensor, and so on.
Exercise IX.5
The tensor product for tensors is simply the multiplication of tensor components to form a tensor of higher
rank (recall the discussion of the tensor product in section VIII). Show that if Qijk and Pij form the
components of a tensor, then the tensor product Qijk Plm transforms as the components of a tensor.
Exercise IX.6
Show that if the tensor components Sij are symmetric in i and j, meaning that Sij = Sji , then the
transformed tensor components are also symmetric: S = S . Also show that if the tensor components
Aij are antisymmetric in i and j, meaning that Aij = Aji , then the transformed tensor components are
also antisymmetric: A = A
Exercise IX.7
In this exercise, I give a more formal definition for a linear map. A linear map R is defined by the following
properties in each of its arguments:
Additivity: R(y + z) = R(y) + R(z)
2
Homogeneity of degree 1: R( y) = R(y)
where y, z represent vectors or dual vectors, and is a scalar. Show that vectors v() and dual vectors
w() satisfy this property.
For an arbitrary number of arguments, a linear map R satisfies the following:
where u, v are vectors, w, p are dual vectors, and , , , are scalars. Show that the tensor T in (9.10)
and the tensor B in (9.12) both satisfy the above formulas.
The metric is a tensor; you can find the expression for the components gij in a dierent set of coordinates using
the tensor transformation law (9.8). In Cartesian coordinates on Euclidean space, the components of the metric are
just the lowered index Kronecker delta ij . In curvilinear coordinates, the components of the metric g are:
xi xj
g = ij (10.1)
y y
Lets transform the metric to spherical coordinates1 (r, , ). In particular, I choose y 1 = r, y 2 = , and y 3 = . I
write the original Cartesian coordinates xi as functions of the spherical coordinates r, , and (or y ):
x1 = r cos sin
x2 = r sin sin (10.2)
3
x = r cos
Using the formulas above (exercise X.2), you can show that the components of the metric tensor take the following
form:
g11 g12 g13 1 0 0
g = g21 g22 g23 = 0 r2 0 (10.3)
2 2
g31 g32 g33 0 0 r (sin )
The above formula (10.3) for the metric components g is useful in two respects. First, it tells you how to measure
distances along curves in spherical coordinates, via the line element (cf. equations (6.9) and (6.10)):
ds2 = g dy dy = dr2 + r2 d2 + r2 (sin )2 d2 (10.4)
where the superscripts after the second equality are exponents, not indices. Second, the metric components g in
(10.3) tells you how to take the dot product, given the components of vectors u and v in spherical coordinates:
u v = g u v (10.5)
However, there is a catch. If you have prior exposure to vectors in spherical coordinates, the vector components u
and v in spherical coordinates may not be the same as those you are familiar with. This is because:
In general, coordinate basis vectors xi are not unit vectors, and unit vectors are not in
general coordinate basis vectors!
Recall that a unit vector is a vector of unit norm, meaning that a unit vector u satisfies the condition that u u = 1.
Also recall that in section VI, the components of the metric gij are actually inner products (in this case, dot products)
between the basis vectors ei in particular, recall equation (6.7) gij = ei , ej and also equation (6.8), which I rewrite
here:
gij = , (10.6)
xi xj
In order for y to be unit vectors, the diagonal elements of the metric, g11 , g22 , and g33 must all be equal to 1. This
is clearly not the case for the metric in spherical coordinate (10.3); upon comparing (10.6) with (10.3), I find that:
, = g11 = 1
r r
, = g22 = r (10.7)
, = g33 = r2 (sin )2
1I use the physicists convention, where is the azimuthal angle that runs from 0 to 2. Incidentally, I find it odd that so many of my
physics colleagues still use the angle for polar coordinates in 2 dimensions (though I admit, I still do this on occasion); even more oddly,
Ive seen many of them switch to using when working in cylindrical coordinates!
21
with all other inner products between the coordinate basis vectors vanishing. Only the basis element r is a unit
vector.
Note that if the metric is diagonal, meaning that gij = ei , ej = 0 if i = j, then the basis elements are orthogonal.
If the basis vectors y are orthogonal, then the norms (10.7) can be used to obtain expressions for the corresponding
unit vectors in a straightforward manner; in spherical coordinates, simply divide the by the square root of the norm
to get the unit vectors:
1
r = =
g11 y 1 r
1 1
= = (10.8)
g22 y 2 r
1 1
= =
g33 y 3 r sin
The above formulas allow you to convert vectors from the basis of partial derivatives (called the coordinate basis) and
the basis of unit vectors in spherical coordinates.
Exercise X.1
For simplicity, lets work in two dimensions for this problem. Consider the coordinate functions relating
Cartesian coordinates xi to the polar coordinates y 1 = r and y 2 = :
x1 = r cos
(10.9)
x2 = r sin
Use the tensor transformation law (10.1) to obtain the components g for the metric in polar coordinates.
Can you infer the components of the inverse metric g ?
Exercise X.2
Obtain the components g for the metric in spherical coordinates from the coordinate functions (10.3)
and the transformation law (10.1) for the metric. Use your result to infer the components of the inverse
metric g .
Exercise X.3
In Cartesian coordinates, the simplest nonvanishing vector field you can write down is the constant vector
field:
v1 = 1
v2 = 0
v3 = 0
Lower the indices to get the components of the dual vector vi . Obtain the components v for the dual
vector field in spherical coordinates, and use the components of the inverse metric g (see exercise X.3)
to obtain the components for the vector field v in spherical coordinates. Finally, obtain the components
for the vector in the unit-vector basis: r, , .
Exercise X.4
Use the coordinate functions (10.9) in exercise X.1 to infer the coordinate functions for cylindrical coordi-
nates. Use the transformation law (10.1) to obtain an expression for the metric in cylindrical coordinates.
Work out the norm for each basis vector y , and obtain expressions for the corresponding unit vectors.
22
Ultimately, the goal here is to construct a formalism for writing down Partial Dierential Equations (PDEs) in a
manner that is transparent1 for doing calculations, but also coordinate-invariant. Often, we wish to express PDEs in
coordinates adapted to a particular problem, especially when symmetry is involved. For instance, when using a PDE
to model a spherically symmetric system, it is far more appropriate to express the PDE in spherical coordinates than
cylindrical coordinates.
By definition, PDEs contain partial derivatives. Unfortunately, it turns out that partial derivatives of tensors do
not transform like tensors. The purpose of this section is to develop a formalism for the derivatives that preserves the
transformation properties of tensors.
The partial derivative of a scalar function f (xa ), the gradient Gi = x f
i , transforms as a tensora dual vector in
fact. Actually, if you recall the logic in section IX, it is the other way aroundthese notes derive the transformation
properties of the dual vector wi (and all the transformation properties of tensors) from the transformation of the
gradient components Gi . On the other hand, the partial derivative of the components v i of a vector field, do not
transform as a tensor. Consider the following quantity:
v i
Ai j := (11.1)
xj
and its corresponding expression in the coordinates y :
v
A := (11.2)
y
If Ai j and A are components of a tensor, they would be related to each other by a tensor transformation law (9.11),
which I rewrite here for T i j and T :
y xj i
T = T j (11.3)
xi y
Unfortunately, Ai j and A do not satisfy a tensor transformation law of the form (11.3). To see this, I insert the
formula v = y i
xi v into equation (11.2) for A :
( ) ( )
y i xj y i
A = v = v (11.4)
y xi y xj xi
where I have applied the chain rule in the second equality. Upon applying the product rule and recognizing a factor
of Ai j , I obtain the result:
xj y i xj 2 y
A = i
A j + j i vi (11.5)
y x y x x
If I cover up the underlined term with my hand, the equation above (11.5) looks like the tensor transformation law.
However, the presence of the underlined term means that the quantities Ai j and A do not form the components
of a tensor. If we can somehow get rid of the underlined term in (11.5) without placing constraints2 on the functions
y (xa ), then we could recover the transformation law for a tensor.
One way to do this is to construct a new derivative operator that reduces to the usual partial derivative in Cartesian
v i
coordinates. The simplest modification to the partial derivative x j (11.2) is to add a correction term. Since I wish
to cancel out the underlined term in (11.5), and since the underlined term contains a factor of v i , the correction
term should contain a factor of v i . These considerations lead me to define a new derivative operator j , called the
covariant derivative, which acts on v i in the following way:
v i
j v i = + ijk v k (11.6)
xj
1 This is the disadvantage of the standard Gibbs-Heaviside formalism for vector analysis that you may be accustomed to (see the article [6]
for a historical discussion of the Gibbs-Heaviside formalism). Abstract expressions such as v and v are in some sense coordinate-
invariant, but they give little indication of the methods for computing them in an arbitrary set of curvilinear coordinates. You need to
first specify the meaning of v and v in some natural coordinate system (Cartesian coordinates), then derive expressions for v
and v in another coordinate system by way of coordinate transformations.
2 If the second derivative of y (xa ) vanishes, then the underlined term in (11.5) vanishes, and we recover the tensor transformation law.
However, if xi are Cartesian coordinates, this is also the condition that the coordinate lines are straight; the only admissible coordinate
transformations correspond to rigid rotations and translations.
23
where ijk are coecients, which are sometimes called connection coecients. The trick here is that the coecients
ijk do not transform as tensors. Instead, I demand that the coecients ijk satisfy the following transformation law:
( ) ( )
y xj xk xj 2 y xi
= ijk (11.7)
xi y y y xj xi y
Note that the bracketed quantity in the second term appears in the underlined term in (11.5). If we demand that i
reduces to the ordinary partial derivative in Cartesian coordinates, then in Cartesian coordinates, we set ijk = 0. It
follows that the second term in (11.7) can be used to compute the connection coecients in any other coordinate
system.
If the coecients ijk transform according to the above transformation law (11.7), then it is not dicult to show
(see exercise XI.1) that the quantity j v i (11.6) transforms as as tensor:
y xj
v = j v i (11.8)
xi y
where v is given by:
v
v = +
v
(11.9)
x
Equation (11.8) states that the covariant derivative operator i yields a rank-2 tensor when acting on a vectorsuccess!
The next thing to do is to construct a covariant derivative for tensors of higher rank. Note that for a rank-2
tensor Gij with raised indices each index will pick up a factor of transformation matrix y xi under a coordinate
transformation. Each factor of yxi will generate an extra term which needs to be canceled out (see exercise XI.2).
The covariant derivative for Gij takes the form:
Gij
k Gij = + ikm Gmj + jkm Gim (11.10)
xk
For a rank-3 tensor Qijl , I pick up yet another term:
Qijl
k Qijl = + ikm Qmjl + jkm Qiml + lkm Qijm (11.11)
xk
I can continue to higher rank, but I think you see the pattern; when taking the covariant derivative, each index
requires a term containing a factor of ijk and the original tensor components.
Ill take the opportunity to quickly introduce the following notation for the gradient of a scalar function:
f
k f = , (11.12)
xk
which follows from running the pattern of equations (11.6), (11.10) and (11.11) in reversea scalar function f has zero
indices, so no extra correction term is needed. In case you dont recognize the pattern, Ill summarize the covariant
derivative for f , v i , Gij and Qijl :
f
k f =
xk
v i
k v i = + ikm v m
xk
(11.13)
Gij
k Gij = + ikm Gmj + jkm Gim
xk
Qijl
k Qijl = + ikm Qmjl + jkm Qiml + lkm Qijm
xk
As an exercise (see exercise XI.3, try to construct the covariant derivative for a rank-4 tensor.
I now wish to construct covariant derivatives for lower indexed objects. Ill begin by constructing the derivative for
a dual vector, which I expect to have the form:
wi j
k wi = + Cki wj (11.14)
xk
24
Recall that the quantity v i wi is a scalar. From (11.12), the covariant derivative k acting on a scalar is just the
partial derivative, so:
(v i wi ) v i wi
k (v i wi ) = k
= wi k + v i k (11.15)
x x x
where the product rule has been used in the second equality. Now a good definition for a derivative operator should
respect the product (Leibniz) rule, so I demand that the covariant derivative for dual vectors be consistent with the
following property:
( )
k v i wi = wi k v i + v i k wi (11.16)
( i )
I now have two equations, (11.15) and (11.16), for the same quantity k v wi . If I expand (11.16) and subtract
(11.15) from (11.16), I obtain the following result:
( ) j
k v i wi ikj = wi ikj v j + v i Cki wj = wi ikj v j + v j Ckj
i
wi = 0 (11.17)
Where I have performed a relabeling of dummy indices in the last term (recall the discussion in section VIII on the
relabeling of dummy indices). If I demand that equation (11.17) holds for all v i and wi , the coecients Ckj
i
must
satisfy Ckj = kj , so that the covariant derivative for dual vectors is given by:
i i
wi
k wi = jki wj (11.18)
xk
Given the pattern (11.13) and the covariant derivative (11.18) for dual vectors, I may infer that for a rank-2 tensor
Kij with lowered indices, the covariant derivative is:
Kij
k Kij = aki Kaj akj Kia (11.19)
xk
I may also infer that for a rank-2 tensor T i j with mixed indices, the covariant derivative is:
T i j
k T i j = + ika T a j akj T i a (11.20)
xk
Given (11.19), (11.20), and the tower of equations in (11.13), you should be able to infer the covariant derivative
of a tensor of arbitrary rank. If you want a more careful justification of (11.19) and (11.20), see exercise XI.4.
At this point, I can state the punchline and end this section. However, I wish to present a useful formula for ijk
in terms of the metric tensor:
( )
1 ia gja gak gjk
i
jk = g + (11.21)
2 xk xj xa
In this form, the coecients ijk are called Christoel symbols. You can derive the formula (11.21) for Christoel
symbols from the following two conditions1 on the covariant derivative i :
1. The Torsion Free Condition: For a scalar function f , i satisfies:
i j f = j i f kij = kji (11.22)
2. Metric Compatibility:
k gij = 0 (11.23)
Since equations (11.22) and (11.23) are constructed from covariant derivatives, they transform as tensors, and have
the same form in all coordinates. It is not too hard to verify that both (11.22) and (11.23) are true in Cartesian
coordinates, provided that we demand that ijk = 0 in Cartesian coordinates. Thus, instead of computing ijk via the
transformation law (11.7) in curvilinear coordinates, we may first work out an expression for the metric in curvilinear
coordinates, then use (11.21) to obtain an expression for ijk .
1 Some terminology: Sometimes, a covariant derivative i is called a connection, and ijk are called connection coecients. The specific
covariant derivative defined by the properties (11.22) and (11.23) is called the Levi-Civita connection (this is not to be confused with the
Levi-Civita symbol and Levi-Civita pseudotensor, which are entirely dierent quantities that I will introduce later on).
25
Now that I have shown you how to take the covariant derivative i for a tensor, I can deliver the punchline for this
section:
The covariant derivative of a tensor transforms as a tensor. This is useful because it allows
us to write derivatives in a form that looks the same in all coordinate systems.
The formulas for the covariant derivatives of a tensor are given in (11.19), (11.20), and (11.13). Given a metric
gij , the coecients ijk can be obtained from equation (11.21). Alternately, one may demand that ijk = 0 use the
transformation law (11.7) to obtain an expression for ijk in an arbitrary curvilinear coordinate system.
Since covariant derivatives become partial derivatives in Cartesian coordinates, all you need to do to convert a
PDEs to a tensor equation is to write down the PDE in Cartesian coordinates, and replace partial derivatives with
covariant derivatives:1
k (11.24)
xk
Curvature
In these notes, I have avoided the discussion of non-Euclidean geometries, lest I be accused of doing General Relativity (the horror!).
However, I suspect that you may nonetheless be curious about the applications of this formalism to non-Euclidean geometries. In particular,
this formalism may be used to describe curved 2d surfaces, but it does so in a way that can be generalized to n-dimensional manifolds.
Specifically, the intrinsic geometric properties of a curved 2d surface is determined by the metric, from which one can define distances
(using the line element) and angles (using the inner product). A metric for a curved 2d surface diers from that of a flat metric in that there
exists no coordinate system such that the metric reduces to the Kronecker delta ij everywhere on the surface. The metric is important
for studying the intrinsic geometry of curved 2d surfaces and their higher-dimensional generalizations, but the covariant derivative (called
the connection) is even more important, because you can use it to construct a tensor that corresponds to the curvature of the surface (or
a manifold), which is a measure of the degree to which a surface/manifold fails to satisfy the geometric properties of Euclidean space. The
tensor that measures the intrinsic curvature of a surface/manifold is called the Riemann curvature tensor Ri jkl , which is formed from the
connection coecients i jk and their partial derivativesthe definition and formula for Ri jkl can be found in any textbook on General
Relativity (see for instance [5], [31], and [18]).
Exercise XI.1
Show that if the coecients ijk satisfy the transformation law (11.7) j v i and v (as defined in (11.6)
and (11.9)) satisfy (11.8):
y xj
v = j v i
xi y
Exercise XI.2
ij
Given the tensor transformation law for G :
y y ij
G = G , (11.25)
xi xj
Gij G
show that the partial derivatives xk
and y satisfies the following expression:
k ij
G x y y G y xk 2 y
ij y xk 2 y
= + G + Gij . (11.26)
y y xi xj xk xj y xk xi xi y xk xj
Use (11.26) to show that the covariant derivative k Gij as defined in (11.10) transforms as a tensor, or
that:
xk y y
G = k Gij , (11.27)
y xi xj
G
with G being given by the expression: G = y +
G
+ G .
1 This is sometimes called the comma-to-semicolon rule. This term refers to a commonly used shorthand notation for partial derivatives
T i
and covariant derivatives, in which partial derivatives of tensors are denoted by placing the derivative index after a comma: T i j, k = xkj ,
and covariant derivatives are similarly expressed, this time with a semicolon: T i j; k = k T i j . The comma-to-semicolon rule refers to the
replacement of partial derivatives (commas) with covariant derivatives (semicolons). For further discussion of this rule and the subtleties
of applying it in non-Euclidean geometries, refer to Chapter 4 of [31] and Box 16.1 of [18].
26
Exercise XI.3
Following the pattern in (11.13), write out the expression for the covariant derivative of the rank-4 tensor
Rijkl .
Exercise XI.4
In this exercise, I will walk you through a more careful justification of equations (11.19) and (11.20). Begin
by assuming that the covariant derivatives of T i j and Kij have the form:
Kij
k Kij = + Aaki Kaj + Aakj Kia
xk
(11.28)
T i j
k T i j = i
+ Pka T a j + Qakj T i a
xk
and expand out the covariant derivatives of the vectors ui , v i and dual vector wi using equations (11.6)
and (11.18). Subtract equations (11.29) from (11.30), and demand that the result holds for all choices of
ui , v i , wi , T i j and Kij . From this demand, you may infer that Aaki = aki , Pka
i
= ika , and Qakj = akj .
Exercise XI.5
In order for the operator i to be properly regarded as a derivative operator for the vector components
ui and v i , it should satisfy the following two properties for the vector components ui and v i :
1. Additivity:
( )
i uj + v j = i uj + i v j (11.31)
Show that the covariant derivative satisfies these properties (for the product rule, apply the definition
(11.10) to the product Gjk = uj v k ). Write down the corresponding properties for a pair of dual vectors
wi and pi , and show that i satisfies those properties for dual vectors. Finally, check that the Leibniz
rule holds for a vector v i and a dual vector wi :
( )
i v j wk = wk i v j + v j i wk (11.33)
Exercise XI.6
Derive (11.21) from equations (11.22) and (11.23). To do this, use (11.23) to obtain the expression:
Exercise XI.7
Use equation (11.21) and with the metric you obtained in exercise X.4 to derive the coecients ijk in
cylindrical coordinates. You should only have two unique nonzero coecients.
Exercise XI.8
Use equation (11.21) and with equation (10.3) to derive the coecients ijk in spherical coordinates.
Exercise XI.9
Extend the properties (11.31) and (11.32) to tensors of arbitrary rank, and show that the covariant
derivative operator i satisfies these properties. For instance, given the tensors T i j , K i j and Qi jk , Pi jk ,
show that the covariant derivative satisfies the following properties:
1. Additivity:
( )
i T j k + K j k = i T j k + i K j k (11.35)
( )
i Qj kl + Pj kl = i Qj kl + i Pj kl (11.36)
and use this result to convince yourself that i satisfies similar properties for tensors of arbitrary rank.
28
PDEs are often expressed in terms of divergences and Laplacians. In this short section, I describe their construction
in terms of covariant derivatives i . In the standard Gibbs-Heaviside notation for vector analysis, the divergence of
a vector field v is written as the following:
v (12.1)
i := g ik k (12.2)
v = gij i v j = gij g ik k v j = k k v j
j (12.3)
which leads to the following expression for the Divergence of a vector field:
i
v = i v i = v + iik v k
(12.4)
xi
Suppose that you wish to take the divergence of a matrix field; in other words you wish to write something to the
eect: M for some 3 3 matrix field M. An expression of the form M requires a good deal of explanation,
i
but if the matrix M is a rank-2 tensor with components M j , it has a straightforward tensor expression:
i
M = i M i j = M j + iik M k j kij M i k
(12.5)
xi
This example illustrates the power of the tensor formalism; expressions that are dicult to interpret in the standard
Gibbs-Heaviside notation become straightforward when written in tensor notation. Conversely, it is not too dicult
to imagine operations that are straightforward in tensor notation, but become extremely cumbersome to express in
Gibbs-Heaviside notationconsider, for instance, the following operator, which I call the matrix Laplacian:
M i j i j (12.6)
(v )
u (12.7)
which can be thought of as the directional derivative of a vector u. This quantity caused me great confusion when
I first saw it, because it was written as v
u; in that form, I didnt know whether to apply the dot product to
or
u, so in my mind, there were two meanings for v u:
gkj v k i uj (12.8)
v k k ui (12.9)
2 =
(12.10)
2 :=
= gij i j (12.11)
29
If the covariant derivative i satisfies the condition of metric compatibility (11.23) (it should, if we demand that
ijk = 0 in Cartesian coordinates), then one can use the product rule to obtain the result for the operator:
2 = g ij i j (12.12)
For a scalar function , (12.10) becomes:
( )
2
= g i j = g
2 ij ij
i j
kij (12.13)
x x xj
The Laplacian operator (12.12) constructed from covariant derivatives i may be used to obtain the action of the
Laplacian on vectors:
( k )
v
v = g i j v = g i
2 k ij k ij k
+ ja v a
xj
[ ( b ) ( k )] (12.14)
2 vk kja a k v
a
v v
=g ij
+ v + ja k
+ ib + ja v ij
b a b k
+ ba v a
xi xj xi xi xj xb
Since we know how covariant derivatives act on tensors of arbitrary rank, you could in principle construct the explicit
formula for the Laplacian for a tensor of arbitrary rank.
Exercise XII.1
Obtain the components of (v )
u (12.7) (in tensor form, v k k ui (12.9)) in cylindrical coordinates (recall
exercise X.4 and XI.7).
Exercise XII.2
The Euler equation for an inviscid fluid takes the form:
v + 1 f
+ (v )
v = P (12.15)
t
where t is time, v is the velocity of a fluid, is the fluid density, P is the fluid pressure, and f is an
external force density (force per unit mass). Rewrite the Euler equation in tensor form, then write down
the explicit expression for the Euler equation in cylindrical coordinates (again recall exercises X.4 and
XI.7).
Exercise XII.3
The Schrodinger equation for a single particle of mass m in nonrelativistic Quantum Mechanics has the
form:
2 2
i = + V (xa ) (12.16)
t 2m
where t is time, and V (xa ) are scalar functions, the reduced Planck constant, and i = 1. Rewrite
the Schrodinger equation in tensor form, then write down the explicit expression for the Schrodinger
equation in spherical coordinates (recall exercises X.2 and XI.8).
Exercise XII.4
Write out the components for the Laplacian of a vector 2v in spherical coordinates.
Exercise XII.5
Write out an explicit expression for the matrix Laplacian of (12.6) acting on a scalar function:
M i j i j (12.17)
ij
in terms of , gij , g and ijk .
Exercise XII.6
Given the expression (12.11), work out an explicit expression for the Laplacian of a rank-2 tensor field:
2 T i j (12.18)
in terms of T i j , gij , g ij and ijk .
30
XIII. THE LEVI-CIVITA TENSOR: CROSS PRODUCTS, CURLS, AND VOLUME INTEGRALS
Many PDEs contain expressions that involve cross products and curls. In the standard Gibbs-Heaviside notation,
the cross product and curl respectively take the form:
B
A (13.1)
A
(13.2)
and B.
for vectors A In both cases, the result is a vector.
To express (13.1) and (13.2) in component form, I need to define the following quantity1 ijk , called the permutation
symbol :2
1 if {i, j, k} is an even permutation of {1, 2, 3}
ijk = ijk
= 1 if {i, j, k} is an odd permutation of {1, 2, 3} (13.3)
0 if any two indices are equal
Ak
[
A]
i = g ia
ajk A = ajk
j k ia jb
(13.6)
xb
Where I have set g ij = ij and i = x
i . To be clear: Equations (13.5) and (13.6) are NOT tensor equations!
Equations (13.5) and (13.6) are ONLY valid in Cartesian coordinates. This is because, as stated earlier in bold, ijk
is not a tensor.
B
To see that ijk is not a tensor, Ill feed it the components of three vectors, A, and C,
to obtain the following
expression:
(B
A C)
= Ai B j C k (13.7)
ijk
Under a coordinate transformation, I obtain the left hand side of the following:
xi xj xk
ijk A B C = A B C (13.8)
y y y
The left hand side is not equal to the right hand side because:
xi xj xk
ijk
= (13.9)
y y y
1 Notethe underline!
2 Thepermutation symbol ijk is also called the Levi-Civita symbol, not to be confused with the Levi-Civita connection (see footnote 1 on
page 24), which is an entirely dierent concept.
31
If both sides (13.9) are equal for an arbitrary coordinate transformation, then one obtains a contradiction: either ijk
or must fail to satisfy (13.4), but both ijk and are defined by equation (13.4).
I cant proceed without introducing an expression for the determinant of a matrix. The permutation symbol ijk
can be used to define the determinant of a matrix M with components M i j :
1
det (M) = ijk M i 1 M j 2 M k 3 =
abc M i a M j b M k c (13.10)
3! ijk
where 3! = 3 2 1 = 6 is the factorial of 3. From the determinant formula above, one can deduce the following
identity:
ijk det(M) = abc M a i M b j M c k (13.11)
The justification for this identity is a bit involved, so I will put it in a footnote.1 The determinant can also be defined
for rank-2 tensors with raised or lowered indices. Particularly useful is the determinant of the metric tensor (which I
assume to be positive2 ):
1
|g| := det(gmn ) = ijk gi1 gj2 gk3 = ijk abc gia gjb gkc (13.14)
3!
Since I require that there exists an inverse metric g ij , the determinant is nonvanishing. What makes |g| useful is the
fact that under a coordinate transformation, it yields the square of the Jacobian determinant:
|g | = J 2 |g| (13.15)
where the Jacobian determinant J is defined as:
( )
xi
J := det (13.16)
y
I assume that the coordinate transformations do not involve reflections (such as x x), so that J > 0. The
transformation property (13.15) may be inferred from the following property of determinants for the 3 3 matrices
M and N:
det (M N) = det (M) det (N) (13.17)
As you might imagine, the transformation property (13.15) will make |g| particularly useful for constructing volume
integralsI will briefly discuss this later on.
xi
Now, I go back and compare the left hand side of equation (13.9) with the identity (13.11). Since y form the
1 xi xj xk
= (13.19)
J y y y ijk
1 To justify (13.11), I need to establish some properties of a completely antisymmetric rank-3 quantity Aijk , which is defined to be a quantity
that satisfies the following property:
Aijk = Ajki = Akij = Ajik = Aikj = Akji (13.12)
In 3-dimensions, the antisymmetry property constrains the value of Aijk so that it eectively has one independent component. To see
this, first note that if any two indices of Aijk have the same value, then Aijk = 0. The only nonvanishing components are those for which
the indices of Aijk are even or odd permutations of 123, of which there are 3! = 6. Equation (13.12) amounts to six constraints on the
nonvanishing components of Aijk , which implies that the six nonvanishing components must all be equal to the same variable or its
negative . One can infer that in 3 dimensions, any antisymmetric tensor is proportional to ijk , since ijk = 1:
Aijk = ijk (13.13)
Now note that both sides of equation (13.11) is an antisymmetric rank-3 quantity (it has three indices that arent summed over). The
indices {i, j, k} are antisymmetric by virtue of ijk . This observation (that the indices {i, j, k} are antisymmetric) tells us two important
facts: equation (13.11) is only nonvanishing if the indices ijk are an even or odd permutation of 123, and the right hand side of (13.11),
being antisymmetric, is proportional to ijk . That the constant of proportionality is the determinant of the matrices = det(M) comes
from contracting the right hand side of (13.11) with ijk , applying the identity ijk ijk = 3!, and recognizing the expression for the
determinant in (13.10).
2 If you plan to study Special and General Relativity, be warned: While the metric tensor in Euclidean space has positive determinant, the
In this form, the permutation symbol ijk almost transforms like a tensor; only the factor of 1/J prevents (13.19)
from being a tensor transformation law.
Fortunately, there is a simple fix. Recalling that the determinant of the metric |g| := det(gmn ) acquires a factor of
J 2 under a coordinate transformation (13.15), I can construct from ijk a quantity that transforms like a tensor by
multiplying ijk by |g|:
ijk = |g| ijk (13.20)
The quantity ijk (no underline!) is called the Levi-Civita tensor 1 , and it satisfies the following transformation law:
xi xj xk
= ijk (13.21)
y y y
With the definition (13.20) for the Levi-Civita tensor ijk , I can now write the cross product (13.5) and curl (13.6) as
tensor equations:
B]
[A i = g ia ajk Aj B k = |g| g ia j k
ajk A B (13.22)
[
A]
i = g ia g jb ajk b Ak = |g| g ia g jb ajk b Ak (13.23)
I conclude this section with a brief discussion of the volume integral. It is well-known that the volume element
dx1 dx2 dx3 acquires a factor of the Jacobian determinant J under a coordinate transformation.2 From equation
(13.15), it follows that the square root of |g| := det(gmn ) acquires a factor of the Jacobian determinant:
|g | = J |g| (13.24)
In Cartesian coordinates on Euclidean space, the metric gij is just the Kronecker delta ij , which has a determinant
of 1. In curvilinear coordinates on Euclidean space, the determinant of the metric is just the square of the Jacobian
determinant. This motivates the following definition for the volume element d3 V in curvilinear coordinates:
d3 V := |g| dx1 dx2 dx3 (13.25)
from which we obtain the following expression for the volume integral of a scalar function f (xa ) in curvilinear
coordinates:
f (xa ) d3 V = f (xa ) |g| dx1 dx2 dx3 . (13.26)
Exercise XIII.1
If you have taken a course in Electromagnetism, you should be familiar with Maxwells equations. Write
down the vector form for Maxwells equations in the standard vector notationif you have a background
in physics, do it from memory! If not, look them up.3 Rewrite all of Maxwells equations in tensor form.
Exercise XIII.2
Show (or convince yourself) of the following identities for the permutation symbol ijk (assuming 3 dimen-
sions):
ijk ijk = 3! = 6 (13.27)
1 Strictly speaking, it is not exactly a tensor since it acquires an extra negative sign under coordinate transformations involving reflections
(parity transformations), which I ignore in this section. For this reason, the Levi-Civita tensor is sometimes called a pseudotensor .
2 The full proof of this statement is beyond the scope of these notes; one can find a proof in [1] (see theorem 15.11), or one may provide a
justification through the formalism of dierential forms, which is also beyond the scope of these notes.
3 You may find an example of Maxwells equations in [12, 13].
33
Exercise XIII.3
Show that the second equality in (13.10) holds. In particular, show that (expand the sums):
ijk M i 1 M j 2 M k 3 = 1
3! ijk abc M i a M j b M k c (13.30)
Exercise XIII.4
Consider an antisymmetric rank-2 tensor Aij . Show that in 3-dimensions, the antisymmetry property
Aij = Aji reduces the number of independent components to 3.
Exercise XIII.5
Prove the following expressions by writing them out in tensor form:1
(B
A C) (C
=B A) (A
=C B)
(13.31)
(B
A C)
=B C)
(A C B)
(A (13.32)
(
A)
=
(
A)
A (13.33)
Exercise XIII.6
Recall that partial derivatives commute: xi xj = xj xi . Show that the following expressions hold in all
coordinate systems:2
(
A)
=0 (13.34)
(f
)=0 (13.35)
Exercise XIII.7
Write out the expression for the curl
v in cylindrical and spherical coordinates.
Exercise XIII.8
In two dimensions, the permutation symbol ij only has two indices: 12 = 21 = 1. Write down the
two-dimensional version of the integral (13.26). Compute the Jacobian determinant J (13.16) for the polar
coordinate
functions (10.9) in exercise X.1, and also the metric determinant |g|; in doing so, show that
J = |g|, and write down the explicit expression for the volume element in polar coordinates.
Exercise XIII.9
Work out the expression forthe Jacobian determinant J (13.16) for the spherical coordinate functions
(10.2), then do the same for |g| (recall exercise X.2). Check that J = |g|, and write down the volume
element. You should get the familiar result:
d3 V = r2 (sin )2 dr d d (13.36)
1 Hint:
The identities (13.27), (13.28), and (13.29) in exercise XIII.2 may be useful here.
2 Hint:
Write the equations in tensor form, then show that they hold in Cartesian coordinates. Think about the tensor transformation law
(compare your result with equation (9.16))
34
In this last section, I will review the formalism for surface integrals, and briefly describe how the tensor formalism
may be extended to the divergence theorem and Stokes theorem.
There are two ways of defining a 2d surface in Euclidean space. The first is to define a 2d surface, which I call
as the level surfaces of some scalar function (xa ). In particular, points on the surface must have coordinate values
such that the following constraint is satisfied:
(xa ) = C (14.1)
where C is some constant. This definition for the surface is useful, because the gradient of the function (xa ) can be
used to obtain the components for the unit normal vector to the surface (see exercise XIV.1):
1
ni = g ik k (14.2)
g ab a b
where the right hand side of equation (14.2) is evaluated at points that lie on the surface .
The other definition is parametricin particular, I parameterize the surface with two parameters, z 1 and z 2 , which
you can imagine to be coordinates for the surface . In fact, I will simply write z 1 and z 2 as z A , with the convention
that capital Latin indices correspond to coordinate indices for the surface . The coordinates for points on the surface
are defined by the coordinate functions xi (z A ), which are explicitly given by:
x1 = x1 (z 1 , z 2 )
x2 = x2 (z 1 , z 2 ) (14.3)
3 3 1 2
x = x (z , z )
with x1 , x2 , and x3 being coordinates in 3-dimensional Euclidean space. The parametric definition is useful because
it can be used to define the components for a metric tensor1 AB on the surface :
xi xj
AB := gij (14.4)
z A z B
The metric tensor AB is called the induced metric It is not too dicult to see that AB is equivalent to the metric
tensor gij for vectors tangent to the surface ; given the components T A for a tangent vector on in the basis zA ,
I can construct the components Ti for a tangent vector to in the basis x
i:
xi A
Ti = T (14.5)
z A
It follows from (14.5) that for vector components T A and S A in the basis
z A
, the induced metric yields the same
result as the vector components T i and S i in the basis x
i:
AB T A S B = gij T i S j (14.6)
Since the induced metric AB is equivalent to gij for tangent vectors, then the following line element:
ds2 = AB dz A dz B (14.7)
is equivalent to the line element (6.9) constructed from gij for distances along curves on the surface .
I now describe a strategy for establishing the relationship between the constraint definition (14.1) and the parametric
definition (14.3). One way to bridge the gap is to use an induced parameterization, in which two of the Euclidean
coordinates (x1 , x2 , x3 ) are used parameters, so that the parameterization includes the following coordinate functions:
x1 = z 1
(14.8)
x2 = z 2
1 The surface may in general be curved; in general, there exists no coordinate system on such that AB = AB everywhere on the
surface . Incidentally, if the surface is curved, then it is an example of a non-Euclidean space.
35
The last coordinate function x3 (z 1 , z 2 ) is then obtained by solving the constraint equation (xa ) = C for x3 . Of
course, the induced parameterization (14.8) assumes that is not a surface of constant x1 or x2 . Also, the induced
parameterization may only work on a portion of the surface, as the surface may contain more than one point that
has the same values for x1 and x2 .
Given a metric tensor AB for the surface and equation (13.26) for the volume integral, it is natural to construct
the surface integral in the following way:1
A 2
f (z ) d a = f (z A ) || dz 1 dz 2 (14.9)
where the 2d metric determinant || is given by the following formula:
|| := det(AB ) = 11 22 12 21 (14.10)
As argued earlier, the metric AB provides a measure of distance (in the form of the line element (14.7) on the surface
that is equivalent to the measure of distance (in the form of the line element (6.9) provided by the metric gij . From
this, one can infer that the area element in (14.9), being constructed from the determinant of AB is consistent with
the way we usually measure lengths in Euclidean space.
Given the components of the unit normal vector ni (14.2) and the area element in (14.9) I can define the directed
surface element:
d2 i := ni || dz 1 dz 2 (14.11)
For some region U in Euclidean space with a 2d boundary surface U , the divergence theorem for a vector field v i
can then be written as:
i v i |g| dx1 dx2 dx3 = v i d2 i (14.12)
U U
Now consider a 2d surface bounded by the closed 1-d path . The closed 1-d path may be described by a
parameterized curve xi (s), where s is some parameter. We may write the dierential dxi for the 1-d path as:
dxi
dxi =
ds (14.13)
ds
The classical2 Stokes theorem for a vector field v i can then be written as:
g g ajk b v d i =
ia jb k 2
vi dxi (14.14)
where ajk is the Levi-Civita tensor (13.20), and the line integral on the right hand side is performed in the right-
handed direction with respect to the directed surface element i .
Exercise XIV.1
Convert the gradient of a function (xa ) to a vector (with raised indices) G i . Take the inner product of
the gradient vector G i with another vector T i , and simplify it to obtain a familiar result. Do you recognize
the result? Use this result to argue that if the vector T i is tangent to the level surfaces (the surfaces
of constant (xa )) of the function (xa ), the inner product gij G i T j vanishes. Since the inner product
between the gradient vector G i and a tangent vector T vanishes, this demonstrates that the gradient vector
G i is normal to the level surfaces of (xa ).
Exercise XIV.2
Briefly explain why I need to divide by g ab a b on the right hand side of the formula (14.2) for the
components of the unit normal vector ni .
Exercise XIV.3
Rewrite equations (14.12) and the (classical) Stokes theorem (14.14) in the (Gibbs-Heaviside) form used
in elementary vector calculus courses.3
1 The 2d surface element || dz 1 dz 2 can be derived from the 3d surface element (13.25) by way of dierential forms, but dierential forms
are beyond the scope of these notes.
2 I say classical Stokes theorem to distinguish (14.14) from the generalized Stokes theorem, which is expressed with dierential forms.
Further discussion of the generalized Stokes theorem may be found in many textbooks on General Relativity such as [5, 8, 9, 18, 21, 31, 32],
and also in the books [2, 3, 11, 16, 19, 20, 27].
3 See for instance, equations 10.17 and 11.9 in [4] or equations II-30 and III-13 in [26].
36
Exercise XIV.4
Rewrite the integral version of Maxwells equations in tensor form.
Exercise XIV.5
Consider the usual Cartesian coordinates x, y, z on Euclidean space. The function for a sphere is given
by:
= x2 + y 2 + z 2 (14.15)
and the constraint is:
= r2 x2 + y 2 + z 2 = r2 (14.16)
for some constant r. Construct an induced parameterization for the sphere. Specifically, parameterize the
sphere with the parameters p and q, and write down the two coordinate functions:
x(p, q) = p
(14.17)
y(p, q) = q
Solve equation (14.16) for z to obtain the function z(x, y), and use (14.17) to obtain the function z(p, q).
Now take the following derivatives:
x x y y z z
, , , , , (14.18)
p q p q p q
Finally, use your result for (14.18) with (14.4) to obtain the induced metric AB for the sphere.
Exercise XIV.6
Again, find the induced metric AB for the sphere, but this time, start in spherical coordinates (as defined
in (10.2)) on Euclidean space. What is the constraint function (r, , )?1 It is appropriate to parameterize
the surface using and . You should obtain the following result for the metric components:
= 1
(14.19)
= r2 (sin )2
Now take the determinant of AB to obtain the surface element, then perform the following integral over
the entire sphere to obtain the surface area:
A= || d d (14.20)
and check that your result is consistent with the surface area of the sphere.
Exercise XIV.7
Given a matrix M dependent on some variable s, the Jacobi formula for the derivative of the determinant
is the following:
(det(M)) M i j
= det(M) Mj i . (14.21)
s s
1
where Mj i form the components
of the inverse matrix M . Use this to infer a similar Jacobi determinant
formula for the derivative of |g| with respect to some variable s. Use the result to show the following:
|g|
= |g| aja (14.22)
xj
Use (14.23) to show that:
( )
|g| i v i = |g| v i
(14.23)
xj
As I mentioned in my introductory remarks, these notes are meant to be self-contained, and I have done my best
to build the subject from first principles. Though I have written these notes for a broad audience, these notes may
be unsatisfactory to somethey may not be pedagogical or rigorous enough to satisfy everyone. Those familiar with
early versions of these notes can attest to the significant changes I have madeit turns out that over time, even I
can become dissatisfied with my notes! I must also mention that these notes do not form a complete discussion of
tensor analysis; I have omitted many important topics, such as dierential forms (and the associated formalism), Lie
dierentiation, and of course, Riemannian geometry.
If you wish to learn more about tensors and their applications (or if you find these notes lacking), I have compiled
a list of books, many of which I have cited throughout these notes. This list is by no means comprehensive, and
mainly reflects my background in General RelativityI am certain that I have only sampled a small fraction of the
literature which discusses tensors and their applications. The books contained in this list, which form a (rather large)
subset of the references on the following page, are chosen because they either contain a more complete discussion of
the material contained in these notes, contain an extensive discussion of the tensor formalism, or discuss applications
of the tensor formalism in geometry and physics.
Books on Tensors
Synge, J. and Schild, A. (1949). Tensor Calculus: by J.L. Synge and A. Schild. University Press
Jeevanjee, N. (2015). An Introduction to Tensors and Group Theory for Physicists. Springer
Lawden, D. (2012). An Introduction to Tensor Calculus: Relativity and Cosmology. Dover
Lovelock, D. and Rund, H. (1989). Tensors, Dierential Forms, and Variational Principles. Dover
Bishop, R. and Goldberg, S. (1968). Tensor Analysis on Manifolds. Dover
Books on Physics and Mathematics
Matzner, R. A. and Shepley, L. C. (1991). Classical Mechanics. Prentice Hall
Schutz, B. (1980). Geometrical Methods of Mathematical Physics. Cambridge University Press
Baez, J. C. and Muniain, J. (1994). Gauge Fields, Knots, and Gravity. World Scientific
Eisenhart, L. (1925). Riemannian Geometry. Princeton University Press
Frankel, T. (2011). The Geometry of Physics: An Introduction. Cambridge University Press
Nash, C. and Sen, S. (2011). Topology and Geometry for Physicists. Dover
Nakahara, M. (2003). Geometry, Topology and Physics, Second Edition. Taylor & Francis
Penrose, R. (2004). The Road to Reality: A Complete Guide to the Laws of the Universe. Random House
General Relativity Textbooks
Schutz, B. (1985). A First Course in General Relativity. Cambridge University Press
Choquet-Bruhat, Y. (2015). Introduction to General Relativity, Black Holes, and Cosmology. Oxford
University Press
Dray, T. (2014). Dierential Forms and the Geometry of General Relativity. Taylor & Francis
Zee, A. (2013). Einstein Gravity in a Nutshell. Princeton University Press
Carroll, S. (2004). Spacetime and Geometry: An Introduction to General Relativity. Addison Wesley
Raychaudhuri, A., Banerji, S., and Banerjee, A. (1992). General Relativity, Astrophysics, and Cosmology.
Springer
Padmanabhan, T. (2010). Gravitation: Foundations and Frontiers. Cambridge University Press
Weinberg, S. (1972). Gravitation and cosmology: principles and applications of the general theory of
relativity. Wiley
Wald, R. (1984). General Relativity. University of Chicago Press
Poisson, E. (2004). A Relativists Toolkit: The Mathematics of Black-Hole Mechanics. Cambridge Uni-
versity Press
Ciufolini, I. and Wheeler, J. (1995). Gravitation and Inertia. Princeton University Press
Misner, C. W., Thorne, K. S., and Wheeler, J. A. (1973). Gravitation. W. H. Freeman
38
ACKNOWLEDGMENTS
My advisor Richard Matzner is responsible for most of my knowledge of the tensor formalism, and I thank him
for his feedback on earlier versions of these notes. Austin Gleeson, Richard Hazeltine and Philip Morrison taught
me some useful identities and introduced me to some interesting applications for the tensor formalism beyond that
of General Relativity. Early in my graduate career, Lawrence Shepley and Philip Morrison organized a seminar in
dierential geometry, which further expanded my knowledge and understanding of tensors. I thank Luis Suazo for
his visualization of dual vectors/one-forms, and for helping me to think more carefully about the meaning of the
formalism. I also thank Brandon Furey for alerting me to some definitions that students may not be aware of. Special
thanks to all those that attended the lectures I gave for a crash course in General Relativity, and to all who provided
suggestions, typo fixes and feedback (whether in person or by correspondence). Among them are Mark Baumann,
Sophia Bogat, Alex Buchanan, Bryce Burchak, Joel Doss, Blake Duschatko, Justin Kang, Dave Klein, Baruch Garcia,
Bryton Hall, Alan Myers, Avery Pawelek, Jean-Jacq du Plessis, Mark Selover, Zachariah Rollins, Lucas Spencer,
Romin Stuart-Rasi, Paul Walter, Sam Wang and Shiv Akshar Yadavalli.
REFERENCES
39
40
Components, 9, 20
Determinant, 31
For a Surface, 34
In Cartesian Coordinates, 9
Induced Metric, 34
Inverse, 9
Line Element, 10
Properties, 9
Symmetry, 9
Transformation Law, 17
Transformation of Determinant, 31
Metric Compatibility, 24
Metric Determinant
2-dimensional, 35
Rank, 18
Scalar Function, 16
Schrodinger Equation, 29
Sphere, 36
Spherical Coordinates, 20
Stokes Theorem, 35
Surface Integral, 35
Symmetric Part, 14
Symmetrization, 14
Tangent Vector, 7, 34
Tensor
Definition, 17
Tensor Equation, 18
Tensor Product, 13, 19
Tensor Transformation Law, 17
Torsion-Free Condition, 24
Trace, 13
Vector
Transformation Law, 17
Vector Fields, 7
Vector Identities, 33
Volume Element, 32
In Spherical Coordinates, 33
Volume Integral, 32