Lecture Notes For Advanced Calculus: James S. Cook Liberty University

Download as pdf or txt
Download as pdf or txt
You are on page 1of 273
At a glance
Powered by AI
The document outlines the author's approach and goals for teaching an advanced calculus course, which focuses more on the algebraic aspects compared to other texts. The author hopes to provide students with an understanding of foundational algebraic concepts before delving into more advanced analytic topics.

The author's goal is to focus on the algebraic aspects of advanced calculus. While other texts cover both the algebraic and analytic aspects, making them difficult to follow at times, the author aims to first establish a firm algebraic foundation.

The author hopes students will understand concepts like continuity, derivatives, differentiability, theorems involving functions, multivariate calculus topics, linear algebra concepts, and an introduction to manifolds and differential forms by the end of the course.

Lecture Notes for Advanced Calculus

James S. Cook
Liberty University
Department of Mathematics

Fall 2011
2

introduction and motivations for these notes


There are many excellent texts on portions of this subject. However, the particular path I choose
this semester is not quite in line with any particular text. I required the text on Advanced Cal-
culus by Edwards because it contains all the major theorems that traditionally are covered in an
Advanced Calculus course.

My focus differs significantly. If I had students who had already completed a semester in real
analysis then we could delve into the more analytic aspects of the subject. However, real analy-
sis is not a prerequisite so we take a different path. Generically the story is as follows: a linear
approximation replaces a complicated object very well so long as we are close to the base-point
for the approximation. The first level of understanding is what I would characterize as algebraic,
beyond that is the analytic understanding. I would argue that we must first have a firm grasp of
the algebraic before we can properly attack the analytic aspects of the subject.

Edwards covers both the algebraic and the analytic. This makes his text hard to read in places
because the full story is at some points technical. My goal is to focus on the algebraic. That said,
I will try to at least point the reader to the section of Edward where the proof can be found.

Linear algebra is not a prerequisite for this course. However, I will use linear algebra. Matrices,
linear transformations and vector spaces are necessary ingredients for a proper discussion of ad-
vanced calculus. I believe an interested student can easily assimilate the needed tools as we go so I
am not terribly worried if you have not had linear algebra previously. I will make a point to include
some baby1 linear exercises to make sure everyone who is working at this course keeps up with the
story that unfolds.

Doing the homework is doing the course. I cannot overemphasize the importance of thinking
through the homework. I would be happy if you left this course with a working knowledge of:

✓ set-theoretic mapping langauge, fibers and images and how to picture relationships diagra-
matically.

✓ continuity in view of the metric topology in n-space.

✓ the concept and application of the derivative and differential of a mapping.

✓ continuous differentiability

✓ inverse function theorem

✓ implicit function theorem

✓ tangent space and normal space via gradients


1
if you view this as an insult then you haven’t met the right babies yet. Baby exercises are cute.
3

✓ extrema for multivariate functions, critical points and the Lagrange multiplier method

✓ multivariate Taylor series.

✓ quadratic forms

✓ critical point analysis for multivariate functions

✓ dual space and the dual basis.

✓ multilinear algebra.

✓ metric dualities and Hodge duality.

✓ the work and flux form mappings for ℝ3 .

✓ basic manifold theory

✓ vector fields as derivations.

✓ Lie series and how vector fields generate symmetries

✓ differential forms and the exterior derivative

✓ integration of forms

✓ generalized Stokes’s Theorem.

✓ surfaces

✓ fundmental forms and curvature for surfaces

✓ differential form formulation of classical differential geometry

✓ some algebra and calculus of supermathematics

Before we begin, I should warn you that I assume quite a few things from the reader. These notes
are intended for someone who has already grappled with the problem of constructing proofs. I
assume you know the difference between ⇒ and ⇔. I assume the phrase ”iff” is known to you.
I assume you are ready and willing to do a proof by induction, strong or weak. I assume you
know what ℝ, ℂ, ℚ, ℕ and ℤ denote. I assume you know what a subset of a set is. I assume you
know how to prove two sets are equal. I assume you are familar with basic set operations such
as union and intersection (although we don’t use those much). More importantly, I assume you
have started to appreciate that mathematics is more than just calculations. Calculations without
context, without theory, are doomed to failure. At a minimum theory and proper mathematics
allows you to communicate analytical concepts to other like-educated individuals.

Some of the most seemingly basic objects in mathematics are insidiously complex. We’ve been
taught they’re simple since our childhood, but as adults, mathematical adults, we find the actual
4

definitions of such objects as ℝ or ℂ are rather involved. I will not attempt to provide foundational
arguments to build numbers from basic set theory. I believe it is possible, I think it’s well-thought-
out mathematics, but we take the existence of the real numbers as an axiom for these notes. We
assume that ℝ exists and that the real numbers possess all their usual properties. In fact, I assume
ℝ, ℂ, ℚ, ℕ and ℤ all exist complete with their standard properties. In short, I assume we have
numbers to work with. We leave the rigorization of numbers to a different course.

The format of these notes is similar to that of my calculus and linear algebra and advanced calculus
notes from 2009-2011. However, I will make a number of definitions in the body of the text. Those
sort of definitions are typically background-type definitions and I will make a point of putting them
in bold so you can find them with ease.

I have avoided use of Einstein’s implicit summation notation in the majority of these notes. This
has introduced some clutter in calculations, but I hope the student finds the added detail helpful.
Naturally if one goes on to study tensor calculations in physics then no such luxury is granted, you
will have to grapple with the meaning of Einstein’s convention. I suspect that is a minority in this
audience so I took that task off the to-do list for this course.

The content of this course differs somewhat from my previous offering. The presentation of ge-
ometry and manifolds is almost entirely altered. Also, I have removed the chapter on Newtonian
mechanics as well as the later chapter on variational calculus. Naturally, the interested student is
invited to study those as indendent studies past this course. If interested please ask.

I should mention that James Callahan’s Advanced Calculus: a geometric view has influenced my
thinking in this reformulation of my notes. His discussion of Morse’s work was a useful addition to
the critical point analysis.

I was inspired by Flander’s text on differential form computation. It is my goal to implement some
of his nicer calculations as an addition to my previous treatment of differential forms. In addition,
I intend to encorporate material from Burns and Gidea’s Differential Geometry and Topology with
a View to Dynamical Systems as well as Munkrese’ Analysis on Manifolds. These additions should
greatly improve the depth of the manifold discussion. I intend to go significantly deeper this year
so the student can perhaps begin to appreciate manifold theory.

I plan to take the last few weeks of class to discuss supermathematics. This will serve as a sideways
review for calculus on ℝ𝑛 . In addition, I hope the exercise of abstracting calculus to supernumbers
gives you some ideas about the process of abstraction in general. Abstraction is a cornerstone of
modern mathematics and it is an essential skill for a mathematician. We may also discuss some of
the historical motivations and modern applications of supermath to supersymmetric field theory.

NOTE To BE DELETED:
-add pictures from 2009 notes.
5

-change equations to numbered equations.


6
Contents

1 set-up 11
1.1 set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 vectors and geometry for 𝑛-dimensional space . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 vector algebra for three dimensions . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.2 compact notations for vector arithmetic . . . . . . . . . . . . . . . . . . . . . 20
1.3 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 elementary topology and limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 linear algebra 37
2.1 vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 matrix calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.1 a gallery of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.2 standard matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.3 coordinates and isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 differentiation 67
3.1 the differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 partial derivatives and the existence of the differential . . . . . . . . . . . . . . . . . 73
3.2.1 directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.2 continuously differentiable, a cautionary tale . . . . . . . . . . . . . . . . . . 78
3.2.3 gallery of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3 additivity and homogeneity of the derivative . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.5 product rules? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.1 scalar-vector product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.2 calculus of paths in ℝ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.5.3 calculus of matrix-valued functions of a real variable . . . . . . . . . . . . . . 93
3.5.4 calculus of complex-valued functions of a real variable . . . . . . . . . . . . . 95
3.6 complex analysis in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.6.1 harmonic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7
8 CONTENTS

4 inverse and implicit function theorems 103


4.1 inverse function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3 implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5 geometry of level sets 119


5.1 definition of level set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2 tangents and normals to a level set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3 method of Lagrange mulitpliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6 critical point analysis for several variables 133


6.1 multivariate power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.1.1 taylor’s polynomial for one-variable . . . . . . . . . . . . . . . . . . . . . . . . 133
6.1.2 taylor’s multinomial for two-variables . . . . . . . . . . . . . . . . . . . . . . 135
6.1.3 taylor’s multinomial for many-variables . . . . . . . . . . . . . . . . . . . . . 138
6.2 a brief introduction to the theory of quadratic forms . . . . . . . . . . . . . . . . . . 141
6.2.1 diagonalizing forms via eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 144
6.3 second derivative test in many-variables . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.3.1 morse theory and future reading . . . . . . . . . . . . . . . . . . . . . . . . . 154

7 multilinear algebra 155


7.1 dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 multilinearity and the tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.1 bilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.2 trilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2.3 multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.3 wedge product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.3.1 wedge product of dual basis generates basis for Λ𝑉 . . . . . . . . . . . . . . . 167
7.3.2 the exterior algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.3.3 connecting vectors and forms in ℝ3 . . . . . . . . . . . . . . . . . . . . . . . . 175
7.4 bilinear forms and geometry; metric duality . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.1 metric geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.2 metric duality for tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.4.3 inner products and induced norm . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.5 hodge duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.5.1 hodge duality in euclidean space ℝ3 . . . . . . . . . . . . . . . . . . . . . . . 183
7.5.2 hodge duality in minkowski space ℝ4 . . . . . . . . . . . . . . . . . . . . . . . 185
7.6 coordinate change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.6.1 coordinate change for 𝑇20 (𝑉 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
CONTENTS 9

8 manifold theory 193


8.1 manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.1.1 embedded manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.2 manifolds defined by charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.1.3 diffeomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.2 tangent space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.2.1 equivalence classes of curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.2.2 contravariant vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2.3 derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2.4 dictionary between formalisms . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.3 the differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.4 cotangent space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.5 tensors at a point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.6 tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.7 metric tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.7.1 classical metric notation in ℝ𝑚 . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.7.2 metric tensor on a smooth manifold . . . . . . . . . . . . . . . . . . . . . . . 221
8.8 on boundaries and submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

9 differential forms 229


9.1 algebra of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.2 exterior derivatives: the calculus of forms . . . . . . . . . . . . . . . . . . . . . . . . 231
9.2.1 coordinate independence of exterior derivative . . . . . . . . . . . . . . . . . . 232
9.2.2 exterior derivatives on ℝ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.3 pullbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
9.4 integration of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.4.1 integration of 𝑘-form on ℝ𝑘 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.4.2 orientations and submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.5 Generalized Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.6 poincare’s lemma and converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6.1 exact forms are closed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6.2 potentials for closed forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
9.7 classical differential geometry in forms . . . . . . . . . . . . . . . . . . . . . . . . . . 257
9.8 E & M in differential form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.8.1 differential forms in Minkowski space . . . . . . . . . . . . . . . . . . . . . . . 258
9.8.2 exterior derivatives of charge forms, field tensors, and their duals . . . . . . 263
9.8.3 coderivatives and comparing to Griffith’s relativitic E & M . . . . . . . . . . 265
9.8.4 Maxwell’s equations are relativistically covariant . . . . . . . . . . . . . . . . 266
9.8.5 Electrostatics in Five dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 268

10 supermath 273
10 CONTENTS
Chapter 1

set-up

In this chapter we settle some basic terminology about sets and functions.

1.1 set theory


Let us denote sets by capital letters in as much as is possible. Often the lower-case letter of the
same symbol will denote an element; 𝑎 ∈ 𝐴 is to mean that the object 𝑎 is in the set 𝐴. We can
abbreviate 𝑎1 ∈ 𝐴 and 𝑎2 ∈ 𝐴 by simply writing 𝑎1 , 𝑎2 ∈ 𝐴, this is a standard notation. The union
of two sets 𝐴 and 𝐵 is denoted 𝐴 ∪ 𝐵 = {𝑥∣𝑥 ∈ 𝐴 𝑜𝑟 𝑥 ∈ 𝐵}. The intersection of two sets is
denoted 𝐴 ∩ 𝐵 = {𝑥∣𝑥 ∈ 𝐴 𝑎𝑛𝑑 𝑥 ∈ 𝐵}. If a set 𝑆 has no elements then we say 𝑆 is the empty set
and denote this by writing 𝑆 = ∅. It sometimes convenient to use unions or intersections of several
sets: ∪
𝑈𝛼 = {𝑥 ∣ there exists 𝛼 ∈ Λ with 𝑥 ∈ 𝑈𝛼 }
𝛼∈Λ

𝑈𝛼 = {𝑥 ∣ for all 𝛼 ∈ Λ we have 𝑥 ∈ 𝑈𝛼 }
𝛼∈Λ

we say Λ is the index set in the definitions above. If Λ is a finite set then the union/intersection
is said to be a finite union/interection. If Λ is a countable set then the union/intersection is said
to be a countable union/interection1 . Suppose 𝐴 and 𝐵 are both sets then we say 𝐴 is a subset
of 𝐵 and write 𝐴 ⊆ 𝐵 iff 𝑎 ∈ 𝐴 implies 𝑎 ∈ 𝐵 for all 𝑎 ∈ 𝐴. If 𝐴 ⊆ 𝐵 then we also say 𝐵 is a
superset of 𝐴. If 𝐴 ⊆ 𝐵 then we say 𝐴 ⊂ 𝐵 iff 𝐴 ∕= 𝐵 and 𝐴 ∕= ∅. Recall, for sets 𝐴, 𝐵 we define
𝐴 = 𝐵 iff 𝑎 ∈ 𝐴 implies 𝑎 ∈ 𝐵 for all 𝑎 ∈ 𝐴 and conversely 𝑏 ∈ 𝐵 implies 𝑏 ∈ 𝐴 for all 𝑏 ∈ 𝐵. This
is equivalent to insisting 𝐴 = 𝐵 iff 𝐴 ⊆ 𝐵 and 𝐵 ⊆ 𝐴. The difference of two sets 𝐴 and 𝐵 is
denoted 𝐴 − 𝐵 and is defined by 𝐴 − 𝐵 = {𝑎 ∈ 𝐴 ∣ such that 𝑎 ∈ / 𝐵}2 .

1
recall the term countable simply means there exists a bijection to the natural numbers. The cardinality of such
a set is said to be ℵ𝑜
2
other texts somtimes use 𝐴 − 𝐵 = 𝐴 ∖ 𝐵

11
12 CHAPTER 1. SET-UP

We often make use of the following standard sets:

natural numbers (positive integers); ℕ = {1, 2, 3, . . . }.

natural numbers up to the number 𝑛; ℕ𝑛 = {1, 2, 3, . . . , 𝑛 − 1, 𝑛}.

integers; ℤ = {. . . , −2, −1, 0, 1, 2, . . . }. Note, ℤ>0 = ℕ.

non-negative integers; ℤ≥0 = {0, 1, 2, . . . } = ℕ ∪ {0}.

negative integers; ℤ<0 = {−1, −2, −3, . . . } = −ℕ.

rational numbers; ℚ = { 𝑝𝑞 ∣ 𝑝, 𝑞 ∈ ℤ, 𝑞 ∕= 0}.

irrational numbers; 𝕁 = {𝑥 ∈ ℝ ∣ 𝑥 ∈
/ ℚ}.

open interval from 𝑎 to 𝑏; (𝑎, 𝑏) = {𝑥∣𝑎 < 𝑥 < 𝑏}.

half-open interval; (𝑎, 𝑏] = {𝑥 ∣ 𝑎 < 𝑥 ≤ 𝑏} or [𝑎, 𝑏) = {𝑥 ∣ 𝑎 ≤ 𝑥 < 𝑏}.

closed interval; [𝑎, 𝑏] = {𝑥 ∣ 𝑎 ≤ 𝑥 ≤ 𝑏}.

The final, and for us the most important, construction in set-theory is called the Cartesian product.
Let 𝐴, 𝐵, 𝐶 be sets, we define:

𝐴 × 𝐵 = {(𝑎, 𝑏) ∣ 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵}

By a slight abuse of notation3 we also define:

𝐴 × 𝐵 × 𝐶 = {(𝑎, 𝑏, 𝑐) ∣ 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵 and 𝑐 ∈ 𝐶}

In the case the sets comprising the cartesian product are the same we use an exponential notation
for the construction:
𝐴2 = 𝐴 × 𝐴, 𝐴3 = 𝐴 × 𝐴 × 𝐴
We can extend to finitely many sets. Suppose 𝐴𝑖 is a set for 𝑖 = 1, 2, . . . 𝑛 then we denote the
Cartesian product by
𝐴1 × 𝐴2 × ⋅ ⋅ ⋅ 𝐴𝑛 = ×𝑛𝑖=1 𝐴𝑖
and define ⃗𝑥 ∈ ×𝑛𝑖=1 𝐴𝑖 iff ⃗𝑥 = (𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ) where 𝑎𝑖 ∈ 𝐴𝑖 for each 𝑖 = 1, 2, . . . 𝑛. An element ⃗𝑥
as above is often called an n-tuple.

We define ℝ2 = {(𝑥, 𝑦) ∣ 𝑥, 𝑦 ∈ ℝ}. I refer to ℝ2 as ”R-two” in conversational mathematics. Like-


wise, ”R-three” is defined by ℝ3 = {(𝑥, 𝑦, 𝑧) ∣ 𝑥, 𝑦, 𝑧 ∈ ℝ}. We are ultimately interested in studying
”R-n” where ℝ𝑛 = {(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) ∣ 𝑥𝑖 ∈ ℝ for 𝑖 = 1, 2, . . . , 𝑛}. In this course if we consider ℝ𝑚
3
technically 𝐴 × (𝐵 × 𝐶) ∕= (𝐴 × 𝐵) × 𝐶 since objects of the form (𝑎, (𝑏, 𝑐)) are not the same as ((𝑎, 𝑏), 𝑐), we
ignore these distinctions and map both of these to the triple (𝑎, 𝑏, 𝑐) without ambiguity in what follows
1.2. VECTORS AND GEOMETRY FOR 𝑁 -DIMENSIONAL SPACE 13

it is assumed from the context that 𝑚 ∈ ℕ.

In terms of cartesian products you can imagine the 𝑥-axis as the number line then if we paste
another numberline at each 𝑥 value the union of all such lines constucts the plane; this is the
picture behind ℝ2 = ℝ × ℝ. Another interesting cartesian product is the unit-square; [0, 1]2 =
[0, 1] × [0, 1] = {(𝑥, 𝑦) ∣ 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1}. Sometimes a rectangle in the plane with it’s edges
included can be written as [𝑥1 , 𝑥2 ] × [𝑦1 , 𝑦2 ]. If we want to remove the edges use (𝑥1 , 𝑥2 ) × (𝑦1 , 𝑦2 ).

Moving to three dimensions we can construct the unit-cube as [0, 1]3 . A generic rectangu-
lar solid can sometimes be represented as [𝑥1 , 𝑥2 ] × [𝑦1 , 𝑦2 ] × [𝑧1 , 𝑧2 ] or if we delete the edges:
(𝑥1 , 𝑥2 ) × (𝑦1 , 𝑦2 ) × (𝑧1 , 𝑧2 ).

1.2 vectors and geometry for 𝑛-dimensional space


Definition 1.2.1.

Let 𝑛 ∈ ℕ, we define ℝ𝑛 = {(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) ∣ 𝑥𝑗 ∈ ℝ 𝑓 𝑜𝑟 𝑗 = 1, 2, . . . , 𝑛}. If 𝑣 ∈ ℝ𝑛


then we say 𝑣 is an n-vector. The numbers in the vector are called the components;
𝑣 = (𝑣1 , 𝑣2 , . . . , 𝑣𝑛 ) has 𝑗-th component 𝑣𝑗 .
Notice, a consequence of the definition above and the construction of the Cartesian product4 is that
two vectors 𝑣 and 𝑤 are equal iff 𝑣𝑗 = 𝑤𝑗 for all 𝑗 ∈ ℕ𝑛 . Equality of two vectors is only true if all
components are found to match. It is therefore logical to define addition and scalar multiplication
in terms of the components of vectors as follows:

Definition 1.2.2.

Define functions + : ℝ𝑛 × ℝ𝑛 → ℝ𝑛 and ⋅ : ℝ × ℝ𝑛 → ℝ𝑛 by the following rules: for each


𝑣, 𝑤 ∈ ℝ𝑛 and 𝑐 ∈ ℝ:

(1.) (𝑣 + 𝑤)𝑗 = 𝑣𝑗 + 𝑤𝑗 (2.) (𝑐𝑣)𝑗 = 𝑐𝑣𝑗

for all 𝑗 ∈ {1, 2, . . . , 𝑛}. The operation + is called vector addition and it takes two
vectors 𝑣, 𝑤 ∈ ℝ𝑛 and produces another vector 𝑣 + 𝑤 ∈ ℝ𝑛 . The operation ⋅ is called scalar
multiplication and it takes a number 𝑐 ∈ ℝ and a vector 𝑣 ∈ ℝ𝑛 and produces another
vector 𝑐 ⋅ 𝑣 ∈ ℝ𝑛 . Often we simply denote 𝑐 ⋅ 𝑣 by juxtaposition 𝑐𝑣.
If you are a gifted at visualization then perhaps you can add three-dimensional vectors in your
mind. If you’re mind is really unhinged maybe you can even add 4 or 5 dimensional vectors. The
beauty of the definition above is that we have no need of pictures. Instead, algebra will do just
fine. That said, let’s draw a few pictures.

4
see my Math 200 notes or ask me if interested, it’s not entirely trivial
14 CHAPTER 1. SET-UP

Notice these pictures go to show how you can break-down vectors into component vectors which
point in the direction of the coordinate axis. Vectors of length one which point in the coordinate
directions make up what is called the standard basis5 It is convenient to define special notation
for the standard basis. First I define a useful shorthand,
Definition 1.2.3.
{
1 ,𝑖 = 𝑗
The symbol 𝛿𝑖𝑗 = is called the Kronecker delta.
0 , 𝑖 ∕= 𝑗

For example, 𝛿22 = 1 while 𝛿12 = 0.


Definition 1.2.4.
Let 𝑒𝑖 ∈ ℝ𝑛×1 be defined by (𝑒𝑖 )𝑗 = 𝛿𝑖𝑗 . The size of the vector 𝑒𝑖 is determined by context.
We call 𝑒𝑖 the 𝑖-th standard basis vector.

Example 1.2.5. Let me expand on what I mean by ”context” in the definition above:
In ℝ we have 𝑒1 = (1) = 1 (by convention we drop the brackets in this case)
5
the term ”basis” is carefully developed in the linear algebra course. In a nutshell we need two things: (1.) the
basis has to be big enough that we can add togther the basis elements to make any thing in the set (2.) the basis is
minimal so no single element in the basis can be formed by adding togther other basis elements
1.2. VECTORS AND GEOMETRY FOR 𝑁 -DIMENSIONAL SPACE 15

In ℝ2 we have 𝑒1 = (1, 0) and 𝑒2 = (0, 1).


In ℝ3 we have 𝑒1 = (1, 0, 0) and 𝑒2 = (0, 1, 0) and 𝑒3 = (0, 0, 1).
In ℝ4 we have 𝑒1 = (1, 0, 0, 0) and 𝑒2 = (0, 1, 0, 0) and 𝑒3 = (0, 0, 1, 0) and 𝑒4 = (0, 0, 0, 1).

A real linear combination of {𝑣1 , 𝑣2 , ⋅ ⋅ ⋅ , 𝑣𝑛 } is simply a finite weighted-sum of the objects from
the set; 𝑐1 𝑣1 + 𝑐2 𝑣2 + ⋅ ⋅ ⋅ 𝑐𝑘 𝑣𝑘 where 𝑐1 , 𝑐2 , ⋅ ⋅ ⋅ 𝑐𝑘 ∈ ℝ. If we take coefficients 𝑐1 , 𝑐2 , ⋅ ⋅ ⋅ 𝑐𝑘 ∈ ℂ then
is is said to be a complex linear combination. I invite the reader to verify that every vector in
ℝ𝑛 is a linear combination of 𝑒1 , 𝑒2 , . . . , 𝑒𝑛 6 . It is not difficult to prove the following properties for
vector addition and scalar multiplication: for all 𝑥, 𝑦, 𝑧 ∈ ℝ𝑛 and 𝑎, 𝑏 ∈ ℝ,

(𝑖.) 𝑥 + 𝑦 = 𝑦 + 𝑥, (𝑖𝑖.) (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧)
(𝑖𝑖𝑖.) 𝑥 + 0 = 𝑥, (𝑖𝑣.) 𝑥 − 𝑥 = 0
(𝑣.) 1𝑥 = 𝑥, (𝑣𝑖.) (𝑎𝑏)𝑥 = 𝑎(𝑏𝑥),
(𝑣𝑖𝑖.) 𝑎(𝑥 + 𝑦) = 𝑎𝑥 + 𝑎𝑦, (𝑣𝑖𝑖𝑖.) (𝑎 + 𝑏)𝑥 = 𝑎𝑥 + 𝑏𝑥
(𝑖𝑥.) 𝑥 + 𝑦 ∈ ℝ𝑛 (𝑥.) 𝑐𝑥 ∈ ℝ𝑛

These properties of ℝ𝑛 are abstracted in linear algebra to form the definition of an abstract vector
space. Naturally ℝ𝑛 is a vector space, in fact it is the quintessial model for all other vector spaces.
Fortunately ℝ𝑛 also has a dot-product. The dot-product is a mapping from ℝ𝑛 × ℝ𝑛 to ℝ. We take
in a pair of vectors and output a real number.

Definition 1.2.6. Let 𝑥, 𝑦 ∈ ℝ𝑛 we define 𝑥 ⋅ 𝑦 ∈ ℝ by

𝑥 ⋅ 𝑦 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋅ ⋅ ⋅ + 𝑥𝑛 𝑦𝑛 .

Example 1.2.7. Let 𝑣 = (1, 2, 3, 4, 5) and 𝑤 = (6, 7, 8, 9, 10)

𝑣 ⋅ 𝑤 = 6 + 14 + 24 + 36 + 50 = 130

Example 1.2.8. Suppose we are given a vector 𝑣 ∈ ℝ𝑛 . We can select the 𝑗-th component by
taking the dot-product of 𝑣 with 𝑒𝑗 . Observe that 𝑒𝑖 ⋅ 𝑒𝑗 = 𝛿𝑖𝑗 and consider,
𝑛
(∑ ) 𝑛
∑ 𝑛

𝑣 ⋅ 𝑒𝑗 = 𝑣𝑖 𝑒 𝑖 ⋅ 𝑒𝑗 = 𝑣𝑖 𝑒𝑖 ⋅ 𝑒𝑗 = 𝑣𝑖 𝛿𝑖𝑗 = 𝑣1 𝛿1𝑗 + ⋅ ⋅ ⋅ + 𝑣𝑗 𝛿𝑗𝑗 + ⋅ ⋅ ⋅ + 𝛿𝑛𝑗 𝑣𝑛 = 𝑣𝑗 .
𝑖=1 𝑖=1 𝑖=1

The dot-product with 𝑒𝑗 has given us the length of the vector 𝑣 in the 𝑗-th direction.

The length or norm of a vector and the angle between two vectors are induced from the dot-product:

Definition 1.2.9.
6
the calculation is given explicitly in my linear notes
16 CHAPTER 1. SET-UP


The length or norm of 𝑥 ∈ ℝ𝑛 is a real number which is defined by ∣∣𝑥∣∣ = 𝑥 ⋅ 𝑥.
Furthermore, 𝑛
−1
[ 𝑥⋅𝑦 ]let 𝑥, 𝑦 be nonzero vectors in ℝ we define the angle 𝜃 between 𝑥 and 𝑦 by
cos ∣∣𝑥∣∣ ∣∣𝑦∣∣ . ℝ together with these defintions of length and angle forms a Euclidean
Geometry.

Technically, before we make this definition we should make sure that the formulas given above even
make sense. I have not shown that 𝑥 ⋅ 𝑥 is nonnegative and how do we know that argument of
the inverse cosine is within its domain of [−1, 1]? I now state the propositions which justify the
preceding definition.(proofs of the propositions below are found in my linear algebra notes)

Proposition 1.2.10.

Suppose 𝑥, 𝑦, 𝑧 ∈ ℝ𝑛 and 𝑐 ∈ ℝ then

1. 𝑥 ⋅ 𝑦 = 𝑦 ⋅ 𝑥

2. 𝑥 ⋅ (𝑦 + 𝑧) = 𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧

3. 𝑐(𝑥 ⋅ 𝑦) = (𝑐𝑥) ⋅ 𝑦 = 𝑥 ⋅ (𝑐𝑦)

4. 𝑥 ⋅ 𝑥 ≥ 0 and 𝑥 ⋅ 𝑥 = 0 iff 𝑥 = 0

The formula cos−1 ∣∣𝑥∣∣𝑥⋅𝑦∣∣𝑦∣∣ is harder to justify. The inequality that we need for it to be reasonable
[ ]

is ∣∣𝑥∣∣𝑥⋅𝑦∣∣𝑦∣∣ ≤ 1, otherwise we would not have a number in the 𝑑𝑜𝑚(𝑐𝑜𝑠−1 ) = 𝑟𝑎𝑛𝑔𝑒(𝑐𝑜𝑠) = [−1, 1].

An equivalent inequality is ∣𝑥 ⋅ 𝑦∣ ≤ ∣∣𝑥∣∣ ∣∣𝑦∣∣ which is known as the Cauchy-Schwarz inequality.

Proposition 1.2.11.

If 𝑥, 𝑦 ∈ ℝ𝑛 then ∣𝑥 ⋅ 𝑦∣ ≤ ∣∣𝑥∣∣ ∣∣𝑦∣∣


1.2. VECTORS AND GEOMETRY FOR 𝑁 -DIMENSIONAL SPACE 17

Example 1.2.12. Let 𝑣 = [1, 2, 3, 4, 5]𝑇 and 𝑤 = [6, 7, 8, 9, 10]𝑇 find the angle between these vectors
and calculate the unit vectors in the same directions as 𝑣 and 𝑤. Recall that, 𝑣 ⋅ 𝑤 = 6 + 14 + 24 +
36 + 50 = 130. Furthermore,
√ √ √
∣∣𝑣∣∣ = 12 + 22 + 32 + 42 + 52 = 1 + 4 + 9 + 16 + 25 = 55
√ √ √
∣∣𝑤∣∣ = 62 + 72 + 82 + 92 + 102 = 36 + 49 + 64 + 81 + 100 = 330
We find unit vectors via the standard trick, you just take the given vector and multiply it by the
reciprocal of its length. This is called normalizing the vector,

𝑣ˆ = √1 [1, 2, 3, 4, 5]𝑇 𝑤
ˆ= √ 1 [6, 7, 8, 9, 10]𝑇
55 330

The angle is calculated from the definition of angle,


( )
−1 130
𝜃 = cos √ √ = 15.21𝑜
55 330

It’s good we have this definition, 5-dimensional protractors are very expensive.

Proposition 1.2.13.

Let 𝑥, 𝑦 ∈ ℝ𝑛 and suppose 𝑐 ∈ ℝ then

1. ∣∣𝑐𝑥∣∣ = ∣𝑐∣ ∣∣𝑥∣∣

2. ∣∣𝑥 + 𝑦∣∣ ≤ ∣∣𝑥∣∣ + ∣∣𝑦∣∣ (triangle inequality)

3. ∣∣𝑥∣∣ ≥ 0

4. ∣∣𝑥∣∣ = 0 iff 𝑥 = 0

The four properties above make ℝ𝑛 paired with ∣∣ ⋅ ∣∣ : ℝ𝑛 × ℝ𝑛 → ℝ a normed linear space.
We’ll see how differentiation can be defined given this structure. It turns out that we can define a
reasonable concept of differentiation for other normed linear spaces. In this course we’ll study how
to differentiate functions to and from ℝ𝑛 , matrix-valued functions and complex-valued functions of
a real variable. Finally, if time permits, we’ll study differentiation of functions of functions which
is the central task of variational calculus. In each case the underlying linear structure along
with the norm is used to define the limits which are necessary to set-up the derivatives. The focus
of this course is the process and use of derivatives and integrals so I have not given proofs of the
linear algebraic propositions in this chapter. The proofs and a deeper view of the meaning of these
propositions is given at length in Math 321. If you haven’t had linear then you’ll just have to trust
me on these propositions7
7
or you could just read the linear notes if curious
18 CHAPTER 1. SET-UP

Definition 1.2.14.

The distance between 𝑎 ∈ ℝ𝑛 and 𝑏 ∈ ℝ𝑛 is defined to be 𝑑(𝑎, 𝑏) ≡ ∣∣𝑏 − 𝑎∣∣.

If we draw a picture this definition is very natural. Here we are thinking of the points 𝑎, 𝑏 as vectors
from the origin then 𝑏 − 𝑎 is the vector which points from 𝑎 to 𝑏 (this is algebraically clear since
𝑎 + (𝑏 − 𝑎) = 𝑏). Then the distance between the points is the length of the vector that points from
one point to the other. If you plug in two dimensional vectors you should recognize the distance
formula from middle school math:

𝑑((𝑎1 , 𝑎2 ), (𝑏1 , 𝑏2 )) = (𝑏1 − 𝑎1 )2 + (𝑏2 − 𝑎2 )2

Proposition 1.2.15.

Let 𝑑 : ℝ𝑛 × ℝ𝑛 → ℝ be the distance function then

1. 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥)

2. 𝑑(𝑥, 𝑦) ≥ 0

3. 𝑑(𝑥, 𝑥) = 0 iff 𝑥 = 0

4. 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧) ≥ 𝑑(𝑥, 𝑧)

In real analysis one studies a set paired with a distance function. Abstractly speaking such a pair
is called a metric space. A vector space with a norm is called a normed linear space. Because
we can always induce a distance function from the norm via the formula 𝑑(𝑎, 𝑏) = ∣∣𝑏 − 𝑎∣∣ every
normed linear space is a metric space. The converse fails. Metric spaces need not be vector spaces,
a metric space could just be formed from some subset of a vector space or something more exotic8 .
The absolute value function on ℝ defines distance function 𝑑(𝑎, 𝑏) = ∣𝑏 − 𝑎∣. In your real analysis
8
there are many texts to read on metric spaces, one nice treatment is Rosenlicht’s Introduction to Analysis, it’s a
good read
1.2. VECTORS AND GEOMETRY FOR 𝑁 -DIMENSIONAL SPACE 19

course you will study the structure of the metric space (ℝ, ∣ ⋅ ∣ : ℝ × ℝ → ℝ) in great depth. I
include these comments here to draw your attention to the connection between this course and the
real analysis course. I primarily use the norm in what follows, but it should be noted that many
things could be written in terms of the distance function.

1.2.1 vector algebra for three dimensions


Every nonzero vector can be written as a unit vector scalar multiplied by its magnitude.

𝑣 ∈ 𝑉 𝑛 such that 𝑣 ∕= 0 ⇒ 𝑣 = ∣∣𝑣∣∣ˆ


𝑣 where 𝑣ˆ = 1
∣∣𝑣∣∣ 𝑣.

You should recall that we can write any vector in 𝑉 3 as

𝑣 =< 𝑎, 𝑏, 𝑐 >= 𝑎 < 1, 0, 0 > +𝑏 < 0, 1, 0 > +𝑐 < 0, 0, 1 >= 𝑎ˆ𝑖 + 𝑏ˆ𝑗 + 𝑐𝑘ˆ

where we defined the ˆ𝑖 =< 1, 0, 0 >, ˆ𝑗 =< 0, 1, 0 >, 𝑘ˆ =< 0, 0, 1 >. You can easily verify that
distinct Cartesian unit-vectors are orthogonal. Sometimes we need to produce a vector which is
orthogonal to a given pair of vectors, it turns out the cross-product is one of two ways to do that
in 𝑉 3 . We will see much later that this is special to three dimensions.
Definition 1.2.16.
If 𝐴 =< 𝐴1 , 𝐴2 , 𝐴3 > and 𝐵 =< 𝐵1 , 𝐵2 , 𝐵3 > are vectors in 𝑉 3 then the cross-product
of 𝐴 and 𝐵 is a vector 𝐴 × 𝐵 which is defined by:

⃗×𝐵
𝐴 ⃗ =< 𝐴2 𝐵3 − 𝐴3 𝐵2 , 𝐴3 𝐵1 − 𝐴1 𝐵3 , 𝐴1 𝐵2 − 𝐴2 𝐵1 > .

⃗ ×𝐵
The magnitude of 𝐴 ⃗ can be shown to satisfy ∣∣𝐴
⃗ × 𝐵∣∣
⃗ = ∣∣𝐴∣∣
⃗ ∣∣𝐵∣∣
⃗ sin(𝜃) and the direction can
be deduced by right-hand-rule. The right hand rule for the unit vectors yields:
ˆ 𝑘ˆ × ˆ𝑖 = ˆ𝑗, ˆ𝑗 × 𝑘ˆ = ˆ𝑖
ˆ𝑖 × ˆ𝑗 = 𝑘,

If I wish to discuss both the point and the vector to which it corresponds we may use the notation

𝑃 = (𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ) ←→ 𝑃⃗ =< 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 >

With this notation we can easily define directed line-segments as the vector which points from one
point to another, also the distance bewtween points is simply the length of the vector which points
from one point to the other:
Definition 1.2.17.
−−→ ⃗ − 𝑃⃗ . This vector is
Let 𝑃, 𝑄 ∈ ℝ𝑛 . The directed line segment from 𝑃 to 𝑄 is 𝑃 𝑄 = 𝑄
drawn from tail 𝑄 to the tip 𝑃 where we denote the direction by drawing an arrowhead.
−−→
The distance between 𝑃 and 𝑄 is 𝑑(𝑃, 𝑄) = ∣∣ 𝑃 𝑄 ∣∣.
20 CHAPTER 1. SET-UP

1.2.2 compact notations for vector arithmetic

I prefer the following notations over the hat-notation of the preceding section because this notation
generalizes nicely to 𝑛-dimensions.

𝑒1 =< 1, 0, 0 > 𝑒2 =< 0, 1, 0 > 𝑒3 =< 0, 0, 1 > .

Likewise the Kronecker delta and the Levi-Civita symbol are at times very convenient for abstract
calculation:


{
⎨1
 (𝑖, 𝑗, 𝑘) ∈ {(1, 2, 3), (3, 1, 2), (2, 3, 1)}
1 𝑖=𝑗
𝛿𝑖𝑗 = 𝜖𝑖𝑗𝑘 = −1 (𝑖, 𝑗, 𝑘) ∈ {(3, 2, 1), (2, 1, 3), (1, 3, 2)}
0 𝑖 ∕= 𝑗 
0 if any index repeats

An equivalent definition for the Levi-civita symbol is simply that 𝜖123 = 1 and it is antisymmetric
with respect to the interchange of any pair of indices;

𝜖𝑖𝑗𝑘 = 𝜖𝑗𝑘𝑖 = 𝜖𝑘𝑖𝑗 = −𝜖𝑘𝑗𝑖 = −𝜖𝑗𝑖𝑘 = −𝜖𝑖𝑘𝑗 .

Now let us restate some earlier results in terms of the Einstein repeated index conventions9 , let
⃗ 𝐵
𝐴, ⃗ ∈ 𝑉 𝑛 and 𝑐 ∈ ℝ then

⃗ = 𝐴𝑘 𝑒𝑘
𝐴 standard basis expansion
𝑒𝑖 ⋅ 𝑒𝑗 = 𝛿𝑖𝑗 orthonormal basis
⃗ + 𝐵)
(𝐴 ⃗ 𝑖=𝐴 ⃗𝑖 + 𝐵⃗𝑖 vector addition
⃗ − 𝐵)
(𝐴 ⃗ 𝑖=𝐴 ⃗𝑖 − 𝐵⃗𝑖 vector subtraction

(𝑐𝐴)𝑖 = 𝑐𝐴𝑖 ⃗ scalar multiplication
⃗⋅𝐵
𝐴 ⃗ = 𝐴𝑘 𝐵𝑘 dot product
⃗ × 𝐵)
(𝐴 ⃗ 𝑘 = 𝜖𝑖𝑗𝑘 𝐴𝑖 𝐵𝑗 cross product.

All but the last of the above are readily generalized to dimensions other than three by simply
increasing the number of components. However, the cross product is special to three dimensions.
I can’t emphasize enough that the formulas given above for the dot and cross products can be
utilized to yield great efficiency in abstract calculations.

Example 1.2.18. . .

9
there are more details to be seen in the Appendix if you’re curious
1.3. FUNCTIONS 21

1.3 functions

Suppose 𝐴 and 𝐵 are sets, we say 𝑓 : 𝐴 → 𝐵 is a function if for each 𝑎 ∈ 𝐴 the function 𝑓
assigns a single element 𝑓 (𝑎) ∈ 𝐵. Moreover, if 𝑓 : 𝐴 → 𝐵 is a function we say it is a 𝐵-valued
function of an 𝐴-variable and we say 𝐴 = 𝑑𝑜𝑚(𝑓 ) whereas 𝐵 = 𝑐𝑜𝑑𝑜𝑚𝑎𝑖𝑛(𝑓 ). For example,
if 𝑓 : ℝ2 → [0, 1] then 𝑓 is real-valued function of ℝ2 . On the other hand, if 𝑓 : ℂ → ℝ2 then
we’d say 𝑓 is a vector-valued function of a complex variable. The term mapping will be used
interchangeably with function in these notes10 . Suppose 𝑓 : 𝑈 → 𝑉 and 𝑈 ⊆ 𝑆 and 𝑉 ⊆ 𝑇 then
we may consisely express the same data via the notation 𝑓 : 𝑈 ⊆ 𝑆 → 𝑉 ⊆ 𝑇 .

Sometimes we can take two given functions and construct a new function.

1. if 𝑓 : 𝑈 → 𝑉 and 𝑔 : 𝑉 → 𝑊 then 𝑔 ∘ 𝑓 : 𝑈 → 𝑊 is the composite of 𝑔 with 𝑓 .

2. if 𝑓, 𝑔 : 𝑈 → 𝑉 and 𝑉 is a set with an operation of addition then we define 𝑓 ± 𝑔 : 𝑈 → 𝑉


pointwise by the natural assignment (𝑓 ± 𝑔)(𝑥) = 𝑓 (𝑥) ± 𝑔(𝑥) for each 𝑥 ∈ 𝑈 . We say that
𝑓 ± 𝑔 is the sum(+) or difference(−) of 𝑓 and 𝑔.

3. if 𝑓 : 𝑈 → 𝑉 and 𝑐 ∈ 𝑆 where there is an operation of scalar multiplication by 𝑆 on 𝑉 then


𝑐𝑓 : 𝑈 → 𝑉 is defined pointwise by (𝑐𝑓 )(𝑥) = 𝑐𝑓 (𝑥) for each 𝑥 ∈ 𝑈 . We say that 𝑐𝑓 is scalar
multiple of 𝑓 by 𝑐.

Usually we have in mind 𝑆 = ℝ or 𝑆 = ℂ and often the addition is just that of vectors, however
the definitions (2.) and (3.) apply equally well to matrix-valued functions or operators which is
another term for function-valued functions. For example, in the first semester of calculus we study
𝑑/𝑑𝑥 which is a function of functions; 𝑑/𝑑𝑥 takes an input of 𝑓 and gives the output 𝑑𝑓 /𝑑𝑥. If we
write 𝐿 = 3𝑑/𝑑𝑥 we have a new operator defined by (3𝑑/𝑑𝑥)[𝑓 ] = 3𝑑𝑓 /𝑑𝑥 for each function 𝑓 in
the domain of 𝑑/𝑑𝑥.

Definition 1.3.1.

10
in my first set of advanced calculus notes (2010) I used the term function to mean the codomain was real numbers
whereas mapping implied a codomain of vectors. I was following Edwards as he makes this convention in his text. I
am not adopting that terminology any longer, I think it’s better to use the term function as we did in Math 200 or
250. A function is an abstract construction which allows for a vast array of codomains.
22 CHAPTER 1. SET-UP

Suppose 𝑓 : 𝑈 → 𝑉 . We define the image of 𝑈1 under 𝑓 as follows:

𝑓 (𝑈1 ) = { 𝑦 ∈ 𝑉 ∣ there exists 𝑥 ∈ 𝑈1 with 𝑓 (𝑥) = 𝑦}.

The range of 𝑓 is 𝑓 (𝑈 ). The inverse image of 𝑉1 under 𝑓 is defined as follows:

𝑓 −1 (𝑉1 ) = { 𝑥 ∈ 𝑈 ∣ 𝑓 (𝑥) ∈ 𝑉1 }.

The inverse image of a single point in the codomain is called a fiber. Suppose 𝑓 : 𝑈 → 𝑉 .
We say 𝑓 is surjective or onto 𝑉1 iff there exists 𝑈1 ⊆ 𝑈 such that 𝑓 (𝑈1 ) = 𝑉1 . If a function
is onto its codomain then the function is surjective. If 𝑓 (𝑥1 ) = 𝑓 (𝑥2 ) implies 𝑥1 = 𝑥2
for all 𝑥1 , 𝑥2 ∈ 𝑈1 ⊆ 𝑈 then we say f is injective on 𝑈1 or 1 − 1 on 𝑈1 . If a function
is injective on its domain then we say the function is injective. If a function is both
injective and surjective then the function is called a bijection or a 1-1 correspondance.

Example 1.3.2. Suppose 𝑓 : ℝ2 → ℝ and 𝑓 (𝑥, 𝑦) = 𝑥 for each (𝑥, 𝑦) ∈ ℝ2 . The function is not
injective since 𝑓 (1, 2) = 1 and 𝑓 (1, 3) = 1 and yet (1, 2) ∕= (1, 3). Notice that the fibers of 𝑓 are
simply vertical lines:

𝑓 −1 (𝑥𝑜 ) = {(𝑥, 𝑦) ∈ 𝑑𝑜𝑚(𝑓 ) ∣ 𝑓 (𝑥, 𝑦) = 𝑥𝑜 } = {(𝑥𝑜 , 𝑦) ∣ 𝑦 ∈ ℝ} = {𝑥𝑜 } × ℝ



Example 1.3.3. Suppose 𝑓 : ℝ → ℝ and 𝑓 (𝑥) = 𝑥2 + 1 for each 𝑥 ∈ ℝ. This function is not
surjective because 0 ∈ / 𝑓 (ℝ). In contrast, if we construct 𝑔 : ℝ → [1, ∞) with 𝑔(𝑥) = 𝑓 (𝑥) for each
𝑥 ∈ ℝ then can argue that 𝑔 is surjective. Neither 𝑓 nor 𝑔 is injective, the fiber of 𝑥𝑜 is {−𝑥𝑜 , 𝑥𝑜 }
for each 𝑥𝑜 ∕= 0. At all points except zero these maps are said to be two-to-one. This is an
abbreviation of the observation that two points in the domain map to the same point in the range.

Definition 1.3.4.
Suppose 𝑓 : 𝑈 ⊆ ℝ𝑝 → 𝑉 ⊆ ℝ𝑛 and suppose further that for each 𝑥 ∈ 𝑈 ,

𝑓 (𝑥) = (𝑓1 (𝑥), 𝑓2 (𝑥), . . . , 𝑓𝑛 (𝑥)).

Then we say that 𝑓 = (𝑓1 , 𝑓2 , . . . , 𝑓𝑛 ) and for each 𝑗 ∈ ℕ𝑝 the functions 𝑓𝑗 : 𝑈 ⊆ ℝ𝑝 → ℝ are
called the component functions of 𝑓 . Furthermore, we define the projection 𝜋𝑗 : ℝ𝑛 → ℝ
to be the map 𝜋𝑗 (𝑥) = 𝑥 ⋅ 𝑒𝑗 for each 𝑗 = 1, 2, . . . 𝑛. This allows us to express each of the
component functions as a composition 𝑓𝑗 = 𝜋𝑗 ∘ 𝑓 .

Example 1.3.5. Suppose 𝑓 : ℝ3 → ℝ2 and 𝑓 (𝑥, 𝑦, 𝑧) = (𝑥2 + 𝑦 2 , 𝑧) for each (𝑥, 𝑦, 𝑧) ∈ ℝ3 . Identify
that 𝑓1 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 whereas 𝑓2 (𝑥, 𝑦, 𝑧) = 𝑧. You can easily see that 𝑟𝑎𝑛𝑔𝑒(𝑓 ) = [0, ∞] × ℝ.
Suppose 𝑅2 ∈ [0, ∞) and 𝑧𝑜 ∈ ℝ then

𝑓 −1 ({(𝑅2 , 𝑧𝑜 )}) = 𝑆1 (𝑅) × {𝑧𝑜 }

where 𝑆1 (𝑅) denotes a circle of radius 𝑅. This result is a simple consequence of the observation
that 𝑓 (𝑥, 𝑦, 𝑧) = (𝑅2 , 𝑧𝑜 ) implies 𝑥2 + 𝑦 2 = 𝑅2 and 𝑧 = 𝑧𝑜 .
1.3. FUNCTIONS 23

Example 1.3.6. Let 𝑎, 𝑏, 𝑐 ∈ ℝ be particular constants. Suppose 𝑓 : ℝ3 → ℝ and 𝑓 (𝑥, 𝑦, 𝑧) =


𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 for each (𝑥, 𝑦, 𝑧) ∈ ℝ3 . Here there is just one component function so we could say that
𝑓 = 𝑓1 but we don’t usually bother to make such an observation. If at least one of the constants
𝑎,
〈 𝑏, 𝑐 is
〉 nonzero then the fibers of this map are planes in three dimensional space with normal
𝑎, 𝑏, 𝑐 .
𝑓 −1 ({𝑑}) = {(𝑥, 𝑦, 𝑧) ∈ ℝ3 ∣ 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 = 𝑑}
If 𝑎 = 𝑏 = 𝑐 = 0 then the fiber of 𝑓 is simply all of ℝ3 and the 𝑟𝑎𝑛𝑔𝑒(𝑓 ) = {0}.

The definition below explains how to put together functions with a common domain. The codomain
of the new function is the cartesian product of the old codomains.

Definition 1.3.7.
Let 𝑓 : 𝑈1 ⊆ ℝ𝑛 → 𝑉1 ⊆ ℝ𝑝 and 𝑔 : 𝑈1 ⊆ ℝ𝑛 → 𝑉2 ⊆ ℝ𝑞 be a mappings then (𝑓, 𝑔) is a
mapping from 𝑈1 to 𝑉1 × 𝑉2 defined by (𝑓, 𝑔)(𝑥) = (𝑓 (𝑥), 𝑔(𝑥)) for all 𝑥 ∈ 𝑈1 .
There’s more than meets the eye in the definition above. Let me expand it a bit here:

(𝑓, 𝑔)(𝑥) = (𝑓1 (𝑥), 𝑓2 (𝑥), . . . , 𝑓𝑝 (𝑥), 𝑔1 (𝑥), 𝑔2 (𝑥), . . . , 𝑔𝑞 (𝑥)) where 𝑥 = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 )

You might notice that Edwards uses 𝜋 for the identity mapping whereas I use 𝐼𝑑. His notation is
quite reasonable given that the identity is the cartesian product of all the projection maps:

𝜋 = (𝜋1 , 𝜋2 , . . . , 𝜋𝑛 )

I’ve had courses where we simply used the coordinate notation itself for projections, in that nota-
tion have formulas such as 𝑥(𝑎, 𝑏, 𝑐) = 𝑎, 𝑥𝑗 (𝑎) = 𝑎𝑗 and 𝑥𝑗 (𝑒𝑖 ) = 𝛿𝑗𝑖 .

Another way to modify a given function is to adjust the domain of a given mapping by restriction
and extension.

Definition 1.3.8.
Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping. If 𝑅 ⊂ 𝑈 then we define the restriction of 𝑓
to 𝑅 to be the mapping 𝑓 ∣𝑅 : 𝑅 → 𝑉 where 𝑓 ∣𝑅 (𝑥) = 𝑓 (𝑥) for all 𝑥 ∈ 𝑅. If 𝑈 ⊆ 𝑆 and
𝑉 ⊂ 𝑇 then we say a mapping 𝑔 : 𝑆 → 𝑇 is an extension of 𝑓 iff 𝑔∣𝑑𝑜𝑚(𝑓 ) = 𝑓 .

When I say 𝑔∣𝑑𝑜𝑚(𝑓 ) = 𝑓 this means that these functions have matching domains and they agree at
each point in that domain; 𝑔∣𝑑𝑜𝑚(𝑓 ) (𝑥) = 𝑓 (𝑥) for all 𝑥 ∈ 𝑑𝑜𝑚(𝑓 ). Once a particular subset is chosen
the restriction to that subset is a unique function. Of course there are usually many susbets of
𝑑𝑜𝑚(𝑓 ) so you can imagine many different restictions of a given function. The concept of extension
is more vague, once you pick the enlarged domain and codomain it is not even necessarily the case
that another extension to that same pair of sets will be the same mapping. To obtain uniqueness
for extensions one needs to add more stucture. This is one reason that complex variables are
interesting, there are cases where the structure of the complex theory forces the extension of a
complex-valued function of a complex variable to be unique. This is very surprising. Similarly a
24 CHAPTER 1. SET-UP

linear transformation is uniquely defined by its values on a basis, it extends uniquely from that
finite set of vectors to the infinite number of points in the vector space. This is very restrictive on
the possible ways we can construct linear mappings. Maybe you can find some other examples of
extensions as you collect your own mathematical storybook.
Definition 1.3.9.
Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping, if there exists a mapping 𝑔 : 𝑓 (𝑈 ) → 𝑈 such that
𝑓 ∘ 𝑔 = 𝐼𝑑𝑓 (𝑈 ) and 𝑔 ∘ 𝑓 = 𝐼𝑑𝑈 then 𝑔 is the inverse mapping of 𝑓 and we denote 𝑔 = 𝑓 −1 .

If a mapping is injective then it can be shown that the inverse mapping is well defined. We define
𝑓 −1 (𝑦) = 𝑥 iff 𝑓 (𝑥) = 𝑦 and the value 𝑥 must be a single value if the function is one-one. When a
function is not one-one then there may be more than one point which maps to a particular point
in the range.
Notice that the inverse image of a set is well-defined even if there is no inverse mapping. Moreover,
it can be shown that the fibers of a mapping are disjoint and their union covers the domain of the
mapping:

𝑓 (𝑦) ∕= 𝑓 (𝑧) ⇒ 𝑓 −1 {𝑦} ∩ 𝑓 −1 {𝑧} = ∅ 𝑓 −1 {𝑦} = 𝑑𝑜𝑚(𝑓 ).
𝑦 ∈ 𝑟𝑎𝑛𝑔𝑒(𝑓 )

This means that the fibers of a mapping partition the domain.


Example 1.3.10. . .

Definition 1.3.11.
Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping. Furthermore, suppose that 𝑠 : 𝑈 → 𝑈 is a
mapping which is constant on each fiber of 𝑓 . In other words, for each fiber 𝑓 −1 {𝑦} ⊆ 𝑈
we have some constant 𝑢 ∈ 𝑈 such that 𝑠(𝑓 −1 {𝑦}) = 𝑢. The subset 𝑠−1 (𝑈 ) ⊆ 𝑈 is called a
cross section of the fiber partition of 𝑓 .
How do we construct a cross section for a particular mapping? For particular examples the details
of the formula for the mapping usually suggests some obvious choice. However, in general if you
accept the axiom of choice then you can be comforted in the existence of a cross section even in
the case that there are infinitely many fibers for the mapping.
1.4. ELEMENTARY TOPOLOGY AND LIMITS 25

Example 1.3.12. . .

Proposition 1.3.13.

Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping. The restriction of 𝑓 to a cross section 𝑆


of 𝑈 is an injective function. The mapping 𝑓˜ : 𝑈 → 𝑓 (𝑈 ) is a surjection. The mapping
𝑓˜∣𝑆 : 𝑆 → 𝑓 (𝑈 ) is a bijection.
The proposition above tells us that we can take any mapping and cut down the domain and/or
codomain to reduce the function to an injection, surjection or bijection. If you look for it you’ll see
this result behind the scenes in other courses. For example, in linear algebra if we throw out the
kernel of a linear mapping then we get an injection. The idea of a local inverse is also important
to the study of calculus.

Example 1.3.14. . .

Definition 1.3.15.
Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping then we say a mapping 𝑔 is a local inverse of 𝑓
iff there exits 𝑆 ⊆ 𝑈 such that 𝑔 = (𝑓 ∣𝑆 )−1 .
Usually we can find local inverses for functions in calculus. For example, 𝑓 (𝑥) = sin(𝑥) is not 1-1
therefore it is not invertible. However, it does have a local inverse 𝑔(𝑦) = sin−1 (𝑦). If we were
)−1
more pedantic we wouldn’t write sin−1 (𝑦). Instead we would write 𝑔(𝑦) = sin ∣[ −𝜋 , 𝜋 ]
(
(𝑦) since
2 2
the inverse sine is actually just a local inverse. To construct a local inverse for some mapping we
must locate some subset of the domain upon which the mapping is injective. Then relative to that
subset we can reverse the mapping. The inverse mapping theorem (which we’ll study mid-course)
will tell us more about the existence of local inverses for a given mapping.

1.4 elementary topology and limits


In this section we describe the metric topology for ℝ𝑛 . In the study of functions of one real variable
we often need to refer to open or closed intervals. The definition that follows generalizes those
26 CHAPTER 1. SET-UP

concepts to 𝑛-dimensions. I have included a short discussion of general topology in the Appendix
if you’d like to learn more about the term.

Definition 1.4.1.
An open ball of radius 𝜖 centered at 𝑎 ∈ ℝ𝑛 is the subset all points in ℝ𝑛 which are less
than 𝜖 units from 𝑎, we denote this open ball by

𝐵𝜖 (𝑎) = {𝑥 ∈ ℝ𝑛 ∣ ∣∣𝑥 − 𝑎∣∣ < 𝜖}

The closed ball of radius 𝜖 centered at 𝑎 ∈ ℝ𝑛 is likewise defined

𝐵 𝜖 (𝑎) = {𝑥 ∈ ℝ𝑛 ∣ ∣∣𝑥 − 𝑎∣∣ ≤ 𝜖}

Notice that in the 𝑛 = 1 case we observe an open ball is an open interval: let 𝑎 ∈ ℝ,

𝐵𝜖 (𝑎) = {𝑥 ∈ ℝ ∣ ∣∣𝑥 − 𝑎∣∣ < 𝜖} = {𝑥 ∈ ℝ ∣ ∣𝑥 − 𝑎∣ < 𝜖} = (𝑎 − 𝜖, 𝑎 + 𝜖)

In the 𝑛 = 2 case we observe that an open ball is an open disk: let (𝑎, 𝑏) ∈ ℝ2 ,

𝐵𝜖 ((𝑎, 𝑏)) = (𝑥, 𝑦) ∈ ℝ2 ∣ ∣∣ (𝑥, 𝑦) − (𝑎, 𝑏) ∣∣ < 𝜖 = (𝑥, 𝑦) ∈ ℝ2 ∣ (𝑥 − 𝑎)2 + (𝑦 − 𝑏)2 < 𝜖
{ } { }

For 𝑛 = 3 an open-ball is a sphere without the outer shell. In contrast, a closed ball in 𝑛 = 3 is a
solid sphere which includes the outer shell of the sphere.

Definition 1.4.2.
Let 𝐷 ⊆ ℝ𝑛 . We say 𝑦 ∈ 𝐷 is an interior point of 𝐷 iff there exists some open ball
centered at 𝑦 which is completely contained in 𝐷. We say 𝑦 ∈ ℝ𝑛 is a limit point of 𝐷 iff
every open ball centered at 𝑦 contains points in 𝐷 − {𝑦}. We say 𝑦 ∈ ℝ𝑛 is a boundary
point of 𝐷 iff every open ball centered at 𝑦 contains points not in 𝐷 and other points which
are in 𝐷 − {𝑦}. We say 𝑦 ∈ 𝐷 is an isolated point of 𝐷 if there exist open balls about
𝑦 which do not contain other points in 𝐷. The set of all interior points of 𝐷 is called the
interior of 𝐷. Likewise the set of all boundary points for 𝐷 is denoted ∂𝐷. The closure
of 𝐷 is defined to be 𝐷 = 𝐷 ∪ {𝑦 ∈ ℝ𝑛 ∣ 𝑦 a limit point}
If you’re like me the paragraph above doesn’t help much until I see the picture below. All the terms
are aptly named. The term ”limit point” is given because those points are the ones for which it is
natural to define a limit.

Example 1.4.3. . .
1.4. ELEMENTARY TOPOLOGY AND LIMITS 27

Definition 1.4.4.

Let 𝐴 ⊆ ℝ𝑛 is an open set iff for each 𝑥 ∈ 𝐴 there exists 𝜖 > 0 such that 𝑥 ∈ 𝐵𝜖 (𝑥) and
𝐵𝜖 (𝑥) ⊆ 𝐴. Let 𝐵 ⊆ ℝ𝑛 is an closed set iff its complement ℝ𝑛 − 𝐵 = {𝑥 ∈ ℝ𝑛 ∣ 𝑥 ∈ / 𝐵}
is an open set.
Notice that ℝ − [𝑎, 𝑏] = (∞, 𝑎) ∪ (𝑏, ∞). It is not hard to prove that open intervals are open hence
we find that a closed interval is a closed set. Likewise it is not hard to prove that open balls are
open sets and closed balls are closed sets. I may ask you to prove the following proposition in the
homework.

Proposition 1.4.5.

A closed set contains all its limit points, that is 𝐴 ⊆ ℝ𝑛 is closed iff 𝐴 = 𝐴.

Example 1.4.6. . .

In calculus I the limit of a function is defined in terms of deleted open intervals centered about the
limit point. We can define the limit of a mapping in terms of deleted open balls centered at the
limit point.

Definition 1.4.7.

Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping. We say that 𝑓 has limit 𝑏 ∈ ℝ𝑚 at limit point 𝑎


of 𝑈 iff for each 𝜖 > 0 there exists a 𝛿 > 0 such that 𝑥 ∈ ℝ𝑛 with 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 implies
∣∣𝑓 (𝑥) − 𝑏∣∣ < 𝜖. In such a case we can denote the above by stating that

lim 𝑓 (𝑥) = 𝑏.
𝑥→𝑎

In calculus I the limit of a function is defined in terms of deleted open intervals centered about the
limit point. We just defined the limit of a mapping in terms of deleted open balls centered at the
limit point. The term ”deleted” refers to the fact that we assume 0 < ∣∣𝑥 − 𝑎∣∣ which means we
do not consider 𝑥 = 𝑎 in the limiting process. In other words, the limit of a mapping considers
values close to the limit point but not necessarily the limit point itself. The case that the function
is defined at the limit point is special, when the limit and the mapping agree then we say the
mapping is continuous at that point.

Example 1.4.8. . .
28 CHAPTER 1. SET-UP

Definition 1.4.9.

Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping. If 𝑎 ∈ 𝑈 is a limit point of 𝑓 then we say that 𝑓


is continuous at 𝑎 iff
lim 𝑓 (𝑥) = 𝑓 (𝑎)
𝑥→𝑎

If 𝑎 ∈ 𝑈 is an isolated point then we also say that 𝑓 is continous at 𝑎. The mapping 𝑓 is


continous on 𝑆 iff it is continous at each point in 𝑆. The mapping 𝑓 is continuous iff
it is continuous on its domain.
Notice that in the 𝑚 = 𝑛 = 1 case we recover the definition of continuous functions from calc. I.

Proposition 1.4.10.

Let 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 be a mapping with component functions 𝑓1 , 𝑓2 , . . . , 𝑓𝑚 hence


𝑓 = (𝑓1 , 𝑓2 , . . . , 𝑓𝑚 ). If 𝑎 ∈ 𝑈 is a limit point of 𝑓 then

lim 𝑓 (𝑥) = 𝑏 ⇔ lim 𝑓𝑗 (𝑥) = 𝑏𝑗 for each 𝑗 = 1, 2, . . . , 𝑚.


𝑥→𝑎 𝑥→𝑎

.
Proof: (⇒) Suppose lim𝑥→𝑎 𝑓 (𝑥) = 𝑏. Then for each 𝜖 > 0 choose 𝛿 > 0 such that 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿
implies ∣∣𝑓 (𝑥) − 𝑏∣∣ < 𝜖. This choice of 𝛿 suffices for our purposes as:
v
u𝑚
√ u∑
2
∣𝑓𝑗 (𝑥) − 𝑏𝑗 ∣ = (𝑓𝑗 (𝑥) − 𝑏𝑗 ) ≤ ⎷ (𝑓𝑗 (𝑥) − 𝑏𝑗 )2 = ∣∣𝑓 (𝑥) − 𝑏∣∣ < 𝜖.
𝑗=1

Hence we have shown that lim𝑥→𝑎 𝑓𝑗 (𝑥) = 𝑏𝑗 for all 𝑗 = 1, 2, . . . 𝑚.

(⇐) Suppose lim𝑥→𝑎 𝑓𝑗 (𝑥) = 𝑏𝑗 for all 𝑗 = 1, 2, . . . 𝑚. Let 𝜖 > 0. Note that 𝜖/𝑚 > 0 and therefore

by the given limits we can choose 𝛿𝑗 > 0 such that 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 implies ∣∣𝑓𝑗 (𝑥) − 𝑏𝑗 ∣∣ < 𝜖/𝑚.
Choose 𝛿 = 𝑚𝑖𝑛{𝛿1 , 𝛿2 , . . . , 𝛿𝑚 } clearly 𝛿 > 0. Moreoever, notice 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 ≤ 𝛿𝑗 hence
requiring 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 automatically induces 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿𝑗 for all 𝑗. Suppose that 𝑥 ∈ ℝ𝑛
and 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 it follows that
v
𝑚
u𝑚 𝑚 √ 𝑚
∑ u∑ ∑ ∑
∣∣𝑓 (𝑥) − 𝑏∣∣ = ∣∣ (𝑓𝑗 (𝑥) − 𝑏𝑗 )𝑒𝑗 ∣∣ = ⎷ ∣𝑓𝑗 (𝑥) − 𝑏𝑗 ∣2 < ( 𝜖/𝑚)2 < 𝜖/𝑚 = 𝜖.
𝑗=1 𝑗=1 𝑗=1 𝑗=1

Therefore, lim𝑥→𝑎 𝑓 (𝑥) = 𝑏 and the proposition follows. □


We can analyze the limit of a mapping by analyzing the limits of the component functions:

Example 1.4.11. . .
1.4. ELEMENTARY TOPOLOGY AND LIMITS 29

The following follows immediately from the preceding proposition.

Proposition 1.4.12.

Suppose that 𝑓 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ𝑚 is a mapping with component functions 𝑓1 , 𝑓2 , . . . , 𝑓𝑚 .


Let 𝑎 ∈ 𝑈 be a limit point of 𝑓 then 𝑓 is continous at 𝑎 iff 𝑓𝑗 is continuous at 𝑎 for
𝑗 = 1, 2, . . . , 𝑚. Moreover, 𝑓 is continuous on 𝑆 iff all the component functions of 𝑓 are
continuous on 𝑆. Finally, a mapping 𝑓 is continous iff all of its component functions are
continuous. .
The proof of the proposition is in Edwards, it’s his Theorem 7.2. It’s about time I proved something.
30 CHAPTER 1. SET-UP

Proposition 1.4.13.
The projection functions are continuous. The identity mapping is continuous.

Proof: Let 𝜖 > 0 and choose 𝛿 = 𝜖. If 𝑥 ∈ ℝ𝑛 such that 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 then it follows that
∣∣𝑥 − 𝑎∣∣ < 𝜖.. Therefore, lim𝑥→𝑎 𝑥 = 𝑎 which means that lim𝑥→𝑎 𝐼𝑑(𝑥) = 𝐼𝑑(𝑎) for all 𝑎 ∈ ℝ𝑛 .
Hence 𝐼𝑑 is continuous on ℝ𝑛 which means 𝐼𝑑 is continuous. Since the projection functions are
component functions of the identity mapping it follows that the projection functions are also con-
tinuous (using the previous proposition). □

Definition 1.4.14.
The sum and product are functions from ℝ2 to ℝ defined by

𝑠(𝑥, 𝑦) = 𝑥 + 𝑦 𝑝(𝑥, 𝑦) = 𝑥𝑦

Proposition 1.4.15.
The sum and product functions are continuous.

Preparing for the proof: Let the limit point be (𝑎, 𝑏). Consider what we wish to show: given a
point (𝑥, 𝑦) such that 0 < ∣∣(𝑥, 𝑦) − (𝑎, 𝑏)∣∣ < 𝛿 we wish to show that

∣𝑠(𝑥, 𝑦) − (𝑎 + 𝑏)∣ < 𝜖 or for the product ∣𝑝(𝑥, 𝑦) − (𝑎𝑏)∣ < 𝜖

follow for appropriate choices of 𝛿. Think about the sum for a moment,

∣𝑠(𝑥, 𝑦) − (𝑎 + 𝑏)∣ = ∣𝑥 + 𝑦 − 𝑎 − 𝑏∣ ≤ ∣𝑥 − 𝑎∣ + ∣𝑦 − 𝑏∣

I just used the triangle inequality for the absolute value of real numbers. We see that if we could
somehow get control of ∣𝑥 − 𝑎∣ and ∣𝑦 − 𝑏∣ then we’d be getting closer to the prize. We have control
of 0 < ∣∣(𝑥, 𝑦) − (𝑎, 𝑏)∣∣ < 𝛿 notice this reduces to

∣∣(𝑥 − 𝑎, 𝑦 − 𝑏)∣∣ < 𝛿 ⇒ (𝑥 − 𝑎)2 + (𝑦 − 𝑏)2 < 𝛿

it is clear that (𝑥 − 𝑎)2 < 𝛿 2 since if it was otherwise the inequality above would be violated as
adding a nonegative quantity (𝑦 − 𝑏)2 only increases the radicand resulting in the squareroot to be
larger than 𝛿. Hence we may assume (𝑥 − 𝑎)2 < 𝛿 2 and since 𝛿 > 0 it follows ∣𝑥 − 𝑎∣ < 𝛿 . Likewise,
∣𝑦 − 𝑏∣ < 𝛿 . Thus

∣𝑠(𝑥, 𝑦) − (𝑎 + 𝑏)∣ = ∣𝑥 + 𝑦 − 𝑎 − 𝑏∣ < ∣𝑥 − 𝑎∣ + ∣𝑦 − 𝑏∣ < 2𝛿

We see for the sum proof we can choose 𝛿 = 𝜖/2 and it will work out nicely.
1.4. ELEMENTARY TOPOLOGY AND LIMITS 31

Proof: Let 𝜖 > 0 and let (𝑎, 𝑏) ∈ ℝ2 . Choose 𝛿 = 𝜖/2 and suppose (𝑥, 𝑦) ∈ ℝ2 such that
∣∣(𝑥, 𝑦) − (𝑎, 𝑏)∣∣ < 𝛿. Observe that

∣∣(𝑥, 𝑦) − (𝑎, 𝑏)∣∣ < 𝛿 ⇒ ∣∣(𝑥 − 𝑎, 𝑦 − 𝑏)∣∣2 < 𝛿 2 ⇒ ∣𝑥 − 𝑎∣2 + ∣𝑦 − 𝑏∣2 < 𝛿 2 .

It follows ∣𝑥 − 𝑎∣ < 𝛿 and ∣𝑦 − 𝑏∣ < 𝛿. Thus

∣𝑠(𝑥, 𝑦) − (𝑎 + 𝑏)∣ = ∣𝑥 + 𝑦 − 𝑎 − 𝑏∣ ≤ ∣𝑥 − 𝑎∣ + ∣𝑦 − 𝑏∣ < 𝛿 + 𝛿 = 2𝛿 = 𝜖.

Therefore, lim(𝑥,𝑦)→(𝑎,𝑏) 𝑠(𝑥, 𝑦) = 𝑎 + 𝑏. and it follows that the sum function if continuous at (𝑎, 𝑏).
But, (𝑎, 𝑏) is an arbitrary point thus 𝑠 is continuous on ℝ2 hence the sum function is continuous. □.

Preparing for the proof of continuity of the product function: I’ll continue to use the same
notation as above. We need to study ∣𝑝(𝑥, 𝑦) − (𝑎𝑏)∣ = ∣𝑥𝑦 − 𝑎𝑏∣ < 𝜖. Consider that

∣𝑥𝑦 − 𝑎𝑏∣ = ∣𝑥𝑦 − 𝑦𝑎 + 𝑦𝑎 − 𝑎𝑏∣ = ∣𝑦(𝑥 − 𝑎) + 𝑎(𝑦 − 𝑏)∣ ≤ ∣𝑦∣∣𝑥 − 𝑎∣ + ∣𝑎∣∣𝑦 − 𝑏∣

We know that ∣𝑥−𝑎∣ < 𝛿 and ∣𝑦−𝑏∣ < 𝛿. There is one less obvious factor to bound in the expression.
What should we do about ∣𝑦∣?. I leave it to the reader to show that:

∣𝑦 − 𝑏∣ < 𝛿 ⇒ ∣𝑦∣ < ∣𝑏∣ + 𝛿

Now put it all together and hopefully we’ll be able to ”solve” for 𝜖.

∣𝑥𝑦 − 𝑎𝑏∣ =≤ ∣𝑦∣∣𝑥 − 𝑎∣ + ∣𝑎∣∣𝑦 − 𝑏∣ < (∣𝑏∣ + 𝛿)𝛿 + ∣𝑎∣𝛿 = 𝛿 2 + 𝛿(∣𝑎∣ + ∣𝑏∣) ” = ” 𝜖

I put solve in quotes because we have considerably more freedom in our quest for finding 𝛿. We
could just as well find 𝛿 which makes the ” = ” become an <. That said let’s pursue equality,

2 −∣𝑎∣ − ∣𝑏∣ ± (∣𝑎∣ + ∣𝑏∣)2 + 4𝜖
𝛿 + 𝛿(∣𝑎∣ + ∣𝑏∣) − 𝜖 = 0 𝛿=
2
√ √
Since 𝜖, ∣𝑎∣, ∣𝑏∣ > 0 it follows that (∣𝑎∣ + ∣𝑏∣)2 + 4𝜖 < (∣𝑎∣ + ∣𝑏∣)2 = ∣𝑎∣+∣𝑏∣ hence the (+) solution
to the quadratic equation yields a positive 𝛿 namely:

−∣𝑎∣ − ∣𝑏∣ + (∣𝑎∣ + ∣𝑏∣)2 + 4𝜖
𝛿=
2
Yowsers, I almost made this a homework. There may be an easier route. You might notice we have
run across a few little lemmas (I’ve boxed the punch lines for the lemmas) which are doubtless
useful in other 𝜖 − 𝛿 proofs. We should collect those once we’re finished with this proof.

Proof: Let 𝜖 > 0 and let (𝑎, 𝑏) ∈ ℝ2 . By the calculations that prepared for the proof we know that
the following quantity is positive, hence choose

−∣𝑎∣ − ∣𝑏∣ + (∣𝑎∣ + ∣𝑏∣)2 + 4𝜖
𝛿= > 0.
2
32 CHAPTER 1. SET-UP

Note that11 ,

∣𝑥𝑦 − 𝑎𝑏∣ = ∣𝑥𝑦 − 𝑦𝑎 + 𝑦𝑎 − 𝑎𝑏∣ = ∣𝑦(𝑥 − 𝑎) + 𝑎(𝑦 − 𝑏)∣ algebra


≤ ∣𝑦∣∣𝑥 − 𝑎∣ + ∣𝑎∣∣𝑦 − 𝑏∣ triangle inequality
< (∣𝑏∣ + 𝛿)𝛿 + ∣𝑎∣𝛿 by the boxed lemmas
= 𝛿 2 + 𝛿(∣𝑎∣ + ∣𝑏∣) algebra
= 𝜖

where we know that last step follows due to the steps leading to the boxed equation in the proof
preparation. Therefore, lim(𝑥,𝑦)→(𝑎,𝑏) 𝑝(𝑥, 𝑦) = 𝑎𝑏. and it follows that the product function if con-
tinuous at (𝑎, 𝑏). But, (𝑎, 𝑏) is an arbitrary point thus 𝑝 is continuous on ℝ2 hence the product
function is continuous. □.

Lemma 1.4.16.
Assume 𝛿 > 0.

1. If 𝑎, 𝑥 ∈ ℝ then ∣𝑥 − 𝑎∣ < 𝛿 ⇒ ∣𝑥∣ < ∣𝑎∣ + 𝛿.

2. If 𝑥, 𝑎 ∈ ℝ𝑛 then ∣∣𝑥 − 𝑎∣∣ < 𝛿 ⇒ ∣𝑥𝑗 − 𝑎𝑗 ∣ < 𝛿 for 𝑗 = 1, 2, . . . 𝑛.

The proof of the proposition above is mostly contained in the remarks of the preceding two pages.

Example 1.4.17. . .

11
my notation is that when we stack inequalities the inequality in a particular line refers only to the immediate
vertical successor.
1.4. ELEMENTARY TOPOLOGY AND LIMITS 33

Proposition 1.4.18.

Let 𝑓 : 𝑉 ⊆ ℝ𝑝 → ℝ𝑚 and 𝑔 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑝 be mappings. Suppose that


lim𝑥→𝑎 𝑔(𝑥) = 𝑏 and suppose that 𝑓 is continuous at 𝑏 then
( )
lim (𝑓 ∘ 𝑔)(𝑥) = 𝑓 lim 𝑔(𝑥) .
𝑥→𝑎 𝑥→𝑎

The proof is in Edwards, see pages 46-47. Notice that the proposition above immediately gives us
the important result below:

Proposition 1.4.19.

Let 𝑓 and 𝑔 be mappings such that 𝑓 ∘ 𝑔 is well-defined. The composite function 𝑓 ∘ 𝑔 is


continuous for points 𝑎 ∈ 𝑑𝑜𝑚(𝑓 ∘ 𝑔) such that the following two conditions hold:

1. 𝑔 is continuous at 𝑎

2. 𝑓 is continuous at 𝑔(𝑎).

I make use of the earlier proposition that a mapping is continuous iff its component functions are
continuous throughout the examples that follow. For example, I know (𝐼𝑑, 𝐼𝑑) is continuous since
𝐼𝑑 was previously proved continuous.
( ) ( )
Example 1.4.20. Note that if 𝑓 = 𝑝 ∘ (𝐼𝑑, 𝐼𝑑) then 𝑓 (𝑥) = 𝑝 ∘ (𝐼𝑑, 𝐼𝑑) (𝑥) = 𝑝 (𝐼𝑑, 𝐼𝑑)(𝑥) =
𝑝(𝑥, 𝑥) = 𝑥2 . Therefore, the quadratic function 𝑓 (𝑥) = 𝑥2 is continuous on ℝ as it is the composite
of continuous functions.

Example 1.4.21. Note that if 𝑓 = 𝑝 ∘ (𝑝 ∘ (𝐼𝑑, 𝐼𝑑), 𝐼𝑑) then 𝑓 (𝑥) = 𝑝(𝑥2 , 𝑥) = 𝑥3 . Therefore, the
cubic function 𝑓 (𝑥) = 𝑥3 is continuous on ℝ as it is the composite of continuous functions.

Example 1.4.22. The power function is inductively defined by 𝑥1 = 𝑥 and 𝑥𝑛 = 𝑥𝑥𝑛−1 for all
𝑛 ∈ ℕ. We can prove 𝑓 (𝑥) = 𝑥𝑛 is continous by induction on 𝑛. We proved the 𝑛 = 1 case
previously. Assume inductively that 𝑓 (𝑥) = 𝑥𝑛−1 is continuous. Notice that

𝑥𝑛 = 𝑥𝑥𝑛−1 = 𝑥𝑓 (𝑥) = 𝑝(𝑥, 𝑓 (𝑥)) = (𝑝 ∘ (𝐼𝑑, 𝑓 ))(𝑥).

Therefore, using the induction hypothesis, we see that 𝑔(𝑥) = 𝑥𝑛 is the composite of continuous
functions thus it is continuous. We conclude that 𝑓 (𝑥) = 𝑥𝑛 is continuous for all 𝑛 ∈ ℕ.

We can play similar games with the sum function to prove that sums of power functions are
continuous. In your homework you will prove constant functions are continuous. Putting all of
these things together gives us the well-known result that polynomials are continuous on ℝ.
34 CHAPTER 1. SET-UP

Proposition 1.4.23.

Let 𝑎 be a limit point of mappings 𝑓, 𝑔 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ and suppose 𝑐 ∈ ℝ. If


lim𝑥→𝑎 𝑓 (𝑥) = 𝑏1 ∈ ℝ and lim𝑥→𝑎 𝑔(𝑥) = 𝑏2 ∈ ℝ then

1. lim𝑥→𝑎 (𝑓 (𝑥) + 𝑔(𝑥)) = lim𝑥→𝑎 𝑓 (𝑥) + lim𝑥→𝑎 𝑔(𝑥).


( )( )
2. lim𝑥→𝑎 (𝑓 (𝑥)𝑔(𝑥)) = lim𝑥→𝑎 𝑓 (𝑥) lim𝑥→𝑎 𝑔(𝑥) .

3. lim𝑥→𝑎 (𝑐𝑓 (𝑥)) = 𝑐 lim𝑥→𝑎 𝑓 (𝑥).

Moreover, if 𝑓, 𝑔 are continuous then 𝑓 + 𝑔, 𝑓 𝑔 and 𝑐𝑓 are continuous.


Proof: Edwards proves (1.) carefully on pg. 48. I’ll do (2.) here: we are given that If lim𝑥→𝑎 𝑓 (𝑥) =
𝑏1 ∈ ℝ and lim𝑥→𝑎 𝑔(𝑥) = 𝑏2 ∈ ℝ thus by Proposition 1.4.10 we find lim𝑥→𝑎 (𝑓, 𝑔)(𝑥) = (𝑏1 , 𝑏2 ).
Consider then,
( )
lim𝑥→𝑎 (𝑓 (𝑥)𝑔(𝑥)) = lim
( 𝑥→𝑎 𝑝(𝑓, 𝑔) ) defn. of product function
= 𝑝 lim𝑥→𝑎 (𝑓, 𝑔) since 𝑝 is continuous
= 𝑝(𝑏1 , 𝑏2 ) by Proposition 1.4.10.
= 𝑏(1 𝑏2 )( ) definition of product function
= lim𝑥→𝑎 𝑓 (𝑥) lim𝑥→𝑎 𝑔(𝑥) .

In your homework you proved that lim𝑥→𝑎 𝑐 = 𝑐 thus item (3.) follows from (2.). □.

The proposition that follows does follow immediately from the proposition above, however I give a
proof that again illustrates the idea we used in the examples. Reinterpreting a given function as a
composite of more basic functions is a useful theoretical and calculational technique.

Proposition 1.4.24.

Assume 𝑓, 𝑔 : 𝑈 ⊆ ℝ𝑛 → 𝑉 ⊆ ℝ are continuous functions at 𝑎 ∈ 𝑈 and suppose 𝑐 ∈ ℝ.

1. 𝑓 + 𝑔 is continuous at 𝑎.

2. 𝑓 𝑔 is continuous at 𝑎

3. 𝑐𝑓 is continuous at 𝑎.

Moreover, if 𝑓, 𝑔 are continuous then 𝑓 + 𝑔, 𝑓 𝑔 and 𝑐𝑓 are continuous.


Proof: Observe that (𝑓 + 𝑔)(𝑥) = (𝑠 ∘ (𝑓, 𝑔))(𝑥) and (𝑓 𝑔)(𝑥) = (𝑝 ∘ (𝑓, 𝑔))(𝑥). We’re given that
𝑓, 𝑔 are continuous at 𝑎 and we know 𝑠, 𝑝 are continuous on all of ℝ2 thus the composite functions
𝑠 ∘ (𝑓, 𝑔) and 𝑝 ∘ (𝑓, 𝑔) are continuous at 𝑎 and the proof of items (1.) and (2.) is complete. To
prove (3.) I refer the reader to their homework where it was shown that ℎ(𝑥) = 𝑐 for all 𝑥 ∈ 𝑈 is a
continuous function. We then find (3.) follows from (2.) by setting 𝑔 = ℎ (function multiplication
commutes for real-valued functions). □.
1.4. ELEMENTARY TOPOLOGY AND LIMITS 35

We can use induction arguments to extend these results to arbitrarily many products and sums of
power functions.To prove continuity of algebraic functions we’d need to do some more work with
quotient and root functions. I’ll stop here for the moment, perhaps I’ll ask you to prove a few more
fundamentals from calculus I. I haven’t delved into the definition of exponential or log functions
not to mention sine or cosine. We will assume that the basic functions of calculus are continuous
on the interior of their respective domains. Basically if the formula for a function can be evaluated
at the limit point then the function is continuous.

It’s not hard to see that the comments above extend to functions of several variables and map-
pings. If the formula for a mapping is comprised of finite sums and products of power func-
tions then we can prove such a mapping is continuous using the techniques developed in this
section. If we have a mapping with a more complicated formula built from elementary func-
tions then that mapping will be continuous provided its component functions have formulas which
are sensibly calculated at the limit point. In other words, if you are willing to believe me that

sin(𝑥), cos(𝑥), 𝑒𝑥 , ln(𝑥), cosh(𝑥), sinh(𝑥), 𝑥, 𝑥1𝑛 , . . . are continuous on the interior of their domains
then it’s not hard to prove:

1 )

( √
𝑥 𝑥+ 𝑦𝑧 )
2 𝑥
𝑓 (𝑥, 𝑦, 𝑧) = sin(𝑥) + 𝑒 + cosh(𝑥 ) + 𝑦 + 𝑒 , cosh(𝑥𝑦𝑧), 𝑥𝑒

is a continuous mapping at points where the radicands of the square root functions are nonnegative.
It wouldn’t be very fun to write explicitly but it is clear that this mapping is the Cartesian product
of functions which are the sum, product and composite of continuous functions.
Definition 1.4.25.
A polynomial in 𝑛-variables has the form:


𝑓 (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) = 𝑐𝑖1 ,𝑖2 ,...,𝑖𝑛 𝑥𝑖11 𝑥𝑖22 ⋅ ⋅ ⋅ 𝑥𝑖𝑛𝑘
𝑖1 ,𝑖2 ,...,𝑖𝑘 =0

where only finitely many coefficients 𝑐𝑖1 ,𝑖2 ,...,𝑖𝑛 ∕= 0. We denote the set of multinomials in
𝑛-variables as ℝ(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ).
Polynomials are ℝ(𝑥). Polynomials in two variables are ℝ(𝑥, 𝑦), for example,

𝑓 (𝑥, 𝑦) = 𝑎𝑥 + 𝑏𝑦 𝑑𝑒𝑔(𝑓 ) = 1, linear function


𝑓 (𝑥, 𝑦) = 𝑎𝑥 + 𝑏𝑦 + 𝑐 𝑑𝑒𝑔(𝑓 ) = 1, affine function
𝑓 (𝑥, 𝑦) = 𝑎𝑥2 + 𝑏𝑥𝑦 + 𝑐𝑦 2 deg(f)=2, quadratic form
𝑓 (𝑥, 𝑦) = 𝑎𝑥2 + 𝑏𝑥𝑦 + 𝑐𝑦 2 + 𝑑𝑥 + 𝑒𝑦 + 𝑔 deg(f)=2
If all the terms in the polynomial have the same number of variables then it is said to be homo-
geneous. In the list above only the linear function and the quadratic form were homogeneous.

Remark 1.4.26.
36 CHAPTER 1. SET-UP

There are other topologies possible for ℝ𝑛 . For example, one can prove that

∣∣𝑣∣∣1 = ∣𝑣1 ∣ + ∣𝑣2 ∣ + ⋅ ⋅ ⋅ + ∣𝑣𝑛 ∣

gives a norm on ℝ𝑛 and the theorems we proved transfer over almost without change by
just trading ∣∣ ⋅ ∣∣ for ∣∣ ⋅ ∣∣1 . The unit ”ball” becomes a diamond for the 1-norm. There are
many other norms which can be constructed, infinitely many it turns out. However, it has
been shown that the topology of all these different norms is equivalent. This means that
open sets generated from different norms will be the same class of sets. For example, if
you can fit an open disk around every point in a set then it’s clear you can just as well fit
an open diamond and vice-versa. One of the things that makes infinite dimensional linear
algebra more fun is the fact that the topology generated by distinct norms need not be
equivalent for infinite dimensions. There is a difference between the open sets generated by
the Euclidean norm verses those generated by the 1-norm. Incidentally, my thesis work is
mostly built over the 1-norm. It makes the supernumbers happy.
Chapter 2

linear algebra

Our goal in the first section of this chapter is to gain conceptual clarity on the meaning of the
central terms from linear algebra. This is a birds-eye view of linear, my selection of topics here is
centered around the goal of helping you to see the linear algebra in calculus. Once you see it then
you can use the theory of linear algebra to help organize your thinking. Our ultimate goal is that
organizational principle. Our goal here is not to learn all of linear algebra, rather we wish to use it
as a tool for the right jobs as they arise this semester.

In the second section we summarize the tools of matrix computation. We will use matrix addition,
multiplication and throughout this course. Inverse matrices and the noncommuative nature of ma-
trix multiplication are illustrated. It is assumed that the reader has some previous experience in
matrix computation, at least in highschool you should have spent some time.

In the third section the concept of a linear transformation is formalized. The formula for any
linear transformation from ℝ𝑚 to ℝ𝑚 can be expressed as a matrix multiplication. We study this
standard matrix in enough depth as to understand it’s application in for differentiation. A number
of examples to visualize the role of a linear transformation are offered for breadth. Finally, isomor-
phisms and coordinate maps are discussed.

In the fourth section we define norms for vector spaces. We study how the norm allows us to define
limits for an abstract vector space. This is important since it allows us to quantify continuity for
abstract linear transformations as well as ultimately to define differentiation on a normed vector
space in the chapter that follows.

37
38 CHAPTER 2. LINEAR ALGEBRA

2.1 vector spaces


Suppose we have a set 𝑉 paired with an addition and scalar multiplication such that for all 𝑥, 𝑦, 𝑧 ∈
𝑉 and 𝑎, 𝑏, 𝑐 ∈ ℝ:
(𝑖.) 𝑥 + 𝑦 = 𝑦 + 𝑥, (𝑖𝑖.) (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧)
(𝑖𝑖𝑖.) 𝑥 + 0 = 𝑥, (𝑖𝑣.) 𝑥 − 𝑥 = 0
(𝑣.) 1𝑥 = 𝑥, (𝑣𝑖.) (𝑎𝑏)𝑥 = 𝑎(𝑏𝑥),
(𝑣𝑖𝑖.) 𝑎(𝑥 + 𝑦) = 𝑎𝑥 + 𝑎𝑦, (𝑣𝑖𝑖𝑖.) (𝑎 + 𝑏)𝑥 = 𝑎𝑥 + 𝑏𝑥
(𝑖𝑥.) 𝑥 + 𝑦 ∈ ℝ𝑛 (𝑥.) 𝑐𝑥 ∈ ℝ𝑛
then we say that 𝑉 is a vector space over ℝ. To be a bit more precise, by (iii.) I mean to say
that there exist some element 0 ∈ 𝑉 such that 𝑥 + 0 = 𝑥 for each 𝑥 ∈ 𝑉 . Also, (iv.) should be
understood to say that for each 𝑥 ∈ 𝑉 there exists another element −𝑥 ∈ 𝑉 such that 𝑥 + (−𝑥) = 0.
Example 2.1.1. ℝ𝑛 is a vector space with respect to the standard vector addition and scalar
multiplication.
Example 2.1.2. ℂ = {𝑎 + 𝑖𝑏 ∣ 𝑎, 𝑏 ∈ ℝ} is a vector space where the usual complex number addition
provides the vector addition and multiplication by a real number 𝑐(𝑎 + 𝑖𝑏) = 𝑐𝑎 + 𝑖(𝑐𝑏) clearly defines
a scalar multiplication.
Example 2.1.3. The set of all 𝑚 × 𝑛 matrices is a vector space with respect to the usual matrix
addition and scalar multiplication. We will elaborate on the details in an upcoming section.
Example 2.1.4. Suppose ℱ is the set of all functions from a set 𝑆 to a vector space 𝑉 then ℱ is
naturally a vector space with respect to the function addition and multiplication by a scalar. Both
of those operations are well-defined on the values of the function since we assumed the codomain of
each function in ℱ is the vector space 𝑉 .
There are many subspaces of function space which provide interesting examples of vector spaces.
For example, the set of continuous functions:
Example 2.1.5. Let 𝐶 0 (ℝ𝑛 , ℝ𝑚 ) be a set of continuous functions from ℝ𝑛 to ℝ𝑚 then 𝐶 0 (ℝ𝑛 , ℝ𝑚 )
is a vector space with respect to function addition and the usual multiplication. This fact relies on
the sum and scalar multiple of continuous functions is once more continuous.

Definition 2.1.6.
We say a subset 𝑆 of a vector space 𝑉 is linearly independent (LI) iff for scalars
𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ,
𝑐1 𝑣1 + 𝑐2 𝑣2 + ⋅ ⋅ ⋅ 𝑐𝑘 𝑣𝑘 = 0 ⇒ 𝑐1 = 𝑐2 = ⋅ ⋅ ⋅ = 0
for each finite subset {𝑣1 , 𝑣2 , . . . , 𝑣𝑘 } of 𝑆.
In the case that 𝑆 is finite it suffices to show the implication for a linear combination of all the
vectors in the set. Notice that if any vector in the set 𝑆 can be written as a linear combination of
the other vectors in 𝑆 then that makes 𝑆 fail the test for linear independence. Moreover, if a set 𝑆
is not linearly independent then we say 𝑆 is linearly dependent.
2.1. VECTOR SPACES 39

Example 2.1.7. The standard basis of ℝ𝑛 is denoted {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 }. We can show linear inde-
pendence easily via the dot-product: suppose that 𝑐1 𝑒1 + 𝑐2 𝑒2 + ⋅ ⋅ ⋅ 𝑐𝑛 𝑒𝑛 = 0 and take the dot-product
of both sides with 𝑒𝑗 to obtain

(𝑐1 𝑒1 + 𝑐2 𝑒2 + ⋅ ⋅ ⋅ 𝑐𝑛 𝑒𝑛 ) ⋅ 𝑒𝑗 = 0 ⋅ 𝑒𝑗 ⇒ (𝑐1 𝑒1 ⋅ 𝑒𝑗 + 𝑐2 𝑒2 ⋅ 𝑒𝑗 + ⋅ ⋅ ⋅ 𝑐𝑛 𝑒𝑛 ⋅ 𝑒𝑗 = 0 ⇒ 𝑐𝑗 (1) = 0

but, 𝑗 was arbitrary hence it follows that 𝑐1 = 𝑐2 = ⋅ ⋅ ⋅ = 𝑐𝑛 = 0 which establishes the linear
independence of the standard basis.

Example 2.1.8. Consider 𝑆 = {1, 𝑖} ⊂ ℂ. We can argue 𝑆 is LI as follows: suppose 𝑐1 (1)+𝑐2 (𝑖) =
0. Thus 𝑐1 +𝑖𝑐2 = 0 for some real numbers 𝑐1 , 𝑐2 . Recall that a basic property of complex numbers is
that if 𝑧1 = 𝑧2 then 𝑅𝑒(𝑧1 ) = 𝑅𝑒(𝑧2 ) and 𝐼𝑚(𝑧1 ) = 𝐼𝑚(𝑧2 ) where 𝑧𝑗 = 𝑅𝑒(𝑧𝑗 )+𝑖𝐼𝑚(𝑧𝑗 ). Therefore,
the complex equation 𝑐1 + 𝑖𝑐2 = 0 yields two real equations 𝑐1 = 0 and 𝑐2 = 0.

Example 2.1.9. Let 𝐶 0 (ℝ) be the vector space of all continuous functions from ℝ to ℝ. Suppose
𝑆 is the set of monic1 monomials 𝑆 = {1, 𝑡, 𝑡2 , 𝑡3 , . . . }. This is an infinite set. We can argue LI
as follows: suppose 𝑐1 𝑡𝑝1 + 𝑐2 𝑡𝑝2 + ⋅ ⋅ ⋅ + 𝑐𝑘 𝑡𝑝𝑘 = 0. For convenience relable the powers 𝑝1 , 𝑝2 , . . . , 𝑝𝑘
by 𝑝𝑖1 , 𝑝𝑖2 , . . . , 𝑝𝑖𝑘 such that 1 < 𝑝𝑖1 < 𝑝𝑖2 < ⋅ ⋅ ⋅ < 𝑝𝑖𝑘 . This notation just shuffles the terms in the
finite sum around so that the first term has the lowest order: consider

𝑐𝑖1 𝑡𝑝𝑖1 + 𝑐𝑖1 𝑡𝑝𝑖2 + ⋅ ⋅ ⋅ + 𝑐𝑖𝑘 𝑡𝑝𝑖𝑘 = 0 ★

If 𝑝𝑖1 = 0 then evaluate ★ at 𝑡 = 0 to obtain 𝑐𝑖1 = 0. If 𝑝𝑖1 > 0 then differentiate ★ 𝑝𝑖1 times and
denote this new equation by 𝐷𝑝𝑖1 ★. Evaluate 𝐷𝑝𝑖1 ★ at 𝑡 = 0 to find

𝑝𝑖1 (𝑝𝑖1 − 1) ⋅ ⋅ ⋅ 3 ⋅ 2 ⋅ 1𝑐𝑖1 = 0

hence 𝑐𝑖1 = 0. Since we set-up 𝑝𝑖1 < 𝑝𝑖2 it follows that after 𝑝𝑖1 -differentiations the second summand
is still nontrivial in 𝐷𝑝𝑖1 ★. However, we can continue differentiating ★ until we reach 𝐷𝑝𝑖2 ★ and
then constant term is 𝑝𝑖2 !𝑐𝑖2 so evaluation will show 𝑐𝑖2 = 0. We continue in this fashion until we
have shown that 𝑐𝑖𝑗 = 0 for 𝑗 = 1, 2, . . . 𝑘. It follows that 𝑆 is a linearly independent set.

We spend considerable effort in linear algebra to understand LI from as many angles as possible.
One equivalent formulation of LI is the ability to equate coefficients. In other words, a set of objects
is LI iff whenever we have an equation with thos objects we can equate coefficients. In calculus
when we equate coefficients we implicitly assume that the functions in question are LI. Generally
speaking two functions are LI if their graphs have distinct shapes which cannot be related by a
simple vertical stretch.

Example 2.1.10. Consider 𝑆 = {2𝑡 , 3(1/2)−𝑡 } as a subset the vector space 𝐶 0 (ℝ). To show linear
dependence we observe that 𝑐1 2𝑡 + 𝑐2 3(1/2)−𝑡 = 0 yields (𝑐1 + 3𝑐2 )2𝑡 = 0. Hence 𝑐1 + 3𝑐2 = 0 which
means nontrivial solutions exist. Take 𝑐2 = 1 then 𝑐1 = −3. Of course the heart of the matter is
that 3(1/2)−𝑡 = 3(2𝑡 ) so the second function is just a scalar multiple of the first function.
1
monic means that the leading coefficient is 1.
40 CHAPTER 2. LINEAR ALGEBRA

If you’ve taken differential equations then you should recognize the concept of LI from your study
of solution sets to differential equations. Given an 𝑛-th order linear differential equation we always
have a goal of calculating 𝑛-LI solutions. In that context LI is important because it helps us
make sure we do not count the same solution twice. The general solution is formed from a linear
combination of the LI solution set. Of course this is not a course in differential equations, I include
this comment to make connections to the other course. One last example on LI should suffice to
make certain you at least have a good idea of the concept:
Example 2.1.11. Consider ℝ3 as a vector space and consider the set 𝑆 = {⃗𝑣 , î, ĵ, k̂} where we
could also denote î = 𝑒1 , ĵ = 𝑒2 , k̂ = 𝑒3 but I’m aiming to make your mind connect with your
calculus III background. This set is clearly linearly dependent since we can write any vector ⃗𝑣 as
a linear combination of the standard unit-vectors: moreover, we can use dot-products to select the
𝑥, 𝑦 and 𝑧 components as follows:
⃗𝑣 = (⃗𝑣 ⋅ î)î + (⃗𝑣 ⋅ ĵ)ĵ + (⃗𝑣 ⋅ k̂)k̂
Linear independence helps us quantify a type of redundancy for vectors in a given set. The next
definition is equally important and it is sort of the other side of the coin; spanning is a criteria
which helps us insure a set of vectors will cover a vector space without missing anything.
Definition 2.1.12.
We say a subset 𝑆 of a vector space 𝑉 is a spanning set for 𝑉 iff for each 𝑣 ∈ 𝑉 there
exist scalars 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 and vectors 𝑣1 , 𝑣2 , . . . , 𝑣𝑘 ∈ 𝑉 such that 𝑣 = 𝑐1 𝑣1 + 𝑐2 𝑣2 + ⋅ ⋅ ⋅ 𝑐𝑘 𝑣𝑘 .
We denote 𝑆𝑝𝑎𝑛{𝑣1 , 𝑣2 , . . . , 𝑣𝑘 } = {𝑐1 𝑣1 + 𝑐2 𝑣2 + ⋅ ⋅ ⋅ 𝑐𝑘 𝑣𝑘 ∣ 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ ℝ}.
If 𝑆 ⊂ 𝑉 and 𝑉 is a vector space then it is immediately obvious that 𝑆𝑝𝑎𝑛(𝑆) ⊆ 𝑉 . If 𝑆 is a
spanning set then it is obvious that 𝑉 ⊆ 𝑆𝑝𝑎𝑛(𝑆). It follows that when 𝑆 is a spanning set for 𝑉
we have 𝑆𝑝𝑎𝑛(𝑆) = 𝑉 .
Example 2.1.13. It is easy to show that if 𝑣 ∈ ℝ𝑛 then 𝑣 = 𝑣1 𝑒1 + 𝑣2 𝑒2 + ⋅ ⋅ ⋅ + 𝑣𝑛 𝑒𝑛 . It follows
that ℝ𝑛 = 𝑆𝑝𝑎𝑛{𝑒𝑖 }𝑛𝑖=1 .
Example 2.1.14. Let 1, 𝑖 ∈ ℂ where 𝑖2 = −1. Clearly ℂ = 𝑆𝑝𝑎𝑛{1, 𝑖}.
Example 2.1.15. Let 𝑃 be the set of polynomials. Since the sum of any two polynomials and
the scalar multiple of any polynomial is once more a polynomial we find 𝑃 is a vector space with
respect to function addition and multiplication of a function by a scalar. We can argue that the set
of monic monomials {1, 𝑡, 𝑡2 , . . . } a spanning set for 𝑃 . Why? Because if 𝑓 (𝑡) ∈ 𝑃 then that means
there are scalars 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 such that 𝑓 (𝑥) = 𝑎0 + 𝑎1 𝑡 + 𝑎2 𝑡2 + ⋅ ⋅ ⋅ + 𝑎𝑛 𝑡𝑛

Definition 2.1.16.
We say a subset 𝛽 of a vector space 𝑉 is a basis for 𝑉 iff 𝛽 is a linearly independent
spanning set for 𝑉 . If 𝛽 is a finite set then 𝑉 is said to be finite dimensional and the
number of vectors in 𝛽 is called the dimension of 𝑉 . That is, if 𝛽 = {𝑣1 , 𝑣2 , . . . , 𝑣𝑛 } is a
basis for 𝑉 then 𝑑𝑖𝑚(𝑉 ) = 𝑛. If no finite basis exists for 𝑉 then 𝑉 is said to be infinite
dimensional.
2.2. MATRIX CALCULATION 41

The careful reader will question why this concept of dimension is well-defined. Why can we not
have bases of differing dimension for a given vector space? I leave this question for linear algebra,
the theorem which asserts the uniqueness of dimension is one of the deeper theorems in the course.
However, like most everything in linear, at some level it just boils down to solving some particular
set of equations. You might tell Dr. Sprano it’s just algebra. In any event, it is common practice
to use the term dimension in courses where linear algebra is not understood. For example, ℝ2 is a
two-dimensional space. Or we’ll say that ℝ3 is a three-dimensional space. This terminology agrees
with the general observation of the next example.

Example 2.1.17. The standard basis {𝑒𝑖 }𝑛𝑖=1 for ℝ𝑛 is a basis for ℝ𝑛 and 𝑑𝑖𝑚(ℝ𝑛 ) = 𝑛. This
result holds for all 𝑛 ∈ ℕ. The line is one-dimensional, the plane is two-dimensional, three-space
is three-dimensional etc...

Example 2.1.18. The set {1, 𝑖} is a basis for ℂ. It follows that 𝑑𝑖𝑚(ℂ) = 2. We say that the
complex numbers form a two-dimensional real vector space.

Example 2.1.19. The set of polynomials is clearly infinite dimensional. Contradiction shows this
without much effort. Suppose 𝑃 had a finite basis 𝛽. Choose the polynomial of largest degree (say
𝑘) in 𝛽. Notice that 𝑓 (𝑡) = 𝑡𝑘+1 is a polynomial and yet clearly 𝑓 (𝑡) ∈
/ 𝑆𝑝𝑎𝑛(𝛽) hence 𝛽 is not a
spanning set. But this contradicts the assumption 𝛽 is a basis. Hence, by contradiction, no finite
basis exists and we conclude the set of polynomials is infinite dimensional.

There is a more general use of the term dimension which is beyond the context of linear algebra.
For example, in calculus II or III you may have heard that a circle is one-dimensional or a surface
is two-dimensional. Well, circles and surfaces are not usually vector spaces so the terminology is
not taken from linear algebra. In fact, that use of the term dimension stems from manifold theory.
I hope to discuss manifolds later in this course.

2.2 matrix calculation


An 𝑚 × 𝑛 matrix is an array of numbers with 𝑚-rows and 𝑛-columns. We define ℝ 𝑚×𝑛 to be the
set of all 𝑚 × 𝑛 matrices. The set of all 𝑛-dimensional column vectors is ℝ 𝑛×1 = ℝ𝑛2 . The set of
all 𝑛-dimensional row vectors is ℝ1×𝑛 . A given matrix 𝐴 ∈ ℝ 𝑚×𝑛 has 𝑚𝑛-components 𝐴𝑖𝑗 . Notice
that the components are numbers; 𝐴𝑖𝑗 ∈ ℝ for all 𝑖, 𝑗 such that 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛. We
should not write 𝐴 = 𝐴𝑖𝑗 because it is nonesense, however 𝐴 = [𝐴𝑖𝑗 ] is quite fine.

Suppose 𝐴 ∈ ℝ 𝑚×𝑛 , note for 1 ≤ 𝑗 ≤ 𝑛 we have 𝑐𝑜𝑙𝑗 (𝐴) ∈ ℝ𝑚×1 whereas for 1 ≤ 𝑖 ≤ 𝑚 we find
𝑟𝑜𝑤𝑖 (𝐴) ∈ ℝ1×𝑛 . In other words, an 𝑚×𝑛 matrix has 𝑛 columns of length 𝑚 and 𝑛 rows of length 𝑚.

2
We will use the convention that points in ℝ𝑛 are column vectors. However, we will use the somewhat subtle
notation (𝑥1 , 𝑥2 , . . . 𝑥𝑛 ) = [𝑥1 , 𝑥2 , . . . 𝑥𝑛 ]𝑇 . This helps me write ℝ𝑛 rather than ℝ 𝑛×1 and I don’t have to pepper
transposes all over the place. If you’ve read my linear algebra notes you’ll appreciate the wisdom of our convention.
42 CHAPTER 2. LINEAR ALGEBRA

Two matrices 𝐴 and 𝐵 are equal iff 𝐴𝑖𝑗 = 𝐵𝑖𝑗 for all 𝑖, 𝑗. Given matrices 𝐴, 𝐵 with components
𝐴𝑖𝑗 , 𝐵𝑖𝑗 and constant 𝑐 ∈ ℝ we define

(𝐴 + 𝐵)𝑖𝑗 = 𝐴𝑖𝑗 + 𝐵𝑖𝑗 (𝑐𝐴)𝑖𝑗 = 𝑐𝐴𝑖𝑗 , for all 𝑖, 𝑗.

The zero matrix in ℝ 𝑚×𝑛 is denoted 0 and defined by 0𝑖𝑗 = 0 for all 𝑖, 𝑗. The additive inverse
of 𝐴 ∈ ℝ 𝑚×𝑛 is the matrix −𝐴 such that 𝐴 + (−𝐴) = 0. The components of the additive inverse
matrix are given by (−𝐴)𝑖𝑗 = −𝐴𝑖𝑗 for all 𝑖, 𝑗. Likewise, if 𝐴 ∈ ℝ 𝑚×𝑛 and 𝐵 ∈ ℝ 𝑛×𝑝 then the
product 𝐴𝐵 ∈ ℝ 𝑚×𝑝 is defined by3 :

𝑛

(𝐴𝐵)𝑖𝑗 = 𝐴𝑖𝑘 𝐵𝑘𝑗
𝑘=1

for each 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑝. In the case 𝑚 = 𝑝 = 1 the indices 𝑖, 𝑗 are omitted in the equation
since the matrix product is simply a number which needs no index. The identity matrix 𝑛×𝑛
{ in ℝ
1 𝑖=𝑗
is the 𝑛 × 𝑛 square matrix 𝐼 whose components are the Kronecker delta; 𝐼𝑖𝑗 = 𝛿𝑖𝑗 = .
0 𝑖 ∕= 𝑗
[ ]
1 0
The notation 𝐼𝑛 is sometimes used. For example, 𝐼2 = . If the size of the identity matrix
0 1
needs emphasis otherwise the size of the matrix 𝐼 is to be understood from the context.

Let 𝐴 ∈ ℝ 𝑛×𝑛 . If there exists 𝐵 ∈ ℝ 𝑛×𝑛 such that 𝐴𝐵 = 𝐼 and 𝐵𝐴 = 𝐼 then we say that 𝐴
is invertible and 𝐴−1 = 𝐵. Invertible matrices are also called nonsingular. If a matrix has no
inverse then it is called a noninvertible or singular matrix.

Let 𝐴 ∈ ℝ 𝑚×𝑛 then 𝐴𝑇 ∈ ℝ 𝑛×𝑚 is called the transpose of 𝐴 and is defined by (𝐴𝑇 )𝑗𝑖 = 𝐴𝑖𝑗
for all 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛. It is sometimes useful to know that (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 and
(𝐴𝑇 )−1 = (𝐴−1 )𝑇 . It is also true that (𝐴𝐵)−1 = 𝐵 −1 𝐴−1 . Furthermore, note dot-product of
𝑣, 𝑤 ∈ 𝑉 𝑛 is given by 𝑣 ⋅ 𝑤 = 𝑣 𝑇 𝑤.

The 𝑖𝑗-th standard basis matrix for ℝ 𝑚×𝑛 is denoted 𝐸𝑖𝑗 for 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛. The
matrix 𝐸𝑖𝑗 is zero in all entries except for the (𝑖, 𝑗)-th slot where it has a 1. In other words, we
define (𝐸𝑖𝑗 )𝑘𝑙 = 𝛿𝑖𝑘 𝛿𝑗𝑙 . I invite the reader to show that the term basis is justified in this context4 .
Given this basis we see that the vector space ℝ 𝑚×𝑛 has 𝑑𝑖𝑚(ℝ 𝑚×𝑛 ) = 𝑚𝑛.

Theorem 2.2.1.
3
this product is defined so the matrix of the composite of a linear transformation is the product of the matrices
of the composed transformations. This is illustrated later in this section and is proved in my linear algebra notes.
4
the theorem stated below contains the needed results and then some, you can find the proof is given in my linear
algebra notes. It would be wise to just work it out in the 2 × 2 case as a warm-up if you are interested
2.2. MATRIX CALCULATION 43

If 𝐴 ∈ ℝ 𝑚×𝑛 and 𝑣 ∈ ℝ then


𝑛
∑ 𝑚 ∑
∑ 𝑛
(𝑖.) 𝑣 = 𝑣𝑛 𝑒𝑛 (𝑖𝑖.) 𝐴 = 𝐴𝑖𝑗 𝐸𝑖𝑗
𝑖=1 𝑖=1 𝑗=1

(𝑖𝑖𝑖.) [𝑒𝑖 𝑇 𝐴] = 𝑟𝑜𝑤𝑖 (𝐴) (𝑖𝑣.) [𝐴𝑒𝑖 ] = 𝑐𝑜𝑙𝑖 (𝐴)


(𝑣.) 𝐸𝑖𝑗 𝐸𝑘𝑙 = 𝛿𝑗𝑘 𝐸𝑖𝑙 (𝑣𝑖.) 𝐸𝑖𝑗 = 𝑒𝑖 𝑒𝑗 𝑇 (𝑣𝑖𝑖.) 𝑒𝑖 𝑇 𝑒𝑗 = 𝑒𝑖 ⋅ 𝑒𝑗 = 𝛿𝑖𝑗 .

You can look in my linear algebra notes for the details of the theorem. I’ll just expand one point
here: Let 𝐴 ∈ ℝ 𝑚×𝑛 then
⎡ ⎤
𝐴11 𝐴12 ⋅ ⋅ ⋅ 𝐴1𝑛
⎢ 𝐴21 𝐴22 ⋅ ⋅ ⋅ 𝐴2𝑛 ⎥
𝐴 =⎢ .
⎢ ⎥
.. .. ⎥
⎣ .. . ⋅⋅⋅ . ⎦
𝐴𝑚1 𝐴𝑚2 ⋅ ⋅ ⋅ 𝐴𝑚𝑛
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 ⋅⋅⋅ 0 0 1 ⋅⋅⋅ 0 0 0 ⋅⋅⋅ 0
⎢ 0 0 ⋅⋅⋅ 0 ⎥ ⎢ 0 0 ⋅⋅⋅ 0 ⎥ ⎢ 0 0 ⋅⋅⋅ 0 ⎥
= 𝐴11 ⎢ ⎥ + 𝐴12 ⎢ ⎥ + ⋅ ⋅ ⋅ + 𝐴𝑚𝑛 ⎢
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
.. .. .. .. .. .. ⎥
⎣ . . ⋅⋅⋅ 0 ⎦ ⎣ . . ⋅⋅⋅ 0 ⎦ ⎣ . . ⋅⋅⋅ 0 ⎦
0 0 ⋅⋅⋅ 0 0 0 ⋅⋅⋅ 0 0 0 ⋅⋅⋅ 1

= 𝐴11 𝐸11 + 𝐴12 𝐸12 + ⋅ ⋅ ⋅ + 𝐴𝑚𝑛 𝐸𝑚𝑛 .


The calculation above follows from repeated 𝑚𝑛-applications of the definition of matrix addition
and another 𝑚𝑛-applications of the definition of scalar multiplication of a matrix.

Example 2.2.2. Suppose 𝐴 = [ 14 25 36 ]. We see that 𝐴 has 2 rows and 3 columns thus 𝐴 ∈ ℝ2×3 .
Moreover, 𝐴11 = 1, 𝐴12 = 2, 𝐴13 = 3, 𝐴21 = 4, 𝐴22 = 5, and 𝐴23 = 6. It’s not usually possible to
find a formula for a generic element in the matrix, but this matrix satisfies 𝐴𝑖𝑗 = 3(𝑖 − 1) + 𝑗 for
all 𝑖, 𝑗 5 . The columns of 𝐴 are,
[ ] [ ] [ ]
1 2 3
𝑐𝑜𝑙1 (𝐴) = , 𝑐𝑜𝑙2 (𝐴) = , 𝑐𝑜𝑙3 (𝐴) = .
4 5 6
The rows of 𝐴 are [ ] [ ]
𝑟𝑜𝑤1 (𝐴) = 1 2 3 , 𝑟𝑜𝑤2 (𝐴) = 4 5 6
[1 4]
Example 2.2.3. Suppose 𝐴 = [ 14 25 36 ] then 𝐴𝑇 = 25 . Notice that
36

𝑟𝑜𝑤1 (𝐴) = 𝑐𝑜𝑙1 (𝐴𝑇 ), 𝑟𝑜𝑤2 (𝐴) = 𝑐𝑜𝑙2 (𝐴𝑇 )


5
In the statement ”for all 𝑖, 𝑗” it is to be understood that those indices range over their allowed values. In the
preceding example 1 ≤ 𝑖 ≤ 2 and 1 ≤ 𝑗 ≤ 3.
44 CHAPTER 2. LINEAR ALGEBRA

and
𝑐𝑜𝑙1 (𝐴) = 𝑟𝑜𝑤1 (𝐴𝑇 ), 𝑐𝑜𝑙2 (𝐴) = 𝑟𝑜𝑤2 (𝐴𝑇 ), 𝑐𝑜𝑙3 (𝐴) = 𝑟𝑜𝑤3 (𝐴𝑇 )
Notice (𝐴𝑇 )𝑖𝑗 = 𝐴𝑗𝑖 = 3(𝑗 − 1) + 𝑖 for all 𝑖, 𝑗; at the level of index calculations we just switch the
indices to create the transpose.
Example 2.2.4. Let 𝐴 = [ 13 24 ] and 𝐵 = [ 57 68 ]. We calculate
[ ] [ ] [ ]
1 2 5 6 6 8
𝐴+𝐵 = + = .
3 4 7 8 10 12

Example 2.2.5. Let 𝐴 = [ 13 24 ] and 𝐵 = [ 57 68 ]. We calculate


[ ] [ ] [ ]
1 2 5 6 −4 −4
𝐴−𝐵 = − = .
3 4 7 8 −4 −4
Now multiply 𝐴 by the scalar 5,
[ ] [ ]
1 2 5 10
5𝐴 = 5 =
3 4 15 20

Example 2.2.6. Let 𝐴, 𝐵 ∈ ℝ 𝑚×𝑛 be defined by 𝐴𝑖𝑗 = 3𝑖 + 5𝑗 and 𝐵𝑖𝑗 = 𝑖2 for all 𝑖, 𝑗. Then we
can calculate (𝐴 + 𝐵)𝑖𝑗 = 3𝑖 + 5𝑗 + 𝑖2 for all 𝑖, 𝑗.
Example 2.2.7. Solve the following matrix equation,
[ ] [ ] [ ] [ ]
𝑥 𝑦 −1 −2 0 0 𝑥−1 𝑦−2
0= + ⇒ =
𝑧 𝑤 −3 −4 0 0 𝑧−3 𝑤−4
The definition of matrix equality means this single matrix equation reduces to 4 scalar equations:
0 = 𝑥 − 1, 0 = 𝑦 − 2, 0 = 𝑧 − 3, 0 = 𝑤 − 4. The solution is 𝑥 = 1, 𝑦 = 2, 𝑧 = 3, 𝑤 = 4.
The definition of matrix multiplication ((𝐴𝐵)𝑖𝑗 = 𝑛𝑘=1 𝐴𝑖𝑘 𝐵𝑘𝑗 ) is very nice for general proofs, but

pragmatically I usually think of matrix multiplication in terms of dot-products. It turns out we can
view the matrix product as a collection of dot-products: suppose 𝐴 ∈ ℝ 𝑚×𝑛 and 𝐵 ∈ ℝ 𝑛×𝑝 then
⎡ ⎤
𝑟𝑜𝑤1 (𝐴) ⋅ 𝑐𝑜𝑙1 (𝐵) 𝑟𝑜𝑤1 (𝐴) ⋅ 𝑐𝑜𝑙2 (𝐵) ⋅ ⋅ ⋅ 𝑟𝑜𝑤1 (𝐴) ⋅ 𝑐𝑜𝑙𝑝 (𝐵)
⎢ 𝑟𝑜𝑤2 (𝐴) ⋅ 𝑐𝑜𝑙1 (𝐵) 𝑟𝑜𝑤2 (𝐴) ⋅ 𝑐𝑜𝑙2 (𝐵) ⋅ ⋅ ⋅ 𝑟𝑜𝑤2 (𝐴) ⋅ 𝑐𝑜𝑙𝑝 (𝐵) ⎥
𝐴𝐵 = ⎢
⎢ ⎥
.. .. .. ⎥
⎣ . . ⋅⋅⋅ . ⎦
𝑟𝑜𝑤𝑚 (𝐴) ⋅ 𝑐𝑜𝑙1 (𝐵) 𝑟𝑜𝑤𝑚 (𝐴) ⋅ 𝑐𝑜𝑙2 (𝐵) ⋅ ⋅ ⋅ 𝑟𝑜𝑤𝑚 (𝐴) ⋅ 𝑐𝑜𝑙𝑝 (𝐵)
Let me explain how this works. The formula above claims (𝐴𝐵)𝑖𝑗 = 𝑟𝑜𝑤𝑖 (𝐴) ⋅ 𝑐𝑜𝑙𝑗 (𝐵) for all 𝑖, 𝑗.
Recall that (𝑟𝑜𝑤𝑖 (𝐴))𝑘 = 𝐴𝑖𝑘 and (𝑐𝑜𝑙𝑗 (𝐵))𝑘 = 𝐵𝑘𝑗 thus
𝑛
∑ 𝑛

(𝐴𝐵)𝑖𝑗 = 𝐴𝑖𝑘 𝐵𝑘𝑗 = (𝑟𝑜𝑤𝑖 (𝐴))𝑘 (𝑐𝑜𝑙𝑗 (𝐵))𝑘
𝑘=1 𝑘=1

Hence, using definition of the dot-product, (𝐴𝐵)𝑖𝑗 = 𝑟𝑜𝑤𝑖 (𝐴) ⋅ 𝑐𝑜𝑙𝑗 (𝐵). This argument holds for
all 𝑖, 𝑗 therefore the dot-product formula for matrix multiplication is valid.
2.2. MATRIX CALCULATION 45

Example 2.2.8. The product of a 3 × 2 and 2 × 3 is a 3 × 3


⎡ ⎤ ⎡
[1, 0][4, 7]𝑇 [1, 0][5, 8]𝑇 [1, 0][6, 9]𝑇
⎡ ⎤ ⎤
1 0 [ ] 4 5 6
⎣ 0 1 ⎦ 4 5 6 =⎢ [0, 1][4, 7]𝑇 [0, 1][5, 8]𝑇 [0, 1][6, 9]𝑇 ⎦ = ⎣ 7 8 9 ⎦

7 8 9

0 0 [0, 0][4, 7]𝑇 [0, 0][5, 8]𝑇 [0, 0][6, 9]𝑇 0 0 0

Example 2.2.9. The product of a 3 × 1 and 1 × 3 is a 3×3


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 [ ] 4⋅1 5⋅1 6⋅1 4 5 6
⎣ 2 ⎦ 4 5 6 =⎣ 4⋅2 5⋅2 6 ⋅ 2 ⎦ = ⎣ 8 10 12 ⎦
3 4⋅3 5⋅3 6⋅3 12 15 18
Example 2.2.10. Let 𝐴 = [ 13 24 ] and 𝐵 = [ 57 68 ]. We calculate
[ ][ ]
1 2 5 6
𝐴𝐵 =
3 4 7 8
[ ]
[1, 2][5, 7]𝑇 [1, 2][6, 8]𝑇
=
[3, 4][5, 7]𝑇 [3, 4][6, 8]𝑇
[ ]
5 + 14 6 + 16
=
15 + 28 18 + 32
[ ]
19 22
=
43 50
Notice the product of square matrices is square. For numbers 𝑎, 𝑏 ∈ ℝ it we know the product of 𝑎
and 𝑏 is commutative (𝑎𝑏 = 𝑏𝑎). Let’s calculate the product of 𝐴 and 𝐵 in the opposite order,
[ ][ ]
5 6 1 2
𝐵𝐴 =
7 8 3 4
[ ]
[5, 6][1, 3]𝑇 [5, 6][2, 4]𝑇
=
[7, 8][1, 3]𝑇 [7, 8][2, 4]𝑇
[ ]
5 + 18 10 + 24
=
7 + 24 14 + 32
[ ]
23 34
=
31 46
Clearly 𝐴𝐵 ∕= 𝐵𝐴 thus matrix multiplication is noncommutative or nonabelian.
When we say that matrix multiplication is noncommuative that indicates that the product of two
matrices does not generally commute. However, there are special matrices which commute with
other matrices.
46 CHAPTER 2. LINEAR ALGEBRA

[𝑎 𝑏]
Example 2.2.11. Let 𝐼 = [ 10 01 ] and 𝐴 = 𝑐 𝑑 . We calculate

[ ][ ] [ ]
1 0 𝑎 𝑏 𝑎 𝑏
𝐼𝐴 = =
0 1 𝑐 𝑑 𝑐 𝑑

Likewise calculate,
[ ][ ] [ ]
𝑎 𝑏 1 0 𝑎 𝑏
𝐴𝐼 = =
𝑐 𝑑 0 1 𝑐 𝑑

Since the matrix 𝐴 was arbitrary we conclude that 𝐼𝐴 = 𝐴𝐼 for all 𝐴 ∈ ℝ2×2 .

Example 2.2.12. Consider 𝐴, 𝑣, 𝑤 from Example ??.

[ ] [ ] [ ]
5 6 11
𝑣+𝑤 = + =
7 8 15

Using the above we calculate,

[ ][ ] [ ] [ ]
1 2 11 11 + 30 41
𝐴(𝑣 + 𝑤) = = = .
3 4 15 33 + 60 93

In constrast, we can add 𝐴𝑣 and 𝐴𝑤,

[ ] [ ] [ ]
19 22 41
𝐴𝑣 + 𝐴𝑤 = + = .
43 50 93

Behold, 𝐴(𝑣 + 𝑤) = 𝐴𝑣 + 𝐴𝑤 for this example. It turns out this is true in general.

I collect all my favorite properties for matrix multiplication in the theorem below. To summarize,
matrix math works as you would expect with the exception that matrix multiplication is not
commutative. We must be careful about the order of letters in matrix expressions.

Theorem 2.2.13.
2.2. MATRIX CALCULATION 47

If 𝐴, 𝐵, 𝐶 ∈ ℝ 𝑚×𝑛 , 𝑋, 𝑌 ∈ ℝ 𝑛×𝑝 , 𝑍∈ℝ 𝑝×𝑞 and 𝑐1 , 𝑐2 ∈ ℝ then

1. (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶),

2. (𝐴𝑋)𝑍 = 𝐴(𝑋𝑍),

3. 𝐴 + 𝐵 = 𝐵 + 𝐴,

4. 𝑐1 (𝐴 + 𝐵) = 𝑐1 𝐴 + 𝑐2 𝐵,

5. (𝑐1 + 𝑐2 )𝐴 = 𝑐1 𝐴 + 𝑐2 𝐴,

6. (𝑐1 𝑐2 )𝐴 = 𝑐1 (𝑐2 𝐴),

7. (𝑐1 𝐴)𝑋 = 𝑐1 (𝐴𝑋) = 𝐴(𝑐1 𝑋) = (𝐴𝑋)𝑐1 ,

8. 1𝐴 = 𝐴,

9. 𝐼𝑚 𝐴 = 𝐴 = 𝐴𝐼𝑛 ,

10. 𝐴(𝑋 + 𝑌 ) = 𝐴𝑋 + 𝐴𝑌 ,

11. 𝐴(𝑐1 𝑋 + 𝑐2 𝑌 ) = 𝑐1 𝐴𝑋 + 𝑐2 𝐴𝑌 ,

12. (𝐴 + 𝐵)𝑋 = 𝐴𝑋 + 𝐵𝑋,

Proof: I will prove a couple of these primarily to give you a chance to test your understanding
of the notation. Nearly all of these properties are proved by breaking the statement down to
components then appealing to a property of real numbers. Just a reminder, we assume that it is
known that ℝ is an ordered field. Multiplication of real numbers is commutative, associative and
distributes across addition of real numbers. Likewise, addition of real numbers is commutative,
associative and obeys familar distributive laws when combined with addition.

Proof of (1.): assume 𝐴, 𝐵, 𝐶 are given as in the statement of the Theorem. Observe that

((𝐴 + 𝐵) + 𝐶)𝑖𝑗 = (𝐴 + 𝐵)𝑖𝑗 + 𝐶𝑖𝑗 defn. of matrix add.


= (𝐴𝑖𝑗 + 𝐵𝑖𝑗 ) + 𝐶𝑖𝑗 defn. of matrix add.
= 𝐴𝑖𝑗 + (𝐵𝑖𝑗 + 𝐶𝑖𝑗 ) assoc. of real numbers
= 𝐴𝑖𝑗 + (𝐵 + 𝐶)𝑖𝑗 defn. of matrix add.
= (𝐴 + (𝐵 + 𝐶))𝑖𝑗 defn. of matrix add.

for all 𝑖, 𝑗. Therefore (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶). □


Proof of (6.): assume 𝑐1 , 𝑐2 , 𝐴 are given as in the statement of the Theorem. Observe that

((𝑐1 𝑐2 )𝐴)𝑖𝑗 = (𝑐1 𝑐2 )𝐴𝑖𝑗 defn. scalar multiplication.


= 𝑐1 (𝑐2 𝐴𝑖𝑗 ) assoc. of real numbers
= (𝑐1 (𝑐2 𝐴))𝑖𝑗 defn. scalar multiplication.
48 CHAPTER 2. LINEAR ALGEBRA

for all 𝑖, 𝑗. Therefore (𝑐1 𝑐2 )𝐴 = 𝑐1 (𝑐2 𝐴). □


Proof of (10.): assume 𝐴, 𝑋, 𝑌 are given as in the statement of the Theorem. Observe that

((𝐴(𝑋 + 𝑌 ))𝑖𝑗 = ∑𝑘 𝐴𝑖𝑘 (𝑋 + 𝑌 )𝑘𝑗 defn. matrix multiplication,
= ∑𝑘 𝐴𝑖𝑘 (𝑋𝑘𝑗 + 𝑌𝑘𝑗 ) defn. matrix addition,
= ∑𝑘 (𝐴𝑖𝑘 𝑋𝑘𝑗 +∑ 𝐴𝑖𝑘 𝑌𝑘𝑗 ) dist. of real numbers,
= 𝑘 𝐴𝑖𝑘 𝑋𝑘𝑗 + 𝑘 𝐴𝑖𝑘 𝑌𝑘𝑗 ) prop. of finite sum,
= (𝐴𝑋)𝑖𝑗 + (𝐴𝑌 )𝑖𝑗 defn. matrix multiplication(× 2),
= (𝐴𝑋 + 𝐴𝑌 )𝑖𝑗 defn. matrix addition,

for all 𝑖, 𝑗. Therefore 𝐴(𝑋 + 𝑌 ) = 𝐴𝑋 + 𝐴𝑌 . □

The proofs of the other items are similar, we consider the 𝑖, 𝑗-th component of the identity and then
apply the definition of the appropriate matrix operation’s definition. This reduces the problem to
a statement about real numbers so we can use the properties of real numbers at the level of
components. Then we reverse the steps. Since the calculation works for arbitrary 𝑖, 𝑗 it follows the
the matrix equation holds true.

2.3 linear transformations


Definition 2.3.1.
Let 𝑉 and 𝑊 be vector spaces over ℝ. If a mapping 𝐿 : 𝑉 → 𝑊 satisfies

1. 𝐿(𝑥 + 𝑦) = 𝐿(𝑥) + 𝐿(𝑦) for all 𝑥, 𝑦 ∈ 𝑉 ; this is called additivity.

2. 𝐿(𝑐𝑥) = 𝑐𝐿(𝑥) for all 𝑥 ∈ 𝑉 and 𝑐 ∈ ℝ; this is called homogeneity.

then we say 𝐿 is a linear transformation. If 𝑉 = 𝑊 then we may say that 𝐿 is a linear


transformation on 𝑉 .

Proposition 2.3.2.

If 𝐴 ∈ ℝ 𝑚×𝑛 and 𝐿 : ℝ𝑛 → ℝ𝑚 is defined by 𝐿(𝑥) = 𝐴𝑥 for each 𝑥 ∈ ℝ𝑛 then 𝐿 is a linear


transformation.
Proof: Let 𝐴 ∈ ℝ 𝑚×𝑛 and define 𝐿 : ℝ𝑛 → ℝ𝑚 by 𝐿(𝑥) = 𝐴𝑥 for each 𝑥 ∈ ℝ𝑛 . Let 𝑥, 𝑦 ∈ ℝ𝑛 and
𝑐 ∈ ℝ,
𝐿(𝑥 + 𝑦) = 𝐴(𝑥 + 𝑦) = 𝐴𝑥 + 𝐴𝑦 = 𝐿(𝑥) + 𝐿(𝑦)
and
𝐿(𝑐𝑥) = 𝐴(𝑐𝑥) = 𝑐𝐴𝑥 = 𝑐𝐿(𝑥)
thus 𝐿 is a linear transformation. □

Obviously this gives us a nice way to construct examples. The following proposition is really at the
heart of all the geometry in this section.
2.3. LINEAR TRANSFORMATIONS 49

Proposition 2.3.3.

Let ℒ = {𝑝 + 𝑡𝑣 ∣ 𝑡 ∈ [0, 1], 𝑝, 𝑣 ∈ ℝ𝑛 with 𝑣 ∕= 0} define a line segment from 𝑝 to 𝑝 + 𝑣 in


ℝ𝑛 . If 𝑇 : ℝ𝑛 → ℝ𝑚 is a linear transformation then 𝑇 (ℒ) is a either a line-segment from
𝑇 (𝑝) to 𝑇 (𝑝 + 𝑣) or a point.
Proof: suppose 𝑇 and ℒ are as in the proposition. Let 𝑦 ∈ 𝑇 (ℒ) then by definition there exists
𝑥 ∈ ℒ such that 𝑇 (𝑥) = 𝑦. But this implies there exists 𝑡 ∈ [0, 1] such that 𝑥 = 𝑝 + 𝑡𝑣 so
𝑇 (𝑝 + 𝑡𝑣) = 𝑦. Notice that

𝑦 = 𝑇 (𝑝 + 𝑡𝑣) = 𝑇 (𝑝) + 𝑇 (𝑡𝑣) = 𝑇 (𝑝) + 𝑡𝑇 (𝑣).

which implies 𝑦 ∈ {𝑇 (𝑝) + 𝑠𝑇 (𝑣) ∣ 𝑠 ∈ [0, 1]} = ℒ2 . Therefore, 𝑇 (ℒ) ⊆ ℒ2 . Conversely, suppose
𝑧 ∈ ℒ2 then 𝑧 = 𝑇 (𝑝) + 𝑠𝑇 (𝑣) for some 𝑠 ∈ [0, 1] but this yields by linearity of 𝑇 that 𝑧 = 𝑇 (𝑝 + 𝑠𝑣)
hence 𝑧 ∈ 𝑇 (ℒ). Since we have that 𝑇 (ℒ) ⊆ ℒ2 and ℒ2 ⊆ 𝑇 (ℒ) it follows that 𝑇 (ℒ) = ℒ2 . Note
that ℒ2 is a line-segment provided that 𝑇 (𝑣) ∕= 0, however if 𝑇 (𝑣) = 0 then ℒ2 = {𝑇 (𝑝)} and the
proposition follows. □

2.3.1 a gallery of linear transformations


My choice of mapping the unit square has no particular signficance in the examples below. I
merely wanted to keep it simple and draw your eye to the distinction between the examples.
In each example we’ll map the four corners of the square to see where the transformation takes
the unit-square. Those corners are simply (0, 0), (1, 0), (1, 1), (0, 1) as we traverse the square in a
counter-clockwise direction.
[ ]
𝑘 0
Example 2.3.4. Let 𝐴 = for some 𝑘 > 0. Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular
0 𝑘
this means, [ ][ ] [ ]
𝑘 0 𝑥 𝑘𝑥
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
0 𝑘 𝑦 𝑘𝑦
We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (𝑘, 0), 𝐿(1, 1) = (𝑘, 𝑘), 𝐿(0, 1) = (0, 𝑘). This mapping is called
a dilation.
50 CHAPTER 2. LINEAR ALGEBRA
[ ]
−1 0
Example 2.3.5. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
0 −1
[ ][ ] [ ]
−1 0 𝑥 −𝑥
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
0 −1 𝑦 −𝑦
We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (−1, 0), 𝐿(1, 1) = (−1, −1), 𝐿(0, 1) = (0, −1). This mapping is
called an inversion.

[ ]
1 2
Example 2.3.6. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
3 4
[ ][ ] [ ]
1 2 𝑥 𝑥 + 2𝑦
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
3 4 𝑦 3𝑥 + 4𝑦
We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (1, 3), 𝐿(1, 1) = (3, 7), 𝐿(0, 1) = (2, 4). This mapping shall
remain nameless, it is doubtless a combination of the other named mappings.

[]
1 −1
√1
Example 2.3.7. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this
1 2 1
means, [ ][ ] [ ]
1 1 −1 𝑥 1 𝑥−𝑦
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = √ =√ .
2 1 1 𝑦 2 𝑥+𝑦
We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = √1 (1, 1), 𝐿(1, 1) = √1 (0, 2), 𝐿(0, 1) = √1 (−1, 1). This mapping
2 2 2
is a rotation by 𝜋/4 radians.
2.3. LINEAR TRANSFORMATIONS 51

[ ]
1 −1
Example 2.3.8. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
1 1
[ ][ ] [ ]
1 −1 𝑥 𝑥−𝑦
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
1 1 𝑦 𝑥+𝑦

We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (1,√1), 𝐿(1, 1) = (0, 2), 𝐿(0, 1) = (−1, 1). This mapping is a
rotation followed by a dilation by 𝑘 = 2.

[ ]
cos(𝜃) − sin(𝜃)
Example 2.3.9. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular
sin(𝜃) cos(𝜃)
this means,
[ ][ ] [ ]
cos(𝜃) − sin(𝜃) 𝑥 𝑥 cos(𝜃) − 𝑦 sin(𝜃)
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
sin(𝜃) cos(𝜃) 𝑦 𝑥 sin(𝜃) + 𝑦 cos(𝜃)

We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (cos(𝜃), sin(𝜃)), 𝐿(1, 1) = (cos(𝜃)−sin(𝜃), cos(𝜃)+sin(𝜃)) 𝐿(0, 1) =
(sin(𝜃), cos(𝜃)). This mapping is a rotation by 𝜃 in the counter-clockwise direction. Of course you
could have derived the matrix 𝐴 from the picture below.
52 CHAPTER 2. LINEAR ALGEBRA

[ ]
1 0
Example 2.3.10. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
0 1
[ ][ ] [ ]
1 0 𝑥 𝑥
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
0 1 𝑦 𝑦

We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (1, 0), 𝐿(1, 1) = (1, 1), 𝐿(0, 1) = (0, 1). This mapping is a
rotation by zero radians, or you could say it is a dilation by a factor of 1, ... usually we call this
the identity mapping because the image is identical to the preimage.
2.3. LINEAR TRANSFORMATIONS 53
[ ]
1 0
Example 2.3.11. Let 𝐴1 = . Define 𝑃1 (𝑣) = 𝐴1 𝑣 for all 𝑣 ∈ ℝ2 . In particular this
0 0
means,
[ ][ ] [ ]
1 0 𝑥 𝑥
𝑃1 (𝑥, 𝑦) = 𝐴1 (𝑥, 𝑦) = = .
0 0 𝑦 0

We find 𝑃1 (0, 0) = (0, 0), 𝑃1 (1, 0) = (1, 0), 𝑃1 (1, 1) = (1, 0), 𝑃1 (0, 1) = (0, 0). This mapping is a
projection
[ onto ]the first coordinate.
0 0
Let 𝐴2 = . Define 𝐿(𝑣) = 𝐴2 𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
0 1
[ ][ ] [ ]
0 0 𝑥 0
𝑃2 (𝑥, 𝑦) = 𝐴2 (𝑥, 𝑦) = = .
0 1 𝑦 𝑦

We find 𝑃2 (0, 0) = (0, 0), 𝑃2 (1, 0) = (0, 0), 𝑃2 (1, 1) = (0, 1), 𝑃2 (0, 1) = (0, 1). This mapping is
projection onto the second coordinate.
We can picture both of these mappings at once:

[ ]
1 1
Example 2.3.12. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
1 1
[ ][ ] [ ]
1 1 𝑥 𝑥+𝑦
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = = .
1 1 𝑦 𝑥+𝑦

We find 𝐿(0, 0) = (0, 0), 𝐿(1, 0) = (1, 1), 𝐿(1, 1) = (2, 2), 𝐿(0, 1) = (1, 1). This mapping is not a
projection, but it does collapse the square to a line-segment.
54 CHAPTER 2. LINEAR ALGEBRA

Remark 2.3.13.
The examples here have focused on linear transformations from ℝ2 to ℝ2 . It turns out that
higher dimensional mappings can largely be understood in terms of the geometric operations
we’ve seen in this section.
⎡ ⎤
0 0
Example 2.3.14. Let 𝐴 = ⎣ 1 0 ⎦. Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In particular this means,
0 1
⎡ ⎤ ⎡ ⎤
0 0 [ ] 0
𝑥
𝐿(𝑥, 𝑦) = 𝐴(𝑥, 𝑦) = ⎣ 1 0 ⎦ = ⎣ 𝑥 ⎦.
𝑦
0 1 𝑦
We find 𝐿(0, 0) = (0, 0, 0), 𝐿(1, 0) = (0, 1, 0), 𝐿(1, 1) = (0, 1, 1), 𝐿(0, 1) = (0, 0, 1). This mapping
moves the 𝑥𝑦-plane to the 𝑦𝑧-plane. In particular, the horizontal unit square gets mapped to vertical
unit square; 𝐿([0, 1] × [0, 1]) = {0} × [0, 1] × [0, 1]. This mapping certainly is not surjective because
no point with 𝑥 ∕= 0 is covered in the range.

[ ]
1 1 0
Example 2.3.15. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ3 . In particular this
1 1 1
means, ⎡ ⎤
[ ] 𝑥 [ ]
1 1 0 ⎣ ⎦ 𝑥+𝑦
𝐿(𝑥, 𝑦, 𝑧) = 𝐴(𝑥, 𝑦, 𝑧) = 𝑦 = .
1 1 1 𝑥+𝑦+𝑧
𝑧
2.3. LINEAR TRANSFORMATIONS 55

Let’s study how 𝐿 maps the unit cube. We have 23 = 8 corners on the unit cube,

𝐿(0, 0, 0) = (0, 0), 𝐿(1, 0, 0) = (1, 1), 𝐿(1, 1, 0) = (2, 2), 𝐿(0, 1, 0) = (1, 1)

𝐿(0, 0, 1) = (0, 1), 𝐿(1, 0, 1) = (1, 2), 𝐿(1, 1, 1) = (2, 3), 𝐿(0, 1, 1) = (1, 2).

This mapping squished the unit cube to a shape in the plane which contains the points (0, 0), (0, 1),
(1, 1), (1, 2), (2, 2), (2, 3). Face by face analysis of the mapping reveals the image is a parallelogram.
This mapping is certainly not injective since two different points get mapped to the same point. In
particular, I have color-coded the mapping of top and base faces as they map to line segments. The
vertical faces map to one of the two parallelograms that comprise the image.

I have used terms like ”vertical” or ”horizontal” in the standard manner we associate such terms
with three dimensional geometry. Visualization and terminology for higher-dimensional examples is
not as obvious. However, with a little imagination we can still draw pictures to capture important
aspects of mappings.
[ ]
1 0 0 0
Example 2.3.16. Let 𝐴 = . Define 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ4 . In particular this
1 0 0 0
means,
⎡ ⎤
[ ] 𝑥 [ ]
1 0 0 0 ⎢ 𝑦 ⎥
⎢ ⎥= 𝑥 .
𝐿(𝑥, 𝑦, 𝑧, 𝑡) = 𝐴(𝑥, 𝑦, 𝑧, 𝑡) =
1 0 0 0 ⎣ 𝑧 ⎦ 𝑥
𝑡

Let’s study how 𝐿 maps the unit hypercube [0, 1]4 ⊂ ℝ4 . We have 24 = 16 corners on the unit
hypercube, note 𝐿(1, 𝑎, 𝑏, 𝑐) = (1, 1) whereas 𝐿(0, 𝑎, 𝑏, 𝑐) = (0, 0) for all 𝑎, 𝑏, 𝑐 ∈ [0, 1]. Therefore,
the unit hypercube is squished to a line-segment from (0, 0) to (1, 1). This mapping is neither
surjective nor injective. In the picture below the vertical axis represents the 𝑦, 𝑧, 𝑡-directions.
56 CHAPTER 2. LINEAR ALGEBRA


Example 2.3.17. Suppose 𝑓 (𝑡, 𝑠) = ( 𝑡, 𝑠2 + 𝑡) note that 𝑓 (1, 1) = (1, 2) and 𝑓 (4, 4) = (2, 20).
Note that (4, 4) = 4(1, 1) thus we should see 𝑓 (4, 4) = 𝑓 (4(1, 1)) = 4𝑓 (1, 1) but that fails to be true
so 𝑓 is not a linear transformation.

Example 2.3.18. Let 𝐿(𝑥, 𝑦) = 𝑥2 + 𝑦 2 define a mapping from ℝ2 to ℝ. This is not a linear
transformation since

𝐿(𝑐(𝑥, 𝑦)) = 𝐿(𝑐𝑥, 𝑐𝑦) = (𝑐𝑥)2 + (𝑐𝑦)2 = 𝑐2 (𝑥2 + 𝑦 2 ) = 𝑐2 𝐿(𝑥, 𝑦).

We say 𝐿 is a nonlinear transformation.

Example 2.3.19. Suppose 𝐿 : ℝ → ℝ is defined by 𝐿(𝑥) = 𝑚𝑥 + 𝑏 for some constants 𝑚, 𝑏 ∈ ℝ.


Is this a linear transformation on ℝ? Observe:

𝐿(0) = 𝑚(0) + 𝑏 = 𝑏

thus 𝐿 is not a linear transformation if 𝑏 ∕= 0. On the other hand, if 𝑏 = 0 then 𝐿 is a linear


transformation.

A mapping on ℝ𝑛 which has the form 𝑇 (𝑥) = 𝑥 + 𝑏 is called a translation. If we have a mapping of
the form 𝐹 (𝑥) = 𝐴𝑥 + 𝑏 for some 𝐴 ∈ ℝ 𝑛×𝑛 and 𝑏 ∈ ℝ then we say 𝐹 is an affine tranformation
on ℝ𝑛 . Technically, in general, the line 𝑦 = 𝑚𝑥 + 𝑏 is the graph of an affine function on ℝ. I invite
the reader to prove that affine transformations also map line-segments to line-segments (or points).

2.3.2 standard matrices


Definition 2.3.20.

Let 𝐿 : ℝ𝑛 → ℝ𝑚 be a linear transformation, the matrix 𝐴 ∈ ℝ 𝑚×𝑛 such that 𝐿(𝑥) = 𝐴𝑥


for all 𝑥 ∈ ℝ𝑛 is called the standard matrix of 𝐿. We denote this by [𝐿] = 𝐴 or more
compactly, [𝐿𝐴 ] = 𝐴, we say that 𝐿𝐴 is the linear transformation induced by 𝐴. Moreover,
the components of the matrix 𝐴 are found from 𝐴𝑗𝑖 = (𝐿(𝑒𝑖 )))𝑗 .
2.3. LINEAR TRANSFORMATIONS 57

Example 2.3.21. Given that 𝐿([𝑥, 𝑦, 𝑧]𝑇 ) = [𝑥+2𝑦, 3𝑦+4𝑧, 5𝑥+6𝑧]𝑇 for [𝑥, 𝑦, 𝑧]𝑇 ∈ ℝ3 find the the
standard matrix of 𝐿. We wish to find a 3×3 matrix such that 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 = [𝑥, 𝑦, 𝑧]𝑇 ∈ ℝ3 .
Write 𝐿(𝑣) then collect terms with each coordinate in the domain,
⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥 𝑥 + 2𝑦 1 2 0
𝐿 ⎝⎣ 𝑦 ⎦⎠ = ⎣ 3𝑦 + 4𝑧 ⎦ = 𝑥 ⎣ 0 ⎦ + 𝑦 ⎣ 3 ⎦ + 𝑧 ⎣ 4 ⎦
𝑧 5𝑥 + 6𝑧 5 0 6

It’s not hard to see that,


⎛⎡ ⎤⎞ ⎡ ⎤⎡ ⎤ ⎡ ⎤
𝑥 1 2 0 𝑥 1 2 0
𝐿 ⎝⎣ 𝑦 ⎦⎠ = ⎣ 0 3 4 ⎦ ⎣ 𝑦 ⎦ ⇒ 𝐴 = [𝐿] = ⎣ 0 3 4 ⎦
𝑧 5 0 6 𝑧 5 0 6

Notice that the columns in 𝐴 are just as you’d expect from the proof of theorem ??. [𝐿] =
[𝐿(𝑒1 )∣𝐿(𝑒2 )∣𝐿(𝑒3 )]. In future examples I will exploit this observation to save writing.

Example 2.3.22. Suppose that 𝐿((𝑡, 𝑥, 𝑦, 𝑧)) = (𝑡 + 𝑥 + 𝑦 + 𝑧, 𝑧 − 𝑥, 0, 3𝑡 − 𝑧), find [𝐿].


⎡ ⎤
𝐿(𝑒1 ) = 𝐿((1, 0, 0, 0)) = (1, 0, 0, 3) 1 1 1 1
𝐿(𝑒2 ) = 𝐿((0, 1, 0, 0)) = (1, −1, 0, 0) ⎢ 0 −1 0 1 ⎥
⇒ [𝐿] = ⎢ ⎣ 0 0 0 0 ⎦.

𝐿(𝑒3 ) = 𝐿((0, 0, 1, 0)) = (1, 0, 0, 0)
𝐿(𝑒4 ) = 𝐿((0, 0, 0, 1)) = (1, 1, 0, −1) 3 0 0 −1

I invite the reader to check my answer here and see that 𝐿(𝑣) = [𝐿]𝑣 for all 𝑣 ∈ ℝ4 as claimed.

Proposition 2.3.23.

Suppose 𝑇 : ℝ𝑛 → ℝ𝑚 and 𝑆 : ℝ𝑛 → ℝ𝑚 are linear transformations then 𝑆 + 𝑇 and 𝑐𝑇 are


linear transformations and

(1.) [𝑇 + 𝑆] = [𝑇 ] + [𝑆], (2.) [𝑇 − 𝑆] = [𝑇 ] − [𝑆], (3.) [𝑐𝑇 ] = 𝑐[𝑇 ].

In words, the standard matrix of the sum, difference or scalar multiple of linear transfor-
mations is the sum, difference or scalar multiple of the standard matrices of the respsective
linear transformations.

Example 2.3.24. Suppose 𝑇 (𝑥, 𝑦) = (𝑥 + 𝑦, 𝑥 − 𝑦) and 𝑆(𝑥, 𝑦) = (2𝑥, 3𝑦). It’s easy to see that
[ ] [ ] [ ]
1 1 2 0 3 1
[𝑇 ] = and [𝑆] = ⇒ [𝑇 + 𝑆] = [𝑇 ] + [𝑆] =
1 −1 0 3 1 2
[ ][ ] [ ]
3 1 𝑥 3𝑥 + 𝑦
Therefore, (𝑇 + 𝑆)(𝑥, 𝑦) = = = (3𝑥 + 𝑦, 𝑥 + 2𝑦). Naturally this is the
1 2 𝑦 𝑥 + 2𝑦
same formula that we would obtain through direct addition of the formulas of 𝑇 and 𝑆.
58 CHAPTER 2. LINEAR ALGEBRA

Proposition 2.3.25.

𝐿1 : ℝ𝑚 → ℝ𝑛 and 𝐿2 : ℝ𝑛 → ℝ𝑝 are linear transformations then 𝐿2 ∘ 𝐿1 : ℝ𝑚 → ℝ𝑝 is a


linear transformation with matrix [𝐿2 ∘ 𝐿1 ] such that
𝑛

[𝐿2 ∘ 𝐿1 ]𝑖𝑗 = [𝐿2 ]𝑖𝑘 [𝐿1 ]𝑘𝑗
𝑘=1

for all 𝑖 = 1, 2, . . . 𝑝 and 𝑗 = 1, 2 . . . , 𝑚.

Example 2.3.26. Let 𝑇 : ℝ 2×1 →ℝ 2×1 be defined by

𝑇 ([𝑥, 𝑦]𝑇 ) = [𝑥 + 𝑦, 2𝑥 − 𝑦]𝑇

for all [𝑥, 𝑦]𝑇 ∈ ℝ 2×1 . Also let 𝑆 : ℝ 2×1 →ℝ 3×1 be defined by

𝑆([𝑥, 𝑦]𝑇 ) = [𝑥, 𝑥, 3𝑥 + 4𝑦]𝑇

for all [𝑥, 𝑦]𝑇 ∈ ℝ 2×1 . We calculate the composite as follows:

(𝑆 ∘ 𝑇 )([𝑥, 𝑦]𝑇 ) = 𝑆(𝑇 ([𝑥, 𝑦]𝑇 ))


= 𝑆([𝑥 + 𝑦, 2𝑥 − 𝑦]𝑇 )
= [𝑥 + 𝑦, 𝑥 + 𝑦, 3(𝑥 + 𝑦) + 4(2𝑥 − 𝑦)]𝑇
= [𝑥 + 𝑦, 𝑥 + 𝑦, 11𝑥 − 𝑦]𝑇

Notice we can write the formula above as a matrix multiplication,


⎡ ⎤ ⎡ ⎤
1 1 [ ] 1 1
𝑥
(𝑆 ∘ 𝑇 )([𝑥, 𝑦]𝑇 ) = ⎣ 1 1 ⎦ ⇒ [𝑆 ∘ 𝑇 ] = ⎣ 1 1 ⎦.
𝑦
11 −1 11 −1

Notice that the standard matrices of 𝑆 and 𝑇 are:


⎡ ⎤
1 0 [ ]
1 1
[𝑆] = ⎣ 1 0 ⎦ [𝑇 ] =
2 −1
3 4

It’s easy to see that [𝑆 ∘ 𝑇 ] = [𝑆][𝑇 ] (as we should expect since these are linear operators)

Notice that 𝑇 ∘ 𝑆 is not even defined since the dimensions of the codomain of 𝑆 do not match
the domain of 𝑇 . Likewise, the matrix product [𝑇 ][𝑆] is not defined since there is a dimension
mismatch; (2 × 2)(3 × 2) is not a well-defined product of matrices.
2.3. LINEAR TRANSFORMATIONS 59

2.3.3 coordinates and isomorphism


Let 𝑉 be a finite dimensional vector space with basis 𝛽 = {𝑣1 , 𝑣2 , . . . 𝑣𝑛 }. The coordinate map
Φ𝛽 : 𝑉 → ℝ𝑛 is defined by

Φ𝛽 (𝑥1 𝑣1 + 𝑥2 𝑣2 + ⋅ ⋅ ⋅ + 𝑥𝑛 𝑣𝑛 ) = 𝑥1 𝑒1 + 𝑥2 𝑒2 + ⋅ ⋅ ⋅ + 𝑥𝑛 𝑒𝑛

for all 𝑣 = 𝑥1 𝑣1 + 𝑥2 𝑣2 + ⋅ ⋅ ⋅ + 𝑥𝑛 𝑣𝑛 ∈ 𝑉 . Sometimes we have to adjust the numbering a bit for


double-indices. For example:
Example 2.3.27. Let Φ : ℝ 𝑚×𝑛 → ℝ𝑚𝑛 be defined by

Φ( 𝐴𝑖𝑗 𝐸𝑖𝑗 ) = (𝐴11 , . . . , 𝐴1𝑛 , 𝐴21 , . . . , 𝐴2𝑛 , . . . , 𝐴𝑚1 , . . . , 𝐴𝑚𝑛 )
𝑖,𝑗

This map simply takes the entries in the matrix and strings them out to a vector of length 𝑚𝑛.
Example 2.3.28. Let Ψ : ℂ → ℝ2 be defined by Ψ(𝑥 + 𝑖𝑦) = (𝑥, 𝑦). This is the coordinate map for
the basis {1, 𝑖}.
Matrix multiplication is for vectors in ℝ𝑛 . Direct matrix multiplication of an abstract vector makes
no sense (how would you multiply a polynomial and a matrix?), however, since we can use the
coordinate map to change the abstract vector to a vector in ℝ𝑛 . The diagram below illustrates the
idea for a linear transformation 𝑇 from an abstract vector space 𝑉 with basis 𝛽 to another abstract
vector space 𝑊 with basis 𝛽: ¯

𝑉
𝑇 / 𝑊

O
Φ−1
𝛽
Φ𝛽¯


ℝ𝑛 / ℝ𝑛
𝐿[𝑇 ] ¯
𝛽,𝛽

Let’s walk through the formula [𝑇 ]𝛽,𝛽¯ 𝑥 = Φ𝛽¯(𝑇 (Φ−1


𝛽 (𝑥))): we begin on the RHS with a column
vector 𝑥, then Φ−1
𝛽 lifts the column vector up to the abstract vector Φ−1
𝛽 (𝑥) in 𝑉 . Next we operate
−1
by 𝑇 which moves us over to the vector 𝑇 (Φ𝛽 (𝑥)) which is in 𝑊 . Finally the coordinate map Φ𝛽¯
pushes the abstract vector in 𝑊 back to a column vector Φ𝛽¯(𝑇 (Φ−1 𝛽 (𝑥))) which is in ℝ
𝑚×1 . The

same journey is accomplished by just multiplying 𝑥 by the 𝑚 × 𝑛 matrix [𝑇 ]𝛽,𝛽¯.


Example 2.3.29. Let 𝛽 = {1, 𝑥, 𝑥2 } be the basis for 𝑃2 and consider the derivative mapping
𝐷 : 𝑃2 → 𝑃2 . Find the matrix of 𝐷 assuming that 𝑃2 has coordinates with respect to 𝛽 on both
copies of 𝑃2 . Define and observe

Φ(𝑥𝑛 ) = 𝑒𝑛+1 whereas Φ−1 (𝑒𝑛 ) = 𝑥𝑛−1


60 CHAPTER 2. LINEAR ALGEBRA

for 𝑛 = 0, 1, 2. Recall 𝐷(𝑎𝑥2 + 𝑏𝑥 + 𝑐) = 2𝑎𝑥 + 𝑏𝑥.

𝑐𝑜𝑙1 ([𝐷]𝛽,𝛽 ) = Φ𝛽 (𝐷(Φ−1


𝛽 (𝑒1 ))) = Φ𝛽 (𝐷(1)) = Φ𝛽 (0) = 0
𝑐𝑜𝑙2 ([𝐷]𝛽,𝛽 ) = Φ𝛽 (𝐷(Φ−1
𝛽 (𝑒2 ))) = Φ𝛽 (𝐷(𝑥)) = Φ𝛽 (1) = 𝑒1
𝑐𝑜𝑙3 ([𝐷]𝛽,𝛽 ) = Φ𝛽 (𝐷(Φ−1 2
𝛽 (𝑒3 ))) = Φ𝛽 (𝐷(𝑥 )) = Φ𝛽 (2𝑥) = 2𝑒2

Therefore we find, ⎡ ⎤
0 1 0
[𝐷]𝛽,𝛽 = ⎣ 0 0 2 ⎦.
0 0 0
Calculate 𝐷3 . Is this surprising?
A one-one correspondence is a map which is 1-1 and onto. If we can find such a mapping between
two sets then it shows those sets have the same cardnality. Cardnality is a crude idea of size, it
turns out that all finite dimensional vector spaces over ℝ have the same cardnality. On the other
hand, not all vector spaces have the same dimension. Isomorphisms help us discern if two vector
spaces have the same dimension.
Definition 2.3.30.
Let 𝑉, 𝑊 be vector spaces then Φ : 𝑉 → 𝑊 is an isomorphism if it is a 1-1 and onto
mapping which is also a linear transformation. If there is an isomorphism between vector
spaces 𝑉 and 𝑊 then we say those vector spaces are isomorphic and we denote this by
𝑉 ≅ 𝑊.
Other authors sometimes denote isomorphism by equality. But, I’ll avoid that custom as I am
reserving = to denote set equality. Details of the first two examples below can be found in my
linear algebra notes.
Example 2.3.31. Let 𝑉 = ℝ3 and 𝑊 = 𝑃2 . Define a mapping Φ : 𝑃2 → ℝ3 by

Φ(𝑎𝑥2 + 𝑏𝑥 + 𝑐) = (𝑎, 𝑏, 𝑐)

for all 𝑎𝑥2 + 𝑏𝑥 + 𝑐 ∈ 𝑃2 . As vector spaces, ℝ3 and polynomials of upto quadratic order are the
same.
Example 2.3.32. Let 𝑆2 be the set of 2 × 2 symmetric matrices. Let Ψ : ℝ3 → 𝑆2 be defined by
[ ]
𝑥 𝑦
Ψ(𝑥, 𝑦, 𝑧) = .
𝑦 𝑧

Example 2.3.33. Let 𝐿(ℝ𝑛 , ℝ𝑚 ) denote the set of all linear transformations from ℝ𝑛 to ℝ𝑚 .
𝐿(ℝ𝑛 , ℝ𝑚 ) forms a vector space under function addition and scalar multiplication. There is a
natural isomorphism to 𝑚 × 𝑛 matrices. Define Ψ : 𝐿(ℝ𝑛 , ℝ𝑚 ) → ℝ 𝑚×𝑛 by Ψ(𝑇 ) = [𝑇 ] for all
linear transformations 𝑇 ∈ 𝐿(ℝ𝑛 , ℝ𝑚 ). In other words, linear transformations and matrices are
the same as vector spaces.
2.4. NORMED VECTOR SPACES 61

The quantification of ”same” is a large theme in modern mathematics. In fact, the term iso-
morphism as we use it here is more accurately phrased vector space isomorphism. The are other
kinds of isomorphisms which preserve other interesting stuctures like Group, Ring or Lie Algebra
isomorphism. But, I think we’ve said more than enough for this course.

2.4 normed vector spaces


Definition 2.4.1.
Suppose 𝑉 is a vector space. If ∣∣ ⋅ ∣∣ : 𝑉 × 𝑉 → ℝ is a function such that for all 𝑥, 𝑦 ∈ 𝑉
and 𝑐 ∈ ℝ:

1. ∣∣𝑐𝑥∣∣ = ∣𝑐∣ ∣∣𝑥∣∣

2. ∣∣𝑥 + 𝑦∣∣ ≤ ∣∣𝑥∣∣ + ∣∣𝑦∣∣ (triangle inequality)

3. ∣∣𝑥∣∣ ≥ 0

4. ∣∣𝑥∣∣ = 0 iff 𝑥 = 0

then we say (𝑉, ∣∣ ⋅ ∣∣) is a normed vector space. When there is no danger of ambiguity we
also say that 𝑉 is a normed vector space.
The norms below are basically thieved from the usual Euclidean norm on ℝ𝑛 .

Example 2.4.2. ℝ𝑛 can be given the Euclidean norm which is defined by ∣∣𝑥∣∣ = 𝑥 ⋅ 𝑥 for each
𝑥 ∈ ℝ𝑛 .

Example 2.4.3. ℝ𝑛 can also be given the 1-norm which is defined by ∣∣𝑥∣∣1 = ∣𝑥1 ∣ + ∣𝑥2 ∣ + ⋅ ⋅ ⋅ + ∣𝑥𝑛 ∣
for each 𝑥 ∈ ℝ𝑛 .

We use the Euclidean norm by default.

Example 2.4.4.
√ Consider ℂ as a two dimensional real vector space. Let 𝑎 + 𝑖𝑏 ∈ ℂ and define
∣∣𝑎 + 𝑖𝑏∣∣ = 𝑎 + 𝑏2 . This is a norm for ℂ.
2

Example 2.4.5. Let 𝐴 ∈ ℝ 𝑚×𝑛 .For each 𝐴 = [𝐴𝑖𝑗 ] we define


v
u𝑚 ∑
𝑛
√ u∑
∣∣𝐴∣∣ = 𝐴211 + 𝐴212 + ⋅ ⋅ ⋅ + 𝐴2𝑚𝑛 = ⎷ 𝐴2𝑖𝑗 .
𝑖=1 𝑗=1

This is the Frobenius norm for matrices.

Each of the norms above allows us to define a distance function and hence open sets and limits for
functions. An open ball in (𝑉, ∣∣ ⋅ ∣∣𝑉 ) is defined

𝐵𝜖 (𝑥𝑜 ) = {𝑦 ∈ 𝑉 ∣ ∣∣𝑦 − 𝑥𝑜 ∣∣𝑉 < 𝜖}.


62 CHAPTER 2. LINEAR ALGEBRA

We define the deleted open ball by removing the center from the open ball 𝐵𝜖 (𝑥𝑜 )𝑜 = 𝐵𝜖 (𝑥𝑜 )−{𝑥𝑜 } =
{𝑦 ∈ 𝑉 ∣ 0 < ∣∣𝑦 −𝑥𝑜 ∣∣𝑉 < 𝜖}. We say 𝑥𝑜 is a limit point of a function 𝑓 iff there exists a deleted open
ball which is contained in the 𝑑𝑜𝑚(𝑓 ). We say 𝑈 ⊆ 𝑉 is an open set iff for each 𝑢 ∈ 𝑈 there exists
an open ball 𝐵𝜖 (𝑢) ⊆ 𝑈 . Limits are also defined in the same way as in ℝ𝑛 , if 𝑓 : 𝑉 → 𝑊 is a func-
tion from normed space (𝑉, ∣∣ ⋅ ∣∣𝑉 ) to normed vector space (𝑊, ∣∣ ⋅ ∣∣𝑊 ) then we say lim𝑥→𝑥𝑜 𝑓 (𝑥) = 𝐿
iff for each 𝜖 > 0 there exists 𝛿 > 0 such that for all 𝑥 ∈ 𝑉 subject to 0 < ∣∣𝑥 − 𝑥𝑜 ∣∣𝑉 < 𝛿 it fol-
lows ∣∣𝑓 (𝑥)−𝑓 (𝑥𝑜 )∣∣𝑊 < 𝜖. If lim𝑥→𝑥𝑜 𝑓 (𝑥) = 𝑓 (𝑥𝑜 ) then we say that 𝑓 is a continuous function at 𝑥𝑜 .

Let (𝑉, ∣∣ ⋅ ∣∣𝑉 ) be a normed vector space, a function from ℕ to 𝑉 is a called a sequence. Suppose
{𝑎𝑛 } is a sequence then we say lim𝑛→∞ 𝑎𝑛 = 𝐿 ∈ 𝑉 iff for each 𝜖 > 0 there exists 𝑀 ∈ ℕ such that
∣∣𝑎𝑛 − 𝐿∣∣𝑉 < 𝜖 for all 𝑛 ∈ ℕ with 𝑛 > 𝑀 . If lim𝑛→∞ 𝑎𝑛 = 𝐿 ∈ 𝑉 then we say {𝑎𝑛 } is a convergent
sequence. We say {𝑎𝑛 } is a Cauchy sequence iff for each 𝜖 > 0 there exists 𝑀 ∈ ℕ such that
∣∣𝑎𝑚 − 𝑎𝑛 ∣∣𝑉 < 𝜖 for all 𝑚, 𝑛 ∈ ℕ with 𝑚, 𝑛 > 𝑀 . In other words, a sequence is Cauchy if the
terms in the sequence get arbitarily close as we go sufficiently far out in the list. Many concepts
we cover in calculus II are made clear with proofs built around the concept of a Cauchy sequence.
The interesting thing about Cauchy is that for some spaces of numbers we can have a sequence
which converges but is not Cauchy. For example, if you think about the rational numbers ℚ we
can construct a sequence of truncated decimal expansions of 𝜋:

{𝑎𝑛 } = {3, 3.1, 3.14, 3.141, 3.1415 . . . }

note that 𝑎𝑛 ∈ ℚ for all 𝑛 ∈ ℕ and yet the 𝑎𝑛 → 𝜋 ∈


/ ℚ. When spaces are missing their limit points
they are in some sense incomplete. For this reason we say a metric space which contains all its
limit points is known as a complete space. Moreover, a normed vector space which is complete
is known as a Banach space. Fortunately all the main examples of this course are built on the
real numbers which are complete, this induces completeness for ℂ, ℝ𝑛 and ℝ 𝑚×𝑛 . I may guide you
through the proof that ℝ, ℂ, ℝ𝑛 and ℝ 𝑚×𝑛 are Banach spaces in a homework. When you take
real analysis you’ll spend some time thinking through the Cauchy concept.

Proposition 1.4.23 was given for the specific case of functions whose range is in ℝ. We might be able
to mimick the proof of that proposition for the case of normed spaces. We do have a composition
of limits theorem and I bet the sum function is continuous on a normed space. Moreover, if the
range happens to be a Banach algebra6 then I would wager the product function is continuous.
Put these together and we get the normed vector space version of Prop. 1.4.23. That said, a direct
proof works nicely here so I’ll just forego the more clever route here.

Proposition 2.4.6.

6
if 𝑊 is a Banach space that also has a product 𝑚 : 𝑊 × 𝑊 → 𝑊 such that ∣∣𝑤1 𝑤2 ∣∣ ≤ ∣∣𝑤1 ∣∣∣∣𝑤2 ∣∣ then 𝑊 is a
Banach algebra.
2.4. NORMED VECTOR SPACES 63

Let 𝑉, 𝑊 be normed vector spaces. Let 𝑎 be a limit point of mappings 𝑓, 𝑔 : 𝑈 ⊆ 𝑉 → 𝑊


and suppose 𝑐 ∈ ℝ. If lim𝑥→𝑎 𝑓 (𝑥) = 𝑏1 ∈ 𝑊 and lim𝑥→𝑎 𝑔(𝑥) = 𝑏2 ∈ 𝑊 then

1. lim𝑥→𝑎 (𝑓 (𝑥) + 𝑔(𝑥)) = lim𝑥→𝑎 𝑓 (𝑥) + lim𝑥→𝑎 𝑔(𝑥).

2. lim𝑥→𝑎 (𝑐𝑓 (𝑥)) = 𝑐 lim𝑥→𝑎 𝑓 (𝑥).

Moreover, if 𝑓, 𝑔 are continuous then 𝑓 + 𝑔 and 𝑐𝑓 are continuous.


Proof: Let 𝜖 > 0 and suppose lim𝑥→𝑎 𝑓 (𝑥) = 𝑏1 ∈ 𝑊 and lim𝑥→𝑎 𝑔(𝑥) = 𝑏2 ∈ 𝑊 . Choose 𝛿1 , 𝛿2 > 0
such that 0 < ∣∣𝑥−𝑎∣∣ < 𝛿1 implies ∣∣𝑓 (𝑥)−𝑏1 ∣∣ < 𝜖/2 and 0 < ∣∣𝑥−𝑎∣∣ < 𝛿2 implies ∣∣𝑔(𝑥)−𝑏2 ∣∣ < 𝜖/2.
Choose 𝛿 = 𝑚𝑖𝑛(𝛿1 , 𝛿2 ) and suppose 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 ≤ 𝛿1 , 𝛿2 hence

∣∣(𝑓 + 𝑔)(𝑥) − (𝑏1 + 𝑏2 )∣∣ = ∣∣𝑓 (𝑥) − 𝑏1 + 𝑔(𝑥) − 𝑏2 ∣∣ ≤ ∣∣𝑓 (𝑥) − 𝑏1 ∣∣ + ∣∣𝑔(𝑥) − 𝑏2 ∣∣ < 𝜖/2 + 𝜖/2 = 𝜖.

Item (2.) follows. To prove (2.) note that if 𝑐 = 0 the result is clearly true so suppose 𝑐 ∕= 0.
Suppose 𝜖 > 0 and choose 𝛿 > 0 such that ∣∣𝑓 (𝑥) − 𝑏1 ∣∣ < 𝜖/∣𝑐∣. Note that if 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 then

∣∣(𝑐𝑓 )(𝑥) − 𝑐𝑏1 ∣∣ = ∣∣𝑐(𝑓 (𝑥) − 𝑏1 )∣∣ = ∣𝑐∣∣∣𝑓 (𝑥) − 𝑏1 ∣∣ < ∣𝑐∣𝜖/∣𝑐∣ = 𝜖.

The claims about continuity follow immediately from the limit properties and that completes the
proof □.

Perhaps you recognize these arguments from calculus I. The logic used to prove the basic limit
theorems on ℝ is essentially identical.

Proposition 2.4.7.

Suppose 𝑉1 , 𝑉2 , 𝑉3 are normed vector spaces with norms ∣∣ ⋅ ∣∣1 , ∣∣ ⋅ ∣∣2 , ∣∣ ⋅ ∣∣3 respective. Let
𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ 𝑉2 → 𝑉3 and 𝑔 : 𝑑𝑜𝑚(𝑔) ⊆ 𝑉1 → 𝑉2 be mappings. Suppose that
lim𝑥→𝑥𝑜 𝑔(𝑥) = 𝑦𝑜 and suppose that 𝑓 is continuous at 𝑦𝑜 then
( )
lim (𝑓 ∘ 𝑔)(𝑥) = 𝑓 lim 𝑔(𝑥) .
𝑥→𝑥𝑜 𝑥→𝑥𝑜

Proof: Let 𝜖 > 0 and choose 𝛽 > 0 such that 0 < ∣∣𝑦 − 𝑏∣∣2 < 𝛽 implies ∣∣𝑓 (𝑦) − 𝑓 (𝑦𝑜 )∣∣3 < 𝜖. We
can choose such a 𝛽 since Since 𝑓 is continuous at 𝑦𝑜 thus it follows that lim𝑦→𝑦𝑜 𝑓 (𝑦) = 𝑓 (𝑦𝑜 ).
Next choose 𝛿 > 0 such that 0 < ∣∣𝑥 − 𝑥𝑜 ∣∣1 < 𝛿 implies ∣∣𝑔(𝑥) − 𝑦𝑜 ∣∣2 < 𝛽. We can choose such
a 𝛿 because we are given that lim𝑥→𝑥𝑜 𝑔(𝑥) = 𝑦𝑜 . Suppose 0 < ∣∣𝑥 − 𝑥𝑜 ∣∣1 < 𝛿 and let 𝑦 = 𝑔(𝑥)
note ∣∣𝑔(𝑥) − 𝑦𝑜 ∣∣2 < 𝛽 yields ∣∣𝑦 − 𝑦𝑜 ∣∣2 < 𝛽 and consequently ∣∣𝑓 (𝑦) − 𝑓 (𝑦𝑜 )∣∣3 < 𝜖. Therefore, 0 <
∣∣𝑥−𝑥𝑜 ∣∣1 < 𝛿 implies ∣∣𝑓 (𝑔(𝑥))−𝑓 (𝑦𝑜 )∣∣3 < 𝜖. It follows that lim𝑥→𝑥𝑜 (𝑓 (𝑔(𝑥)) = 𝑓 (lim𝑥→𝑥𝑜 𝑔(𝑥)). □

The squeeze theorem relies heavily on the order properties of ℝ. Generally a normed vector space
has no natural ordering. For example, is 1 > 𝑖 or is 1 < 𝑖 in ℂ ? That said, we can state a squeeze
theorem for functions whose domain reside in a normed vector space. This is a generalization of
64 CHAPTER 2. LINEAR ALGEBRA

what we learned in calculus I. That said, the proof offered below is very similar to the typical proof
which is not given in calculus I7

Proposition 2.4.8. squeeze theorem.

Suppose 𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ 𝑉 → ℝ, 𝑔 : 𝑑𝑜𝑚(𝑔) ⊆ 𝑉 → ℝ, ℎ : 𝑑𝑜𝑚(ℎ) ⊆ 𝑉 → ℝ where 𝑉 is a


normed vector space with norm ∣∣ ⋅ ∣∣. Let 𝑓 (𝑥) ≤ 𝑔(𝑥) ≤ ℎ(𝑥) for all 𝑥 on some 𝛿 > 0 ball
of8 𝑎 ∈ 𝑉 then we find that the limits at 𝑥𝑜 follow the same ordering,

lim 𝑓 (𝑥) ≤ lim 𝑔(𝑥) ≤ lim ℎ(𝑥).


𝑥→𝑎 𝑥→𝑎 𝑥→𝑎

Moreover, if lim𝑥→𝑥𝑜 𝑓 (𝑥) = lim𝑥→𝑥𝑜 ℎ(𝑥) = 𝐿 ∈ ℝ then lim𝑥→𝑥𝑜 𝑓 (𝑥) = 𝐿.


Proof: Suppose 𝑓 (𝑥) ≤ 𝑔(𝑥) for all 𝑥 ∈ 𝐵𝛿1 (𝑎)𝑜 for some 𝛿1 > 0 and also suppose lim𝑥→𝑎 𝑓 (𝑥) =
𝐿𝑓 ∈ ℝ and lim𝑥→𝑎 𝑔(𝑥) = 𝐿𝑔 ∈ ℝ. We wish to prove that 𝐿𝑓 ≤ 𝐿𝑔 . Suppose otherwise towards a
contradiction. That is, suppose 𝐿𝑓 > 𝐿𝑔 . Note that lim𝑥→𝑎 [𝑔(𝑥) − 𝑓 (𝑥)] = 𝐿𝑔 − 𝐿𝑓 by the linearity
of the limit. It follows that for 𝜖 = 21 (𝐿𝑓 − 𝐿𝑔 ) > 0 there exists 𝛿2 > 0 such that 𝑥 ∈ 𝐵𝛿2 (𝑎)𝑜 implies
∣𝑔(𝑥) − 𝑓 (𝑥) − (𝐿𝑔 − 𝐿𝑓 )∣ < 𝜖 = 12 (𝐿𝑓 − 𝐿𝑔 ). Expanding this inequality we have

1 1
− (𝐿𝑓 − 𝐿𝑔 ) < 𝑔(𝑥) − 𝑓 (𝑥) − (𝐿𝑔 − 𝐿𝑓 ) < (𝐿𝑓 − 𝐿𝑔 )
2 2
adding 𝐿𝑔 − 𝐿𝑓 yields,

3 1
− (𝐿𝑓 − 𝐿𝑔 ) < 𝑔(𝑥) − 𝑓 (𝑥) < − (𝐿𝑓 − 𝐿𝑔 ) < 0.
2 2
Thus, 𝑓 (𝑥) > 𝑔(𝑥) for all 𝑥 ∈ 𝐵𝛿2 (𝑎)𝑜 . But, 𝑓 (𝑥) ≤ 𝑔(𝑥) for all 𝑥 ∈ 𝐵𝛿1 (𝑎)𝑜 so we find a contradic-
tion for each 𝑥 ∈ 𝐵𝛿 (𝑎) where 𝛿 = 𝑚𝑖𝑛(𝛿1 , 𝛿2 ). Hence 𝐿𝑓 ≤ 𝐿𝑔 . The same proof can be applied to
𝑔 and ℎ thus the first part of the theorem follows.

Next, we suppose that lim𝑥→𝑎 𝑓 (𝑥) = lim𝑥→𝑎 ℎ(𝑥) = 𝐿 ∈ ℝ and 𝑓 (𝑥) ≤ 𝑔(𝑥) ≤ ℎ(𝑥) for all
𝑥 ∈ 𝐵𝛿1 (𝑎) for some 𝛿1 > 0. We seek to show that lim𝑥→𝑎 𝑓 (𝑥) = 𝐿. Let 𝜖 > 0 and choose 𝛿2 > 0
such that ∣𝑓 (𝑥) − 𝐿∣ < 𝜖 and ∣ℎ(𝑥) − 𝐿∣ < 𝜖 for all 𝑥 ∈ 𝐵𝛿 (𝑎)𝑜 . We are free to choose such a
𝛿2 > 0 because the limits of 𝑓 and ℎ are given at 𝑥 = 𝑎. Choose 𝛿 = 𝑚𝑖𝑛(𝛿1 , 𝛿2 ) and note that if
𝑥 ∈ 𝐵𝛿 (𝑎)𝑜 then
𝑓 (𝑥) ≤ 𝑔(𝑥) ≤ ℎ(𝑥)
hence,
𝑓 (𝑥) − 𝐿 ≤ 𝑔(𝑥) − 𝐿 ≤ ℎ(𝑥) − 𝐿
but ∣𝑓 (𝑥) − 𝐿∣ < 𝜖 and ∣ℎ(𝑥) − 𝐿∣ < 𝜖 imply −𝜖 < 𝑓 (𝑥) − 𝐿 and ℎ(𝑥) − 𝐿 < 𝜖 thus

−𝜖 < 𝑓 (𝑥) − 𝐿 ≤ 𝑔(𝑥) − 𝐿 ≤ ℎ(𝑥) − 𝐿 < 𝜖.


7
this is lifted word for word from my calculus I notes, however here the meaning of open ball is considerably more
general and the linearity of the limit which is referenced is the one proven earlier in this section
2.4. NORMED VECTOR SPACES 65

Therefore, for each 𝜖 > 0 there exists 𝛿 > 0 such that 𝑥 ∈ 𝐵𝛿 (𝑎)𝑜 implies ∣𝑔(𝑥) − 𝐿∣ < 𝜖 so
lim𝑥→𝑎 𝑔(𝑥) = 𝐿. □

Our typical use of the theorem above applies to equations of norms from a normed vector space.
The norm takes us from 𝑉 to ℝ so the theorem above is essential to analyze interesting limits. We
shall make use of it in the next chapter.
Proposition 2.4.9. norm is continuous with respect to itself.
Suppose 𝑉 has norm ∣∣ ⋅ ∣∣ then 𝑓 : 𝑉 → ℝ defined by 𝑓 (𝑥) = ∣∣𝑥∣∣ defines a continuous
function on 𝑉 .
Proof: Suppose 𝑥𝑜 ∈ 𝑉 and ∣∣𝑥𝑜 ∣∣ = 𝑙𝑜 . We wish to show that for each 𝜖 > 0 there exists a
𝛿 > 0 such that 0 < ∣∣𝑥 − 𝑥𝑜 ∣∣ < 𝛿 implies ∣∣∣𝑥∣∣ − 𝑙𝑜 ∣ < 𝜖. Let 𝜖 > 0 and choose 𝛿 = 𝜖 then
0 < ∣∣𝑥 − 𝑥𝑜 ∣∣ < 𝛿... stuck.XXX □.
Finally, we should like to find a vector of the limits is the limit of the vector proposition for the
context of normed spaces. It is generally true for normed vector spaces that if we know the limits
of all the component functions converge to particular limits then the limit of a vector function is
simply the vector of those limits. The converse is not so simple because the basis expansion for a
normed vector space could fail to follow the pattern we expect from our study of ℝ𝑛 .

Let ℝ2 have basis 𝛽 = {ℰ1 , ℰ2 } = {𝑒1 , −3𝑒1 + 𝑒2 } note the vector 𝑣 = 3ℰ1 + ℰ2 = 3𝑒1 + (−3𝑒1 + 𝑒2 ) =
𝑒2 . With respect to the 𝛽 basis we find 𝑣1 = 3 and 𝑣2 = 1. The concept of length is muddled
√ in these

coordinates. If we tried (incorrectly) to use the pythagorean theorem we’d find ∣∣𝑣∣∣ = 9 + 1 = 10
and yet the length of the vector is clearly just 1 since 𝑣 = 𝑒2 = (0, 1). The trouble with 𝛽 is that
it has different basis elements which overlap. To keep clear the euclidean idea of distance we must
insist on the use of an orthonormal basis.

I’d rather not explain what that means at this point. Sufficient to say that if 𝛽 is an orthonormal
basis then the coordinates preserve essentially the euclidean idea of vector length. In particular,
we can expect that if
∑𝑚 ∑𝑚
𝑣= 𝑣𝑗 𝑓𝑗 then ∣∣𝑣∣∣2 = 𝑣𝑗2 .
𝑗=1 𝑗=1

Proposition 2.4.10.
Let 𝑉, 𝑊 be normed vector spaces and suppose 𝑊 has basis 𝛽 = {𝑤𝑗 }𝑚 𝑗=1 such that when
∑𝑚 2
∑𝑚 2
𝑣 = 𝑗=1 𝑣𝑗 𝑤𝑗 then ∣∣𝑣∣∣ = 𝑗=1 𝑣𝑗 . Suppose that 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ 𝑉 → ∑ 𝑊 is a mapping
with component functions 𝐹1 , 𝐹2 , . . . , 𝐹𝑚 with respect to the 𝛽 basis (𝐹 = 𝑚 𝑗=1 𝐹𝑗 𝑤𝑗 ). Let
𝑎 ∈ 𝑉 be a limit point of 𝐹 then
𝑚

lim 𝐹 (𝑥) = 𝐵 = 𝐵 𝑗 𝑤𝑗 ⇔ lim 𝐹𝑗 (𝑥) = 𝐵𝑗 for all 𝑗 = 1, 2, . . . 𝑚.
𝑥→𝑎 𝑥→𝑎
𝑗=1
66 CHAPTER 2. LINEAR ALGEBRA

∑𝑚
Proof: Suppose lim𝑥→𝑎 𝐹 (𝑥) = 𝐵 = 𝑗=1 𝐵𝑗 𝑤𝑗 . Since we assumed the basis of 𝑊 was orthonor-
mal we have that:
𝑚

∣𝐹𝑘 (𝑥) − 𝐵𝑘 ∣2 ≤ ∣𝐹𝑗 (𝑥) − 𝐵𝑗 ∣2 = ∣∣𝐹 (𝑥) − 𝐵∣∣2
𝑗=1

where in the first equality I simply added nonzero terms. With the inequality above in mind,
let 𝜖 > 0 and choose 𝛿 > 0 such that 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿 implies ∣∣𝐹 (𝑥) − 𝐵∣∣ < 𝜖. It follows that
∣𝐹𝑘 (𝑥)−𝐵𝑘 ∣2 < 𝜖2 and hence ∣𝐹𝑘 (𝑥)−𝐵𝑘 ∣ < 𝜖. The index 𝑘 is arbitrary therefore, lim𝑥→𝑎 𝐹𝑘 (𝑥) = 𝐵𝑘
for all 𝑘 ∈ ℕ 𝑚 .

Conversely suppose lim𝑥→𝑎 𝐹𝑘 (𝑥) = 𝐵𝑘 for all 𝑘 ∈ ℕ 𝑚 . Let 𝑀 = 𝑚𝑎𝑥{∣∣𝑤1 ∣∣, ∣∣𝑤2 ∣∣, . . . , ∣∣𝑤𝑚 ∣∣}.
Let 𝜖 > 0 and choose, by virtue of the given limits for the component functions, 𝛿𝑘 > 0 such
𝜖
that 0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿𝑘 implies ∣𝐹𝑘 (𝑥) − 𝐵𝑘 ∣ < 𝑚𝑀 . Choose 𝛿 = 𝑚𝑖𝑛{𝛿1 , 𝛿2 , . . . , 𝛿𝑚 } and suppose
0 < ∣∣𝑥 − 𝑎∣∣ < 𝛿. Consider
𝑚
∑ 𝑚
∑ 𝑚

∣∣𝐹 (𝑥) − 𝐵∣∣ = ∣∣ (𝐹𝑗 (𝑥) − 𝐵𝑗 )𝑤𝑗 ∣∣ ≤ ∣∣(𝐹𝑗 (𝑥) − 𝐵𝑗 )𝑤𝑗 ∣∣ = ∣𝐹𝑗 (𝑥) − 𝐵𝑗 ∣∣∣𝑤𝑗 ∣∣
𝑗=1 𝑗=1 𝑗=1

However, ∣∣𝑤𝑗 ∣∣ ≤ 𝑀 for all 𝑗 = 1, 2, . . . 𝑚 hence


𝑚 𝑚 𝑚
∑ ∑ 𝜖 ∑ 𝜖
∣∣𝐹 (𝑥) − 𝐵∣∣ ≤ 𝑀 ∣𝐹𝑗 (𝑥) − 𝐵𝑗 ∣ < 𝑀 = = 𝜖.
𝑚𝑀 𝑚
𝑗=1 𝑗=1 𝑗=1

Therefore, lim𝑥→𝑎 𝐹 (𝑥) = 𝐵. □

I leave the case of non-orthonormal bases to the reader. In all the cases we consider it is possible
and natural to choose orthogonal bases to describe the vector space. I’ll avoid the temptation to
do more here (there is more).9

9
add a couple references for further reading here XXX
Chapter 3

differentiation

Our goal in this chapter is to describe differentiation for functions to and from normed linear spaces.
It turns out this is actually quite simple given the background of the preceding chapter. The dif-
ferential at a point is a linear transformation which best approximates the change in a function at
a particular point. We can quantify ”best” by a limiting process which is naturally defined in view
of the fact there is a norm on the spaces we consider.

The most important example is of course the case 𝑓 : ℝ𝑛 → ℝ𝑚 . In this context it is natural to write
the differential as a matrix multiplication. The matrix of the differential is what Ewards calls the
derivative. Partial derivatives are also defined in terms of directional derivatives. The directional
derivative is sometimes defined where the differential fails to exist. We will discuss how the criteria
of continuous differentiability allows us to build the differential from the directional derivatives.
We’ll see how the Cauchy-Riemann equations of complex analysis are really just an algebraic result
if we already have the theorem for continuously differentiability. We will see how this general con-
cept of differentiation recovers all the derivatives you’ve seen previously in calculus and much more.

On the other hand, I postpone implicit differentiation for a future chapter where we have the
existence theorems for implicit and inverse functions. I also postpone discussion of the geometry of
the differential. In short, existence of the differential and the tangent space are essentially two sides
of the same problem. In fact, the approach of this chapter is radically different than my first set of
notes on advanced calculus. Last year I followed Edwards a bit more and built up to the definition
of the differential on the basis of the directional derivative and geometry. I don’t think students
appreciate geometry or directional differentiation well enough to make that approach successful.
Consquently, I begin with the unjustified definition of the derivative and then spend the rest of the
chapter working out precise implicationa and examples that flow fromt he defintition. I essentially
ignore the question of motivating the defintiion we begin with. If you want motivation, think
backward with this chapter or prehaps read Edwards or my old notes.

67
68 CHAPTER 3. DIFFERENTIATION

3.1 the differential


The definition1 below says that △𝐹 = 𝐹 (𝑎 + ℎ) − 𝐹 (𝑎) ≈ 𝑑𝐹𝑎 (ℎ) when ℎ is close to zero.

Definition 3.1.1.

Let (𝑉, ∣∣ ⋅ ∣∣𝑉 ) and (𝑊, ∣∣ ⋅ ∣∣𝑊 ) be normed vector spaces. Suppose that 𝑈 is open and
𝐹 : 𝑈 ⊆ 𝑉 → 𝑊 is a function the we say that 𝐹 is differentiable at 𝑎 ∈ 𝑈 iff there exists
a linear mapping 𝐿 : 𝑉 → 𝑊 such that
[ ]
𝐹 (𝑎 + ℎ) − 𝐹 (𝑎) − 𝐿(ℎ)
lim = 0.
ℎ→0 ∣∣ℎ∣∣𝑉

In such a case we call the linear mapping 𝐿 the differential at 𝑎 and we denote 𝐿 = 𝑑𝐹𝑎 .
In the case 𝑉 = ℝ𝑚 and 𝑊 = ℝ𝑛 are given the standard euclidean norms, the matrix of
the differential is called the derivative of 𝐹 at 𝑎 and we denote [𝑑𝐹𝑎 ] = 𝐹 ′ (𝑎) ∈ ℝ 𝑚×𝑛
which means that 𝑑𝐹𝑎 (𝑣) = 𝐹 ′ (𝑎)𝑣 for all 𝑣 ∈ ℝ𝑛 .
Notice this definition gives an equation which implicitly defines 𝑑𝐹𝑎 . For the moment the only way
we have to calculate 𝑑𝐹𝑎 is educated guessing.

Example 3.1.2. Suppose 𝑇 : 𝑉 → 𝑊 is a linear transformation of normed vector spaces 𝑉 and 𝑊 .


I propose 𝐿 = 𝑇 . In other words, I think we can show the best linear approximation to the change
in a linear function is simply the function itself. Clearly 𝐿 is linear since 𝑇 is linear. Consider the
difference quotient:

𝑇 (𝑎 + ℎ) − 𝑇 (𝑎) − 𝐿(ℎ) 𝑇 (𝑎) + 𝑇 (ℎ) − 𝑇 (𝑎) − 𝑇 (ℎ) 0


= = .
∣∣ℎ∣∣𝑉 ∣∣ℎ∣∣𝑉 ∣∣ℎ∣∣𝑉

Note ℎ ∕= 0 implies ∣∣ℎ∣∣𝑉 ∕= 0 by the definition of the norm. Hence the limit of the difference quotient
vanishes since it is identically zero for every nonzero value of ℎ. We conclude that 𝑑𝑇𝑎 = 𝑇 .

Example 3.1.3. Let 𝑇 : 𝑉 → 𝑊 where 𝑉 and 𝑊 are normed vector spaces and define 𝑇 (𝑣) = 𝑤𝑜
for all 𝑣 ∈ 𝑉 . I claim the differential is the zero transformation. Linearity of 𝐿(𝑣) = 0 is trivially
verified. Consider the difference quotient:

𝑇 (𝑎 + ℎ) − 𝑇 (𝑎) − 𝐿(ℎ) 𝑤𝑜 − 𝑤𝑜 − 0 0
= = .
∣∣ℎ∣∣𝑉 ∣∣ℎ∣∣𝑉 ∣∣ℎ∣∣𝑉

Using the arguments to the preceding example, we find 𝑑𝑇𝑎 = 0.

Typically the difference quotient is not identically zero. The pair of examples above are very special
cases. I’ll give a few more abstract examples later in this section. For now we turn to the question
of how this general definition recovers the concept of differentiation we studied in calculus.
1
Some authors might put a norm in the numerator of the quotient. That is an equivalent condition since a function
𝑔 : 𝑉 → 𝑊 has limℎ→0 𝑔(ℎ) = 0 iff limℎ→0 ∣∣𝑔(ℎ)∣∣𝑊 = 0
3.1. THE DIFFERENTIAL 69

Example 3.1.4. Suppose 𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ ℝ → ℝ is differentiable at 𝑥. It follows that there exists a


linear function 𝑑𝑓𝑥 : ℝ → ℝ such that2

𝑓 (𝑥 + ℎ) − 𝑓 (𝑥) − 𝑑𝑓𝑥 (ℎ)


lim = 0.
ℎ→0 ∣ℎ∣

Since 𝑑𝑓𝑥 : ℝ → ℝ is linear there exists a constant matrix 𝑚 such that 𝑑𝑓𝑥 (ℎ) = 𝑚ℎ. In this silly
case the matrix 𝑚 is a 1 × 1 matrix which otherwise known as a real number. Note that
𝑓 (𝑥 + ℎ) − 𝑓 (𝑥) − 𝑑𝑓𝑥 (ℎ) 𝑓 (𝑥 + ℎ) − 𝑓 (𝑥) − 𝑑𝑓𝑥 (ℎ)
lim =0 ⇔ lim = 0.
ℎ→0 ∣ℎ∣ ℎ→0± ∣ℎ∣

In the left limit ℎ → 0− we have ℎ < 0 hence ∣ℎ∣ = −ℎ. On the other hand, in the right limit ℎ → 0+
we have ℎ > 0 hence ℎ∣ = ℎ. Thus, differentiability suggests that limℎ→0± 𝑓 (𝑥+ℎ)−𝑓±ℎ (𝑥)−𝑑𝑓𝑥 (ℎ)
= 0.
𝑓 (𝑥+ℎ)−𝑓 (𝑥)−𝑑𝑓𝑥 (ℎ)
But we can pull the minus out of the left limit to obtain limℎ→0− ℎ = 0. Therefore,

𝑓 (𝑥 + ℎ) − 𝑓 (𝑥) − 𝑑𝑓𝑥 (ℎ)


lim = 0.
ℎ→0 ℎ
𝑓 (𝑥+ℎ)−𝑓 (𝑥)
We seek to show that limℎ→0 ℎ = 𝑚.

𝑚ℎ 𝑑𝑓𝑥 (ℎ)
𝑚 = lim = lim
ℎ→0 ℎ ℎ→0 ℎ
A theorem from calculus I states that if lim(𝑓 − 𝑔) = 0 and lim(𝑔) exists then so must lim(𝑓 ) and
lim(𝑓 ) = lim(𝑔). Apply that theorem to the fact we know limℎ→0 𝑑𝑓𝑥ℎ(ℎ) exists and
[ ]
𝑓 (𝑥 + ℎ) − 𝑓 (𝑥) 𝑑𝑓𝑥 (ℎ)
lim − = 0.
ℎ→0 ℎ ℎ

It follows that
𝑑𝑓𝑥 (ℎ) 𝑓 (𝑥 + ℎ) − 𝑓 (𝑥)
lim = lim .
ℎ→0 ℎ ℎ→0 ℎ
Consequently,
𝑓 (𝑥 + ℎ) − 𝑓 (𝑥)
𝑑𝑓𝑥 (ℎ) = lim defined 𝑓 ′ (𝑥) in calc. I.
ℎ→0 ℎ
Therefore, 𝑑𝑓𝑥 (ℎ) = 𝑓 ′ (𝑥)ℎ . In other words, if a function is differentiable in the sense we defined
at the beginning of this chapter then it is differentiable in the terminology we used in calculus I.
Moreover, the derivative at 𝑥 is precisely the matrix of the differential.

2

unless we state otherwise, ℝ𝑛 is assumed to have the euclidean norm, in this case ∣∣𝑥∣∣ℝ = 𝑥2 = ∣𝑥∣.
70 CHAPTER 3. DIFFERENTIATION

Example 3.1.5. Suppose 𝐹 : ℝ2 → ℝ3 is defined by 𝐹 (𝑥, 𝑦) = (𝑥𝑦, 𝑥2 , 𝑥 + 3𝑦) for all (𝑥, 𝑦) ∈ ℝ2 .
Consider the difference function △𝐹 at (𝑥, 𝑦):
△𝐹 = 𝐹 ((𝑥, 𝑦) + (ℎ, 𝑘)) − 𝐹 (𝑥, 𝑦) = 𝐹 (𝑥 + ℎ, 𝑦 + 𝑘) − 𝐹 (𝑥, 𝑦)
Calculate,
△𝐹 = (𝑥 + ℎ)(𝑦 + 𝑘), (𝑥 + ℎ)2 , 𝑥 + ℎ + 3(𝑦 + 𝑘) − 𝑥𝑦, 𝑥2 , 𝑥 + 3𝑦
( ) ( )

Simplify by cancelling terms which cancel with 𝐹 (𝑥, 𝑦):


△𝐹 = 𝑥𝑘 + ℎ𝑦, 2𝑥ℎ + ℎ2 , ℎ + 3𝑘)
( )

Identify the linear part of △𝐹 as a good candidate for the differential. I claim that:
( )
𝐿(ℎ, 𝑘) = 𝑥𝑘 + ℎ𝑦, 2𝑥ℎ, ℎ + 3𝑘 .
is the differential for 𝑓 at (x,y). Observe first that we can write
⎡ ⎤
𝑦 𝑥 [ ]

𝐿(ℎ, 𝑘) = ⎣ 2𝑥 0 ⎦ .
𝑘
1 3

therefore 𝐿 : ℝ2 → ℝ3 is manifestly linear. Use the algebra above to simplify the difference quotient
below:
(0, ℎ2 , 0)
[ ] [ ]
△𝐹 − 𝐿(ℎ, 𝑘)
lim = lim
(ℎ,𝑘)→(0,0) ∣∣(ℎ, 𝑘)∣∣ (ℎ,𝑘)→(0,0) ∣∣(ℎ, 𝑘)∣∣
√ √
Note ∣∣(ℎ, 𝑘)∣∣ = ℎ2 + 𝑘 2 therefore we fact the task of showing that (0, ℎ2 / ℎ2 + 𝑘 2 , 0) → (0, 0, 0)
as (ℎ, 𝑘) → (0, 0). Recall from our study of limits that we can prove the vector tends to (0, 0, 0)
by showing the each component tends to zero. The first and third components are obviously zero
however the second component requires study. Observe that
ℎ2 ℎ2
0≤ √ ≤ √ = ∣ℎ∣
ℎ2 + 𝑘 2 ℎ2
Clearly lim(ℎ,𝑘)→(0,0) (0) = 0 and lim(ℎ,𝑘)→(0,0) ∣ℎ∣ = 0 hence the squeeze theorem for multivariate
2
limits shows that lim(ℎ,𝑘)→(0,0) √ℎℎ2 +𝑘2 = 0. Therefore,
⎡ ⎤
𝑦 𝑥 [ ]

𝑑𝑓(𝑥,𝑦) (ℎ, 𝑘) = ⎣ 2𝑥 0 ⎦ .
𝑘
1 3
Computation of less trivial multivariate limits is an art we’d like to avoid if possible. It turns out
that we can actually avoid these calculations by computing partial derivatives. However, we still
need a certain multivariate limit to exist for the partial derivative functions so in some sense it’s
unavoidable. The limits are there whether we like to calculate them or not. I want to give a few
more abstract examples before I get into the partial differentiation. The purpose of this section is
to showcase the generality of the definition for differential.
3.1. THE DIFFERENTIAL 71

Example 3.1.6. Suppose 𝐹 (𝑡) = 𝑈 (𝑡)+𝑖𝑉 (𝑡) for all 𝑡 ∈ 𝑑𝑜𝑚(𝑓 ) and both 𝑈 and 𝑉 are differentiable
functions on 𝑑𝑜𝑚(𝐹 ). By the arguments given in Example 3.1.4 it suffices to find 𝐿 : ℝ → ℂ such
that [ ]
𝐹 (𝑡 + ℎ) − 𝐹 (𝑡) − 𝐿(ℎ)
lim = 0.
ℎ→0 ℎ
I propose that on the basis of analogy to Example 3.1.4 we ought to have 𝑑𝐹𝑡 (ℎ) = (𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))ℎ.
Let 𝐿(ℎ) = (𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))ℎ. Observe that, using properties of ℂ , 𝐿(ℎ1 + 𝑐ℎ2 ) =

= (𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))(ℎ1 + 𝑐ℎ2 ) = (𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))ℎ1 + 𝑐(𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))ℎ2 = 𝐿(ℎ1 ) + 𝑐𝐿(ℎ2 ).

for all ℎ1 , ℎ2 ∈ ℝ and 𝑐 ∈ ℝ. Hence 𝐿 : ℝ → ℂ is linear. Moreover,


( )
𝐹 (𝑡+ℎ)−𝐹 (𝑡)−𝐿(ℎ) 1 ′ ′
ℎ = ℎ 𝑈 (𝑡 + ℎ) + 𝑖𝑉 (𝑡 + ℎ) − 𝑈 (𝑡) + 𝑖𝑉 (𝑡) − (𝑈 (𝑡) + 𝑖𝑉 (𝑡))ℎ
( ) ( )
1 ′ 1 ′
= ℎ 𝑈 (𝑡 + ℎ) − 𝑈 (𝑡) − 𝑈 (𝑡)ℎ + 𝑖 ℎ 𝑉 (𝑡 + ℎ) − 𝑉 (𝑡) − 𝑉 (𝑡)ℎ

Consider the problem of calculating limℎ→0 𝐹 (𝑡+ℎ)−𝐹ℎ (𝑡)−𝐿(ℎ) . We use a lemma that a complex
function converges to zero iff the real and imaginary parts of the function separately converge to
zero (this might be a homework). By differentiability of 𝑈 and 𝑉 we find again using Example 3.1.4
( ) ( )
1 ′ 1 ′
lim 𝑈 (𝑡 + ℎ) − 𝑈 (𝑡) − 𝑈 (𝑡)ℎ = 0 lim 𝑉 (𝑡 + ℎ) − 𝑉 (𝑡) − 𝑉 (𝑡)ℎ = 0.
ℎ→0 ℎ ℎ→0 ℎ

Therefore, 𝑑𝐹𝑡 (ℎ) = (𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡))ℎ. Note that the quantity 𝑈 ′ (𝑡) + 𝑖𝑉 ′ (𝑡) is not a real matrix
in this case. To write the derivative in terms of a real matrix multiplication we need to construct
some further notation which makes use of the isomorphism between ℂ and ℝ2 . Actually, it’s pretty
easy if you agree that 𝑎 + 𝑖𝑏 = (𝑎, 𝑏) then 𝑑𝐹𝑡 (ℎ) = (𝑈 ′ (𝑡), 𝑉 ′ (𝑡))ℎ so the matrix of the differential
is (𝑈 ′ (𝑡), 𝑉 ′ (𝑡)) ∈ ℝ1×2 which makes since as 𝐹 : ℂ ≈ ℝ2 → ℝ.

Generally constructing the matrix for a function 𝑓 : 𝑉 → 𝑊 where 𝑉, 𝑊 ∕= ℝ involves a fair


number of relatively ad-hoc conventions because the constructions necessarily involving choosing
coordinates. The situation is similar in linear algebra. Writing abstract linear transformations in
terms of matrix multiplication takes a little thinking. If you look back you’ll notice that I did not
bother to try to write a matrix Examples 3.1.2 or 3.1.3. The same is true for the final example of
this section.

Example 3.1.7. Suppose 𝐹 : ℝ 𝑛×𝑛 →ℝ 𝑛×𝑛 is defined by 𝐹 (𝑋) = 𝑋 2 . Notice

△𝐹 = 𝐹 (𝑋 + 𝐻) − 𝐹 (𝑋) = (𝑋 + 𝐻)(𝑋 + 𝐻) − 𝑋 2 = 𝑋𝐻 + 𝐻𝑋 + 𝐻 2

I propose that 𝐹 is differentiable at 𝑋 and 𝐿(𝐻) = 𝑋𝐻 + 𝐻𝑋. Let’s check linearity,

𝐿(𝐻1 + 𝑐𝐻2 ) = 𝑋(𝐻1 + 𝑐𝐻2 ) + (𝐻1 + 𝑐𝐻2 )𝑋 = 𝑋𝐻1 + 𝐻1 𝑋 + 𝑐(𝑋𝐻2 + 𝐻2 𝑋)


72 CHAPTER 3. DIFFERENTIATION

Hence 𝐿 : ℝ 𝑛×𝑛 → ℝ 𝑛×𝑛 is a linear transformation. By construction of 𝐿 the linear terms in the
numerator cancel leaving just the quadratic term,

𝐹 (𝑋 + 𝐻) − 𝐹 (𝑋) − 𝐿(𝐻) 𝐻2
lim = lim .
𝐻→0 ∣∣𝐻∣∣ 𝐻→0 ∣∣𝐻∣∣

2
It suffices to show that lim𝐻→0 ∣∣𝐻 ∣∣
∣∣𝐻∣∣ = 0 since lim(∣∣𝑔∣∣) = 0 iff lim(𝑔) = 0 in a normed vector
space. Fortunately the normed vector space ℝ 𝑛×𝑛 is actually a Banach algebra. A vector space
with a multiplication operation is called an algebra. In the current context the multiplication is
simply matrix multiplication. A Banach algebra is a normed vector space with a multiplication that
satisfies ∣∣𝑋𝑌 ∣∣ ≤ ∣∣𝑋∣∣ ∣∣𝑌 ∣∣. Thanks to this inequality we can calculate our limit via the squeeze
2 ∣∣ ∣∣𝐻 2 ∣∣
theorem. Observe 0 ≤ ∣∣𝐻 ∣∣𝐻∣∣ ≤ ∣∣𝐻∣∣. As 𝐻 → 0 it follows ∣∣𝐻∣∣ → 0 hence lim𝐻→0 ∣∣𝐻∣∣ = 0. We
find 𝑑𝐹𝑋 (𝐻) = 𝑋𝐻 + 𝐻𝑋.
XXX- need to adjust example below to reflect orthonormality assumption.
Example 3.1.8. Suppose 𝑉 is a normed vector space with basis 𝛽 = {𝑓1 , 𝑓2 , . . . , 𝑓𝑛 }. Futhermore,
let 𝐺 : 𝐼 ⊆ ℝ → 𝑉 be defined by
∑𝑛
𝐺(𝑡) = 𝐺𝑖 (𝑡)𝑓𝑖
𝑖=1

differentiable on 𝐼 for 𝑖 = 1, 2, . . . , 𝑛. I claim that if 𝑇 = 𝑛𝑗=1 𝑇𝑗 𝑓𝑗 : ℝ → 𝑉



where 𝐺𝑖 : 𝐼 → ℝ is∑
then lim𝑡→0 𝑇 (𝑡) = 𝑛𝑗=1 𝑙𝑗 𝑓𝑗 iff lim𝑡→0 𝑇𝑗 (𝑡) = 𝑙𝑗 for all 𝑗 = 1, 2, . . . , 𝑛. In words, the limit of a
vector-valued function can be parsed into a vector of limits. We’ve not proved this, I may make it
a homework. With this in mind consider (again[we can trade ∣ℎ∣ for ℎ ]as we explained in-depth
∑𝑛 𝑑𝐺𝑖
𝐺(𝑡+ℎ)−𝐺(𝑡)−ℎ 𝑖=1 𝑓
𝑑𝑡 𝑖
in Example 3.1.4) the difference quotient limℎ→0 ℎ , factoring out the basis
yields:
[ ∑𝑛 𝑛 [
+ ℎ) − 𝐺𝑖 (𝑡) − ℎ 𝑑𝐺 𝐺𝑖 (𝑡 + ℎ) − 𝐺𝑖 (𝑡) − ℎ 𝑑𝐺
] ∑ ]
𝑖=1 [𝐺𝑖 (𝑡 𝑑𝑡 ]𝑓𝑖
𝑖 𝑖
𝑑𝑡
lim = lim 𝑓𝑖
ℎ→0 ℎ ℎ→0 ℎ
𝑖=1

The expression on the left is the limit of a vector whereas the expression on the right is a vector of
limits. I make the equality by applying the claim. In any event, I hope you are not surprised that:
𝑛
∑ 𝑑𝐺𝑖
𝑑𝐺𝑡 (ℎ) = ℎ 𝑓𝑖
𝑑𝑡
𝑖=1

The example above encompasses a number of cases at once:


1. 𝑉 = ℝ, functions on ℝ, 𝑓 : ℝ → ℝ

2. 𝑉 = ℝ𝑛 , space curves in ℝ, ⃗𝑟 : ℝ → ℝ𝑛

3. 𝑉 = ℂ, complex-valued functions of a real variable, 𝑓 = 𝑢 + 𝑖𝑣 : ℝ → ℂ


3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 73

4. 𝑉 = ℝ 𝑚×𝑛 , matrix-valued functions of a real variable, 𝐹 : ℝ → ℝ 𝑚×𝑛 .

In short, when we differentiate a function which has a real domain then we can define the derivative
of such a function by component-wise differentiation. It gets more interesting when the domain has
several independent variables. We saw this in Examples 3.1.5 and 3.1.7.

Remark 3.1.9.

I have deliberately defined the derivative in slightly more generality than we need for this
course. It’s probably not much trouble to continue to develop the theory of differentiation
for a normed vector space, however I will for the most part stop here. The theorems that
follow are not terribly complicated in the notation of ℝ𝑛 and traditionally this type of
course only covers continuous differentiability, inverse and implicit function theorems in
the context of mappings from ℝ𝑛 to ℝ𝑚 . For the reader interested in generalizing these
results to the context of an abstract normed vector space feel free to discuss it with me
sometime. This much we can conclude from our brief experience thus far, if we study
functions whose domain is in ℝ then differentiation is accomplished component-wise in the
range. This is good news since in all your previous courses I simply defined differentiation
by the component-wise rule. This section at a minimum shows that idea is consistent with
the larger theory we are working out in this chapter. It is likely I will have you work out
the calculus of complex or matrix-valued functions of a real variable in the homework.

3.2 partial derivatives and the existence of the differential


In the preceding section we calculated the differential at a point via educated guessing. We should
like to find better method to derive differentials. It turns out that we can systematically calculate
the differential from partial derivatives of the component functions. However, certain topological
conditions are required for us to properly paste together the partial derivatives of the component
functions. We describe how the criteria of continuous differentiability achieves this goal. Much of
this section was covered in calculus III but we do bring new generality and vision to the calculations
described here.

3.2.1 directional derivatives


The directional derivative of a mapping 𝐹 at a point 𝑎 ∈ 𝑑𝑜𝑚(𝐹 ) along 𝑣 is defined to be the
derivative of the curve 𝛾(𝑡) = 𝐹 (𝑎 + 𝑡𝑣). In other words, the directional derivative gives you the
instantaneous vector-rate of change in the mapping 𝐹 at the point 𝑎 along 𝑣. In the case that
𝑚 = 1 then 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ ℝ𝑛 → ℝ and the directional derivative gives the instantaneous rate of
change of the function 𝐹 at the point 𝑎 along 𝑣. You probably insisted that ∣∣𝑣∣∣ = 1 in calculus III
but we make no such demand here. We define the directional derivative for mappings and vectors
of non-unit length.

Definition 3.2.1.
74 CHAPTER 3. DIFFERENTIATION

Let 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ ℝ𝑛 → ℝ𝑚 and suppose the limit below exists for 𝑎 ∈ 𝑑𝑜𝑚(𝐹 ) and 𝑣 ∈ ℝ𝑛
then we define the directional derivative of 𝐹 at 𝑎 along 𝑣 to be 𝐷𝑣 𝐹 (𝑎) ∈ ℝ𝑚 where

𝐹 (𝑎 + ℎ𝑣) − 𝐹 (𝑎)
𝐷𝑣 𝐹 (𝑎) = lim
ℎ→0 ℎ

One great contrast we should pause to note is that the definition of the directional derivative is
explicit whereas the definition of the differential was implicit. Many similarities do exist. For
example: the directional derivative 𝐷𝑣 𝐹 (𝑎) and the differential 𝑑𝑓𝑎 (𝑣) are both is homogenous in
𝑣.
Proposition 3.2.2.
Let 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ ℝ𝑛 → ℝ𝑚 then if 𝐷𝑣 𝐹 (𝑎) exists in ℝ𝑚 then 𝐷𝑐𝑣 𝐹 (𝑎) = 𝑐𝐷𝑣 𝐹 (𝑎)

Proof: Let 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ ℝ𝑛 → ℝ𝑚 and suppose 𝐷𝑣 𝐹 (𝑎) ∈ ℝ𝑚 . This means we are given that
limℎ→0 𝐹 (𝑎+ℎ𝑣)−𝐹

(𝑎)
= 𝐷𝑣 𝐹 (𝑎) ∈ ℝ𝑚 . If 𝑐 = 0 then the proposition is clearly true. Consider, for
nonzero 𝑐 ∈ ℝ,
𝐹 (𝑎 + ℎ(𝑐𝑣)) − 𝐹 (𝑎) 𝐹 (𝑎 + 𝑐ℎ(𝑣)) − 𝐹 (𝑎)
lim = 𝑐 lim
ℎ→0 ℎ 𝑐ℎ→0 𝑐ℎ
Hence by the substitution 𝑐ℎ = 𝑘 we find,
𝐹 (𝑎 + ℎ(𝑐𝑣)) − 𝐹 (𝑎) 𝐹 (𝑎 + 𝑘(𝑐𝑣)) − 𝐹 (𝑎)
lim = 𝑐 lim
ℎ→0 ℎ 𝑘→0 𝑘
Therefore, the limit on the left of the equality exists as the limit on the right of the equality is
given and we conclude 𝐷𝑐𝑣 𝐹 (𝑎) = 𝑐𝐷𝑣 𝐹 (𝑎) for all 𝑐 ∈ ℝ. □

If we’re given the derivative of a mapping then the directional derivative exists. The converse is
not so simple as we shall discuss in the next subsection.
Proposition 3.2.3.
If 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is differentiable at 𝑎 ∈ 𝑈 then the directional derivative 𝐷𝑣 𝐹 (𝑎) exists
for each 𝑣 ∈ ℝ𝑛 and 𝐷𝑣 𝐹 (𝑎) = 𝑑𝐹𝑎 (𝑣).

Proof: Suppose 𝑎 ∈ 𝑈 such that 𝑑𝐹𝑎 is well-defined then we are given that
𝐹 (𝑎 + ℎ) − 𝐹 (𝑎) − 𝑑𝐹𝑎 (ℎ)
lim = 0.
ℎ→0 ∣∣ℎ∣∣
This is a limit in ℝ𝑛 , when it exists it follows that the limits that approach the origin along
particular paths also exist and are zero. In particular we can consider the path 𝑡 7→ 𝑡𝑣 for 𝑣 ∕= 0
and 𝑡 > 0, we find
𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) − 𝑑𝐹𝑎 (𝑡𝑣) 1 𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) − 𝑡𝑑𝐹𝑎 (𝑣)
lim = lim = 0.
𝑡𝑣→0, 𝑡>0 ∣∣𝑡𝑣∣∣ ∣∣𝑣∣∣ 𝑡→0+ ∣𝑡∣
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 75

Hence, as ∣𝑡∣ = 𝑡 for 𝑡 > 0 we find

𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) 𝑡𝑑𝐹𝑎 (𝑣)


lim = lim = 𝑑𝐹𝑎 (𝑣).
𝑡→0+ 𝑡 𝑡→0 𝑡
Likewise we can consider the path 𝑡 7→ 𝑡𝑣 for 𝑣 ∕= 0 and 𝑡 < 0

𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) − 𝑑𝐹𝑎 (𝑡𝑣) 1 𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) − 𝑡𝑑𝐹𝑎 (𝑣)


lim = lim = 0.
𝑡𝑣→0, 𝑡<0 ∣∣𝑡𝑣∣∣ ∣∣𝑣∣∣ 𝑡→0− ∣𝑡∣

Note ∣𝑡∣ = −𝑡 thus the limit above yields

𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎) 𝑡𝑑𝐹𝑎 (𝑣) 𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎)


lim = lim ⇒ lim = 𝑑𝐹𝑎 (𝑣).
𝑡→0− −𝑡 𝑡→0− −𝑡 𝑡→0− 𝑡
Therefore,
𝐹 (𝑎 + 𝑡𝑣) − 𝐹 (𝑎)
lim = 𝑑𝐹𝑎 (𝑣)
𝑡→0 𝑡
and we conclude that 𝐷𝑣 𝐹 (𝑎) = 𝑑𝐹𝑎 (𝑣) for all 𝑣 ∈ ℝ𝑛 since the 𝑣 = 0 case follows trivially. □

Let’s think about the problem we face. We want to find a nice formula for the differential. We
now know that if it exists then the directional derivatives allow us to calculate the values of the
differential in particular directions. The natural thing to do is to calculate the standard matrix
for the differential using the preceding proposition. Recall that if 𝐿 : ℝ𝑛 → ℝ𝑚 then the standard
matrix was simply [𝐿] = [𝐿(𝑒1 )∣𝐿(𝑒2 )∣ ⋅ ⋅ ⋅ ∣𝐿(𝑒𝑛 )] and thus the action of 𝐿 is expressed nicely as a
matrix multiplication; 𝐿(𝑣) = [𝐿]𝑣. Similarly, 𝑑𝑓𝑎 : ℝ𝑛 → ℝ𝑚 is linear transformation and thus
𝑑𝑓𝑎 (𝑣) = [𝑑𝑓𝑎 ]𝑣 where [𝑑𝑓𝑎 ] = [𝑑𝑓𝑎 (𝑒1 )∣𝑑𝑓𝑎 (𝑒2 )∣ ⋅ ⋅ ⋅ ∣𝑑𝑓𝑎 (𝑒𝑛 )]. Moreover, by the preceding proposition
we can calculate 𝑑𝑓𝑎 (𝑒𝑗 ) = 𝐷𝑒𝑗 𝑓 (𝑎) for 𝑗 = 1, 2, . . . , 𝑛. Clearly the directional derivatives in the
coordinate directions are of great importance. For this reason we make the following definition:

Definition 3.2.4.
Suppose that 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is a mapping the we say that 𝐹 is has partial derivative
∂𝐹
∂𝑥𝑖 (𝑎) at 𝑎 ∈ 𝑈 iff the directional derivative in the 𝑒𝑖 direction exists at 𝑎. In this case we
denote,
∂𝐹
(𝑎) = 𝐷𝑒𝑖 𝐹 (𝑎).
∂𝑥𝑖
∂𝐹
Also we may use the notation 𝐷𝑒𝑖 𝐹 (𝑎) = 𝐷𝑖 𝐹 (𝑎) or ∂𝑖 𝐹 = ∂𝑥𝑖
when convenient. We also
construct the partial derivative mapping ∂𝑖 𝐹 : 𝑉 ⊆ ℝ → ℝ𝑚 as the mapping defined
𝑛

pointwise for each 𝑣 ∈ 𝑉 where ∂𝑖 𝐹 (𝑣) exists.

Let’s expand this definition a bit. Note that if 𝐹 = (𝐹1 , 𝐹2 , . . . , 𝐹𝑚 ) then

𝐹 (𝑎 + ℎ𝑒𝑖 ) − 𝐹 (𝑎) 𝐹𝑗 (𝑎 + ℎ𝑒𝑖 ) − 𝐹𝑗 (𝑎)


𝐷𝑒𝑖 𝐹 (𝑎) = lim ⇒ [𝐷𝑒𝑖 𝐹 (𝑎)] ⋅ 𝑒𝑗 = lim
ℎ→0 ℎ ℎ→0 ℎ
76 CHAPTER 3. DIFFERENTIATION

for each 𝑗 = 1, 2, . . . 𝑚. But then the limit of the component function 𝐹𝑗 is precisely the directional
derivative at 𝑎 along 𝑒𝑖 hence we find the result

∂𝐹 ∂𝐹𝑗
⋅ 𝑒𝑗 = in other words, ∂𝑖 𝐹 = (∂𝑖 𝐹1 , ∂𝑖 𝐹2 , . . . , ∂𝑖 𝐹𝑚 ).
∂𝑥𝑖 ∂𝑥𝑖

Proposition 3.2.5.

If 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is differentiable at 𝑎 ∈ 𝑈 then the directional derivative 𝐷𝑣 𝐹 (𝑎) can


be expressed as a sum of partial derivative maps for each 𝑣 =< 𝑣1 , 𝑣2 , . . . , 𝑣𝑛 >∈ ℝ𝑛 :
𝑛

𝐷𝑣 𝐹 (𝑎) = 𝑣𝑗 ∂𝑗 𝐹 (𝑎)
𝑗=1

Proof: since 𝐹 is differentiable at 𝑎 the differential 𝑑𝐹𝑎 exists and 𝐷𝑣 𝐹 (𝑎) = 𝑑𝐹𝑎 (𝑣) for all 𝑣 ∈ ℝ𝑛 .
Use linearity of the differential to calculate that

𝐷𝑣 𝐹 (𝑎) = 𝑑𝐹𝑎 (𝑣1 𝑒1 + ⋅ ⋅ ⋅ + 𝑣𝑛 𝑒𝑛 ) = 𝑣1 𝑑𝐹𝑎 (𝑒1 ) + ⋅ ⋅ ⋅ + 𝑣𝑛 𝑑𝐹𝑎 (𝑒𝑛 ).

Note 𝑑𝐹𝑎 (𝑒𝑗 ) = 𝐷𝑒𝑗 𝐹 (𝑎) = ∂𝑗 𝐹 (𝑎) and the prop. follows. □

My primary interest in advanced calculus is the differential3 . I discuss the directional derivative
here merely to connect with your past calculations in calculus III where we explored the geometric
and analytic significance of the directional derivative. I do not intend to revisit all of that here
once more. Our focus is elsewhere. That said it’s probably best to include the example below:

Example 3.2.6. Suppose 𝑓 : ℝ3 → ℝ then ∇𝑓 = [∂𝑥 𝑓, ∂𝑦 𝑓, ∂𝑧 𝑓 ]𝑇 and we can write the directional
derivative in terms of
𝐷𝑣 𝑓 = [∂𝑥 𝑓, ∂𝑦 𝑓, ∂𝑧 𝑓 ]𝑇 𝑣 = ∇𝑓 ⋅ 𝑣
if we insist that ∣∣𝑣∣∣ = 1 then we recover the standard directional derivative we discuss in calculus
III. Naturally the ∣∣∇𝑓 (𝑎)∣∣ yields the maximum value for the directional derivative at 𝑎 if we
limit the inputs to vectors of unit-length. If we did not limit the vectors to unit length then the
directional derivative at 𝑎 can become arbitrarily large as 𝐷𝑣 𝑓 (𝑎) is proportional to the magnitude
of 𝑣. Since our primary motivation in calculus III was describing rates of change along certain
directions for some multivariate function it made sense to specialize the directional derivative to
vectors of unit-length. The definition used in these notes better serves the theoretical discussion. If
you read my calculus III notes you’ll find a derivation of how the directional derivative in Stewart’s
calculus arises from the general definition of the derivative as a linear mapping. Look up page 305g.
Incidentally, those notes may well be better than these in certain respects.

3
this is why I have yet to give an example in this section, you should get out your calculus III notes if you need a
refresher on directional derivatives
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 77

Proposition 3.2.7.
If 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is differentiable at 𝑎 ∈ 𝑈 then the differential 𝑑𝐹𝑎 has derivative
matrix 𝐹 ′ (𝑎) and it has components which are expressed in terms of partial derivatives of
the component functions:
[𝑑𝐹𝑎 ]𝑖𝑗 = ∂𝑗 𝐹𝑖
for 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛.
Perhaps it is helpful to expand the derivative matrix explicitly for future reference:
⎡ ⎤
∂1 𝐹1 (𝑎) ∂2 𝐹1 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹1 (𝑎)
⎢ ∂1 𝐹2 (𝑎) ∂2 𝐹2 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹2 (𝑎) ⎥
𝐹 ′ (𝑎) = ⎢
⎢ ⎥
.. .. .. .. ⎥
⎣ . . . . ⎦
∂1 𝐹𝑚 (𝑎) ∂2 𝐹𝑚 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹𝑚 (𝑎)
Let’s write the operation of the differential for a differentiable mapping at some point 𝑎 ∈ ℝ in
terms of the explicit matrix multiplication by 𝐹 ′ (𝑎). Let 𝑣 = (𝑣1 , 𝑣2 , . . . 𝑣𝑛 ) ∈ ℝ𝑛 ,
⎡ ⎤⎡ ⎤
∂1 𝐹1 (𝑎) ∂2 𝐹1 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹1 (𝑎) 𝑣1
⎢ ∂1 𝐹2 (𝑎) ∂2 𝐹2 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹2 (𝑎) ⎥ ⎢ 𝑣2 ⎥
𝑑𝐹𝑎 (𝑣) = 𝐹 ′ (𝑎)𝑣 = ⎢
⎢ ⎥⎢ ⎥
.. .. .. .. ⎥ ⎢ .. ⎥
⎣ . . . . ⎦⎣ . ⎦
∂1 𝐹𝑚 (𝑎) ∂2 𝐹𝑚 (𝑎) ⋅ ⋅ ⋅ ∂𝑛 𝐹𝑚 (𝑎) 𝑣𝑛
You may recall the notation from calculus III at this point, omitting the 𝑎-dependence,
[ ]𝑇
∇𝐹𝑗 = 𝑔𝑟𝑎𝑑(𝐹𝑗 ) = ∂1 𝐹𝑗 , ∂2 𝐹𝑗 , ⋅ ⋅ ⋅ , ∂𝑛 𝐹𝑗

So if the derivative exists we can write it in terms of a stack of gradient vectors of the component
functions: (I used a transpose to write the stack side-ways),
]𝑇
𝐹 ′ = ∇𝐹1 ∣∇𝐹2 ∣ ⋅ ⋅ ⋅ ∣∇𝐹𝑚
[

Finally, just to collect everything together,

(∇𝐹1 )𝑇
⎡ ⎤ ⎡ ⎤
∂1 𝐹1 ∂2 𝐹1 ⋅ ⋅ ⋅ ∂𝑛 𝐹1
⎢ ∂1 𝐹2 ∂2 𝐹2 ⋅ ⋅ ⋅ ∂𝑛 𝐹2 ⎥ [ ] ⎢ (∇𝐹2 )𝑇 ⎥
𝐹′ = ⎢ . = ∂ 𝐹 ∣ ∂ 𝐹 ∣ ⋅ ⋅ ⋅ ∣ ∂ 𝐹 =⎢
⎢ ⎥ ⎢ ⎥
.. .. .. 1 2 𝑛 ..
⎣ ..
⎥ ⎥
. . . ⎦ ⎣ . ⎦
∂1 𝐹𝑚 ∂2 𝐹𝑚 ⋅ ⋅ ⋅ ∂𝑛 𝐹𝑚 (∇𝐹𝑚 )𝑇

Example 3.2.8. Recall that in Example 3.1.5 we showed that 𝐹 : ℝ2 → ℝ3 defined by 𝐹 (𝑥, 𝑦) =
(𝑥𝑦, 𝑥2 , 𝑥 + 3𝑦) for all (𝑥, 𝑦) ∈ ℝ2 was differentiable. In fact we calculated that
⎡ ⎤
𝑦 𝑥 [ ]

𝑑𝐹(𝑥,𝑦) (ℎ, 𝑘) = ⎣ 2𝑥 0 ⎦ .
𝑘
1 3
78 CHAPTER 3. DIFFERENTIATION

If you recall from calculus III the mechanics of partial differentiation it’s simple to see that
⎡ ⎤
𝑦
∂𝐹 ∂
= (𝑥𝑦, 𝑥2 , 𝑥 + 3𝑦) = (𝑦, 2𝑥, 1) = ⎣ 2𝑥 ⎦
∂𝑥 ∂𝑥
1
⎤ ⎡
𝑥
∂𝐹 ∂
= (𝑥𝑦, 𝑥2 , 𝑥 + 3𝑦) = (𝑥, 0, 3) = ⎣ 0 ⎦
∂𝑦 ∂𝑦
3
Thus [𝑑𝐹 ] = [∂𝑥 𝐹 ∣∂𝑦 𝐹 ] (as we expect given the derivations in this section!)

3.2.2 continuously differentiable, a cautionary tale


We have noted that differentiablility on some set 𝑈 implies all sorts of nice formulas in terms of
the partial derivatives. Curiously the converse is not quite so simple. It is possible for the partial
derivatives to exist on some set and yet the mapping may fail to be differentiable. We need an extra
topological condition on the partial derivatives if we are to avoid certain pathological4 examples.

Example 3.2.9. I found this example in Hubbard’s advanced calculus text(see Ex. 1.9.4, pg. 123).
It is a source of endless odd examples, notation and bizarre quotes. Let 𝑓 (𝑥) = 0 and

𝑥 1
𝑓 (𝑥) = + 𝑥2 sin
2 𝑥
for all 𝑥 ∕= 0. I can be shown that the derivative 𝑓 ′ (0) = 1/2. Moreover, we can show that 𝑓 ′ (𝑥)
exists for all 𝑥 ∕= 0, we can calculate:
1 1 1
𝑓 ′ (𝑥) = + 2𝑥 sin − cos
2 𝑥 𝑥
Notice that 𝑑𝑜𝑚(𝑓 ′ ) = ℝ. Note then that the tangent line at (0, 0) is 𝑦 = 𝑥/2.

4
”pathological” as in, ”your clothes are so pathological, where’d you get them?”
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 79

You might be tempted to say then that this function is increasing at a rate of 1/2 for 𝑥 near zero.
But this claim would be false since you can see that 𝑓 ′ (𝑥) oscillates wildly without end near zero.
We have a tangent line at (0, 0) with positive slope for a function which is not increasing at (0, 0)
(recall that increasing is a concept we must define in a open interval to be careful). This sort of
thing cannot happen if the derivative is continuous near the point in question.
The one-dimensional case is really quite special, even though we had discontinuity of the derivative
we still had a well-defined tangent line to the point. However, many interesting theorems in calculus
of one-variable require the function to be continuously differentiable near the point of interest. For
example, to apply the 2nd-derivative test we need to find a point where the first derivative is zero
and the second derivative exists. We cannot hope to compute 𝑓 ′′ (𝑥𝑜 ) unless 𝑓 ′ is continuous at 𝑥𝑜 .
The next example is sick.
Example 3.2.10. Let us define 𝑓 (0, 0) = 0 and
𝑥2 𝑦
𝑓 (𝑥, 𝑦) =
𝑥2 + 𝑦 2
for all (𝑥, 𝑦) ∕= (0, 0) in ℝ2 . It can be shown that 𝑓 is continuous at (0, 0). Moreover, since
𝑓 (𝑥, 0) = 𝑓 (0, 𝑦) = 0 for all 𝑥 and all 𝑦 it follows that 𝑓 vanishes identically along the coordinate
axis. Thus the rate of change in the 𝑒1 or 𝑒2 directions is zero. We can calculate that
∂𝑓 2𝑥𝑦 3 ∂𝑓 𝑥4 − 𝑥2 𝑦 2
= 2 and = 2
∂𝑥 (𝑥 + 𝑦 2 )2 ∂𝑦 (𝑥 + 𝑦 2 )2
Consider the path to the origin 𝑡 7→ (𝑡, 𝑡) gives 𝑓𝑥 (𝑡, 𝑡) = 2𝑡4 /(𝑡2 + 𝑡2 )2 = 1/2 hence 𝑓𝑥 (𝑥, 𝑦) → 1/2
along the path 𝑡 7→ (𝑡, 𝑡), but 𝑓𝑥 (0, 0) = 0 hence the partial derivative 𝑓𝑥 is not continuous at (0, 0).
In this example, the discontinuity of the partial derivatives makes the tangent plane fail to exist.
XXX— need to include graph of this thing.

Definition 3.2.11.
A mapping 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is continuously differentiable at 𝑎 ∈ 𝑈 iff the partial
derivative mappings 𝐷𝑗 𝐹 exist on an open set containing 𝑎 and are continuous at 𝑎.
The defintion above is interesting because of the proposition below. The import of the proposition
is that we can build the tangent plane from the Jacobian matrix provided the partial derivatives
are all continuous. This is a very nice result because the concept of the linear mapping is quite
abstract but partial differentiation of a given mapping is easy.
Proposition 3.2.12.
If 𝐹 is continuously differentiable at 𝑎 then 𝐹 is differentiable at 𝑎

We’ll follow the proof in Edwards on pages 72-73.


- need to include the continuously diff. discuss here then do the examples.
- need to eliminate discussion of directional derivative except where absolutely necessary
- think about where to define the differential as a form. Consider peppering ”differential at a point”
many places...
80 CHAPTER 3. DIFFERENTIATION

3.2.3 gallery of derivatives


Our goal here is simply to exhbit the Jacobian matrix and partial derivatives for a few mappings.
At the base of all these calculations is the observation that partial differentiation is just ordinary
differentiation where we treat all the independent variable not being differentiated as constants.
The criteria of indepedence is important. We’ll study the case the variables are not independent
in a later section.
Remark 3.2.13.
I have put remarks about the rank of the derivative in the examples below. Of course this
has nothing to do with the process of calculating Jacobians. It’s something to think about
once we master the process of calculating the Jacobian. Ignore the red comments for now
if you wish

Example 3.2.14. Let 𝑓 (𝑡) = (𝑡, 𝑡2 , 𝑡3 ) then 𝑓 ′ (𝑡) = (1, 2𝑡, 3𝑡2 ). In this case we have
⎡ ⎤
1
𝑓 ′ (𝑡) = [𝑑𝑓𝑡 ] = ⎣ 2𝑡 ⎦
3𝑡2
The Jacobian here is a single column vector. It has rank 1 provided the vector is nonzero. We
see that 𝑓 ′ (𝑡) ∕= (0, 0, 0) for all 𝑡 ∈ ℝ. This corresponds to the fact that this space curve has a
well-defined tangent line for each point on the path.

Example 3.2.15. Let 𝑓 (⃗𝑥, ⃗𝑦 ) = ⃗𝑥 ⋅ ⃗𝑦 be a mapping from ℝ3 × ℝ3 → ℝ. I’ll denote the coordinates
in the domain by (𝑥1 , 𝑥2 , 𝑥3 , 𝑦1 , 𝑦2 , 𝑦3 ) thus 𝑓 (⃗𝑥, ⃗𝑦 ) = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 . Calculate,
[𝑑𝑓(⃗𝑥,⃗𝑦) ] = ∇𝑓 (⃗𝑥, ⃗𝑦 )𝑇 = [𝑦1 , 𝑦2 , 𝑦3 , 𝑥1 , 𝑥2 , 𝑥3 ]
The Jacobian here is a single row vector. It has rank 6 provided all entries of the input vectors are
nonzero.

Example 3.2.16. Let 𝑓 (⃗𝑥, ⃗𝑦 ) = ⃗𝑥 ⋅ ⃗𝑦 be a mapping from∑ℝ𝑛 × ℝ𝑛 → ℝ. I’ll denote the coordinates
in the domain by (𝑥1 , . . . , 𝑥𝑛 , 𝑦1 , . . . , 𝑦𝑛 ) thus 𝑓 (⃗𝑥, ⃗𝑦 ) = 𝑛𝑖=1 𝑥𝑖 𝑦𝑖 . Calculate,
[∑ 𝑛 ] ∑ 𝑛 𝑛

∂ ∂𝑥𝑖
𝑥𝑗 𝑥 𝑦
𝑖 𝑖 = 𝑥𝑗 𝑖𝑦 = 𝛿𝑖𝑗 𝑦𝑖 = 𝑦𝑗
𝑖=1 𝑖=1 𝑖=1

Likewise,
𝑛
[∑ ] 𝑛
∑ 𝑛


𝑦𝑗 𝑥 𝑖 𝑦𝑖 = 𝑥𝑖 ∂𝑦
𝑦𝑗
𝑖
= 𝑥𝑖 𝛿𝑖𝑗 = 𝑥𝑗
𝑖=1 𝑖=1 𝑖=1
Therefore, noting that ∇𝑓 = (∂𝑥1 𝑓, . . . , ∂𝑥𝑛 𝑓, ∂𝑦1 𝑓, . . . , ∂𝑦𝑛 𝑓 ),
[𝑑𝑓(⃗𝑥,⃗𝑦) ]𝑇 = (∇𝑓 )(⃗𝑥, ⃗𝑦 ) = ⃗𝑦 × ⃗𝑥 = (𝑦1 , . . . , 𝑦𝑛 , 𝑥1 , . . . , 𝑥𝑛 )
The Jacobian here is a single row vector. It has rank 2n provided all entries of the input vectors
are nonzero.
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 81

Example 3.2.17. Suppose 𝐹 (𝑥, 𝑦, 𝑧) = (𝑥𝑦𝑧, 𝑦, 𝑧) we calculate,


∂𝐹 ∂𝐹 ∂𝐹
∂𝑥 = (𝑦𝑧, 0, 0) ∂𝑦 = (𝑥𝑧, 1, 0) ∂𝑧 = (𝑥𝑦, 0, 1)

Remember these are actually column vectors in my sneaky notation; (𝑣1 , . . . , 𝑣𝑛 ) = [𝑣1 , . . . , 𝑣𝑛 ]𝑇 .
This means the derivative or Jacobian matrix of 𝐹 at (𝑥, 𝑦, 𝑧) is
⎡ ⎤
𝑦𝑧 𝑥𝑧 𝑥𝑦
𝐹 ′ (𝑥, 𝑦, 𝑧) = [𝑑𝐹(𝑥,𝑦,𝑧) ] = ⎣ 0 1 0 ⎦
0 0 1

Note, 𝑟𝑎𝑛𝑘(𝐹 ′ (𝑥, 𝑦, 𝑧)) = 3 for all (𝑥, 𝑦, 𝑧) ∈ ℝ3 such that 𝑦, 𝑧 ∕= 0. There are a variety of ways to
see that claim, one way is to observe 𝑑𝑒𝑡[𝐹 ′ (𝑥, 𝑦, 𝑧)] = 𝑦𝑧 and this determinant is nonzero so long
as neither 𝑦 nor 𝑧 is zero. In linear algebra we learn that a square matrix is invertible iff it has
nonzero determinant iff it has linearly indpendent column vectors.

Example 3.2.18. Suppose 𝐹 (𝑥, 𝑦, 𝑧) = (𝑥2 + 𝑧 2 , 𝑦𝑧) we calculate,


∂𝐹 ∂𝐹 ∂𝐹
∂𝑥 = (2𝑥, 0) ∂𝑦 = (0, 𝑧) ∂𝑧 = (2𝑧, 𝑦)

The derivative is a 2 × 3 matrix in this example,


[ ]
′ 2𝑥 0 2𝑧
𝐹 (𝑥, 𝑦, 𝑧) = [𝑑𝐹(𝑥,𝑦,𝑧) ] =
0 𝑧 𝑦

The maximum rank for 𝐹 ′ is 2 at a particular point (𝑥, 𝑦, 𝑧) because there are at most two linearly
independent vectors in ℝ2 . You can consider the three square submatrices to analyze the rank for
a given point. If any one of these is nonzero then the rank (dimension of the column space) is two.
[ ] [ ] [ ]
2𝑥 0 2𝑥 2𝑧 0 2𝑧
𝑀1 = 𝑀2 = 𝑀3 =
0 𝑧 0 𝑦 𝑧 𝑦

We’ll need either 𝑑𝑒𝑡(𝑀1 ) = 2𝑥𝑧 ∕= 0 or 𝑑𝑒𝑡(𝑀2 ) = 2𝑥𝑦 ∕= 0 or 𝑑𝑒𝑡(𝑀3 ) = −2𝑧 2 ∕= 0. I believe
the only point where all three of these fail to be true simulataneously is when 𝑥 = 𝑦 = 𝑧 = 0. This
mapping has maximal rank at all points except the origin.

Example 3.2.19. Suppose 𝐹 (𝑥, 𝑦) = (𝑥2 + 𝑦 2 , 𝑥𝑦, 𝑥 + 𝑦) we calculate,


∂𝐹 ∂𝐹
∂𝑥 = (2𝑥, 𝑦, 1) ∂𝑦 = (2𝑦, 𝑥, 1)

The derivative is a 3 × 2 matrix in this example,


⎡ ⎤
2𝑥 2𝑦
𝐹 ′ (𝑥, 𝑦) = [𝑑𝐹(𝑥,𝑦) ] = ⎣ 𝑦 𝑥 ⎦
1 1
82 CHAPTER 3. DIFFERENTIATION

The maximum rank is again 2, this time because we only have two columns. The rank will be two
if the columns are not linearly dependent. We can analyze the question of rank a number of ways
but I find determinants of submatrices a comforting tool in these sort of questions. If the columns
are linearly dependent then all three sub-square-matrices of 𝐹 ′ will be zero. Conversely, if even one
of them is nonvanishing then it follows the columns must be linearly independent. The submatrices
for this problem are:
[ ] [ ] [ ]
2𝑥 2𝑦 2𝑥 2𝑦 𝑦 𝑥
𝑀1 = 𝑀2 = 𝑀3 =
𝑦 𝑥 1 1 1 1

You can see 𝑑𝑒𝑡(𝑀1 ) = 2(𝑥2 − 𝑦 2 ), 𝑑𝑒𝑡(𝑀2 ) = 2(𝑥 − 𝑦) and 𝑑𝑒𝑡(𝑀3 ) = 𝑦 − 𝑥. Apparently we have
𝑟𝑎𝑛𝑘(𝐹 ′ (𝑥, 𝑦, 𝑧)) = 2 for all (𝑥, 𝑦) ∈ ℝ2 with 𝑦 ∕= 𝑥. In retrospect this is not surprising.

Example 3.2.20. Suppose 𝑃 (𝑥, 𝑣, 𝑚) = (𝑃𝑜 , 𝑃1 ) = ( 12 𝑚𝑣 2 + 12 𝑘𝑥2 , 𝑚𝑣) for some constant 𝑘. Let’s
calculate the derivative via gradients this time,

∇𝑃𝑜 = (∂𝑃𝑜 /∂𝑥, ∂𝑃𝑜 /∂𝑣, ∂𝑃𝑜 /∂𝑚) = (𝑘𝑥, 𝑚𝑣, 21 𝑣 2 )

∇𝑃1 = (∂𝑃1 /∂𝑥, ∂𝑃1 /∂𝑣, ∂𝑃1 /∂𝑚) = (0, 𝑚, 𝑣)


Therefore,
1 2
[ ]
𝑘𝑥 𝑚𝑣 2𝑣
𝑃 ′ (𝑥, 𝑣, 𝑚) =
0 𝑚 𝑣

Example 3.2.21. Let 𝐹 (𝑟, 𝜃) = (𝑟 cos 𝜃, 𝑟 sin 𝜃). We calculate,

∂𝑟 𝐹 = (cos 𝜃, sin 𝜃) and ∂𝜃 𝐹 = (−𝑟 sin 𝜃, 𝑟 cos 𝜃)

Hence, [ ]
′ cos 𝜃 −𝑟 sin 𝜃
𝐹 (𝑟, 𝜃) =
sin 𝜃 𝑟 cos 𝜃
We calculate 𝑑𝑒𝑡(𝐹 ′ (𝑟, 𝜃)) = 𝑟 thus this mapping has full rank everywhere except the origin.

Example 3.2.22. Let 𝐺(𝑥, 𝑦) = ( 𝑥2 + 𝑦 2 , tan−1 (𝑦/𝑥)). We calculate,

∂𝑥 𝐺 = √ 𝑥
, 2−𝑦 2 ∂𝑦 𝐺 = √ 𝑦
, 2𝑥 2
( ) ( )
and
𝑥2 +𝑦 2 𝑥 +𝑦 𝑥2 +𝑦 2 𝑥 +𝑦

Hence,
[
√ 𝑥 √ 𝑦 ]
𝑦
𝑥
[ ] √
′ 𝑥2 +𝑦 2 𝑥2 +𝑦 2
( )
𝐺 (𝑥, 𝑦) = = 𝑟 𝑟 using 𝑟 = 𝑥2 + 𝑦 2
−𝑦 𝑥 −𝑦 𝑥
𝑥2 +𝑦 2 𝑥2 +𝑦 2 𝑟2 𝑟2

We calculate 𝑑𝑒𝑡(𝐺′ (𝑥, 𝑦)) = 1/𝑟 thus this mapping has full rank everywhere except the origin.
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 83

Example 3.2.23. Let 𝐹 (𝑥, 𝑦) = (𝑥, 𝑦,𝑅2 − 𝑥2 − 𝑦 2 ) for a constant 𝑅. We calculate,
( )
−𝑦

−𝑥
∇ 𝑅 − 𝑥 − 𝑦 = √ 2 2 2, √ 2 2 2
2 2 2
𝑅 −𝑥 −𝑦 𝑅 −𝑥 −𝑦

Also, ∇𝑥 = (1, 0) and ∇𝑦 = (0, 1) thus


⎡ ⎤
1 0
𝐹 ′ (𝑥, 𝑦) = ⎣ 0 1
⎢ ⎥
√ −𝑦

√ −𝑥
𝑅2 −𝑥2 −𝑦 2 𝑅2 −𝑥2 −𝑦 2

2 2 2
√ that we need 𝑅 − 𝑥 − 𝑦 > 0 for the
This matrix clearly has rank 2 where is is well-defined. Note
2 2 2
derivative to exist. Moreover, we could define 𝐺(𝑦, 𝑧) = ( 𝑅 − 𝑦 − 𝑧 , 𝑦, 𝑧) and calculate,
⎡ ⎤
1 0
−𝑦 √ −𝑧
𝐺′ (𝑦, 𝑧) = ⎣ √ ⎦.
⎢ ⎥
𝑅2 −𝑦 2 −𝑧 2 𝑅2 −𝑦 2 −𝑧 2
0 1

Observe that 𝐺′ (𝑦, 𝑧) exists when 𝑅2 − 𝑦 2 − 𝑧 2 > 0. Geometrically, 𝐹 parametrizes the sphere
above the equator at 𝑧 = 0 whereas 𝐺 parametrizes the right-half of the sphere with 𝑥 > 0. These
parametrizations overlap in the first octant where both 𝑥 and 𝑧 are positive. In particular, 𝑑𝑜𝑚(𝐹 ′ )∩
𝑑𝑜𝑚(𝐺′ ) = {(𝑥, 𝑦) ∈ ℝ2 ∣ 𝑥, 𝑦 > 0 and 𝑥2 + 𝑦 2 < 𝑅2 }


Example 3.2.24. Let 𝐹 (𝑥, 𝑦, 𝑧) = (𝑥, 𝑦, 𝑧, 𝑅2 − 𝑥2 − 𝑦 2 − 𝑧 2 ) for a constant 𝑅. We calculate,
( )
−𝑦

−𝑥 −𝑧
∇ 𝑅 −𝑥 −𝑦 −𝑧 = √ 2
2 2 2 2
2 2 2
, √
2 2 2 2
, √
2 2 2 2
𝑅 −𝑥 −𝑦 −𝑧 𝑅 −𝑥 −𝑦 −𝑧 𝑅 −𝑥 −𝑦 −𝑧

Also, ∇𝑥 = (1, 0, 0), ∇𝑦 = (0, 1, 0) and ∇𝑧 = (0, 0, 1) thus


⎡ ⎤
1 0 0
⎢ 0 1 0 ⎥
𝐹 ′ (𝑥, 𝑦, 𝑧) = ⎢
⎢ ⎥
0 0 1 ⎥
−𝑦
⎣ ⎦
√ −𝑥 √ √ −𝑧
𝑅2 −𝑥2 −𝑦 2 −𝑧 2 𝑅2 −𝑥2 −𝑦 2 −𝑧 2 𝑅2 −𝑥2 −𝑦 2 −𝑧 2

This matrix clearly has rank 3 where is is well-defined. Note that we need 𝑅2 −𝑥2 −𝑦 2 −𝑧 2 > 0 for the
derivative to exist. This mapping gives us a parametrization of the 3-sphere 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 = 𝑅2
for 𝑤 > 0. (drawing this is a little trickier)
84 CHAPTER 3. DIFFERENTIATION

Example 3.2.25. Let 𝑓 (𝑥, 𝑦, 𝑧) = (𝑥 + 𝑦, 𝑦 + 𝑧, 𝑥 + 𝑧, 𝑥𝑦𝑧). You can calculate,


⎡ ⎤
1 1 0
⎢ 0 1 1 ⎥
[𝑑𝑓(𝑥,𝑦,𝑧) ] = ⎢
⎣ 1 0

1 ⎦
𝑦𝑧 𝑥𝑧 𝑥𝑦
This matrix clearly has rank 3 and is well-defined for all of ℝ3 .

Example 3.2.26. Let 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧. You can calculate,


[ ]
[𝑑𝑓(𝑥,𝑦,𝑧) ] = 𝑦𝑧 𝑥𝑧 𝑥𝑦
This matrix fails to have rank 3 if 𝑥, 𝑦 or 𝑧 are zero. In other words, 𝑓 ′ (𝑥, 𝑦, 𝑧) has rank 3 in
ℝ3 provided we are at a point which is not on some coordinate plane. (the coordinate planes are
𝑥 = 0, 𝑦 = 0 and 𝑧 = 0 for the 𝑦𝑧, 𝑧𝑥 and 𝑥𝑦 coordinate planes respective)

Example 3.2.27. Let 𝑓 (𝑥, 𝑦, 𝑧) = (𝑥𝑦𝑧, 1 − 𝑥 − 𝑦). You can calculate,


[ ]
𝑦𝑧 𝑥𝑧 𝑥𝑦
[𝑑𝑓(𝑥,𝑦,𝑧) ] =
−1 −1 0
This matrix has rank 3 if either 𝑥𝑦 ∕= 0 or (𝑥 − 𝑦)𝑧 ∕= 0. In contrast to the preceding example, the
derivative does have rank 3 on certain points of the coordinate planes. For example, 𝑓 ′ (1, 1, 0) and
𝑓 ′ (0, 1, 1) both give 𝑟𝑎𝑛𝑘(𝑓 ′ ) = 3.

Example 3.2.28. Let 𝑓 : ℝ3 × ℝ3 be defined by 𝑓 (𝑥) = 𝑥 × 𝑣 for a fixed vector 𝑣 ∕= 0. We denote


𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) and calculate,
∂ ∂ (∑ ) ∑ ∂𝑥𝑖 ∑ ∑
(𝑥 × 𝑣) = 𝜖𝑖𝑗𝑘 𝑥𝑖 𝑣𝑗 𝑒𝑘 = 𝜖𝑖𝑗𝑘 𝑣𝑗 𝑒 𝑘 = 𝜖𝑖𝑗𝑘 𝛿𝑖𝑎 𝑣𝑗 𝑒𝑘 = 𝜖𝑎𝑗𝑘 𝑣𝑗 𝑒𝑘
∂𝑥𝑎 ∂𝑥𝑎 ∂𝑥𝑎
𝑖,𝑗,𝑘 𝑖,𝑗,𝑘 𝑖,𝑗,𝑘 𝑗,𝑘

It follows,
∂ ∑
(𝑥 × 𝑣) = 𝜖1𝑗𝑘 𝑣𝑗 𝑒𝑘 = 𝑣2 𝑒3 − 𝑣3 𝑒2 = (0, −𝑣3 , 𝑣2 )
∂𝑥1
𝑗,𝑘
∂ ∑
(𝑥 × 𝑣) = 𝜖2𝑗𝑘 𝑣𝑗 𝑒𝑘 = 𝑣3 𝑒1 − 𝑣1 𝑒3 = (𝑣3 , 0, −𝑣1 )
∂𝑥2
𝑗,𝑘
∂ ∑
(𝑥 × 𝑣) = 𝜖3𝑗𝑘 𝑣𝑗 𝑒𝑘 = 𝑣1 𝑒2 − 𝑣2 𝑒1 = (−𝑣2 , 𝑣1 , 0)
∂𝑥3
𝑗,𝑘
Thus the Jacobian is simply, ⎡ ⎤
0 𝑣3 −𝑣2
[𝑑𝑓(𝑥,𝑦) ] = ⎣ −𝑣3 0 −𝑣1 ⎦
𝑣2 𝑣1 0
In fact, 𝑑𝑓𝑝 (ℎ) = 𝑓 (ℎ) = ℎ × 𝑣 for each 𝑝 ∈ ℝ3 . The given mapping is linear so the differential of
the mapping is precisely the mapping itself.
3.2. PARTIAL DERIVATIVES AND THE EXISTENCE OF THE DIFFERENTIAL 85

Example 3.2.29. Let 𝑓 (𝑥, 𝑦) = (𝑥, 𝑦, 1 − 𝑥 − 𝑦). You can calculate,


⎡ ⎤
1 0
[𝑑𝑓(𝑥,𝑦,𝑧) ] = ⎣ 0 1 ⎦
−1 −1
Example 3.2.30. Let 𝑋(𝑢, 𝑣) = (𝑥, 𝑦, 𝑧) where 𝑥, 𝑦, 𝑧 denote functions of 𝑢, 𝑣 and I prefer to omit
the explicit depedendence to reduce clutter in the equations to follow.
∂𝑋 ∂𝑋
= 𝑋𝑢 = (𝑥𝑢 , 𝑦𝑢 , 𝑧𝑢 ) and = 𝑋𝑣 = (𝑥𝑣 , 𝑦𝑣 , 𝑧𝑣 )
∂𝑢 ∂𝑣
Then the Jacobian is the 3 × 2 matrix
⎡ ⎤
[ ] 𝑥𝑢 𝑥𝑣
𝑑𝑋(𝑢,𝑣) = ⎣ 𝑦𝑢 𝑦𝑣 ⎦
𝑧𝑢 𝑧𝑣
[ ]
The matrix 𝑑𝑋(𝑢,𝑣) has rank 2 if at least one of the determinants below is nonzero,
[ ] [ ] [ ]
𝑥𝑢 𝑥𝑣 𝑥𝑢 𝑥𝑣 𝑦𝑢 𝑦𝑣
𝑑𝑒𝑡 𝑑𝑒𝑡 𝑑𝑒𝑡
𝑦𝑢 𝑦𝑣 𝑧𝑢 𝑧𝑣 𝑧𝑢 𝑧 𝑣

Example 3.2.31. . .

Example 3.2.32. . .
86 CHAPTER 3. DIFFERENTIATION

3.3 additivity and homogeneity of the derivative


Suppose 𝐹1 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 and 𝐹2 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 . Furthermore, suppose both of these are
differentiable at 𝑎 ∈ ℝ𝑛 . It follows that (𝑑𝐹1 )𝑎 = 𝐿1 and (𝑑𝐹2 )𝑎 = 𝐿2 are linear operators from ℝ𝑛
to ℝ𝑚 which approximate the change in 𝐹1 and 𝐹2 near 𝑎, in particular,

𝐹1 (𝑎 + ℎ) − 𝐹1 (𝑎) − 𝐿1 (ℎ) 𝐹2 (𝑎 + ℎ) − 𝐹2 (𝑎) − 𝐿2 (ℎ)


lim =0 lim =0
ℎ→0 ∣∣ℎ∣∣ ℎ→0 ∣∣ℎ∣∣

To prove that 𝐻 = 𝐹1 + 𝐹2 is differentiable at 𝑎 ∈ ℝ𝑛 we need to find a differential at 𝑎 for 𝐻.


Naturally, we expect 𝑑𝐻𝑎 = 𝑑(𝐹1 + 𝐹2 )𝑎 = (𝑑𝐹1 )𝑎 + (𝑑𝐹2 )𝑎 . Let 𝐿 = (𝑑𝐹1 )𝑎 + (𝑑𝐹2 )𝑎 and consider,
𝐻(𝑎+ℎ)−𝐻(𝑎)−𝐿(ℎ) 𝐹1 (𝑎+ℎ)+𝐹2 (𝑎+ℎ)−𝐹1 (𝑎)−𝐹2 (𝑎)−𝐿1 (ℎ)−𝐿2 (ℎ)
lim ∣∣ℎ∣∣ = lim ∣∣ℎ∣∣
ℎ→0 ℎ→0
𝐹1 (𝑎+ℎ)−𝐹1 (𝑎)−𝐿1 (ℎ) 𝐹2 (𝑎+ℎ)−𝐹2 (𝑎)−𝐿2 (ℎ)
= lim ∣∣ℎ∣∣ + lim ∣∣ℎ∣∣
ℎ→0 ℎ→0
=0+0
=0

Note that breaking up the limit was legal because we knew the subsequent limits existed and
were zero by the assumption of differentiability of 𝐹1 and 𝐹2 at 𝑎. Finally, since 𝐿 = 𝐿1 + 𝐿2 we
know 𝐿 is a linear transformation since the sum of linear transformations is a linear transformation.
Moreover, the matrix of 𝐿 is the sum of the matrices for 𝐿1 and 𝐿2 . Let 𝑐 ∈ ℝ and suppose 𝐺 = 𝑐𝐹1
then we can also show that 𝑑𝐺𝑎 = 𝑑(𝑐𝐹1 )𝑎 = 𝑐(𝑑𝐹1 )𝑎 , the calculation is very similar except we just
pull the constant 𝑐 out of the limit. I’ll let you write it out. Collecting our observations:

Proposition 3.3.1.

Suppose 𝐹1 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 and 𝐹2 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 are differentiable at 𝑎 ∈ 𝑈 then


𝐹1 + 𝐹2 is differentiable at 𝑎 and

𝑑(𝐹1 + 𝐹2 )𝑎 = (𝑑𝐹1 )𝑎 + (𝑑𝐹2 )𝑎 or (𝐹1 + 𝐹2 )′ (𝑎) = 𝐹1′ (𝑎) + 𝐹2′ (𝑎)

Likewise, if 𝑐 ∈ ℝ then

𝑑(𝑐𝐹1 )𝑎 = 𝑐(𝑑𝐹1 )𝑎 or (𝑐𝐹1 )′ (𝑎) = 𝑐(𝐹1′ (𝑎))

These results suggest that the differential of a function is a new object which has a vector space
structure. There is much more to say here later.

3.4 chain rule


The proof in Edwards is on 77-78. I’ll give a heuristic proof here which captures the essence of the
argument. The simplicity of this rule continues to amaze me.
3.4. CHAIN RULE 87

Proposition 3.4.1.

If 𝐹 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑝 is differentiable at 𝑎 and 𝐺 : 𝑉 ⊆ ℝ𝑝 → ℝ𝑚 is differentiable at


𝐹 (𝑎) ∈ 𝑉 then 𝐺 ∘ 𝐹 is differentiable at 𝑎 and

𝑑(𝐺 ∘ 𝐹 )𝑎 = (𝑑𝐺)𝐹 (𝑎) ∘ 𝑑𝐹𝑎 or, in matrix notation, (𝐺 ∘ 𝐹 )′ (𝑎) = 𝐺′ (𝐹 (𝑎))𝐹 ′ (𝑎)

Proof Sketch:

In calculus III you may have learned how to calculate partial derivatives in terms of tree-diagrams
and intermediate variable etc... We now have a way of understanding those rules and all the
other chain rules in terms of one over-arching calculation: matrix multiplication of the constituent
Jacobians in the composite function. Of course once we have this rule for the composite of two
functions we can generalize to 𝑛-functions by a simple induction argument. For example, for three
suitably defined mappings 𝐹, 𝐺, 𝐻,

(𝐹 ∘ 𝐺 ∘ 𝐻)′ (𝑎) = 𝐹 ′ (𝐺(𝐻(𝑎)))𝐺′ (𝐻(𝑎))𝐻 ′ (𝑎)

Example 3.4.2. . .
88 CHAPTER 3. DIFFERENTIATION

Example 3.4.3. . .

Example 3.4.4. . .

Example 3.4.5. . .
3.4. CHAIN RULE 89

Example 3.4.6. . .

Example 3.4.7. . .
90 CHAPTER 3. DIFFERENTIATION

3.5 product rules?


What sort of product can we expect to find among mappings? Remember two mappings have
vector outputs and there is no way to multiply vectors in general. Of course, in the case we have
two mappings that have equal-dimensional outputs we could take their dot-product. There is a
⃗ 𝐵
product rule for that case: if 𝐴, ⃗ : ℝ𝑛 → ℝ𝑚 then

⃗ ⋅ 𝐵)
∂𝑗 (𝐴 ⃗ = (∂𝑗 𝐴)
⃗ ⋅ 𝐵)
⃗ +𝐴
⃗ ⋅ (∂𝑗 𝐵)

Or in the special case of 𝑚 = 3 we could even take their cross-product and there is another product
rule in that case:
⃗ × 𝐵)
∂𝑗 (𝐴 ⃗ ×𝐵
⃗ = (∂𝑗 𝐴) ⃗ +𝐴 ⃗
⃗ × (∂𝑗 𝐵)

What other case can we ”multiply” vectors? One very important case is ℝ2 = ℂ where is is
customary to use the notation (𝑥, 𝑦) = 𝑥 + 𝑖𝑦 and 𝑓 = 𝑢 + 𝑖𝑣. If our range is complex numbers
then we again have a product rule: if 𝑓 : ℝ𝑛 → ℂ and 𝑔 : ℝ𝑛 → ℂ then
∂𝑗 (𝑓 𝑔) = (∂𝑗 𝑓 )𝑔 + 𝑓 (∂𝑗 𝑔)
I have relegated the proof of these product rules to the end of this chapter. One other object worth
differentiating is a matrix-valued function of ℝ𝑛 . If we define the partial derivative of a matrix to
be the matrix of partial derivatives then partial differentiation will respect the sum and product of
matrices (we may return to this in depth if need be later on):
∂𝑗 (𝐴 + 𝐵) = ∂𝑗 𝐵 + ∂𝑗 𝐵 ∂𝑗 (𝐴𝐵) = (∂𝑗 𝐴)𝐵 + 𝐴(∂𝑗 𝐵)
Moral of this story? If you have a pair mappings whose ranges allow some sort of product then it
is entirely likely that there is a corresponding product rule 5 .

3.5.1 scalar-vector product rule


There is one product rule which we can state for arbitrary mappings, note that we can always
sensibly multiply a mapping by a function. Suppose then that 𝐺 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 and 𝑓 : 𝑈 ⊆
ℝ𝑛 → ℝ are differentiable at 𝑎 ∈ 𝑈 . It follows that there exist linear transformations 𝐿𝐺 : ℝ𝑛 → ℝ𝑚
and 𝐿𝑓 : ℝ𝑛 → ℝ where
𝐺(𝑎 + ℎ) − 𝐺(𝑎) − 𝐿𝐺 (ℎ) 𝑓 (𝑎 + ℎ) − 𝑓 (𝑎) − 𝐿𝑓 (ℎ)
lim =0 lim =0
ℎ→0 ∣∣ℎ∣∣ ℎ→0 ℎ
Since 𝐺(𝑎 + ℎ) ≈ 𝐺(𝑎) + 𝐿𝐺 (ℎ) and 𝑓 (𝑎 + ℎ) ≈ 𝑓 (𝑎) + 𝐿𝑓 (ℎ) we expect
𝑓 𝐺(𝑎 + ℎ) ≈ (𝑓 (𝑎) + 𝐿𝑓 (ℎ))(𝐺(𝑎) + 𝐿𝐺 (ℎ))
≈ (𝑓 𝐺)(𝑎) + 𝐺(𝑎)𝐿𝑓 (ℎ) + 𝑓 (𝑎)𝐿𝐺 (ℎ) + 𝐿𝑓 (ℎ)𝐿𝐺 (ℎ)
| {z } | {z }
linear in ℎ 2𝑛𝑑 order in ℎ

5
In my research I consider functions on supernumbers, these also can be multiplied. Naturally there is a product
rule for super functions, the catch is that super numbers 𝑧, 𝑤 do not necessarily commute. However, if they’re
homogeneneous 𝑧𝑤 = (−1)𝜖𝑤 𝜖𝑧 𝑤𝑧. Because of this the super product rule is ∂𝑀 (𝑓 𝑔) = (∂𝑀 𝑓 )𝑔 + (−1)𝜖𝑓 𝜖𝑀 𝑓 (∂𝑀 𝑔)
3.5. PRODUCT RULES? 91

Thus we propose: 𝐿(ℎ) = 𝐺(𝑎)𝐿𝑓 (ℎ) + 𝑓 (𝑎)𝐿𝐺 (ℎ) is the best linear approximation of 𝑓 𝐺.
(𝑓 𝐺)(𝑎 + ℎ) − (𝑓 𝐺)(𝑎) − 𝐿(ℎ)
lim =
ℎ→0 ∣∣ℎ∣∣
𝑓 (𝑎 + ℎ)𝐺(𝑎 + ℎ) − 𝑓 (𝑎)𝐺(𝑎) − 𝐺(𝑎)𝐿𝑓 (ℎ) − 𝑓 (𝑎)𝐿𝐺 (ℎ)
= lim
ℎ→0 ∣∣ℎ∣∣
𝑓 (𝑎 + ℎ)𝐺(𝑎 + ℎ) − 𝑓 (𝑎)𝐺(𝑎) − 𝐺(𝑎)𝐿𝑓 (ℎ) − 𝑓 (𝑎)𝐿𝐺 (ℎ)
= lim +
ℎ→0 ∣∣ℎ∣∣
𝑓 (𝑎)𝐺(𝑎 + ℎ) − 𝐺(𝑎 + ℎ)𝑓 (𝑎)
+ lim
ℎ→0 ∣∣ℎ∣∣
𝑓 (𝑎 + ℎ)𝐺(𝑎) − 𝐺(𝑎)𝑓 (𝑎 + ℎ)
+ lim
ℎ→0 ∣∣ℎ∣∣
𝑓 (𝑎)𝐺(𝑎) − 𝐺(𝑎)𝑓 (𝑎)
+ lim
ℎ→0 ∣∣ℎ∣∣
[
𝐺(𝑎 + ℎ) − 𝐺(𝑎) − 𝐿𝐺 (ℎ) 𝑓 (𝑎 + ℎ) − 𝑓 (𝑎) − 𝐿𝑓 (ℎ)
= lim 𝑓 (𝑎) + 𝐺(𝑎)+
ℎ→0 ∣∣ℎ∣∣ ∣∣ℎ∣∣
( ) ]
𝐺(𝑎 + ℎ) − 𝐺(𝑎)
+ 𝑓 (𝑎 + ℎ) − 𝑓 (𝑎)
∣∣ℎ∣∣
[ ] [ ]
𝐺(𝑎 + ℎ) − 𝐺(𝑎) − 𝐿𝐺 (ℎ) 𝑓 (𝑎 + ℎ) − 𝑓 (𝑎) − 𝐿𝑓 (ℎ)
= 𝑓 (𝑎) lim + lim 𝐺(𝑎)
ℎ→0 ∣∣ℎ∣∣ ℎ→0 ∣∣ℎ∣∣
=0

Where we have made use of the differentiability and the consequent continuity of both 𝑓 and 𝐺 at
𝑎. Furthermore, note

𝐿(ℎ + 𝑐𝑘) = 𝐺(𝑎)𝐿𝑓 (ℎ + 𝑐𝑘) + 𝑓 (𝑎)𝐿𝐺 (ℎ + 𝑐𝑘)


= 𝐺(𝑎)(𝐿𝑓 (ℎ) + 𝑐𝐿𝑓 (𝑘)) + 𝑓 (𝑎)(𝐿𝐺 (ℎ) + 𝑐𝐿𝐺 (𝑘))
= 𝐺(𝑎)𝐿𝑓 (ℎ) + 𝑓 (𝑎)(𝐿𝐺 (ℎ) + 𝑐(𝐺(𝑎)𝐿𝑓 (𝑘) + 𝑓 (𝑎)𝐿𝐺 (𝑘))
= 𝐿(ℎ) + 𝑐𝐿(𝑘)

for all ℎ, 𝑘 ∈ ℝ𝑛 and 𝑐 ∈ ℝ hence 𝐿 = 𝐺(𝑎)𝐿𝑓 + 𝑓 (𝑎)𝐿𝐺 is a linear transformation. We have proved
(most of) the following proposition:
Proposition 3.5.1.
If 𝐺 : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 and 𝑓 : 𝑈 ⊆ ℝ𝑛 → ℝ are differentiable at 𝑎 ∈ 𝑈 then 𝑓 𝐺 is
differentiable at 𝑎 and

𝑑(𝑓 𝐺)𝑎 = (𝑑𝑓 )𝑎 𝐺(𝑎) + 𝑓 (𝑎)𝑑𝐺𝑎 (𝑓 𝐺)′ (𝑎) = 𝑓 ′ (𝑎)𝐺(𝑎) + 𝑓 (𝑎)𝐺′ (𝑎)

The argument above covers the ordinary product rule and a host of other less common rules. Note
again that 𝐺(𝑎) and 𝐺′ (𝑎) are vectors.
92 CHAPTER 3. DIFFERENTIATION

3.5.2 calculus of paths in ℝ3


A path is a mapping from ℝ to ℝ𝑚 . We use such mappings to model position, velocity and
acceleration of particles in the case 𝑚 = 3. Some of these things were proved in previous sections
of this chapter but I intend for this section to be self-contained so that you can read it without
digging through the rest of this chapter.
Proposition 3.5.2.
If 𝐹, 𝐺 : 𝑈 ⊆ ℝ → ℝ𝑚 are differentiable vector-valued functions and 𝜙 : 𝑈 ⊆ ℝ → ℝ is a
differentiable real-valued function then for each 𝑡 ∈ 𝑈 ,

1. (𝐹 + 𝐺)′ (𝑡) = 𝐹 ′ (𝑡) + 𝐺′ (𝑡).

2. (𝑐𝐹 )′ (𝑡) = 𝑐𝐹 ′ (𝑡).

3. (𝜙𝐹 )′ (𝑡) = 𝜙′ (𝑡)𝐹 (𝑡) + 𝜙(𝑡)𝐹 ′ (𝑡).

4. (𝐹 ⋅ 𝐺)′ (𝑡) = 𝐹 ′ (𝑡) ⋅ 𝐺(𝑡) + 𝐹 (𝑡) ⋅ 𝐺′ (𝑡).

5. provided 𝑚 = 3, (𝐹 × 𝐺)′ (𝑡) = 𝐹 ′ (𝑡) × 𝐺(𝑡) + 𝐹 (𝑡) × 𝐺′ (𝑡).

6. provided 𝜙(𝑈 ) ⊂ 𝑑𝑜𝑚(𝐹 ′ ), (𝐹 ∘ 𝜙)′ (𝑡) = 𝜙′ (𝑡)𝐹 (𝜙(𝑡)).

We have to insist that 𝑚 = 3 for the statement with cross-products since we only have a standard
cross-product in ℝ3 . We prepare for the proof of the proposition with a useful lemma. Notice this
lemma tells us how to actually calculate the derivative of paths in examples. The derivative of
component functions is nothing more than calculus I and one of our goals is to reduce things to
those sort of calculations whenever possible.
Lemma 3.5.3.
If 𝐹 : 𝑈 ⊆ ℝ → ℝ𝑚 is differentiable vector-valued function then for all 𝑡 ∈ 𝑈 ,

𝐹 ′ (𝑡) = (𝐹1′ (𝑡), 𝐹2′ (𝑡), . . . , 𝐹𝑚



(𝑡))

We are given that the following vector limit exists and is equal to 𝐹 ′ (𝑡),
𝐹 (𝑡 + ℎ) − 𝐹 (𝑡)
𝐹 ′ (𝑡) = lim
ℎ→0 ℎ
then by Proposition 1.4.10 the limit of a vector is related to the limits of its components as follows:
𝐹𝑗 (𝑡 + ℎ) − 𝐹𝑗 (𝑡)
𝐹 ′ (𝑡) ⋅ 𝑒𝑗 = lim .
ℎ→0 ℎ
Thus (𝐹 ′ (𝑡))𝑗 = 𝐹𝑗′ (𝑡) and the lemma follows6 . ▽

6
this notation I first saw in a text by Marsden, it means the proof is partially completed but you should read on
to finish the proof
3.5. PRODUCT RULES? 93

∑ ∑
Proof of proposition: We use the notation 𝐹 ∑ = 𝐹𝑗 𝑒𝑗 = (𝐹1 , . . . , 𝐹𝑚 ) and 𝐺 = 𝑖 𝐺𝑖 𝑒𝑖 =
(𝐺1 , . . . , 𝐺𝑚 ) throughout the proofs below. The is understood to range over 1, 2, . . . 𝑚. Begin
with (1.),

[(𝐹 + 𝐺)′ ]𝑗 = 𝑑
𝑑𝑡 [(𝐹 + 𝐺)𝑗 ] using the lemma
𝑑
= 𝑑𝑡 [𝐹𝑗 + 𝐺𝑗 ] using def. (𝐹 + 𝐺)𝑗 = 𝐹𝑗 + 𝐺𝑗
= 𝑑 𝑑
𝑑𝑡 [𝐹𝑗 ] + 𝑑𝑡 [𝐺𝑗 ] by calculus I, (𝑓 + 𝑔)′ = 𝑓 ′ + 𝑔 ′ .
= [𝐹 ′ + 𝐺′ ]𝑗 def. of vector addition for 𝐹 ′ and 𝐺′

Hence (𝐹 × 𝐺)′ = 𝐹 ′ × 𝐺 + 𝐹 × 𝐺′ .The proofs of 2,3,5 and 6 are similar. I’ll prove (5.),

[(𝐹 × 𝐺)′ ]𝑘 = 𝑑
𝑑𝑡 [(𝐹 × 𝐺)𝑘 ] using the lemma

𝑑
= 𝑑𝑡 [ 𝜖𝑖𝑗𝑘 𝐹𝑖 𝐺𝑗 ] using def. 𝐹 × 𝐺

= 𝑑
𝜖𝑖𝑗𝑘 𝑑𝑡 [𝐹𝑖 𝐺𝑗 ] repeatedly using, (𝑓 + 𝑔)′ = 𝑓 ′ + 𝑔 ′
𝑑𝐺

= 𝜖𝑖𝑗𝑘 [ 𝑑𝐹 𝑗
𝑑𝑡 𝐺𝑗 + 𝐹𝑖 𝑑𝑡 ]
𝑖
repeatedly using, (𝑓 𝑔)′ = 𝑓 ′ 𝑔 + 𝑓 𝑔 ′
𝑑𝐺
∑ ∑ ∑
= 𝜖𝑖𝑗𝑘 𝑑𝐹𝑑𝑡 𝐺𝑗
𝑖
𝜖𝑖𝑗𝑘 𝐹𝑖 𝑑𝑡𝑗 ] property of finite sum
= ( 𝑑𝐹 𝑑𝐺
𝑑𝑡 × 𝐺)𝑘 + (𝐹 × 𝑑𝑡 )𝑘 ) def. of cross product
( 𝑑𝐹 𝑑𝐺
)
= 𝑑𝑡 × 𝐺 + 𝐹 × 𝑑𝑡 𝑘 def. of vector addition

Notice that the calculus step really just involves calculus I applied to the components. The ordinary
product rule was the crucial factor to prove the product rule for cross-products. We’ll see the same
for the dot product of mappings. Prove (4.)

(𝐹 ⋅ 𝐺)′ (𝑡) = 𝑑
𝑑𝑡 [ 𝐹𝑘 𝐺 𝑘 ] using def. 𝐹 ⋅ 𝐺

= 𝑑
𝑑𝑡 [𝐹𝑘 𝐺𝑘 ] repeatedly using, (𝑓 + 𝑔)′ = 𝑓 ′ + 𝑔 ′

= [ 𝑑𝐹 𝑑𝐺𝑘
𝑑𝑡 𝐺𝑘 + 𝐹𝑘 𝑑𝑡 ]
𝑘
repeatedly using, (𝑓 𝑔)′ = 𝑓 ′ 𝑔 + 𝑓 𝑔 ′
𝑑𝐹 𝑑𝐺
= 𝑑𝑡 ⋅𝐺+𝐹 ⋅ 𝑑𝑡 . def. of dot product

The proof of (3.) follows from applying the product rule to each component of 𝜙(𝑡)𝐹 (𝑡). The proof
of (2.) follow from (3.) in the case that 𝑝ℎ𝑖(𝑡) = 𝑐 so 𝜙′ (𝑡) = 0. Finally the proof of (6.) follows
from applying the chain-rule to each component. □

3.5.3 calculus of matrix-valued functions of a real variable


Definition 3.5.4.
94 CHAPTER 3. DIFFERENTIATION

A matrix-valued function of a real variable is a function from 𝐼 ⊆ ℝ to ℝ 𝑚×𝑛 . Suppose


𝐴 : 𝐼 ⊆ ℝ → ℝ 𝑚×𝑛 is such that 𝐴𝑖𝑗 : 𝐼 ⊆ ℝ → ℝ is differentiable for each 𝑖, 𝑗 then we
define [ 𝑑𝐴𝑖𝑗 ]
𝑑𝐴
𝑑𝑡 = 𝑑𝑡
′ ′
∫ ∫
which can also be denoted (𝐴 )𝑖𝑗 = 𝐴𝑖𝑗 . We likewise define 𝐴𝑑𝑡 = [ 𝐴𝑖𝑗 𝑑𝑡] for 𝐴 with
integrable components. Definite integrals and higher derivatives are also defined component-
wise.

2𝑡 3𝑡2
[ ]
Example 3.5.5. Suppose 𝐴(𝑡) = . I’ll calculate a few items just to illustrate the
4𝑡3 5𝑡4
definition above. calculate; to differentiate a matrix we differentiate each component one at a time:
[ ] [ ] [ ]
′ 2 6𝑡 ′′ 0 6 ′ 2 0
𝐴 (𝑡) = 𝐴 (𝑡) = 𝐴 (0) =
12𝑡 20𝑡3
2 24𝑡 60𝑡2 0 0

Integrate by integrating each component:


⎡ ⎤
2 2 𝑡3 2

2 𝑡
𝑡2 + 𝑐1 𝑡3 + 𝑐2
∫ [ ] ∫ [ ]
0 0 ⎥ 4 8
𝐴(𝑡)𝑑𝑡 = 𝐴(𝑡)𝑑𝑡 = ⎣ =

𝑡4 + 𝑐3 𝑡5 + 𝑐4 0 4
2 5 2 ⎦ 16 32
𝑡 0 𝑡 0

Proposition 3.5.6.

Suppose 𝐴, 𝐵 are matrix-valued functions of a real variable, 𝑓 is a function of a real variable,


𝑐 is a constant, and 𝐶 is a constant matrix then

1. (𝐴𝐵)′ = 𝐴′ 𝐵 + 𝐴𝐵 ′ (product rule for matrices)

2. (𝐴𝐶)′ = 𝐴′ 𝐶

3. (𝐶𝐴)′ = 𝐶𝐴′

4. (𝑓 𝐴)′ = 𝑓 ′ 𝐴 + 𝑓 𝐴′

5. (𝑐𝐴)′ = 𝑐𝐴′

6. (𝐴 + 𝐵)′ = 𝐴′ + 𝐵 ′

where each of the functions is evaluated at the same time 𝑡 and I assume that the functions
and matrices are differentiable at that value of 𝑡 and of course the matrices 𝐴, 𝐵, 𝐶 are such
that the multiplications are well-defined.
3.5. PRODUCT RULES? 95

Proof: Suppose 𝐴(𝑡) ∈ ℝ 𝑚×𝑛 and 𝐵(𝑡) ∈ ℝ 𝑛×𝑝 consider,

(𝐴𝐵)′ 𝑖𝑗 𝑑
= 𝑑𝑡 ((𝐴𝐵)𝑖𝑗 ) defn. derivative of matrix
𝑑 ∑
= 𝑑𝑡 ( 𝑘 𝐴𝑖𝑘 𝐵𝑘𝑗 ) defn. of matrix multiplication
∑ 𝑑
= 𝑘 𝑑𝑡 (𝐴𝑖𝑘 𝐵𝑘𝑗 ) linearity of derivative
𝑑𝐵 ]
= 𝑘 𝑑𝐴𝑑𝑡𝑖𝑘 𝐵𝑘𝑗 + 𝐴𝑖𝑘 𝑑𝑡𝑘𝑗
∑ [
ordinary product rules
𝑑𝐵
= 𝑘 𝑑𝐴𝑑𝑡𝑖𝑘 𝐵𝑘𝑗 + 𝑘 𝐴𝑖𝑘 𝑑𝑡𝑘𝑗
∑ ∑
algebra
= (𝐴′ 𝐵)𝑖𝑗 + (𝐴𝐵 ′ )𝑖𝑗 defn. of matrix multiplication
= (𝐴′ 𝐵 + 𝐴𝐵 ′ )𝑖𝑗 defn. matrix addition

this proves (1.) as 𝑖, 𝑗 were arbitrary in the calculation above. The proof of (2.) and (3.) follow
quickly from (1.) since 𝐶 constant means 𝐶 ′ = 0. Proof of (4.) is similar to (1.):

(𝑓 𝐴)′ 𝑖𝑗 𝑑
= 𝑑𝑡 ((𝑓 𝐴)𝑖𝑗 ) defn. derivative of matrix
𝑑
= 𝑑𝑡 (𝑓 𝐴𝑖𝑗 ) defn. of scalar multiplication
𝑑𝐴𝑖𝑗
= 𝑑𝑓
𝑑𝑡 𝐴𝑖𝑗 + 𝑓 𝑑𝑡 ordinary product rule
= ( 𝑑𝑓 𝑑𝐴
𝑑𝑡 𝐴 + 𝑓 𝑑𝑡 )𝑖𝑗 defn. matrix addition
= ( 𝑑𝑓 𝑑𝐴
𝑑𝑡 𝐴 + 𝑓 𝑑𝑡 )𝑖𝑗 defn. scalar multiplication.

The proof of (5.) follows from taking 𝑓 (𝑡) = 𝑐 which has 𝑓 ′ = 0. I leave the proof of (6.) as an
exercise for the reader. □.

To summarize: the calculus of matrices is the same as the calculus of functions with the small
qualifier that we must respect the rules of matrix algebra. The noncommutativity of matrix mul-
tiplication is the main distinguishing feature.

3.5.4 calculus of complex-valued functions of a real variable


Differentiation of functions from ℝ to ℂ is defined by splitting a given function into its real and
imaginary parts then we just differentiate with respect to the real variable one component at a
time. For example:

𝑑 2𝑡 𝑑 𝑑
(𝑒 cos(𝑡) + 𝑖𝑒2𝑡 sin(𝑡)) = (𝑒2𝑡 cos(𝑡)) + 𝑖 (𝑒2𝑡 sin(𝑡))
𝑑𝑡 𝑑𝑡 𝑑𝑡
= (2𝑒2𝑡 cos(𝑡) − 𝑒2𝑡 sin(𝑡)) + 𝑖(2𝑒2𝑡 sin(𝑡) + 𝑒2𝑡 cos(𝑡)) (3.1)
= 𝑒2𝑡 (2 + 𝑖)(cos(𝑡) + 𝑖 sin(𝑡))
= (2 + 𝑖)𝑒(2+𝑖)𝑡

𝑑 𝜆𝑡
where I have made use of the identity7 𝑒𝑥+𝑖𝑦 = 𝑒𝑥 (cos(𝑦) + 𝑖 sin(𝑦)). We just saw that 𝑑𝑡 𝑒 = 𝜆𝑒𝜆𝑡
which seems obvious enough until you appreciate that we just proved it for 𝜆 = 2 + 𝑖.
7
or definition, depending on how you choose to set-up the complex exponential, I take this as the definition in
calculus II
96 CHAPTER 3. DIFFERENTIATION

3.6 complex analysis in a nutshell


Differentiation with respect to a real variable can be reduced to the slogan that we differentiate
componentwise. Differentiation with respect to a complex variable requires additional structure.
They key distinguishing ingredient is complex linearity:
Definition 3.6.1.
If we have some function 𝑇 : ℂ → ℂ such that

(1.) 𝑇 (𝑣 + 𝑤) = 𝑇 (𝑣) + 𝑇 (𝑤) for all 𝑣, 𝑤 ∈ ℂ (2.) 𝑇 (𝑐𝑣) = 𝑐𝑇 (𝑣) for all 𝑐, 𝑣 ∈ ℂ

then we would say that 𝑇 is complex-linear.


Condition (1.) is additivity whereas condition (2.) is homogeneity. Note that complex linearity
implies real linearity however the converse is not true.
Example 3.6.2. Suppose 𝑇 (𝑧) = 𝑧¯ where if 𝑧 = 𝑥 + 𝑖𝑦 for 𝑥, 𝑦 ∈ ℝ then 𝑧¯ = 𝑥 − 𝑖𝑦 is the complex
conjugate of 𝑧. Consider for 𝑐 = 𝑎 + 𝑖𝑏 where 𝑎, 𝑏 ∈ ℝ,
𝑇 (𝑐𝑧) = 𝑇 ((𝑎 + 𝑖𝑏)(𝑥 + 𝑖𝑦))
= 𝑇 (𝑎𝑥 − 𝑏𝑦 + 𝑖(𝑎𝑦 + 𝑏𝑥))
= 𝑎𝑥 − 𝑏𝑦 − 𝑖(𝑎𝑦 + 𝑏𝑥)
= (𝑎 − 𝑖𝑏)(𝑥 − 𝑖𝑦)
= 𝑐¯ 𝑇 (𝑧)
hence this map is not complex linear. On the other hand, if we study mutiplication by just 𝑎 ∈ ℝ,
𝑇 (𝑎𝑧) = 𝑇 (𝑎(𝑥 + 𝑖𝑦)) = 𝑇 (𝑎𝑥 + 𝑖𝑎𝑦) = 𝑎𝑥 − 𝑖𝑎𝑦 = 𝑎(𝑥 − 𝑖𝑦) = 𝑎𝑇 (𝑥 + 𝑖𝑦) = 𝑎𝑇 (𝑧)
thus 𝑇 is homogeneous with respect to real-number multiplication and it is also additive hence 𝑇 is
real linear.
Suppose that 𝐿(is a linear 2 2
) mapping from ℝ to ℝ . It is known from linear algebra that there exists
𝑎 𝑏
a matrix 𝐴 = such that 𝐿(𝑣) = 𝐴𝑣 for all 𝑣 ∈ ℝ2 . In this section we use the notation
𝑐 𝑑
𝑎 + 𝑖𝑏 = (𝑎, 𝑏) and

(𝑎, 𝑏) ∗ (𝑐, 𝑑) = (𝑎 + 𝑖𝑏)(𝑐 + 𝑖𝑑) = 𝑎𝑐 − 𝑏𝑑 + 𝑖(𝑎𝑑 + 𝑏𝑐) = (𝑎𝑐 − 𝑏𝑑, 𝑎𝑑 + 𝑏𝑐).

This construction is due to Gauss in the early nineteenth century, the idea is to use two component
vectors to construct complex
( numbers.
)( ) There are other ways to construct complex numbers8 .
𝑎 𝑏 𝑥
Notice that 𝐿(𝑥 + 𝑖𝑦) = = (𝑎𝑥 + 𝑏𝑦, 𝑐𝑥 + 𝑑𝑦) = 𝑎𝑥 + 𝑏𝑦 + 𝑖(𝑐𝑥 + 𝑑𝑦) defines a real
𝑐 𝑑 𝑦
linear mapping on ℂ for any choice of the real constants 𝑎, 𝑏, 𝑐, 𝑑. In contrast, complex linearity
puts strict conditions on these constants:
8
the same is true for real numbers, you can construct them in more than one way, however all constructions agree
on the basic properties and as such it is the properties of real or complex numbers which truly defined them. That
said, we choose Gauss’ representation for convenience.
3.6. COMPLEX ANALYSIS IN A NUTSHELL 97

Theorem 3.6.3.
The linear mapping 𝐿(𝑣) = 𝐴𝑣 is complex linear iff the matrix 𝐴 will have the special form
below: ( )
𝑎 𝑏
−𝑏 𝑎
To be clear, we mean to identify ℝ2 with ℂ as before. Thus the condition of complex
homogeneity reads 𝐿((𝑎, 𝑏) ∗ (𝑥, 𝑦)) = (𝑎, 𝑏) ∗ 𝐿(𝑥, 𝑦)
Proof: assume 𝐿 is complex linear. Define the matrix of 𝐿 as before:
( )( )
𝑎 𝑏 𝑥
𝐿(𝑥, 𝑦) =
𝑐 𝑑 𝑦
This yields,
𝐿(𝑥 + 𝑖𝑦) = 𝑎𝑥 + 𝑏𝑦 + 𝑖(𝑐𝑥 + 𝑑𝑦)
We can gain conditions on the matrix by examining the special points 1 = (1, 0) and 𝑖 = (0, 1)
𝐿(1, 0) = (𝑎, 𝑐) 𝐿(0, 1) = (𝑏, 𝑑)
Note that (𝑐1 , 𝑐2 ) ∗ (1, 0) = (𝑐1 , 𝑐2 ) hence 𝐿((𝑐1 + 𝑖𝑐2 )1) = (𝑐1 + 𝑖𝑐2 )𝐿(1) yields
(𝑎𝑐1 + 𝑏𝑐2 ) + 𝑖(𝑐𝑐1 + 𝑑𝑐2 ) = (𝑐1 + 𝑖𝑐2 )(𝑎 + 𝑖𝑐) = 𝑐1 𝑎 − 𝑐2 𝑐 + 𝑖(𝑐1 𝑐 + 𝑐2 𝑎)
We find two equations by equating the real and imaginary parts:
𝑎𝑐1 + 𝑏𝑐2 = 𝑐1 𝑎 − 𝑐2 𝑐 𝑐𝑐1 + 𝑑𝑐2 = 𝑐1 𝑐 + 𝑐2 𝑎
Therefore, 𝑏𝑐2 = −𝑐2 𝑐 and 𝑑𝑐2 = 𝑐2 𝑎 for all (𝑐1 , 𝑐2 ) ∈ ℂ. Suppose 𝑐1 = 0 and 𝑐2 = 1. We find
𝑏 = −𝑐 and 𝑑 = 𝑎. We leave the converse proof to the reader. The proposition follows. □

In analogy with the real case we define 𝑓 ′ (𝑧) as follows:


Definition 3.6.4.
Suppose 𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ ℂ → ℂ and 𝑧 ∈ 𝑑𝑜𝑚(𝑓 ) then we define 𝑓 ′ (𝑧) by the limit below:

𝑓 (𝑧 + ℎ) − 𝑓 (𝑧)
𝑓 ′ (𝑧) = lim .
ℎ→0 ℎ

The derivative function 𝑓 ′ is defined pointwise for all such 𝑧 ∈ 𝑑𝑜𝑚(𝑓 ) that the limit above
exists.
𝑓 ′ (𝑧)ℎ
Note that 𝑓 ′ (𝑧) = limℎ→0 ℎ hence
𝑓 ′ (𝑧)ℎ 𝑓 (𝑧 + ℎ) − 𝑓 (𝑧) 𝑓 (𝑧 + ℎ) − 𝑓 (𝑧) − 𝑓 ′ (𝑧)ℎ
lim = lim ⇒ lim =0
ℎ→0 ℎ ℎ→0 ℎ ℎ→0 ℎ
Note that the limit above simply says that 𝐿(𝑣) = 𝑓 ′ (𝑧)𝑣 gives the is the best complex-linear
approximation of Δ𝑓 = 𝑓 (𝑧 + ℎ) − 𝑓 (𝑧).
98 CHAPTER 3. DIFFERENTIATION

Proposition 3.6.5.

If 𝑓 is a complex differentiable at 𝑧𝑜 then linearization 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )ℎ is a complex linear


mapping.
Proof: let 𝑐, ℎ ∈ ℂ and note 𝐿(𝑐ℎ) = 𝑓 ′ (𝑧𝑜 )(𝑐ℎ) = 𝑐𝑓 ′ (𝑧𝑜 )ℎ = 𝑐𝐿(ℎ). □

It turns out that complex differentiability automatically induces real differentiability:


Proposition 3.6.6.

If 𝑓 is a complex differentiable at 𝑧𝑜 then 𝑓 is (real) differentiable at 𝑧𝑜 with 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )ℎ.


𝑓 (𝑧+ℎ)−𝑓 (𝑧)−𝑓 ′ (𝑧)ℎ
Proof: note that limℎ→0 ℎ = 0 implies

𝑓 (𝑧 + ℎ) − 𝑓 (𝑧) − 𝑓 ′ (𝑧)ℎ
lim =0
ℎ→0 ∣ℎ∣
but then ∣ℎ∣ = ∣∣ℎ∣∣ and we know 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )ℎ is real-linear hence 𝐿 is the best linear approxi-
mation to Δ𝑓 at 𝑧𝑜 and the proposition follows. □

Let’s summarize what we’ve learned: if 𝑓 : 𝑑𝑜𝑚(𝑓 ) → ℂ is complex differentiable at 𝑧𝑜 and


𝑓 = 𝑢 + 𝑖𝑣 then,
1. 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )ℎ is complex linear.

2. 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )ℎ is the best real linear approximation to 𝑓 viewed as a mapping on ℝ2 .


The Jacobian matrix for 𝑓 = (𝑢, 𝑣) has the form
[ ]
𝑢𝑥 (𝑝𝑜 ) 𝑢𝑦 (𝑝𝑜 )
𝐽𝑓 (𝑝𝑜 ) =
𝑣𝑥 (𝑝𝑜 ) 𝑣𝑦 (𝑝𝑜 )

Theorem 3.6.3 applies to 𝐽𝑓 (𝑝𝑜 ) since 𝐿 is a complex linear mapping. Therefore we find the Cauchy
Riemann equations: 𝑢𝑥 = 𝑣𝑦 and 𝑢𝑦 = −𝑣𝑥 . We have proved the following theorem:
Theorem 3.6.7.
If 𝑓 = 𝑢 + 𝑖𝑣 is a complex function which is complex-differentiable at 𝑧𝑜 then the partial
derivatives of 𝑢 and 𝑣 exist at 𝑧𝑜 and satisfy the Cauchy-Riemann equations at 𝑧𝑜
∂𝑢 ∂𝑣 ∂𝑢 ∂𝑣
= =− .
∂𝑥 ∂𝑦 ∂𝑦 ∂𝑥

Example 3.6.8. Let 𝑓 (𝑧) = 𝑒𝑧 where the definition of the complex exponential function is given
by the following, for each 𝑥, 𝑦 ∈ ℝ and 𝑧 = 𝑥 + 𝑖𝑦

𝑓 (𝑥 + 𝑖𝑦) = 𝑒𝑥+𝑖𝑦 = 𝑒𝑥 (cos(𝑦) + 𝑖 sin(𝑦)) = 𝑒𝑥 cos(𝑦) + 𝑖𝑒𝑥 sin(𝑦)


3.6. COMPLEX ANALYSIS IN A NUTSHELL 99

Identify for 𝑓 = 𝑢 + 𝑖𝑣 we have 𝑢(𝑥, 𝑦) = 𝑒𝑥 cos(𝑦) and 𝑣(𝑥, 𝑦) = 𝑒𝑥 sin(𝑦). Calculate:

∂𝑢 ∂ [ 𝑥 ∂𝑢 ∂ [ 𝑥
𝑒 cos(𝑦) = 𝑒𝑥 cos(𝑦) 𝑒 cos(𝑦) = −𝑒𝑥 sin(𝑦),
] ]
= & =
∂𝑥 ∂𝑥 ∂𝑦 ∂𝑦

∂𝑣 ∂ [ 𝑥 ∂𝑣 ∂ [ 𝑥
𝑒 sin(𝑦) = 𝑒𝑥 sin(𝑦) 𝑒 sin(𝑦) = 𝑒𝑥 cos(𝑦).
] ]
= & =
∂𝑥 ∂𝑥 ∂𝑥 ∂𝑦
∂𝑢 ∂𝑣 ∂𝑢 ∂𝑣
Thus 𝑓 satisfies the CR-equations ∂𝑥 = ∂𝑦 and ∂𝑦 = − ∂𝑥 . The complex exponential function is
complex differentiable.

The converse of Theorem 3.6.7 is not true in general. It is possible to have functions 𝑢, 𝑣 : 𝑈 ⊆
ℝ2 → ℝ that satisfy the CR-equations at 𝑧𝑜 ∈ 𝑈 and yet 𝑓 = 𝑢+𝑖𝑣 fails to be complex differentiable
at 𝑧𝑜 .
{
0 if 𝑥𝑦 ∕= 0
Example 3.6.9. Counter-example to converse of Theorem 3.6.7. Suppose 𝑓 (𝑥+𝑖𝑦) = .
1 if 𝑥𝑦 = 0
Clearly 𝑓 is identically zero on the coordinate axes thus along the 𝑥-axis we can calculate the partial
derivatives for 𝑢 and 𝑣 and they are both zero. Likewise, along the 𝑦-axis we find 𝑢𝑦 and 𝑣𝑦 exist and
are zero. At the origin we find 𝑢𝑥 , 𝑢𝑦 , 𝑣𝑥 , 𝑣𝑦 all exist and are zero. Therefore, the Cauchy-Riemann
equations hold true at the origin. However, this function is not even continuous at the origin, thus
it is not real differentiable!

The example above equally well serves as an example for a point where a function has partial
derivatives which exist at all orders and yet the differential fails to exist. It’s not a problem of
complex variables in my opinion, it’s a problem of advanced calculus. The key concept to reverse
the theorem is continuous differentiability.

Theorem 3.6.10.

If 𝑢, 𝑣, 𝑢𝑥 , 𝑢𝑦 , 𝑣𝑥 , 𝑣𝑦 are continuous functions in some open disk of 𝑧𝑜 and 𝑢𝑥 (𝑧𝑜 ) = 𝑣𝑦 (𝑧𝑜 )


and 𝑢𝑦 (𝑧𝑜 ) = −𝑣𝑥 (𝑧𝑜 ) then 𝑓 = 𝑢 + 𝑖𝑣 is complex differentiable at 𝑧𝑜 .
Proof: we are given that a function 𝑓 : 𝐷𝜖 (𝑧𝑜 ) ⊂ ℝ2 → ℝ2 is continuous with continuous partial
derivatives of its component functions 𝑢 and 𝑣. Therefore, by Theorem ?? we know 𝑓 is (real)
differentiable at 𝑧𝑜 . Therefore, we have a best linear approximation to the change in 𝑓 near 𝑧𝑜
which can be induced via multiplication of the Jacobian matrix:
[ ][ ]
𝑢𝑥 (𝑧𝑜 ) 𝑢𝑦 (𝑧𝑜 ) 𝑣1
𝐿(𝑣1 , 𝑣2 ) = .
𝑣𝑥 (𝑧𝑜 ) 𝑣𝑦 (𝑧𝑜 ) 𝑣2

Note then that the given CR-equations show the matrix of 𝐿 has the form
[ ]
𝑎 𝑏
[𝐿] =
−𝑏 𝑎
100 CHAPTER 3. DIFFERENTIATION

where 𝑎 = 𝑢𝑥 (𝑧𝑜 ) and 𝑏 = 𝑣𝑥 (𝑧𝑜 ). Consequently we find 𝐿 is complex linear and it follows that 𝑓
is complex differentiable at 𝑧𝑜 since we have a complex linear map 𝐿 such that
𝑓 (𝑧 + ℎ) − 𝑓 (𝑧) − 𝐿(ℎ)
lim =0
ℎ→0 ∣∣ℎ∣∣
note that the limit with ℎ in the denominator is equivalent to the limit above which followed directly
from the (real) differentiability at 𝑧𝑜 . (the following is not needed for the proof of the theorem, but
perhaps it is interesting anyway) Moreover, we can write
[ ][ ]
𝑢𝑥 𝑢𝑦 ℎ1
𝐿(ℎ1 , ℎ2 ) =
−𝑢𝑦 𝑢𝑥 ℎ2
[ ]
𝑢𝑥 ℎ1 + 𝑢𝑦 ℎ2
=
−𝑢𝑦 ℎ1 + 𝑢𝑥 ℎ2
= 𝑢𝑥 ℎ1 + 𝑢𝑦 ℎ2 + 𝑖(−𝑢𝑦 ℎ1 + 𝑢𝑥 ℎ2 )
= (𝑢𝑥 − 𝑖𝑢𝑦 )(ℎ1 + 𝑖ℎ2 )

Therefore we find 𝑓 ′ (𝑧𝑜 ) = 𝑢𝑥 − 𝑖𝑢𝑦 gives 𝐿(ℎ) = 𝑓 ′ (𝑧𝑜 )𝑧. □

In the preceding section we found necessary and sufficient conditions for the component functions
𝑢, 𝑣 to construct an complex differentiable function 𝑓 = 𝑢 + 𝑖𝑣. The definition that follows is the
next logical step: we say a function is analytic9 at 𝑧𝑜 if it is complex differentiable at each point in
some open disk about 𝑧𝑜 .
Definition 3.6.11.
Let 𝑓 = 𝑢 + 𝑖𝑣 be a complex function. If there exists 𝜖 > 0 such that 𝑓 is complex
differentiable for each 𝑧 ∈ 𝐷𝜖 (𝑧𝑜 ) then we say that 𝑓 is analytic at 𝑧𝑜 . If 𝑓 is analytic for
each 𝑧𝑜 ∈ 𝑈 then we say 𝑓 is analytic on 𝑈 . If 𝑓 is not analytic at 𝑧𝑜 then we say that 𝑧𝑜
is a singular point. Singular points may be outside the domain of the function. If 𝑓 is
analytic on the entire complex plane then we say 𝑓 is entire. Analytic functions are
also called holomorphic functions
If you look in my complex variables notes you can find proof of the following theorem (well, partial
proof perhaps, but this result is shown in every good complex variables text)
Theorem 3.6.12.

If 𝑓 : ℝ → ℝ ⊂ ℂ is a function and 𝑓˜ : ℂ → ℂ is an extension of 𝑓 which is analytic then


𝑓˜ is unique. In particular, if there is an analytic extension of sine, cosine, hyperbolic sine
or hyperbolic cosine then those extensions are unique.
This means if we demand analyticity then we actually had no freedom in our choice of the exponen-
tial. If we find a complex function which matches the exponential function on a line-segment ( in
9
you may recall that a function on ℝ was analyic at 𝑥𝑜 if its Talyor series at 𝑥𝑜 converged to the function in some
neighborhood of 𝑥𝑜 . This terminology is consistent but I leave the details for your complex analysis course
3.6. COMPLEX ANALYSIS IN A NUTSHELL 101

particular a closed interval in ℝ viewed as a subset of ℂ is a line-segment ) then there is just one
complex function which agrees with the real exponential and is complex differentiable everywhere.

𝑓 (𝑥) = 𝑒𝑥 extends uniquely to 𝑓˜(𝑧) = 𝑒𝑅𝑒(𝑧) (cos(𝐼𝑚(𝑧)) + 𝑖 sin(𝐼𝑚(𝑧))).

Note 𝑓˜(𝑥 + 0𝑖) = 𝑒𝑥 (cos(0) + 𝑖 sin(0)) = 𝑒𝑥 thus 𝑓˜∣ℝ = 𝑓 . Naturally, analyiticity is a desireable
property for the complex-extension of known functions so this concept of analytic continuation is
very nice. Existence aside, we should first construct sine, cosine etc... then we have to check they
are both analytic and also that they actually agree with the real sine or cosine etc... If a function
on ℝ has vertical asymptotes, points of discontinuity or points where it is not smooth then the
story is more complicated.

3.6.1 harmonic functions


We’ve discussed in some depth how to determine if a given function 𝑓 = 𝑢 + 𝑖𝑣 is in fact analytic.
In this section we study another angle on the story. We learn that the component functions
𝑢, 𝑣 of an analytic function 𝑓 = 𝑢 + 𝑖𝑣 are harmonic conjugates and they satisfy the phyically
significant Laplace’s equation ∇2 𝜙 = 0 where ∇2 = ∂ 2 /∂𝑥2 + ∂ 2 /∂𝑦 2 . In addition we’ll learn
that if we have one solution of Laplace’s equation then we can consider it to be the ”𝑢” of some
yet undetermined analytic function 𝑓 = 𝑢 + 𝑖𝑣. The remaining function 𝑣 is then constructed
through some integration guided by the CR-equations. The construction is similar to the problem
of construction of a potential function for a given conservative force in calculus III.

Proposition 3.6.13.

If 𝑓 = 𝑢 + 𝑖𝑣 is analytic on some domain 𝐷 ⊆ ℂ then 𝑢 and 𝑣 are solutions of Laplace’s


equation 𝜙𝑥𝑥 + 𝜙𝑦𝑦 = 0 on 𝐷.
Proof: since 𝑓 = 𝑢 + 𝑖𝑣 is analytic we know the CR-equations hold true; 𝑢𝑥 = 𝑣𝑦 and 𝑢𝑦 = −𝑣𝑥 .
Moreover, 𝑓 is continuously differentiable so we may commute partial derivatives by a theorem
from multivariate calculus. Consider

𝑢𝑥𝑥 + 𝑢𝑦𝑦 = (𝑢𝑥 )𝑥 + (𝑢𝑦 )𝑦 = (𝑣𝑦 )𝑥 + (−𝑣𝑥 )𝑦 = 𝑣𝑦𝑥 − 𝑣𝑥𝑦 = 0

Likewise,
𝑣𝑥𝑥 + 𝑣𝑦𝑦 = (𝑣𝑥 )𝑥 + (𝑣𝑦 )𝑦 = (−𝑢𝑦 )𝑥 + (𝑢𝑥 )𝑦 = −𝑢𝑦𝑥 + 𝑢𝑥𝑦 = 0
Of course these relations hold for all points inside 𝐷 and the proposition follows. □

Example 3.6.14. Note 𝑓 (𝑧) = 𝑧 2 is analytic with 𝑢 = 𝑥2 − 𝑦 2 and 𝑣 = 2𝑥𝑦. We calculate,

𝑢𝑥𝑥 = 2, 𝑢𝑦𝑦 = −2 ⇒ 𝑢𝑥𝑥 + 𝑢𝑦𝑦 = 0

Note 𝑣𝑥𝑥 = 𝑣𝑦𝑦 = 0 so 𝑣 is also a solution to Laplace’s equation.

Now let’s see if we can reverse this idea.


102 CHAPTER 3. DIFFERENTIATION

Example 3.6.15. Let 𝑢(𝑥, 𝑦) = 𝑥 + 𝑐1 notice that 𝑢 solves Laplace’s equation. We seek to find a
harmonic conjugate of 𝑢. We need to find 𝑣 such that,
∂𝑣 ∂𝑢 ∂𝑣 ∂𝑢
= =1 =− =0
∂𝑦 ∂𝑥 ∂𝑥 ∂𝑦

Integrate these equations to deduce 𝑣(𝑥, 𝑦) = 𝑦 + 𝑐2 for some constant 𝑐2 ∈ ℝ. We thus construct
an analytic function 𝑓 (𝑥, 𝑦) = 𝑥 + 𝑐1 + 𝑖(𝑦 + 𝑐2 ) = 𝑥 + 𝑖𝑦 + 𝑐1 + 𝑖𝑐2 . This is just 𝑓 (𝑧) = 𝑧 + 𝑐 for
𝑐 = 𝑐1 + 𝑖𝑐2 .

Example 3.6.16. Suppose 𝑢(𝑥, 𝑦) = 𝑒𝑥 cos(𝑦). Note that 𝑢𝑥𝑥 = 𝑢 whereas 𝑢𝑦𝑦 = −𝑢 hence
𝑢𝑥𝑥 + 𝑢𝑦𝑦 = 0. We seek to find 𝑣 such that

∂𝑣 ∂𝑢 ∂𝑣 ∂𝑢
= = 𝑒𝑥 cos(𝑦) =− = 𝑒𝑥 sin(𝑦)
∂𝑦 ∂𝑥 ∂𝑥 ∂𝑦

Integrating 𝑣𝑦 = 𝑒𝑥 cos(𝑦) with respect to 𝑦 and 𝑣𝑥 = 𝑒𝑥 sin(𝑦) with respect to 𝑥 yields 𝑣(𝑥, 𝑦) =
𝑒𝑥 sin(𝑦). We thus construct an analytic function 𝑓 (𝑥, 𝑦) = 𝑒𝑥 cos(𝑦) + 𝑖𝑒𝑥 sin(𝑦). Of course we
should recognize the function we just constructed, it’s just the complex exponential 𝑓 (𝑧) = 𝑒𝑧 .

Notice we cannot just construct an analytic function from any given function of two variables. We
have to start with a solution to Laplace’s equation. This condition is rather restrictive. There
is much more to say about harmonic functions, especially where applications are concerned. My
goal here was just to give another perspective on analytic functions. Geometrically one thing we
could see without further work at this point is that for an analytic function 𝑓 = 𝑢 + 𝑖𝑣 the families
of level curves 𝑢(𝑥, 𝑦) = 𝑐1 and 𝑣(𝑥, 𝑦) = 𝑐2 are orthogonal. Note 𝑔𝑟𝑎𝑑(𝑢) =< 𝑢𝑥 , 𝑢𝑦 > and
𝑔𝑟𝑎𝑑(𝑣) =< 𝑣𝑥 , 𝑣𝑦 > have

𝑔𝑟𝑎𝑑(𝑢) ⋅ 𝑔𝑟𝑎𝑑(𝑣) = 𝑢𝑥 𝑣𝑥 + 𝑢𝑦 𝑣𝑦 = −𝑢𝑥 𝑢𝑦 + 𝑢𝑦 𝑢𝑥 = 0

This means the normal lines to the level curves for 𝑢 and 𝑣 are orthogonal. Hence the level curves
of 𝑢 and 𝑣 are orthogonal.
Chapter 4

inverse and implicit function theorems

It is tempting to give a complete and rigourous proof of these theorems here, but I will resist the
temptation in lecture. I’m actually more interested that the student understand what the theorem
claims. I will sketch the proof and show many applications. A nearly complete proof is found in
Edwards where he uses an iterative approximation technique founded on the contraction mapping
principle. All his arguments are in some sense in vain unless you have some working knowledge
of uniform convergence. It’s hidden in his proof, but we cannot conclude the limit of his sequence
of function has the properties we desire unless the sequence of functions is uniformly convergent.
Sadly that material has it’s home in real analysis and I dare not trespass in lecture. That said, if
you wish I’d be happy to show you the full proof if you have about 20 extra hours to develop the
material outside of class. Alternatively, as a course of future study, return to the proof after you
have completed Math 431 here at Liberty1 . Some other universities put advanced calculus after
the real analysis course so that more analytical depth can be built into the course2

4.1 inverse function theorem


Consider the problem of finding a local inverse for 𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ ℝ → ℝ. If we are given a point
𝑝 ∈ 𝑑𝑜𝑚(𝑓 ) such that there exists an open interval 𝐼 containing 𝑝 with 𝑓 ∣𝐼 a one-one function then
we can reasonably construct an inverse function by the simple rule 𝑓 −1 (𝑦) = 𝑥 iff 𝑓 (𝑥) = 𝑦 for
𝑥 ∈ 𝐼 and 𝑦 ∈ 𝑓 (𝐼). A sufficient condition to insure the existence of a local inverse is that the
derivative function is either strictly positive or strictly negative on some neighborhood of 𝑝. If we
are give a continuously differentiable function at 𝑝 then it has a derivative which is continuous on
some neighborhood of 𝑝. For such a function if 𝑓 ′ (𝑝) ∕= 0 then there exists some interval centered at
𝑝 for which the derivative is strictly positive or negative. It follows that such a function is strictly
monotonic and is hence one-one thus there is a local inverse at 𝑝. Therefore, even in calculus I we
find the derivative informs us about the local invertibility of a function.

1
often read incorrectly as LU
2
If you think this would be worthwhile then by all means say as much in your exit interview, I believe we should
value the opinions of students, especially when they are geared towards academic excellence

103
104 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

The arguments I just made are supported by theorems that are developed in calculus I. Let me shift
gears a bit and give a direct calculational explaination based on the linearization approximation.
If 𝑥 ≈ 𝑝 then 𝑓 (𝑥) ≈ 𝑓 (𝑝) + 𝑓 ′ (𝑝)(𝑥 − 𝑝). To find the formula for the inverse we solve 𝑦 = 𝑓 (𝑥) for
𝑥:
1 [
𝑦 ≈ 𝑓 (𝑝) + 𝑓 ′ (𝑝)(𝑥 − 𝑝) ⇒ 𝑥 ≈ 𝑝 + ′
]
𝑦 − 𝑓 (𝑝)
𝑓 (𝑝)
1 [
Therefore, 𝑓 −1 (𝑦) ≈ 𝑝 +
]
𝑦 − 𝑓 (𝑝) for 𝑦 near 𝑓 (𝑝).
𝑓 ′ (𝑝)

Example 4.1.1. Just to help you believe me, consider 𝑓 (𝑥) = 3𝑥 − 2 then 𝑓 ′ (𝑥) = 3 for all 𝑥.
Suppose we want to find the inverse function near 𝑝 = 2 then the discussion preceding this example
suggests,
1
𝑓 −1 (𝑦) = 2 + (𝑦 − 4).
3
I invite the reader to check that 𝑓 (𝑓 (𝑦)) = 𝑦 and 𝑓 −1 (𝑓 (𝑥)) = 𝑥 for all 𝑥, 𝑦 ∈ ℝ.
−1

In the example above we found a global inverse exactly, but this is thanks to the linearity of the
function in the example. Generally, inverting the linearization just gives the first approximation to
the inverse.
Consider 𝐹 : 𝑑𝑜𝑚(𝐹 ) ⊆ ℝ𝑛 → ℝ𝑛 . If 𝐹 is differentiable at 𝑝 ∈ ℝ𝑛 then we can write 𝐹 (𝑥) ≈
𝐹 (𝑝) + 𝐹 ′ (𝑝)(𝑥 − 𝑝) for 𝑥 ≈ 𝑝. Set 𝑦 = 𝐹 (𝑥) and solve for 𝑥 via matrix algebra. This time we need
to assume 𝐹 ′ (𝑝) is an invertible matrix in order to isolate 𝑥,

𝑦 ≈ 𝐹 (𝑝) + 𝐹 ′ (𝑝)(𝑥 − 𝑝) ⇒ 𝑥 ≈ 𝑝 + (𝐹 ′ (𝑝))−1 𝑦 − 𝑓 (𝑝)


[ ]

Therefore, 𝐹 −1 (𝑦) ≈ 𝑝 + (𝐹 ′ (𝑝))−1 𝑦 − 𝑓 (𝑝) for 𝑦 near 𝐹 (𝑝). Apparently the condition to find a
[ ]

local inverse for a mapping on ℝ𝑛 is that the derivative matrix is nonsingular3 in some neighbor-
hood of the point. Experience has taught us from the one-dimensional case that we must insist the
derivative is continuous near the point in order to maintain the validity of the approximation.

Recall from calculus II that as we attempt to approximate a function with a power series it takes
an infinite series of power functions to recapture the formula exactly. Well, something similar is
true here. However, the method of approximation is through an iterative approximation procedure
which is built off the idea of Newton’s method. The product of this iteration is a nested sequence of
composite functions. To prove the theorem below one must actually provide proof the recursively
generated sequence of functions converges. See pages 160-187 of Edwards for an in-depth exposition
of the iterative approximation procedure. Then see pages 404-411 of Edwards for some material
on uniform convergence4 The main analytical tool which is used to prove the convergence is called
the contraction mapping principle. The proof of the principle is relatively easy to follow and
3
nonsingular matrices are also called invertible matrices and a convenient test is that 𝐴 is invertible iff 𝑑𝑒𝑡(𝐴) ∕= 0.
4
actually that later chapter is part of why I chose Edwards’ text, he makes a point of proving things in ℝ𝑛 in such
a way that the proof naturally generalizes to function space. This is done by arguing with properties rather than
formulas. The properties offen extend to infinite dimensions whereas the formulas usually do not.
4.1. INVERSE FUNCTION THEOREM 105

interestingly the main non-trivial step is an application of the geometric series. For the student
of analysis this is an important topic which you should spend considerable time really trying to
absorb as deeply as possible. The contraction mapping is at the base of a number of interesting
and nontrivial theorems. Read Rosenlicht’s Introduction to Analysis for a broader and better
organized exposition of this analysis. In contrast, Edwards’ uses analysis as a tool to obtain results
for advanced calculus but his central goal is not a broad or well-framed treatment of analysis.
Consequently, if analysis is your interest then you really need to read something else in parallel to
get a better ideas about sequences of functions and uniform convergence. I have some notes from
a series of conversations with a student about Rosenlicht, I’ll post those for the interested student.
These notes focus on the part of the material I require for this course. This is Theorem 3.3 on page
185 of Edwards’ text:

Theorem 4.1.2. ( inverse function theorem )

Suppose 𝐹 : ℝ𝑛 → ℝ𝑛 is continuously differentiable in an open set 𝑊 containing 𝑎 and the


derivative matrix 𝐹 ′ (𝑎) is invertible. Then 𝐹 is locally invertible at 𝑎. This means that
there exists an open set 𝑈 ⊆ 𝑊 containing 𝑎 and 𝑉 a open set containing 𝑏 = 𝐹 (𝑎) and
a one-one, continuously differentiable mapping 𝐺 : 𝑉 → 𝑊 such that 𝐺(𝐹 (𝑥)) = 𝑥 for all
𝑥 ∈ 𝑈 and 𝐹 (𝐺(𝑦)) = 𝑦 for all 𝑦 ∈ 𝑉 . Moreover, the local inverse 𝐺 can be obtained as the
limit of the sequence of successive approximations defined by

𝐺𝑜 (𝑦) = 𝑎 and 𝐺𝑛+1 (𝑦) = 𝐺𝑛 (𝑦) − [𝐹 ′ (𝑎)]−1 [𝐹 (𝐺𝑛 (𝑦)) − 𝑦] for all 𝑦 ∈ 𝑉 .

The qualifier local is important to note. If we seek a global inverse then other ideas are needed.
If the function is everywhere injective then logically 𝐹 (𝑥) = 𝑦 defines 𝐹 −1 (𝑦) = 𝑥 and 𝐹 −1 so
constructed in single-valued by virtue of the injectivity of 𝐹 . However, for differentiable mappings,
one might wonder how can the criteria of global injectivity be tested via the differential. Even in
the one-dimensional case a vanishing derivative does not indicate a lack of injectivity; 𝑓 (𝑥) = 𝑥3

has 𝑓 −1 (𝑦) = 3 𝑦 and yet 𝑓 ′ (0) = 0 (therefore 𝑓 ′ (0) is not invertible). One the other hand, we’ll see
in the examples that follow that even if the derivative is invertible over a set it is possible for the
values of the mapping to double-up and once that happens we cannot find a single-valued inverse
function5

Remark 4.1.3. James R. Munkres’ Analysis on Manifolds good for a different proof.

Another good place to read the inverse function theorem is in James R. Munkres Analysis
on Manifolds. That text is careful and has rather complete arguments which are not entirely
the same as the ones given in Edwards. Munkres’ text does not use the contraction mapping
principle, instead the arguments are more topological in nature.

5
there are scientists and engineers who work with multiply-valued functions with great success, however, as a point
of style if nothing else, we try to use functions in math.
106 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

To give some idea of what I mean by topological let be give an example of such an argument.
Suppose 𝐹 : ℝ𝑛 → ℝ𝑛 is continuously differentiable and 𝐹 ′ (𝑝) is invertible. Here’s a sketch of the
argument that 𝐹 ′ (𝑥) is invertible for all 𝑥 near 𝑝 as follows:
1. the function 𝑔 : ℝ𝑛 → ℝ defined by 𝑔(𝑥) = 𝑑𝑒𝑡(𝐹 ′ (𝑥)) is formed by a multinomial in the
component functions of 𝐹 ′ (𝑥). This function is clearly continuous since we are given that the
partial derivatives of the component functions of 𝐹 are all continuous.
2. note we are given 𝐹 ′ (𝑝) is invertible and hence 𝑑𝑒𝑡(𝐹 ′ (𝑝)) ∕= 0 thus the continuous function 𝑔
is nonzero at 𝑝. It follows there is some open set 𝑈 containing 𝑝 for which 0 ∈ / 𝑔(𝑈 )
3. we have 𝑑𝑒𝑡(𝐹 ′ (𝑥)) ∕= 0 for all 𝑥 ∈ 𝑈 hence 𝐹 ′ (𝑥) is invertible on 𝑈 .
I would argue this is a topological argument because the key idea here is the continuity of 𝑔.
Topology is the study of continuity in general.
Remark 4.1.4. James J. Callahan’s Advanced Calculus: a Geometric View, good reading.
James J. Callahan recently authored Advanced Calculus: a Geometric View. This text has
great merit in both visualization and well-thought use of linear algebraic techniques. In
addition, many student will enjoy his staggered proofs where he first shows the proof for a
simple low dimensional case and then proceeds to the general case. I almost used his text
this semester.

Example 4.1.5. Suppose 𝐹 (𝑥, 𝑦) = sin(𝑦) + 1, sin(𝑥) + 2 for (𝑥, 𝑦) ∈ ℝ2 . Clearly 𝐹 is contin-
( )

uously differentiable as all its component functions have continuous partial derivatives. Observe,
[ ]
′ 0 cos(𝑦)
𝐹 (𝑥, 𝑦) = [ ∂𝑥 𝐹 ∣ ∂𝑦 𝐹 ] =
cos(𝑥) 0
Hence 𝐹 ′ (𝑥, 𝑦) is invertible at points (𝑥, 𝑦) such that 𝑑𝑒𝑡(𝐹 ′ (𝑥, 𝑦)) = − cos(𝑥) cos(𝑦) ∕= 0. This
means we may not be able to find local inverses at points (𝑥, 𝑦) with 𝑥 = 21 (2𝑛 + 1)𝜋 or 𝑦 =
1 ′
2 (2𝑚 + 1)𝜋 for some 𝑚, 𝑛 ∈ ℤ. Points where 𝐹 (𝑥, 𝑦) are singular are points where one or both
of sin(𝑦) and sin(𝑥) reach extreme values thus the points where the Jacobian matrix are singular
are in fact points where we cannot find a local inverse. Why? Because the function is clearly not
1-1 on any set which contains the points of singularity for 𝑑𝐹 . Continuing, recall from precalculus
that sine has a standard inverse on [−𝜋/2, 𝜋/2]. Suppose (𝑥, 𝑦) ∈ [−𝜋/2, 𝜋/2]2 and seek to solve
𝐹 (𝑥, 𝑦) = (𝑎, 𝑏) for (𝑥, 𝑦):
𝑦 = sin−1 (𝑎 − 1)
[ ] [ ] { } { }
sin(𝑦) + 1 𝑎 sin(𝑦) + 1 = 𝑎
𝐹 (𝑥, 𝑦) = = ⇒ ⇒
sin(𝑥) + 2 𝑏 sin(𝑥) + 2 = 𝑏 𝑥 = sin−1 (𝑏 − 2)

It follows that 𝐹 −1 (𝑎, 𝑏) = sin−1 (𝑏 − 2), sin−1 (𝑎 − 1) for (𝑎, 𝑏) ∈ [0, 2] × [1, 3] where you should
( )

note 𝐹 ([−𝜋/2, 𝜋/2]2 ) = [0, 2] × [1, 3]. We’ve found a local inverse for 𝐹 on the region [−𝜋/2, 𝜋/2]2 .
In other words, we just found a global inverse for the restriction of 𝐹 to [−𝜋/2, 𝜋/2]2 . Technically
we ought not write 𝐹 −1 , to be more precise we should write:
(𝐹 ∣[−𝜋/2,𝜋/2]2 )−1 (𝑎, 𝑏) = sin−1 (𝑏 − 2), sin−1 (𝑎 − 1) .
( )
4.1. INVERSE FUNCTION THEOREM 107

It is customary to avoid such detail in many contexts. Inverse functions for sine, cosine, tangent
etc... are good examples of this slight of langauge.

A coordinate system on ℝ𝑛 is an invertible mapping of ℝ𝑛 to ℝ𝑛 . However, in practice the term


coordinate system is used with less rigor. Often a coordinate system has various degeneracies. For
example, in polar coordinates you could say 𝜃 = 𝜋/4 or 𝜃 = 9𝜋/4 or generally 𝜃 = 2𝜋𝑘 + 𝜋/4 for
any 𝑘 ∈ ℤ. Let’s examine polar coordinates in view of the inverse function theorem.
( )
Example 4.1.6. Let 𝑇 (𝑟, 𝜃) = 𝑟 cos(𝜃), 𝑟 sin(𝜃) for (𝑟, 𝜃) ∈ [0, ∞) × (−𝜋/2, 𝜋/2). Clearly
𝑇 is continuously differentiable as all its component functions have continuous partial derivatives.
To find the inverse we seek to solve 𝑇 (𝑟, 𝜃) = (𝑥, 𝑦) for (𝑟, 𝜃). Hence, consider 𝑥 = 𝑟 cos(𝜃) and
𝑦 = 𝑟 sin(𝜃). Note that

𝑥2 + 𝑦 2 = 𝑟2 cos2 (𝜃) + 𝑟2 sin2 (𝜃) = 𝑟2 (cos2 (𝜃) + sin2 (𝜃)) = 𝑟2

and
𝑦 𝑟 sin(𝜃)
= = tan(𝜃).
𝑥 𝑟 cos(𝜃)

It follows that 𝑟 = 𝑥2 + 𝑦 2 and 𝜃 = tan−1 (𝑦/𝑥) for (𝑥, 𝑦) ∈ (0, ∞) × ℝ. We find
(√ )
−1 2 2 −1
𝑇 (𝑥, 𝑦) = 𝑥 + 𝑦 , tan (𝑦/𝑥) .

Let’s see how the derivative fits with our results. Calcuate,
[ ]
′ cos(𝜃) −𝑟 sin(𝜃)
𝑇 (𝑟, 𝜃) = [ ∂𝑟 𝑇 ∣ ∂𝜃 𝑇 ] =
sin(𝜃) 𝑟 cos(𝜃)

note that 𝑑𝑒𝑡(𝑇 ′ (𝑟, 𝜃)) = 𝑟 hence we the inverse function theorem provides the existence of a local
inverse around any point except the origin. Notice the derivative does not( detect the defect )in the
angular coordinate. Challenge, find the inverse function for 𝑇 (𝑟, 𝜃) = 𝑟 cos(𝜃), 𝑟 sin(𝜃) with
𝑑𝑜𝑚(𝑇 ) = [0, ∞) × (𝜋/2, 3𝜋/2). Or, find the inverse for polar coordinates in a neighborhood of
(0, −1).

Example 4.1.7. Suppose 𝑇 : ℝ3 → ℝ3 is defined by 𝑇 (𝑥, 𝑦, 𝑧) = (𝑎𝑥, 𝑏𝑦, 𝑐𝑧) for constants 𝑎, 𝑏, 𝑐 ∈
ℝ where 𝑎𝑏𝑐 ∕= 0. Clearly 𝑇 is continuously differentiable as all its component functions have
continuous partial derivatives. We calculate 𝑇 ′ (𝑥, 𝑦, 𝑧) = [∂𝑥 𝑇 ∣∂𝑦 𝑇 ∣∂𝑧 𝑇 ] = [𝑎𝑒1 ∣𝑏𝑒2 ∣𝑐𝑒3 ]. Thus
𝑑𝑒𝑡(𝑇 ′ (𝑥, 𝑦, 𝑧)) = 𝑎𝑏𝑐 ∕= 0 for all (𝑥, 𝑦, 𝑧) ∈ ℝ3 hence this function is locally invertible everywhere.
Moreover, we calculate the inverse mapping by solving 𝑇 (𝑥, 𝑦, 𝑧) = (𝑢, 𝑣, 𝑤) for (𝑥, 𝑦, 𝑧):

(𝑎𝑥, 𝑏𝑦, 𝑐𝑧) = (𝑢, 𝑣, 𝑤) ⇒ (𝑥, 𝑦, 𝑧) = (𝑢/𝑎, 𝑣/𝑏, 𝑤/𝑐) ⇒ 𝑇 −1 (𝑢, 𝑣, 𝑤) = (𝑢/𝑎, 𝑣/𝑏, 𝑤/𝑐).

Example 4.1.8. Suppose 𝐹 : ℝ𝑛 → ℝ𝑛 is defined by 𝐹 (𝑥) = 𝐴𝑥+𝑏 for some matrix 𝐴 ∈ ℝ 𝑛×𝑛 and
vector 𝑏 ∈ ℝ𝑛 . Under what conditions is such a function invertible ?. Since the formula for
this function gives each component function as a polynomial in the 𝑛-variables we can conclude the
108 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

function is continuously differentiable. You can calculate that 𝐹 ′ (𝑥) = 𝐴. It follows that a sufficient
condition for local inversion is 𝑑𝑒𝑡(𝐴) ∕= 0. It turns out that this is also a necessary condition as
𝑑𝑒𝑡(𝐴) = 0 implies the matrix 𝐴 has nontrivial solutions for 𝐴𝑣 = 0. We say 𝑣 ∈ 𝑁 𝑢𝑙𝑙(𝐴) iff
𝐴𝑣 = 0. Note if 𝑣 ∈ 𝑁 𝑢𝑙𝑙(𝐴) then 𝐹 (𝑣) = 𝐴𝑣 + 𝑏 = 𝑏. This is not a problem when 𝑑𝑒𝑡(𝐴) ∕= 0
for in that case the null space is contains just zero; 𝑁 𝑢𝑙𝑙(𝐴) = {0}. However, when 𝑑𝑒𝑡(𝐴) = 0 we
learn in linear algebra that 𝑁 𝑢𝑙𝑙(𝐴) contains infinitely many vectors so 𝐹 is far from injective. For
example, suppose 𝑁 𝑢𝑙𝑙(𝐴) = 𝑠𝑝𝑎𝑛{𝑒1 } then you can show that 𝐹 (𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ) = 𝐹 (𝑥, 𝑎2 , . . . , 𝑎𝑛 )
for all 𝑥 ∈ ℝ. Hence any point will have other points nearby which output the same value under 𝐹 .
Suppose 𝑑𝑒𝑡(𝐴) ∕= 0, to calculate the inverse mapping formula we should solve 𝐹 (𝑥) = 𝑦 for 𝑥,

𝑦 = 𝐴𝑥 + 𝑏 ⇒ 𝑥 = 𝐴−1 (𝑦 − 𝑏) ⇒ 𝐹 −1 (𝑦) = 𝐴−1 (𝑦 − 𝑏).

Remark 4.1.9. inverse function theorem holds for higher derivatives.

In Munkres the inverse function theorem is given for 𝑟-times differentiable functions. In
short, a 𝐶 𝑟 function with invertible differential at a point has a 𝐶 𝑟 inverse function local
to the point. Edwards also has arguments for 𝑟 > 1, see page 202 and arguments and
surrounding arguments.

4.2 implicit function theorem


Consider the problem of solving 𝑥2 + 𝑦 2 = 1 for 𝑦 as a function of 𝑥.

𝑥2 + 𝑦 2 = 1 ⇒ 𝑦 2 = 1 − 𝑥2 ⇒ 𝑦 = ± 1 − 𝑥2 .

A function cannot have two outputs for a single input, when we write ± in the expression above
it simply indicates our ignorance as to which is chosen. Once further information is given then we
may be able to choose a + or a −. For example:

1. if 𝑥2 + 𝑦 2 = 1 and we want to solve for 𝑦 near (0, 1) then 𝑦 = 1 − 𝑥2 is the correct choice
since 𝑦 > 0 at the point of interest.

2. if 𝑥2 + 𝑦 2 = 1 and we want to solve for 𝑦 near (0, −1) then 𝑦 = − 1 − 𝑥2 is the correct choice
since 𝑦 < 0 at the point of interest.

3. if 𝑥2 + 𝑦 2 = 1 and we want to solve for 𝑦 near (1, 0) then it’s impossible to find a single
function which reproduces 𝑥2 + 𝑦 2 = 1 on an open disk centered at (1, 0).

What is the defect of case (3.) ? The trouble is that no matter how close we zoom in to the point
there are always two 𝑦-values for each given 𝑥-value. Geometrically, this suggests either we have a
discontinuity, a kink, or a vertical tangent in the graph. The given problem has a vertical tangent
and hopefully you can picture this with ease since its just the unit-circle. In calculus I we studied
4.2. IMPLICIT FUNCTION THEOREM 109

implicit differentiation, our starting point was to assume 𝑦 = 𝑦(𝑥) and then we differentiated
equations to work out implicit formulas for 𝑑𝑦/𝑑𝑥. Take the unit-circle and differentiate both sides,

𝑑𝑦 𝑑𝑦 𝑥
𝑥2 + 𝑦 2 = 1 ⇒ 2𝑥 + 2𝑦 =0 ⇒ =− .
𝑑𝑥 𝑑𝑥 𝑦
𝑑𝑦
Note 𝑑𝑥 is not defined for 𝑦 = 0. It’s no accident that those two points (−1, 0) and (1, 0) are
precisely the points at which we cannot solve for 𝑦 as a function of 𝑥. Apparently, the singularity
in the derivative indicates where we may have trouble solving an equation for one variable as a
function of the remaining variable.

We wish to study this problem in general. Given 𝑛-equations in (𝑚+𝑛)-unknowns when can we solve
for the last 𝑛-variables as functions of the first 𝑚-variables. Given a continuously differentiable
mapping 𝐺 = (𝐺1 , 𝐺2 , . . . , 𝐺𝑛 ) : ℝ𝑚 × ℝ𝑛 → ℝ𝑛 study the level set: (here 𝑘1 , 𝑘2 , . . . , 𝑘𝑛 are
constants)

𝐺1 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘1
𝐺2 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘2
..
.
𝐺𝑛 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘𝑛

We wish to locally solve for 𝑦1 , . . . , 𝑦𝑛 as functions of 𝑥1 , . . . 𝑥𝑚 . That is, find a mapping ℎ : ℝ𝑚 →


ℝ𝑛 such that 𝐺(𝑥, 𝑦) = 𝑘 iff 𝑦 = ℎ(𝑥) near some point (𝑎, 𝑏) ∈ ℝ𝑚 × ℝ𝑛 such that 𝐺(𝑎, 𝑏) = 𝑘. In
this section we use the notation 𝑥 = (𝑥1 , 𝑥2 , . . . 𝑥𝑚 ) and 𝑦 = (𝑦1 , 𝑦2 , . . . , 𝑦𝑛 ).

Before we turn to the general problem let’s analyze the unit-circle problem in this notation. We
are given 𝐺(𝑥, 𝑦) = 𝑥2 + 𝑦 2 and we wish to find 𝑓 (𝑥) such that 𝑦 = 𝑓 (𝑥) solves 𝐺(𝑥, 𝑦) = 1.
Differentiate with respect to 𝑥 and use the chain-rule:

∂𝐺 𝑑𝑥 ∂𝐺 𝑑𝑦
+ =0
∂𝑥 𝑑𝑥 ∂𝑦 𝑑𝑥

We find that 𝑑𝑦/𝑑𝑥 = −𝐺𝑥 /𝐺𝑦 = −𝑥/𝑦. Given this analysis we should suspect that if we are
given some level curve 𝐺(𝑥, 𝑦) = 𝑘 then we may be able to solve for 𝑦 as a function of 𝑥 near 𝑝
if 𝐺(𝑝) = 𝑘 and 𝐺𝑦 (𝑝) ∕= 0. This suspicion is valid and it is one of the many consequences of the
implicit function theorem.

We again turn to the linearization approximation. Suppose 𝐺(𝑥, 𝑦) = 𝑘 where 𝑥 ∈ ℝ𝑚 and 𝑦 ∈ ℝ𝑛


and suppose 𝐺 : ℝ𝑚 × ℝ𝑛 → ℝ𝑛 is continuously differentiable. Suppose (𝑎, 𝑏) ∈ ℝ𝑚 × ℝ𝑛 has
𝐺(𝑎, 𝑏) = 𝑘. Replace 𝐺 with its linearization based at (𝑎, 𝑏):

𝐺(𝑥, 𝑦) ≈ 𝑘 + 𝐺′ (𝑎, 𝑏)(𝑥 − 𝑎, 𝑦 − 𝑏)


110 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

here we have the matrix multiplication of the 𝑛 × (𝑚 + 𝑛) matrix 𝐺′ (𝑎, 𝑏) with the (𝑚 + 𝑛) × 1
column vector (𝑥 − 𝑎, 𝑦 − 𝑏) to yield an 𝑛-component column vector. It is convenient to define
partial derivatives with respect to a whole vector of variables,
⎡ ∂𝐺1
⋅ ⋅ ⋅ ∂𝐺
⎡ ∂𝐺1 ∂𝐺1 ⎤ ⎤
∂𝑥1 ⋅ ⋅ ⋅ ∂𝑥 𝑚 ∂𝑦1 ∂𝑦𝑛
1

∂𝐺 ⎢ . .. ⎥ ∂𝐺 ⎢ . .. ⎥
= ⎣ .. . = ⎣ .. . ⎦
∂𝑥 ∂𝑦

∂𝐺𝑛 ∂𝐺𝑛 ∂𝐺𝑛 ∂𝐺𝑛
∂𝑥1 ⋅ ⋅ ⋅ ∂𝑥𝑚 ∂𝑦1 ⋅ ⋅ ⋅ ∂𝑦𝑛
In this notation we can write the 𝑛 × (𝑚 + 𝑛) matrix 𝐺′ (𝑎, 𝑏) as the concatenation of the 𝑛 × 𝑚
matrix ∂𝐺 ∂𝐺
∂𝑥 (𝑎, 𝑏) and the 𝑛 × 𝑛 matrix ∂𝑦 (𝑎, 𝑏)
[ ]
′ ∂𝐺 ∂𝐺
𝐺 (𝑎, 𝑏) = (𝑎, 𝑏) (𝑎, 𝑏)
∂𝑥 ∂𝑦
With this notation we have
∂𝐺 ∂𝐺
𝐺(𝑥, 𝑦) ≈ 𝑘 + (𝑎, 𝑏)(𝑥 − 𝑎) + (𝑎, 𝑏)(𝑦 − 𝑏)
∂𝑥 ∂𝑦
If we are near (𝑎, 𝑏) then 𝐺(𝑥, 𝑦) ≈ 𝑘 thus we are faced with the problem of solving the following
equation for 𝑦:
∂𝐺 ∂𝐺
𝑘≈𝑘+ (𝑎, 𝑏)(𝑥 − 𝑎) + (𝑎, 𝑏)(𝑦 − 𝑏)
∂𝑥 ∂𝑦
Suppose the square matrix ∂𝐺 ∂𝑦 (𝑎, 𝑏) is invertible at (𝑎, 𝑏) then we find the following approximation
for the implicit solution of 𝐺(𝑥, 𝑦) = 0 for 𝑦 as a function of 𝑥:
[ ]−1 [ ]
∂𝐺 ∂𝐺
𝑦 =𝑏− (𝑎, 𝑏) (𝑎, 𝑏)(𝑥 − 𝑎) .
∂𝑦 ∂𝑥
Of course this is not a formal proof, but it does suggest that 𝑑𝑒𝑡 ∂𝐺
[ ]
∂𝑦 (𝑎, 𝑏) ∕= 0 is a necessary
condition for solving for the 𝑦 variables.

As before suppose 𝐺 : ℝ𝑚 × ℝ𝑛 → ℝ𝑛 . Suppose we have a continuously differentiable function


ℎ : ℝ𝑚 → ℝ𝑛 such that ℎ(𝑎) = 𝑏 and 𝐺(𝑥, ℎ(𝑥)) = 𝑘. We seek to find the derivative of ℎ in terms
of the derivative of 𝐺. This is a generalization of the implicit differentiation calculation we perform
in calculus I. I’m including this to help you understand the notation a bit more before I state the
implicit function theorem. Differentiate with respect to 𝑥𝑙 for 𝑙 ∈ ℕ𝑚 :
[ ] ∑ 𝑚 𝑛 𝑛
∂ ∂𝐺 ∂𝑥𝑖 ∑ ∂𝐺 ∂ℎ𝑗 ∂𝐺 ∑ ∂𝐺 ∂ℎ𝑗
𝐺(𝑥, ℎ(𝑥)) = + = + =0
∂𝑥𝑙 ∂𝑥𝑖 ∂𝑥𝑙 ∂𝑦𝑗 ∂𝑥𝑙 ∂𝑥𝑙 ∂𝑦𝑗 ∂𝑥𝑙
𝑖=1 𝑗=1 𝑗=1
∂𝑥𝑖
we made use of the identity ∂𝑥𝑘= 𝛿𝑖𝑘 to squash the sum of 𝑖 to the single nontrivial term and the

zero on the r.h.s follows from the fact that ∂𝑥 𝑙
(𝑘) = 0. Concatenate these derivatives from 𝑘 = 1
up to 𝑘 = 𝑚:
[ 𝑛 𝑛 𝑛 ]
∂𝐺 ∑ ∂𝐺 ∂ℎ𝑗 ∂𝐺 ∑ ∂𝐺 ∂ℎ𝑗 ∂𝐺 ∑ ∂𝐺 ∂ℎ𝑗
+ + ⋅⋅⋅
+ = [0∣0∣ ⋅ ⋅ ⋅ ∣0]
∂𝑥1 ∂𝑦𝑗 ∂𝑥1 ∂𝑥2 ∂𝑦𝑗 ∂𝑥2 ∂𝑥𝑚 ∂𝑦𝑗 ∂𝑥𝑚
𝑗=1 𝑗=1 𝑗=1
4.2. IMPLICIT FUNCTION THEOREM 111

Properties of matrix addition allow us to parse the expression above as follows:


[ ] [∑𝑛 𝑛 𝑛 ]
∂𝐺 ∂𝐺 ∂𝐺 ∂𝐺 ∂ℎ𝑗 ∑ ∂𝐺 ∂ℎ𝑗 ∑ ∂𝐺 ∂ℎ𝑗
⋅⋅⋅
+ ⋅⋅⋅
= [0∣0∣ ⋅ ⋅ ⋅ ∣0]
∂𝑥1 ∂𝑥2 ∂𝑥𝑚 ∂𝑦𝑗 ∂𝑥1 ∂𝑦𝑗 ∂𝑥2 ∂𝑦𝑗 ∂𝑥𝑚
𝑗=1 𝑗=1 𝑗=1

But, this reduces to


[ ]
∂𝐺 ∂𝐺 ∂ℎ ∂𝐺 ∂ℎ ∂𝐺 ∂ℎ 𝑚×𝑛
+ ⋅ ⋅ ⋅ =0∈ℝ
∂𝑥 ∂𝑦 ∂𝑥1 ∂𝑦 ∂𝑥2 ∂𝑦 ∂𝑥𝑚
The concatenation property of matrix multiplication states [𝐴𝑏1 ∣𝐴𝑏2 ∣ ⋅ ⋅ ⋅ ∣𝐴𝑏𝑚 ] = 𝐴[𝑏1 ∣𝑏2 ∣ ⋅ ⋅ ⋅ ∣𝑏𝑚 ]
we use this to write the expression once more,

∂𝐺 −1 ∂𝐺
[ ]
∂𝐺 ∂𝐺 ∂ℎ ∂ℎ ∂ℎ ∂𝐺 ∂𝐺 ∂ℎ ∂ℎ
+ ⋅⋅⋅
=0 ⇒ + =0 ⇒ =−
∂𝑥 ∂𝑦 ∂𝑥1 ∂𝑥2 ∂𝑥𝑚 ∂𝑥 ∂𝑦 ∂𝑥 ∂𝑥 ∂𝑦 ∂𝑥
∂𝐺
where in the last implication we made use of the assumption that ∂𝑦 is invertible.
Theorem 4.2.1. (Theorem 3.4 in Edwards’s Text see pg 190)

Let 𝐺 : 𝑑𝑜𝑚(𝐺) ⊆ ℝ𝑚 × ℝ𝑛 → ℝ𝑛 be continuously differentiable in a open ball about the


point (𝑎, 𝑏) where 𝐺(𝑎, 𝑏) = 𝑘 (a constant vector in ℝ𝑛 ). If the matrix ∂𝐺
∂𝑦 (𝑎, 𝑏) is invertible
then there exists an open ball 𝑈 containing 𝑎 in ℝ𝑚 and an open ball 𝑊 containing (𝑎, 𝑏)
in ℝ𝑚 × ℝ𝑛 and a continuously differentiable mapping ℎ : 𝑈 → ℝ𝑛 such that 𝐺(𝑥, 𝑦) = 𝑘
iff 𝑦 = ℎ(𝑥) for all (𝑥, 𝑦) ∈ 𝑊 . Moreover, the mapping ℎ is the limit of the sequence of
successive approximations defined inductively below
−1
ℎ𝑜 (𝑥) = 𝑏, ℎ𝑛+1 = ℎ𝑛 (𝑥) − [ ∂𝐺
∂𝑦 (𝑎, 𝑏)] 𝐺(𝑥, ℎ𝑛 (𝑥)) for all 𝑥 ∈ 𝑈 .

We will not attempt a proof of the last sentence for the same reasons we did not pursue the details
in the inverse function theorem. However, we have already derived the first step in the iteration in
our study of the linearization solution.

Proof: Let 𝐺 : 𝑑𝑜𝑚(𝐺) ⊆ ℝ𝑚 × ℝ𝑛 → ℝ𝑛 be continuously differentiable in a open ball 𝐵 about


the point (𝑎, 𝑏) where 𝐺(𝑎, 𝑏) = 𝑘 (𝑘 ∈ ℝ𝑛 a constant). Furthermore, assume the matrix ∂𝐺 ∂𝑦 (𝑎, 𝑏)
is invertible. We seek to use the inverse function theorem to prove the implicit function theorem.
Towards that end consider 𝐹 : ℝ𝑚 × ℝ𝑛 → ℝ𝑚 × ℝ𝑛 defined by 𝐹 (𝑥, 𝑦) = (𝑥, 𝐺(𝑥, 𝑦)). To begin,
observe that 𝐹 is continuously differentiable in the open ball 𝐵 which is centered at (𝑎, 𝑏) since
𝐺 and 𝑥 have continuous partials of their components in 𝐵. Next, calculate the derivative of
𝐹 = (𝑥, 𝐺), [ ] [ ]
′ ∂𝑥 𝑥 ∂𝑦 𝑥 𝐼𝑚 0𝑚×𝑛
𝐹 (𝑥, 𝑦) = [∂𝑥 𝐹 ∣∂𝑦 𝐹 ] = =
∂𝑥 𝐺 ∂𝑦 𝐺 ∂𝑥 𝐺 ∂𝑦 𝐺
The determinant of the matrix above is the product of the deteminant of the blocks 𝐼𝑚 and
∂𝑦 𝐺; 𝑑𝑒𝑡(𝐹 ′ (𝑥, 𝑦) = 𝑑𝑒𝑡(𝐼𝑚 )𝑑𝑒𝑡(∂𝑦 𝐺) = ∂𝑦 𝐺. We are given that ∂𝐺
∂𝑦 (𝑎, 𝑏) is invertible and hence
112 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

𝑑𝑒𝑡( ∂𝐺 ′ ′
∂𝑦 (𝑎, 𝑏)) ∕= 0 thus 𝑑𝑒𝑡(𝐹 (𝑥, 𝑦) ∕= 0 and we find 𝐹 (𝑎, 𝑏) is invertible. Consequently, the inverse
function theorem applies to the function 𝐹 at (𝑎, 𝑏). Therefore, there exists 𝐹 −1 : 𝑉 ⊆ ℝ𝑚 × ℝ𝑛 →
𝑈 ⊆ ℝ𝑚 × ℝ𝑛 such that 𝐹 −1 is continuously differentiable. Note (𝑎, 𝑏) ∈ 𝑈 and 𝑉 contains the
point 𝐹 (𝑎, 𝑏) = (𝑎, 𝐺(𝑎, 𝑏)) = (𝑎, 𝑘).

Our goal is to find the implicit solution of 𝐺(𝑥, 𝑦) = 𝑘. We know that

𝐹 −1 (𝐹 (𝑥, 𝑦)) = (𝑥, 𝑦) and 𝐹 (𝐹 −1 (𝑢, 𝑣)) = (𝑢, 𝑣)

for all (𝑥, 𝑦) ∈ 𝑈 and (𝑢, 𝑣) ∈ 𝑉 . As usual to find the formula for the inverse we can solve
𝐹 (𝑥, 𝑦) = (𝑢, 𝑣) for (𝑥, 𝑦) this means we wish to solve (𝑥, 𝐺(𝑥, 𝑦)) = (𝑢, 𝑣) hence 𝑥 = 𝑢. The
formula for 𝑣 is more elusive, but we know it exists by the inverse function theorem. Let’s say
𝑦 = 𝐻(𝑢, 𝑣) where 𝐻 : 𝑉 → ℝ𝑛 and thus 𝐹 −1 (𝑢, 𝑣) = (𝑢, 𝐻(𝑢, 𝑣)). Consider then,

(𝑢, 𝑣) = 𝐹 (𝐹 −1 (𝑢, 𝑣) = 𝐹 (𝑢, 𝐻(𝑢, 𝑣)) = (𝑢, 𝐺(𝑢, 𝐻(𝑢, 𝑣))

Let 𝑣 = 𝑘 thus (𝑢, 𝑘) = (𝑢, 𝐺(𝑢, 𝐻(𝑢, 𝑘)) for all (𝑢, 𝑣) ∈ 𝑉 . Finally, define ℎ(𝑢) = 𝐻(𝑢, 𝑘) for
all (𝑢, 𝑘) ∈ 𝑉 and note that 𝑘 = 𝐺(𝑢, ℎ(𝑢)). In particular, (𝑎, 𝑘) ∈ 𝑉 and at that point we find
ℎ(𝑎) = 𝐻(𝑎, 𝑘) = 𝑏 by construction. It follows that 𝑦 = ℎ(𝑥) provides a continuously differentiable
solution of 𝐺(𝑥, 𝑦) = 𝑘 near (𝑎, 𝑏).

Uniqueness of the solution follows from the uniqueness for the limit of the sequence of functions
described in Edwards’ text on page 192. However, other arguments for uniqueness can be offered,
independent of the iterative method, for instance: see page 75 of Munkres Analysis on Manifolds. □

Remark 4.2.2. notation and the implementation of the implicit function theorem.
We assumed the variables 𝑦 were to be written as functions of 𝑥 variables to make explicit
a local solution to the equation 𝐺(𝑥, 𝑦) = 𝑘. This ordering of the variables is convenient to
argue the proof, however the real theorem is far more general. We can select any subset of 𝑛
input variables to make up the ”𝑦” so long as ∂𝐺∂𝑦 is invertible. I will use this generalization
of the formal theorem in the applications that follow. Moreover, the notations 𝑥 and 𝑦 are
unlikely to maintain the same interpretation as in the previous pages. Finally, we will for
convenience make use of the notation 𝑦 = 𝑦(𝑥) to express the existence of a function 𝑓 such
that 𝑦 = 𝑓 (𝑥) when appropriate. Also, 𝑧 = 𝑧(𝑥, 𝑦) means there is some function ℎ for which
𝑧 = ℎ(𝑥, 𝑦). If this notation confuses then invent names for the functions in your problem.

Example 4.2.3. Suppose 𝐺(𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 + 𝑧 2 . Suppose we are given a point (𝑎, 𝑏, 𝑐) such
that 𝐺(𝑎, 𝑏, 𝑐) = 𝑅2 for a constant 𝑅. Problem: For which variable can we solve? What, if
any, influence does the given point have on our answer? Solution: to begin, we have one
equation and three unknowns so we should expect to find one of the variables as functions of the
remaining two variables. The implicit function theorem applies as 𝐺 is continuously differentiable.
1. if we wish to solve 𝑧 = 𝑧(𝑥, 𝑦) then we need 𝐺𝑧 (𝑎, 𝑏, 𝑐) = 2𝑐 ∕= 0.
4.2. IMPLICIT FUNCTION THEOREM 113

2. if we wish to solve 𝑦 = 𝑦(𝑥, 𝑧) then we need 𝐺𝑦 (𝑎, 𝑏, 𝑐) = 2𝑏 ∕= 0.

3. if we wish to solve 𝑥 = 𝑥(𝑦, 𝑧) then we need 𝐺𝑥 (𝑎, 𝑏, 𝑐) = 2𝑎 ∕= 0.

The point has no local solution for 𝑧 if it is a point on the intersection of the 𝑥𝑦-plane and the
sphere 𝐺(𝑥, 𝑦, 𝑧) = 𝑅2 . Likewise, we cannot solve for 𝑦 = 𝑦(𝑥, 𝑧) on the 𝑦 = 0 slice of the sphere
and we cannot solve for 𝑥 = 𝑥(𝑦, 𝑧) on the 𝑥 = 0 slice of the sphere.

Notice, algebra verifies the conclusions we reached via the implicit function theorem:
√ √ √
𝑧 = ± 𝑅 2 − 𝑥2 − 𝑦 2 𝑦 = ± 𝑅 2 − 𝑥2 − 𝑧 2 𝑥 = ± 𝑅2 − 𝑦 2 − 𝑧 2

When we are at zero for one of the coordinates then we cannot choose + or − since we need both on
an open ball intersected with the sphere centered at such a point6 . Remember, when I talk about
local solutions I mean solutions which exist over the intersection of the solution set and an open
ball in the ambient space (ℝ3 in this context). The preceding example is the natural extension of
the unit-circle example to ℝ3 . A similar result is available for the 𝑛-sphere in ℝ𝑛 . I hope you get
the point of the example, if we have one equation then if we wish to solve for a particular variable in
terms of the remaining variables then all we need is continuous differentiability of the level function
and a nonzero partial derivative at the point where we wish to find the solution. Now, the implicit
function theorem doesn’t find the solution for us, but it does provide the existence. In the section
that follows, existence is really all we need since focus our attention on rates of change rather than
actually solutions to the level set equation.

Example 4.2.4. Consider the equation 𝑒𝑥𝑦 + 𝑧 3 − 𝑥𝑦𝑧 = 2. Can we solve this equation for
𝑧 = 𝑧(𝑥, 𝑦) near (0, 0, 1)? Let 𝐺(𝑥, 𝑦, 𝑧) = 𝑒𝑥𝑦 + 𝑧 3 − 𝑥𝑦𝑧 and note 𝐺(0, 0, 1) = 𝑒0 + 1 + 0 = 2 hence
(0, 0, 1) is a point on the solution set 𝐺(𝑥, 𝑦, 𝑧) = 2. Note 𝐺 is clearly continuously differentiable
and
𝐺𝑧 (𝑥, 𝑦, 𝑧) = 3𝑧 2 − 𝑥𝑦 ⇒ 𝐺𝑧 (0, 0, 1) = 3 ∕= 0
therefore, there exists a continuously differentiable function ℎ : 𝑑𝑜𝑚(ℎ) ⊆ ℝ2 → ℝ which solves
𝐺(𝑥, 𝑦, ℎ(𝑥, 𝑦)) = 2 for (𝑥, 𝑦) near (0, 0) and ℎ(0, 0) = 1.

I’ll not attempt an explicit solution for the last example.

Example 4.2.5. Let (𝑥, 𝑦, 𝑧) ∈ 𝑆 iff 𝑥 + 𝑦 + 𝑧 = 2 and 𝑦 + 𝑧 = 1. Problem: For which


variable(s) can we solve? Solution: define 𝐺(𝑥, 𝑦, 𝑧) = (𝑥 + 𝑦 + 𝑧, 𝑦 + 𝑧) we wish to study
𝐺(𝑥, 𝑦, 𝑧) = (2, 1). Notice the solution set is not empty since 𝐺(1, 0, 1) = (1 + 0 + 1, 0 + 1) = (2, 1)
Moreover, 𝐺 is continuously differentiable. In this case we have two equations and three unknowns
so we expect two variables can be written in terms of the remaining free variable. Let’s examine
the derivative of 𝐺: [ ]
′ 1 1 1
𝐺 (𝑥, 𝑦, 𝑧) =
0 1 1
6
if you consider 𝐺(𝑥, 𝑦, 𝑧) = 𝑅2 as a space then the open sets on the space are taken to be the intersection with
the space and open balls in ℝ3 . This is called the subspace topology in topology courses.
114 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

Suppose we wish to solve 𝑥 = 𝑥(𝑧) and 𝑦 = 𝑦(𝑧) then we should check invertiblility of
[ ]
∂𝐺 1 1
= .
∂(𝑥, 𝑦) 0 1

The matrix above is invertible hence the implicit function theorem applies and we can solve for 𝑥
and 𝑦 as functions of 𝑧. On the other hand, if we tried to solve for 𝑦 = 𝑦(𝑥) and 𝑧 = 𝑧(𝑥) then
we’ll get no help from the implicit function theorem as the matrix
[ ]
∂𝐺 1 1
= .
∂(𝑦, 𝑧) 1 1

is not invertible. Geometrically, we can understand these results from noting that 𝐺(𝑥, 𝑦, 𝑧) = (2, 1)
is the intersection of the plane 𝑥 + 𝑦 + 𝑧 = 2 and 𝑦 + 𝑧 = 1. Subsituting 𝑦 + 𝑧 = 1 into 𝑥 + 𝑦 + 𝑧 = 2
yields 𝑥 + 1 = 2 hence 𝑥 = 1 on the line of intersection. We can hardly use 𝑥 as a free variable for
the solution when the problem fixes 𝑥 from the outset.

The method I just used to analyze the equations in the preceding example was a bit adhoc. In
linear algebra we do much better for systems of linear equations. A procedure called Gaussian
elimination naturally reduces a system of equations to a form in which it is manifestly obvious how
to eliminate redundant variables in terms of a minimal set of basic free variables. The ”𝑦” of the
implicit function proof discussions plays the role of the so-called pivotal variables whereas the
”𝑥” plays the role of the remaining free variables. These variables are generally intermingled in
the list of total variables so to reproduce the pattern assumed for the implicit function theorem we
would need to relable variables from the outset of a calculation. The calculations in the examples
that follow are not usually possible. Linear equations are particularly nice and basically what I’m
doing is following the guide of the linearization derivation in the context of specific examples.

Example 4.2.6. XXX

Example 4.2.7. XXX

Example 4.2.8. XXX


4.3. IMPLICIT DIFFERENTIATION 115

4.3 implicit differentiation


Enough theory, let’s calculate. In this section I apply previous theoretical constructions to specific
problems. I also introduce standard notation for ”constrained” partial differentiation which is
also sometimes called ”partial differentiation with a side condition”. The typical problem is the
following: given equations:

𝐺1 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘1
𝐺2 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘2
..
.
𝐺𝑛 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 𝑘𝑛

calculate partial derivative of dependent variables with respect to independent variables. Contin-
uing with the notation of the implicit function discussion we’ll assume that 𝑦 will be dependent
on 𝑥. I want to recast some of our arguments via differentials7 . Take the total differential of each
equation above,

𝑑𝐺1 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 0
𝑑𝐺2 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 0
..
.
𝑑𝐺𝑛 (𝑥1 , . . . , 𝑥𝑚 , 𝑦1 , . . . , 𝑦𝑛 ) = 0

Hence,

∂𝑥1 𝐺1 𝑑𝑥1 + ⋅ ⋅ ⋅ + ∂𝑥𝑚 𝐺1 𝑑𝑥𝑚 + ∂𝑦1 𝐺1 𝑑𝑦1 + ⋅ ⋅ ⋅ + ∂𝑦𝑛 𝐺1 𝑑𝑦𝑛 = 0


∂𝑥1 𝐺2 𝑑𝑥1 + ⋅ ⋅ ⋅ + ∂𝑥𝑚 𝐺2 𝑑𝑥𝑚 + ∂𝑦1 𝐺2 𝑑𝑦1 + ⋅ ⋅ ⋅ + ∂𝑦𝑛 𝐺2 𝑑𝑦𝑛 = 0
..
.
∂𝑥1 𝐺𝑛 𝑑𝑥1 + ⋅ ⋅ ⋅ + ∂𝑥𝑚 𝐺𝑛 𝑑𝑥𝑚 + ∂𝑦1 𝐺𝑛 𝑑𝑦1 + ⋅ ⋅ ⋅ + ∂𝑦𝑛 𝐺𝑛 𝑑𝑦𝑛 = 0

Notice, this can be nicely written in column vector notation as:

∂𝑥1 𝐺𝑑𝑥1 + ⋅ ⋅ ⋅ + ∂𝑥𝑚 𝐺𝑑𝑥𝑚 + ∂𝑦1 𝐺𝑑𝑦1 + ⋅ ⋅ ⋅ + ∂𝑦𝑛 𝐺𝑑𝑦𝑛 = 0

Or, in matrix notation:


⎡ ⎤ ⎡ ⎤
𝑑𝑥1 𝑑𝑦1
[∂𝑥1 𝐺∣ ⋅ ⋅ ⋅ ∣∂𝑥𝑚 𝐺] ⎣ ... ⎦ + [∂𝑦1 𝐺∣ ⋅ ⋅ ⋅ ∣∂𝑦𝑛 𝐺] ⎣ ... ⎦ = 0
⎢ ⎥ ⎢ ⎥

𝑑𝑥𝑚 𝑑𝑦𝑛
7
in contrast, In the previous section we mostly used derivative notation
116 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

Finally, solve for 𝑑𝑦, we assume [∂𝑦1 𝐺∣ ⋅ ⋅ ⋅ ∣∂𝑦𝑛 𝐺]−1 exists,


⎡ ⎤ ⎡ ⎤
𝑑𝑦1 𝑑𝑥1
⎢ .. ⎥ −1 ⎢ . ⎥
⎣ . ⎦ = −[∂𝑦1 𝐺∣ ⋅ ⋅ ⋅ ∣∂𝑦𝑛 𝐺] [∂𝑥1 𝐺∣ ⋅ ⋅ ⋅ ∣∂𝑥𝑚 𝐺] ⎣ .. ⎦
𝑑𝑦𝑛 𝑑𝑥𝑚
∂𝑦𝑖
Given all of this we can calculate ∂𝑥 𝑗
by simply reading the coeffient 𝑑𝑥𝑗 in the 𝑖-th row. I will
make this idea quite explicit in the examples that follow.

Example 4.3.1. Let’s return to a common calculus III problem. Suppose 𝐹 (𝑥, 𝑦, 𝑧) = 𝑘 for some
constant 𝑘. Find partial derivatives of 𝑥, 𝑦 or 𝑧 with repsect to the remaining variables.
Solution: I’ll use the method of differentials once more:

𝑑𝐹 = 𝐹𝑥 𝑑𝑥 + 𝐹𝑦 𝑑𝑦 + 𝐹𝑧 𝑑𝑧 = 0

We can solve for 𝑑𝑥, 𝑑𝑦 or 𝑑𝑧 provided 𝐹𝑥 , 𝐹𝑦 or 𝐹𝑧 is nonzero respective and these differential
expressions reveal various partial derivatives of interest:
𝐹𝑦 𝐹𝑧 ∂𝑥 𝐹𝑦 ∂𝑥 𝐹𝑧
𝑑𝑥 = − 𝑑𝑦 − 𝑑𝑧 ⇒ =− & =−
𝐹𝑥 𝐹𝑥 ∂𝑦 𝐹𝑥 ∂𝑧 𝐹𝑥

𝐹𝑥 𝐹𝑧 ∂𝑦 𝐹𝑥 ∂𝑦 𝐹𝑧
𝑑𝑦 = − 𝑑𝑥 − 𝑑𝑧 ⇒ =− & =−
𝐹𝑦 𝐹𝑦 ∂𝑥 𝐹𝑦 ∂𝑧 𝐹𝑦
𝐹𝑥 𝐹𝑦 ∂𝑧 𝐹𝑥 ∂𝑧 𝐹𝑦
𝑑𝑧 = − 𝑑𝑥 − 𝑑𝑦 ⇒ =− & =−
𝐹𝑧 𝐹𝑧 ∂𝑥 𝐹𝑧 ∂𝑦 𝐹𝑧
In each case above, the implicit function theorem allows us to solve for one variable in terms of the
remaining two. If the partial derivative of 𝐹 in the denominator are zero then the implicit function
theorem does not apply and other thoughts are required. Often calculus text give the following as a
homework problem:
∂𝑥 ∂𝑦 ∂𝑧 𝐹𝑦 𝐹𝑧 𝐹𝑥
=− = −1.
∂𝑦 ∂𝑧 ∂𝑥 𝐹𝑥 𝐹𝑦 𝐹𝑧
In the equation above we have 𝑥 appear as a dependent variable on 𝑦, 𝑧 and also as an independent
variable for the dependent variable 𝑧. These mixed expressions are actually of interest to engineering
and physics. The less mbiguous notation below helps better handle such expressions:
( ) ( ) ( )
∂𝑥 ∂𝑦 ∂𝑧
= −1.
∂𝑦 𝑧 ∂𝑧 𝑥 ∂𝑥 𝑦

In each part of the expression we have clearly denoted which variables are taken to depend on the
others and in turn what sort of partial derivative we mean to indicate. Partial derivatives are not
taken alone, they must be done in concert with an understanding of the totality of the indpendent
variables for the problem. We hold all the remaining indpendent variables fixed as we take a partial
derivative.
4.3. IMPLICIT DIFFERENTIATION 117

The explicit independent variable notation is more important for problems where we can choose
more than one set of indpendent variables for a given dependent variables. In the example that
follows we study 𝑤
( =)𝑤(𝑥, 𝑦) but we could( ∂𝑤just
) as well consider 𝑤( ∂𝑤
= 𝑤(𝑥, 𝑧). Generally it will not
be the case that ∂𝑤
)
is the same as . In calculation of we hold 𝑦 constant as we
(∂𝑥
∂𝑤
)𝑦 ∂𝑥 𝑧 ∂𝑥 𝑦
vary 𝑥 whereas in ∂𝑥 𝑧 we hold 𝑧 constant as we vary 𝑥. There is no reason these ought to be the
same8 .
Example 4.3.2. Suppose 𝑥+𝑦+𝑧+𝑤 = 3 and 𝑥2 −2𝑥𝑦𝑧+𝑤3 = 5. Calculate partial derivatives
of 𝑧 and 𝑤 with respect to the independent variables 𝑥, 𝑦. Solution: we begin by calculation
of the differentials of both equations:

𝑑𝑥 + 𝑑𝑦 + 𝑑𝑧 + 𝑑𝑤 = 0
(2𝑥 − 2𝑦𝑧)𝑑𝑥 − 2𝑥𝑧𝑑𝑦 − 2𝑥𝑦𝑑𝑧 + 3𝑤2 𝑑𝑤 = 0

We can solve for (𝑑𝑧, 𝑑𝑤). In this calculation we can treat the differentials as formal variables.

𝑑𝑧 + 𝑑𝑤 = −𝑑𝑥 − 𝑑𝑦
−2𝑥𝑦𝑑𝑧 + 3𝑤2 𝑑𝑤 = −(2𝑥 − 2𝑦𝑧)𝑑𝑥 + 2𝑥𝑧𝑑𝑦

I find matrix notation is often helpful,


[ ][ ] [ ]
1 1 𝑑𝑧 −𝑑𝑥 − 𝑑𝑦
=
−2𝑥𝑦 3𝑤2 𝑑𝑤 −(2𝑥 − 2𝑦𝑧)𝑑𝑥 + 2𝑥𝑧𝑑𝑦

Use Kramer’s rule, multiplication by inverse, substitution, adding/subtracting equations etc... what-
ever technique of solving linear equations you prefer. Our goal is to solve for 𝑑𝑧 and 𝑑𝑤 in terms
of 𝑑𝑥 and 𝑑𝑦. I’ll use Kramer’s rule this time:
[ ]
−𝑑𝑥 − 𝑑𝑦 1
𝑑𝑒𝑡
−(2𝑥 − 2𝑦𝑧)𝑑𝑥 + 2𝑥𝑧𝑑𝑦 3𝑤2 3𝑤2 (−𝑑𝑥 − 𝑑𝑦) + (2𝑥 − 2𝑦𝑧)𝑑𝑥 − 2𝑥𝑧𝑑𝑦
𝑑𝑧 = =
3𝑤2 + 2𝑥𝑦
[ ]
1 1
𝑑𝑒𝑡
−2𝑥𝑦 3𝑤2

Collecting terms,
−3𝑤2 + 2𝑥 − 2𝑦𝑧 −3𝑤2 − 2𝑥𝑧
( ) ( )
𝑑𝑧 = 𝑑𝑥 + 𝑑𝑦
3𝑤2 + 2𝑥𝑦 3𝑤2 + 2𝑥𝑦
From the expression above we can read various implicit derivatives,

−3𝑤2 + 2𝑥 − 2𝑦𝑧 −3𝑤2 − 2𝑥𝑧


( ) ( )
∂𝑧 ∂𝑧
= & =
∂𝑥 𝑦 3𝑤2 + 2𝑥𝑦 ∂𝑦 𝑥 3𝑤2 + 2𝑥𝑦

(The
∂𝑧
) notation above indicates that 𝑧 is understood to be a function of independent variables 𝑥, 𝑦.
∂𝑥 𝑦 means we take the derivative of 𝑧 with respect to 𝑥 while holding 𝑦 fixed. The appearance
8
a good exercise would be to do the example over but instead aim to calculate partial derivatives for 𝑦, 𝑤 with
respect to independent variables 𝑥, 𝑧
118 CHAPTER 4. INVERSE AND IMPLICIT FUNCTION THEOREMS

of the dependent variable 𝑤 can be removed by using the equations 𝐺(𝑥, 𝑦, 𝑧, 𝑤) = (3, 5). Similar
ambiguities exist for implicit differentiation in calculus I. Apply Kramer’s rule once more to solve
for 𝑑𝑤:
[ ]
1 −𝑑𝑥 − 𝑑𝑦
𝑑𝑒𝑡
−2𝑥𝑦 −(2𝑥 − 2𝑦𝑧)𝑑𝑥 + 2𝑥𝑧𝑑𝑦 −(2𝑥 − 2𝑦𝑧)𝑑𝑥 + 2𝑥𝑧𝑑𝑦 − 2𝑥𝑦(𝑑𝑥 + 𝑑𝑦)
𝑑𝑤 = =
3𝑤2 + 2𝑥𝑦
[ ]
1 1
𝑑𝑒𝑡
−2𝑥𝑦 3𝑤2

Collecting terms, ( ) ( )
−2𝑥 + 2𝑦𝑧 − 2𝑥𝑦 2𝑥𝑧𝑑𝑦 − 2𝑥𝑦𝑑𝑦
𝑑𝑤 = 𝑑𝑥 + 𝑑𝑦
3𝑤2 + 2𝑥𝑦 3𝑤2 + 2𝑥𝑦
We can read the following from the differential above:
( ) ( )
∂𝑤 −2𝑥 + 2𝑦𝑧 − 2𝑥𝑦 ∂𝑤 2𝑥𝑧𝑑𝑦 − 2𝑥𝑦𝑑𝑦
= & =
∂𝑥 𝑦 3𝑤2 + 2𝑥𝑦 ∂𝑦 𝑥 3𝑤2 + 2𝑥𝑦

You should ask: where did we use the implicit function theorem in the preceding example? Notice
our underlying hope is that we can solve for 𝑧 = 𝑧(𝑥, [ 𝑦) and 𝑤 = 𝑤(𝑥,
] 𝑦). The implicit function
∂𝐺 1 1
theorem states this is possible precisely when ∂(𝑧,𝑤) = is non singular. Interestingly
−2𝑥𝑦 3𝑤2
this is the same matrix we must [ consider to ]isolate 𝑑𝑧 and 𝑑𝑤. The calculations of the example
1 1
are only meaningful if the 𝑑𝑒𝑡 ∕= 0. In such a case the implicit function theorem
−2𝑥𝑦 3𝑤2
applies and it is reasonable to suppose 𝑧, 𝑤 can be written as functions of 𝑥, 𝑦.

Example 4.3.3. Suppose the temperature in a room is given by 𝑇 (𝑥, 𝑦, 𝑧) = 70 + 10𝑒𝑥𝑝(−𝑥2 − 𝑦 2 ).


Find how the temperature varies on a sphere 𝑥2 + 𝑦 2 + 𝑧 2 = 𝑅2 . We can choose any one
variable from (𝑥, 𝑦, 𝑧) and write it as a function of the remaining two on the sphere. However, we
do need to a
Chapter 5

geometry of level sets

Our goal in this chapter is to develop a few tools to analyze the geometry of solution sets to equa-
tion(s) in ℝ𝑛 . These solution sets are commonly called level sets. I assume the reader is already
familiar with the concept of level curves and surfaces from multivariate calculus. We go much fur-
ther in this chapter. Our goal is to describe the tangent and normal spaces for a 𝑝-dimensional level
set in ℝ𝑛 . The dimension of the level set is revealed by its tangent space and we discuss conditions
which are sufficient to insure the invariance of this dimension over the entirety of the level set. In
contrast, the dimension of the normal space to a 𝑝-dimensional level set in ℝ𝑛 is 𝑛 − 𝑝. The theory
of orthogonal complements is borrowed from linear algebra to help understand how all of this fits
together at a given point on the level set. Finally, we use this geometry and a few simple lemmas
to justify the method of Lagrange multipliers. Lagrange’s technique and the theory of multivariate
Taylor polynomials form the basis for analyzing extrema for multivariate functions. In short, this
chapter deals with the question of extrema on the edges of a set whereas the next chapter deals
with the interior point via the theory of quadratic forms applied to the second-order approximation
to a function of several variables. Finally, we should mention that 𝑝-dimensional level sets provide
examples of 𝑝-dimensional manifolds, however, we defer careful discussion of manifolds for a later
chapter.

5.1 definition of level set


A level set is the solution set of some equation or system of equations. We confine our interest to
level sets of ℝ𝑛 . For example, the set of all (𝑥, 𝑦) that satisfy

𝐺(𝑥, 𝑦) = 𝑘

is called a level curve in ℝ2 . Often we can use 𝑘 to label the curve. You should also recall level
surfaces in ℝ3 are defined by an equation of the form

𝐺(𝑥, 𝑦, 𝑧) = 𝑘.

119
120 CHAPTER 5. GEOMETRY OF LEVEL SETS

The set of all (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) ∈ ℝ4 which solve 𝐺(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = 𝑘 is a level volume in ℝ4 . We


can obtain lower dimensional objects by simultaneously imposing several equations at once. For
example, suppose 𝐺1 (𝑥, 𝑦, 𝑧) = 𝑧 = 1 and 𝐺2 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 + 𝑧 2 = 5, points (𝑥, 𝑦, 𝑧) which solve
both of these equations are on the intersection of the plane 𝑧 = 1 and the sphere 𝑥2 + 𝑦 2 + 𝑧 2 = 5.
Let 𝐺 = (𝐺1 , 𝐺2 ), note that 𝐺(𝑥, 𝑦, 𝑧) = (1, 5) describes a circle in ℝ3 . More generally:

Definition 5.1.1.
Suppose 𝐺 : 𝑑𝑜𝑚(𝐺) ⊆ ℝ𝑛 → ℝ𝑝 . Let 𝑘 be a vector of constants in ℝ𝑝 and suppose
𝑆 = {𝑥 ∈ ℝ𝑛 ∣ 𝐺(𝑥) = 𝑘} is non-empty and 𝐺 is continuously differentiable on an open
set containing 𝑆. We say 𝑆 is an (𝑛 − 𝑝)-dimensional level set iff 𝐺′ (𝑥) has 𝑝 linearly
independent rows at each 𝑥 ∈ 𝑆.
The condition of linear independence of the rows is give to eliminate possible redundancy in the
system of equations. In the case that 𝑝 = 1 the criteria reduces to the conditon level function has
𝐺′ (𝑥) ∕= 0 over the level set of dimension 𝑛 − 1. Intuitively we think of each equation in 𝐺(𝑥) = 𝑘
as removing one of the dimensions of the ambient space ℝ𝑛 . It is worthwhile to cite a useful result
from linear algebra at this point:

Proposition 5.1.2.

Let 𝐴 ∈ ℝ 𝑚×𝑛 . The number of linearly independent columns in 𝐴 is the same as the
number of linearly independent rows in 𝐴. This invariant of 𝐴 is called the rank of 𝐴.
Given the wisdom of linear algebra we see that we should require a (𝑛 − 𝑝)-dimensional level set
𝑆 = 𝐺−1 (𝑘) to have a level function 𝐺 : ℝ𝑛 → ℝ𝑝 whose derivative is of rank 𝑝 over all of 𝑆. We
can either analyze linear independence of columns or rows.

Example 5.1.3. Consider 𝐺(𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 − 𝑧 2 and suppose 𝑆 = 𝐺−1 {0}. Calculate,

𝐺′ (𝑥, 𝑦, 𝑧) = [2𝑥, 2𝑦, −2𝑧]

Notice that (0, 0, 0) ∈ 𝑆 and 𝐺′ (0, 0, 0) = [0, 0, 0] hence 𝑆 is not rank one at the origin. At all
other points in 𝑆 we have 𝐺′ (𝑥, 𝑦, 𝑧) ∕= 0 which means this is almost a 3 − 1 = 2-dimensional
level set. However, almost is not good enough in math. Under our definition the cone 𝑆 is not a
2-dimensional level set since it fails to meet the full-rank criteria at the point of the cone.

Example 5.1.4. Let 𝐺(𝑥, 𝑦, 𝑧) = (𝑥, 𝑦) and define 𝑆 = 𝐺−1 (𝑎, 𝑏) for some fixed pair of constants
𝑎, 𝑏 ∈ ℝ. We calculate that 𝐺′ (𝑥, 𝑦, 𝑧) = 𝐼2 ∈ ℝ2×2 . We clearly have rank two at all points in 𝑆
hence 𝑆 is a 3 − 2 = 1-dimensional level set. Perhaps you realize 𝑆 is the vertical line which passes
through (𝑎, 𝑏, 0) in the 𝑥𝑦-plane.
5.2. TANGENTS AND NORMALS TO A LEVEL SET 121

5.2 tangents and normals to a level set


There are many ways to define a tangent space for some subset of ℝ𝑛 . One natural definition is
that the tangent space to 𝑝 ∈ 𝑆 is simply the set of all tangent vectors to curves on 𝑆 which pass
through the point 𝑝. In this section we study the geometry of curves on a level-set. We’ll see how
the tangent space is naturally a vector space in the particular context of level-sets in ℝ𝑛 .

Throughout this section we assume that 𝑆 is a 𝑘-dimensional level set defined by 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝


where 𝐺−1 (𝑐) = 𝑆. This means that we can apply the implicit function theorem to 𝑆 and for
any given point 𝑝 = (𝑝𝑥 , 𝑝𝑦 ) ∈ 𝑆 where 𝑝𝑥 ∈ ℝ𝑘 and 𝑝𝑦 ∈ ℝ𝑝 . There exists a local continuously
differentiable solution ℎ : 𝑈 ⊆ ℝ𝑘 → ℝ𝑝 such that ℎ(𝑝𝑥 ) = 𝑝𝑦 and for all 𝑥 ∈ 𝑈 we have 𝐺(𝑥, ℎ(𝑥)) =
𝑐. We can view 𝐺(𝑥, 𝑦) = 𝑐 for 𝑥 near 𝑝 as the graph of 𝑦 = ℎ(𝑥) for 𝑥 ∈ 𝑈 . With the set-up above
in mind, suppose that 𝛾 : ℝ → 𝑈 ⊆ 𝑆. If we write 𝛾 = (𝛾𝑥 , 𝛾𝑦 ) then it follows 𝛾 = (𝛾𝑥 , ℎ ∘ 𝛾𝑥 )
over the subset 𝑈 × ℎ(𝑈 ) of 𝑆. More explicitly, for all 𝑡 ∈ ℝ such that 𝛾(𝑡) ∈ 𝑈 × ℎ(𝑈 ) we have
𝛾(𝑡) = (𝛾𝑥 (𝑡), ℎ(𝛾𝑥 (𝑡))). Therefore, if 𝛾(0) = 𝑝 then 𝛾(0) = (𝑝𝑥 , ℎ(𝑝𝑥 )). Differentiate, use the
chain-rule in the second factor to obtain: 𝛾 ′ (𝑡) = (𝛾𝑥′ (𝑡), ℎ′ (𝛾𝑥 (𝑡))𝛾𝑥′ (𝑡)). We find that the tangent
vector to 𝑝 ∈ 𝑆 of 𝛾 has a rather special form which was forced on us by the implicit function
theorem: 𝛾 ′ (0) = (𝛾𝑥′ (0), ℎ′ (𝑝𝑥 )𝛾𝑥′ (0)). Or to cut through the notation a bit, if 𝛾 ′ (0) = 𝑣 = (𝑣𝑥 , 𝑣𝑦 )
then 𝑣 = (𝑣𝑥 , ℎ′ (𝑝𝑥 )𝑣𝑥 ). The second component of the vector is not free of the first, it essentially
redundant. This makes us suspect that the tangent space to 𝑆 at 𝑝 is 𝑘-dimensional.

Theorem 5.2.1.

Let 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 be a level-mappping which defines a 𝑘-dimensional level set 𝑆


by 𝐺−1 (𝑐) = 𝑆. Suppose 𝛾1 , 𝛾2 : ℝ → 𝑆 are differentiable curves with 𝛾1′ (0) = 𝑣1 and
𝛾2′ (0) = 𝑣2 then there exists a differentiable curve 𝛾 : ℝ → 𝑆 such that 𝛾 ′ (0) = 𝑣1 + 𝑣2 and
𝛾(0) = 𝑝. Moreover, there exists a differentiable curve 𝛽 : ℝ → 𝑆 such that 𝛽 ′ (0) = 𝑐𝑣1 and
𝛽(0) = 𝑝.
Proof: It is convenient to define a map which gives a local parametrization of 𝑆 at 𝑝. Since
we have a description of 𝑆 locally as a graph 𝑦 = ℎ(𝑥) (near 𝑝) it is simple to construct the
parameterization. Define Φ : 𝑈 ⊆ ℝ𝑘 → 𝑆 by Φ(𝑥) = (𝑥, ℎ(𝑥)). Clearly Φ(𝑈 ) = 𝑈 × ℎ(𝑈 ) and
there is an inverse mapping Φ−1 (𝑥, 𝑦) = 𝑥 is well-defined since 𝑦 = ℎ(𝑥) for each (𝑥, 𝑦) ∈ 𝑈 × ℎ(𝑈 ).
Let 𝑤 ∈ ℝ𝑘 and observe that

𝜓(𝑡) = Φ(Φ−1 (𝑝) + 𝑡𝑤) = Φ(𝑝𝑥 + 𝑡𝑤) = (𝑝𝑥 + 𝑡𝑤, ℎ(𝑝𝑥 + 𝑡𝑤))

is a curve from ℝ to 𝑈 ⊆ 𝑆 such that 𝜓(0) = (𝑝𝑥 , ℎ(𝑝𝑥 )) = (𝑝𝑥 , 𝑝𝑦 ) = 𝑝 and using the chain rule on
the final form of 𝜓(𝑡):
𝜓 ′ (0) = (𝑤, ℎ′ (𝑝𝑥 )𝑤).
The construction above shows that any vector of the form (𝑣𝑥 , ℎ′ (𝑝𝑥 )𝑣𝑥 ) is the tangent vector of a
particular differentiable curve in the level set (differentiability of 𝜓 follows from the differentiability
of ℎ and the other maps which we used to construct 𝜓). In particular we can apply this to the
case 𝑤 = 𝑣1𝑥 + 𝑣2𝑥 and we find 𝛾(𝑡) = Φ(Φ−1 (𝑝) + 𝑡(𝑣1𝑥 + 𝑣2𝑥 )) has 𝛾 ′ (0) = 𝑣1 + 𝑣2 and 𝛾(0) = 𝑝.
122 CHAPTER 5. GEOMETRY OF LEVEL SETS

Likewise, apply the construction to the case 𝑤 = 𝑐𝑣1𝑥 to write 𝛽(𝑡) = Φ(Φ−1 (𝑝) + 𝑡(𝑐𝑣1𝑥 )) with
𝛽 ′ (0) = 𝑐𝑣1 and 𝛽(0) = 𝑝. □

The idea of the proof is encapsulated in the picture below. This idea of mapping lines in a flat
domain to obtain standard curves in a curved domain is an idea which plays over and over as you
study manifold theory. The particular redundancy of the 𝑥 and 𝑦 sub-vectors is special to the
discussion level-sets, however anytime we have a local parametrization we’ll be able to construct
curves with tangents of our choosing by essentially the same construction. In fact, there are in-
finitely many curves which produce a particular tangent vector in the tangent space of a manifold.

XXX - read this section again for improper, premature use of the term ”manifold”

Theorem 5.2.1 shows that the definition given below is logical. In particular, it is not at all obvious
that the sum of two tangent vectors ought to again be a tangent vector. However, that is just what
the Theorem 5.2.1 told us for level-sets1 .

Definition 5.2.2.

Suppose 𝑆 is a 𝑘-dimensional level-set defined by 𝑆 = 𝐺−1 {𝑐} for 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 . We


define the tangent space at 𝑝 ∈ 𝑆 to be the set of pairs:

𝑇𝑝 𝑆 = {(𝑝, 𝑣) ∣ there exists differentiable 𝛾 : ℝ → 𝑆 and 𝛾(0) = 𝑝 where 𝑣 = 𝛾 ′ (0)}

Moreover, we define (i.) addition and (ii.) scalar multiplication of vectors by the rules

(𝑖.) (𝑝, 𝑣1 ) + (𝑝, 𝑣2 ) = (𝑝, 𝑣1 + 𝑣2 ) (𝑖𝑖.) 𝑐(𝑝, 𝑣1 ) = (𝑝, 𝑐𝑣1 )

for all (𝑝, 𝑣1 ), (𝑝, 𝑣2 ) ∈ 𝑇𝑝 𝑆 and 𝑐 ∈ ℝ.


When I picture 𝑇𝑝 𝑆 in my mind I think of vectors pointing out from the base-point 𝑝. To make
an explicit connection between the pairs of the above definition and the classical geometric form
of the tangent space we simply take the image of 𝑇𝑝 𝑆 under the mapping Ψ(𝑥, 𝑦) = 𝑥 + 𝑦 thus
Ψ(𝑇𝑝 𝑆) = {𝑝 + 𝑣 ∣ (𝑝, 𝑣) ∈ 𝑇𝑝 𝑆}. I often picture 𝑇𝑝 𝑆 as 𝜓(𝑇𝑝 𝑆)2

1
technically, there is another logical gap which I currently ignore. I wonder if you can find it.
2
In truth, as you continue to study manifold theory you’ll find at least three seemingly distinct objects which are
all called ”tangent vectors”; equivalence classes of curves, derivations, contravariant tensors.
5.2. TANGENTS AND NORMALS TO A LEVEL SET 123

We could set out to calculate tangent spaces in view of the definition above, but we are actually
interested in more than just the tangent space for a level-set. In particular. we want a concrete
description of all the vectors which are not in the tangent space.
Definition 5.2.3.
Suppose 𝑆 is a 𝑘-dimensional level-set defined by 𝑆 = 𝐺−1 {𝑐} for 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 and
𝑇𝑝 𝑆 is the tangent space at 𝑝. Note that 𝑇𝑝 𝑆 ≤ 𝑉𝑝 where 𝑉𝑝 = {𝑝} × ℝ𝑘 × ℝ𝑝 is given the
natural vector space structure which we already exhibited on the subspace 𝑇𝑝 𝑆. We define
the inner product on 𝑉𝑝 as follows: for all (𝑝, 𝑣), (𝑝, 𝑤) ∈ 𝑉𝑝 ,

(𝑝, 𝑣) ⋅ (𝑝, 𝑤) = 𝑣 ⋅ 𝑤.

The length of a vector (𝑝, 𝑣) is naturally defined by ∣∣(𝑝, 𝑣)∣∣ = ∣∣𝑣∣∣. Moreover, we say two
vectors (𝑝, 𝑣), (𝑝, 𝑤) ∈ 𝑉𝑝 are orthogonal iff 𝑣 ⋅ 𝑤 = 0. Given a set of vectors 𝑅 ⊆ 𝑉𝑝 we
define the orthogonal complement by

𝑅⊥ = {(𝑝, 𝑣) ∈ 𝑉𝑝 ∣ (𝑝, 𝑣) ⋅ (𝑝, 𝑟) for all (𝑝, 𝑟) ∈ 𝑅}.

Suppose 𝑊1 , 𝑊2 ⊆ 𝑉𝑝 then we say 𝑊1 is orthogonal to 𝑊2 iff 𝑤1 ⋅ 𝑤2 = 0 for all 𝑤1 ∈ 𝑊1


and 𝑤2 ∈ 𝑊2 . We denote orthogonality by writing 𝑊1 ⊥ 𝑊2 . If every 𝑣 ∈ 𝑉𝑝 can be written
as 𝑣 = 𝑤1 + 𝑤2 for a pair of 𝑤1 ∈ 𝑊1 and 𝑤2 ∈ 𝑊2 where 𝑊1 ⊥ 𝑊2 then we say that 𝑉𝑝 is
the direct sum of 𝑊1 and 𝑊2 which is denoted by 𝑉𝑝 = 𝑊1 ⊕ 𝑊2 .
There is much more to say about orthogonality, however, our focus is not in that vein. We just
need the langauge to properly define the normal space. The calculation below is probably the most
important calculation to understand for a level-set. Suppose we have a curve 𝛾 : ℝ → 𝑆 where
𝑆 = 𝐺−1 (𝑐) is a 𝑘-dimensional level-set in ℝ𝑘 × ℝ𝑝 . Observe that for all 𝑡 ∈ ℝ,

𝐺(𝛾(𝑡)) = 𝑐 ⇒ 𝐺′ (𝛾(𝑡))𝛾 ′ (𝑡) = 0.

In particular, suppose for 𝑡 = 0 we have 𝛾(0) = 𝑝 and 𝑣 = 𝛾 ′ (0) which makes (𝑝, 𝑣) ∈ 𝑇𝑝 𝑆 with

𝐺′ (𝑝)𝑣 = 0.

Recall 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 has an 𝑝 × 𝑛 derivative matrix where the 𝑗-th row is the gradient vector
of the 𝑗-th component function. The equation 𝐺′ (𝑝)𝑣 = 0 gives us 𝑝-independent equations as
we examine it componentwise. In particular, it reveals that (𝑝, 𝑣) is orthogonal to ∇𝐺𝑗 (𝑝) for
𝑗 = 1, 2, . . . , 𝑝. We have derived the following theorem:
Theorem 5.2.4.
Let 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 be a level-mappping which defines a 𝑘-dimensional level set 𝑆 by
𝐺−1 (𝑐) = 𝑆. The gradient vectors ∇𝐺𝑗 (𝑝) are perpendicular to the tangent space at 𝑝; for
each 𝑗 ∈ ℕ𝑝
(𝑝, ∇(𝐺𝑗 (𝑝))𝑇 ) ∈ (𝑇𝑝 𝑆)⊥ .
124 CHAPTER 5. GEOMETRY OF LEVEL SETS

It’s time to do some counting. Observe that the mapping 𝜙 : ℝ𝑘 → 𝑇𝑝 𝑆 defined by 𝜙(𝑣) = (𝑝, 𝑣)
is an isomorphism of vector spaces hence 𝑑𝑖𝑚(𝑇𝑝 𝑆) = 𝑘. But, by the same isomorphism we can
see that 𝑉𝑝 = 𝜙(ℝ𝑘 × ℝ𝑝 ) hence 𝑑𝑖𝑚(𝑉𝑝 ) = 𝑝 + 𝑘. In linear algebra we learn that if we have a
𝑘-dimensional subspace 𝑊 of an 𝑛-dimensional vector space 𝑉 then the orthogonal complement
𝑊 ⊥ is a subspace of 𝑉 with codimension 𝑘. The term codimension is used to indicate a loss
of dimension from the ambient space, in particular 𝑑𝑖𝑚(𝑊 ⊥ ) = 𝑛 − 𝑘. We should note that the
direct sum of 𝑊 and 𝑊 ⊥ covers the whole space; 𝑊 ⊕ 𝑊 ⊥ = 𝑉 . In the case of the tangent space,
the codimension of 𝑇𝑝 𝑆 ≤ 𝑉𝑝 is found to be 𝑝 + 𝑘 − 𝑘 = 𝑝. Thus 𝑑𝑖𝑚(𝑇𝑝 𝑆)⊥ = 𝑝. Any basis for
this space must consist of 𝑝 linearly independent vectors which are all orthogonal to the tangent
space. Naturally, the subset of vectors {(𝑝, (∇𝐺𝑗 (𝑝))𝑇 )𝑝𝑗=1 forms just such a basis since it is given
to be linearly independent by the 𝑟𝑎𝑛𝑘(𝐺′ (𝑝)) = 𝑝 condition. It follows that:

(𝑇𝑝 𝑆)⊥ ≈ 𝑅𝑜𝑤(𝐺′ (𝑝))

where equality can be obtained by the slightly tedious equation (𝑇𝑝 𝑆)⊥ = 𝜙(𝐶𝑜𝑙(𝐺′ (𝑝)𝑇 )) . That
equation simply does the following:
1. transpose 𝐺′ (𝑝) to swap rows to columns

2. construct column space by taking span of columns in 𝐺′ (𝑝)𝑇

3. adjoin 𝑝 to make pairs of vectors which live in 𝑉𝑝


many wiser authors wouldn’t bother. The comments above are primarily about notation. Certainly
hiding these details would make this section prettier, however, would it make it better? Finally, I
once more refer the reader to linear algebra where we learn that (𝑅𝑜𝑤(𝐴))⊥ = 𝑁 𝑢𝑙𝑙(𝐴𝑇 ). Let me
walk you through the proof: let 𝐴 ∈ ℝ 𝑚×𝑛 . Observe 𝑣 ∈ 𝑁 𝑢𝑙𝑙(𝐴𝑇 ) iff 𝐴𝑇 𝑣 = 0 for 𝑣 ∈ ℝ𝑚 iff
𝑣 𝑇 𝐴 = 0 iff 𝑣 𝑇 𝑐𝑜𝑙𝑗 (𝐴) = 0 for 𝑗 = 1, 2, . . . , 𝑛 iff 𝑣 ⋅ 𝑐𝑜𝑙𝑗 (𝐴) = 0 for 𝑗 = 1, 2, . . . , 𝑛 iff 𝑣 ∈ 𝐶𝑜𝑙(𝐴)⊥ .
Another useful identity for the ”perp” is that (𝐴⊥ )⊥ = 𝐴. With those two gems in mind consider
that:
(𝑇𝑝 𝑆)⊥ ≈ 𝑅𝑜𝑤(𝐺′ (𝑝)) ⇒ 𝑇𝑝 𝑆 ≈ 𝑅𝑜𝑤(𝐺′ (𝑝))⊥ = 𝑁 𝑢𝑙𝑙(𝐺′ (𝑝)𝑇 )
Let me once more replace ≈ by a more tedious, but explicit, procedure:

𝑇𝑝 𝑆 = 𝜙(𝑁 𝑢𝑙𝑙(𝐺′ (𝑝)𝑇 ))

Theorem 5.2.5.
Let 𝐺 : ℝ𝑘 × ℝ𝑝 → ℝ𝑝 be a level-mappping which defines a 𝑘-dimensional level set 𝑆 by
𝐺−1 (𝑐) = 𝑆. The tangent space 𝑇𝑝 𝑆 and the normal space at 𝑝 ∈ 𝑆 are given by

𝑇𝑝 𝑆 = {𝑝} × 𝑁 𝑢𝑙𝑙(𝐺′ (𝑝)𝑇 ) & 𝑇𝑝 𝑆 ⊥ = {𝑝} × 𝐶𝑜𝑙(𝐺′ (𝑝)𝑇 ).

Moreover, 𝑉𝑝 = 𝑇𝑝 𝑆 ⊕ 𝑇𝑝 𝑆 ⊥ . Every vector can be uniquely written as the sum of a tangent


vector and a normal vector.
5.2. TANGENTS AND NORMALS TO A LEVEL SET 125

The fact that there are only tangents and normals is the key to the method of Lagrange multipliers.
It forces two seemingly distinct objects to be in the same direction as one another.

Example 5.2.6. Let 𝑔 : ℝ4 → ℝ be defined by 𝑔(𝑥, 𝑦, 𝑧, 𝑡) = 𝑡+𝑥2 +𝑦 2 −2𝑧 2 note that 𝑔(𝑥, 𝑦, 𝑧, 𝑡) = 0
gives a three dimensional subset of ℝ4 , let’s call it 𝑀 . Notice ∇𝑔 =< 2𝑥, 2𝑦, −4𝑧, 1 > is nonzero
everywhere. Let’s focus on the point (2, 2, 1, 0) note that 𝑔(2, 2, 1, 0) = 0 thus the point is on 𝑀 .
The tangent plane at (2, 2, 1, 0) is formed from the union of all tangent vectors to 𝑔 = 0 at the
point (2, 2, 1, 0). To find the equation of the tangent plane we suppose 𝛾 : ℝ → 𝑀 is a curve with
𝛾 ′ ∕= 0 and 𝛾(0) = (2, 2, 1, 0). By assumption 𝑔(𝛾(𝑠)) = 0 since 𝛾(𝑠) ∈ 𝑀 for all 𝑠 ∈ ℝ. Define
𝛾 ′ (0) =< 𝑎, 𝑏, 𝑐, 𝑑 >, we find a condition from the chain-rule applied to 𝑔 ∘ 𝛾 = 0 at 𝑠 = 0,

𝑑(
𝑔 ∘ 𝛾(𝑠) = ∇𝑔 (𝛾(𝑠)) ⋅ 𝛾 ′ (𝑠) = 0
) ( )
⇒ ∇𝑔(2, 2, 1, 0) ⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0
𝑑𝑠
⇒ < 4, 4, −4, 1 > ⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0
⇒ 4𝑎 + 4𝑏 − 4𝑐 + 𝑑 = 0

Thus the equation of the tangent plane is 4(𝑥 − 2) + 4(𝑦 − 2) − 4(𝑧 − 1) + 𝑡 = 0. In invite the
reader to find a vector in the tangent plane and check it is orthogonal to ∇𝑔(2, 2, 1, 0). However,
this should not be surprising, the condition the chain rule just gave us is just the statement that
< 𝑎, 𝑏, 𝑐, 𝑑 >∈ 𝑁 𝑢𝑙𝑙(∇𝑔(2, 2, 1, 0)𝑇 ) and that is precisely the set of vector orthogonal to ∇𝑔(2, 2, 1, 0).

Example 5.2.7. Let 𝐺 : ℝ4 → ℝ2 be defined by 𝐺(𝑥, 𝑦, 𝑧, 𝑡) = (𝑧 + 𝑥2 + 𝑦 2 − 2, 𝑧 + 𝑦 2 + 𝑡2 − 2). In


this case 𝐺(𝑥, 𝑦, 𝑧, 𝑡) = (0, 0) gives a two-dimensional manifold in ℝ4 let’s call it 𝑀 . Notice that
𝐺1 = 0 gives 𝑧 + 𝑥2 + 𝑦 2 = 2 and 𝐺2 = 0 gives 𝑧 + 𝑦 2 + 𝑡2 = 2 thus 𝐺 = 0 gives the intersection of
both of these three dimensional manifolds in ℝ4 (no I can’t ”see” it either). Note,

∇𝐺1 =< 2𝑥, 2𝑦, 1, 0 > ∇𝐺2 =< 0, 2𝑦, 1, 2𝑡 >

It turns out that the inverse mapping theorem says 𝐺 = 0 describes a manifold of dimension 2 if
the gradient vectors above form a linearly independent set of vectors. For the example considered
here the gradient vectors are linearly dependent at the origin since ∇𝐺1 (0) = ∇𝐺2 (0) = (0, 0, 1, 0).
In fact, these gradient vectors are colinear along along the plane 𝑥 = 𝑡 = 0 since ∇𝐺1 (0, 𝑦, 𝑧, 0) =
∇𝐺2 (0, 𝑦, 𝑧, 0) =< 0, 2𝑦, 1, 0 >. We again seek to contrast the tangent plane and its normal at
some particular point. Choose (1, 1, 0, 1) which is in 𝑀 since 𝐺(1, 1, 0, 1) = (0 + 1 + 1 − 2, 0 +
1 + 1 − 2) = (0, 0). Suppose that 𝛾 : ℝ → 𝑀 is a path in 𝑀 which has 𝛾(0) = (1, 1, 0, 1) whereas
𝛾 ′ (0) =< 𝑎, 𝑏, 𝑐, 𝑑 >. Note that ∇𝐺1 (1, 1, 0, 1) =< 2, 2, 1, 0 > and ∇𝐺2 (1, 1, 0, 1) =< 0, 2, 1, 1 >.
Applying the chain rule to both 𝐺1 and 𝐺2 yields:

(𝐺1 ∘ 𝛾)′ (0) = ∇𝐺1 (𝛾(0))⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0 ⇒ < 2, 2, 1, 0 > ⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0

(𝐺2 ∘ 𝛾) (0) = ∇𝐺2 (𝛾(0))⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0 ⇒ < 0, 2, 1, 1 > ⋅ < 𝑎, 𝑏, 𝑐, 𝑑 >= 0

This is two equations and four unknowns, we can solve it and write the vector in terms of two free
variables correspondant to the fact the tangent space is two-dimensional. Perhaps it’s easier to use
126 CHAPTER 5. GEOMETRY OF LEVEL SETS

matrix techiques to organize the calculation:


⎡ ⎤
[ ] 𝑎 [ ]
2 2 1 0 ⎢ ⎢ 𝑏 ⎥= 0

0 2 1 1 ⎣ 𝑐 ⎦ 0
𝑑
[ ] [ ]
2 2 1 0 1 0 0 −1/2
We calculate, 𝑟𝑟𝑒𝑓 = . It’s natural to chose 𝑐, 𝑑 as free vari-
0 2 1 1 0 1 1/2 1/2
ables then we can read that 𝑎 = 𝑑/2 and 𝑏 = −𝑐/2 − 𝑑/2 hence
𝑐
< 𝑎, 𝑏, 𝑐, 𝑑 >=< 𝑑/2, −𝑐/2 − 𝑑/2, 𝑐, 𝑑 >= 2 < 0, −1, 2, 0 > + 𝑑2 < 1, −1, 0, 2 >
We can see a basis for the tangent space. In fact, I can give parametric equations for the tangent
space as follows:
𝑋(𝑢, 𝑣) = (1, 1, 0, 1) + 𝑢 < 0, −1, 2, 0 > +𝑣 < 1, −1, 0, 2 >
Not surprisingly the basis vectors of the tangent space are perpendicular to the gradient vectors
∇𝐺1 (1, 1, 0, 1) =< 2, 2, 1, 0 > and ∇𝐺2 (1, 1, 0, 1) =< 0, 2, 1, 1 > which span the normal plane
𝑁𝑝 to the tangent plane 𝑇𝑝 at 𝑝 = (1, 1, 0, 1). We find that 𝑇𝑝 is orthogonal to 𝑁𝑝 . In summary
𝑇𝑝⊥ = 𝑁𝑝 and 𝑇𝑝 ⊕ 𝑁𝑝 = ℝ4 . This is just a fancy way of saying that the normal and the tangent
plane only intersect at zero and they together span the entire ambient space.

5.3 method of Lagrange mulitpliers


Let us begin with a statement of the problem we wish to solve.

Problem: given an objective function 𝑓 : ℝ𝑛 → ℝ and continuously differentiable


constraint function 𝐺 : ℝ𝑛 → ℝ𝑝 , find extreme values for the objective function
𝑓 relative to the constraint 𝐺(𝑥) = 𝑐.
Note that 𝐺(𝑥) = 𝑐 is a vector notation for 𝑝-scalar equations. If we suppose 𝑟𝑎𝑛𝑘(𝐺′ (𝑥)) = 𝑝
then the constraint surface 𝐺(𝑥) = 𝑐 will form an (𝑛 − 𝑝)-dimensional level set. Let us make that
supposition throughout the remainder of this section.

In order to solve a problem it is sometimes helpful to find necessary conditions by assuming an


answer exists. Let us do that here. Suppose 𝑥𝑜 maps to the local extrema of 𝑓 (𝑥𝑜 ) on 𝑆 = 𝐺−1 {𝑐}.
This means there exists an open ball around 𝑥𝑜 for which 𝑓 (𝑥𝑜 ) is either an upper or lower bound
of all the values of 𝑓 over the ball intersected with 𝑆. One clear implication of this data is that
if we take any continuously differentiable curve on 𝑆 which passes through 𝑥𝑜 , say 𝛾 : ℝ → ℝ𝑛
with 𝛾(0) = 𝑥𝑜 and 𝐺(𝛾(𝑡)) = 𝑐 for all 𝑡, then the composite 𝑓 ∘ 𝛾 is a function on ℝ which takes
an extreme value at 𝑡 = 0. Fermat’s theorem from calculus I applies and as 𝑓 ∘ 𝛾 is differentiable
near 𝑡 = 0 we find (𝑓 ∘ 𝛾)′ (0) = 0 is a necessary condition. But, this means we have two necessary
conditions on 𝛾:
5.3. METHOD OF LAGRANGE MULITPLIERS 127

1. 𝐺(𝛾(𝑡)) = 𝑐
2. (𝑓 ∘ 𝛾)′ (0) = 0
Let us expand a bit on both of these conditions:
1. 𝐺′ (𝑥𝑜 )𝛾 ′ (0) = 0
2. 𝑓 ′ (𝑥𝑜 )𝛾 ′ (0) = 0
The first of these conditions places 𝛾 ′ (0) ∈ 𝑇𝑥𝑜 𝑆 but then the second condition says that 𝑓 ′ (𝑥𝑜 ) =
(∇𝑓 )(𝑥𝑜 )𝑇 is orthogonal to 𝛾 ′ (0) hence (∇𝑓 )(𝑥𝑜 )𝑇 ∈ 𝑁𝑥𝑜 . Now, recall from the last section that
the gradient vectors of the component functions to 𝐺 span the normal space, this means any vector
in 𝑁𝑥𝑜 can be written as a linear combination of the gradient vectors. In particular, this means
there exist constants 𝜆1 , 𝜆2 , . . . , 𝜆𝑝 such that
(∇𝑓 )(𝑥𝑜 )𝑇 = 𝜆1 (∇𝐺1 )(𝑥𝑜 )𝑇 + 𝜆2 (∇𝐺2 )(𝑥𝑜 )𝑇 + ⋅ ⋅ ⋅ + 𝜆𝑝 (∇𝐺𝑝 )(𝑥𝑜 )𝑇
We may summarize the method of Lagrange multipliers as follows:

1. choose 𝑛-variables which aptly describe your problem.

2. identify your objective function and write all constraints as level surfaces.

3. solve ∇𝑓 = 𝜆1 ∇𝐺1 + 𝜆2 ∇𝐺2 + ⋅ ⋅ ⋅ + 𝜆𝑝 ∇𝐺𝑝 subject to the constraint 𝐺(𝑥) = 𝑐.

4. test the validity of your proposed extremal points.

The obvious gap in the method is the supposition that an extrema exists for the restriction 𝑓 ∣𝑆 .
Well examine a few examples before I reveal a sufficient condition. We’ll also see how absence of
that sufficient condition does allow the method to fail.
Example 5.3.1. Suppose we wish to find maximum and minimum distance to the origin for points
on the curve 𝑥2 − 𝑦 2 = 1. In this case we can use the distance-squared function as our objective
𝑓 (𝑥, 𝑦) = 𝑥2 + 𝑦 2 and the single constraint function is 𝑔(𝑥, 𝑦) = 𝑥2 − 𝑦 2 . Observe that ∇𝑓 =<
2𝑥, 2𝑦 > whereas ∇𝑔 =< 2𝑥, −2𝑦 >. We seek solutions of ∇𝑓 = 𝜆∇𝑔 which gives us < 2𝑥, 2𝑦 >=
𝜆 < 2𝑥, −2𝑦 >. Hence 2𝑥 = 2𝜆𝑥 and 2𝑦 = −2𝜆𝑦. We must solve these equations subject to the
condition 𝑥2 − 𝑦 2 = 1. Observe that 𝑥 = 0 is not a solution since 0 − 𝑦 2 = 1 has no real solution.
On the other hand, 𝑦 = 0 does fit the constraint and 𝑥2 − 0 = 1 has solutions 𝑥 = ±1. Consider
then
2𝑥 = 2𝜆𝑥 and 2𝑦 = −2𝜆𝑦 ⇒ 𝑥(1 − 𝜆) = 0 and 𝑦(1 + 𝜆) = 0
Since 𝑥 ∕= 0 on the constraint curve it follows that 1 − 𝜆 = 0 hence 𝜆 = 1 and we learn that
𝑦(1 + 1) = 0 hence 𝑦 = 0. Consequently, (1, 0 and (−1, 0) are the two point where we expect to find
extreme-values of 𝑓 . In this case, the method of Lagrange multipliers served it’s purpose, as you
can see in the graph. Below the green curves are level curves of the objective function whereas the
particular red curve is the given constraint curve.
128 CHAPTER 5. GEOMETRY OF LEVEL SETS

The picture below is a screen-shot of the Java applet created by David Lippman and Konrad
Polthier to explore 2D and 3D graphs. Especially nice is the feature of adding vector fields to given
objects, many other plotters require much more effort for similar visualization. See more at the
website: http://dlippman.imathas.com/g1/GrapherLaunch.html.

Note how the gradient vectors to the objective function and constraint function line-up nicely at
those points.

In the previous example, we actually got lucky. There are examples of this sort where we could get
false maxima due to the nature of the constraint function.

Example 5.3.2. Suppose we wish to find the points on the unit circle 𝑔(𝑥, 𝑦) = 𝑥2 + 𝑦 2 = 1 which
give extreme values for the objective function 𝑓 (𝑥, 𝑦) = 𝑥2 − 𝑦 2 . Apply the method of Lagrange
multipliers and seek solutions to ∇𝑓 = 𝜆∇𝑔:

< 2𝑥, −2𝑦 >= 𝜆 < 2𝑥, 2𝑦 >

We must solve 2𝑥 = 2𝑥𝜆 which is better cast as (1 − 𝜆)𝑥 = 0 and −2𝑦 = 2𝜆𝑦 which is nicely written
as (1 + 𝜆)𝑦 = 0. On the basis of these equations alone we have several options:

1. if 𝜆 = 1 then (1 + 1)𝑦 = 0 hence 𝑦 = 0


5.3. METHOD OF LAGRANGE MULITPLIERS 129

2. if 𝜆 = −1 then (1 − (1))𝑥 = 0 hence 𝑥 = 0


But, we also must fit the constraint 𝑥2 + 𝑦 2 = 1 hence we find four solutions:
1. if 𝜆 = 1 then 𝑦 = 0 thus 𝑥2 = 1 ⇒ 𝑥 = ±1 ⇒ (±1, 0)
2. if 𝜆 = −1 then 𝑥 = 0 thus 𝑦 2 = 1 ⇒ 𝑦 = ±1 ⇒ (0, ±1)
We test the objective function at these points to ascertain which type of extrema we’ve located:
𝑓 (0, ±1) = 02 − (±1)2 = −1 & 𝑓 (±1, 0) = (±1)2 − 02 = 1
When constrained to the unit circle we find the objective function attains a maximum value of 1 at
the points (1, 0) and (−1, 0) and a minimum value of −1 at (0, 1) and (0, −1). Let’s illustrate the
answers as well as a few non-answers to get perspective. Below the green curves are level curves of
the objective function whereas the particular red curve is the given constraint curve.

The success of the last example was no accident. The fact that the constraint curve was a circle
which is a closed and bounded subset of ℝ2 means that is is a compact subset of ℝ2 . A well-known
theorem of analysis states that any real-valued continuous function on a compact domain attains
both maximum and minimum values. The objective function is continuous and the domain is
compact hence the theorem applies and the method of Lagrange multipliers succeeds. In contrast,
the constraint curve of the preceding example was a hyperbola which is not compact. We have
no assurance of the existence of any extrema. Indeed, we only found minima but no maxima in
Example 5.3.1.

The generality of the method of Lagrange multipliers is naturally limited to smooth constraint
curves and smooth objective functions. We must insist the gradient vectors exist at all points of
inquiry. Otherwise, the method breaks down. If we had a constraint curve which has sharp corners
then the method of Lagrange breaks down at those corners. In addition, if there are points of dis-
continuity in the constraint then the method need not apply. This is not terribly surprising, even in
calculus I the main attack to analyze extrema of function on ℝ assumed continuity, differentiability
and sometimes twice differentiability. Points of discontinuity require special attention in whatever
context you meet them.

At this point it is doubtless the case that some of you are, to misquote an ex-student of mine, ”not-
impressed”. Perhaps the following examples better illustrate the dangers of non-compact constraint
curves.
130 CHAPTER 5. GEOMETRY OF LEVEL SETS

Example 5.3.3. Suppose we wish to find extrema of 𝑓 (𝑥, 𝑦) = 𝑥 when constrained to 𝑥𝑦 = 1.


Identify 𝑔(𝑥, 𝑦) = 𝑥𝑦 = 1 and apply the method of Lagrange multipliers and seek solutions to
∇𝑓 = 𝜆∇𝑔:
< 1, 0 >= 𝜆 < 𝑦, 𝑥 > ⇒ 1 = 𝜆𝑦 and 0 = 𝜆𝑥

If 𝜆 = 0 then 1 = 𝜆𝑦 is impossible to solve hence 𝜆 ∕= 0 and we find 𝑥 = 0. But, if 𝑥 = 0 then


𝑥𝑦 = 1 is not solvable. Therefore, we find no solutions. Well, I suppose we have succeeded here
in a way. We just learned there is no extreme value of 𝑥 on the hyperbola 𝑥𝑦 = 1. Below the
green curves are level curves of the objective function whereas the particular red curve is the given
constraint curve.

Example 5.3.4. Suppose we wish to find extrema of 𝑓 (𝑥, 𝑦) = 𝑥 when constrained to 𝑥2 − 𝑦 2 = 1.


Identify 𝑔(𝑥, 𝑦) = 𝑥2 − 𝑦 2 = 1 and apply the method of Lagrange multipliers and seek solutions to
∇𝑓 = 𝜆∇𝑔:
< 1, 0 >= 𝜆 < 2𝑥, −2𝑦 > ⇒ 1 = 2𝜆𝑥 and 0 = −2𝜆𝑦

If 𝜆 = 0 then 1 = 2𝜆𝑥 is impossible to solve hence 𝜆 ∕= 0 and we find 𝑦 = 0. If 𝑦 = 0 and 𝑥2 −𝑦 2 = 1


then we must solve 𝑥2 = 1 whence 𝑥 = ±1. We are tempted to conclude that:

1. the objective function 𝑓 (𝑥, 𝑦) = 𝑥 attains a maximum on 𝑥2 −𝑦 2 = 1 at (1, 0) since 𝑓 (1, 0) = 1

2. the objective function 𝑓 (𝑥, 𝑦) = 𝑥 attains a minimum on 𝑥2 − 𝑦 2 = 1 at (−1, 0) since 𝑓 (1, 0) =


−1

√ 2 √
2 = 1 hence (± 2, 1) are points on the constraint
But, both conclusions
√ are
√ false. Note
√ 2 − √1
curve and 𝑓 ( 2, 1) = 2 and 𝑓 (− 2, 1) = − 2. The error of the method of Lagrange multipliers
in this context is the supposition that there exists extrema to find, in this case there are no such
points. It is possible for the gradient vectors to line-up at points where there are no extrema. Below
the green curves are level curves of the objective function whereas the particular red curve is the
given constraint curve.
5.3. METHOD OF LAGRANGE MULITPLIERS 131

Incidentally, if you want additional discussion of Lagrange multipliers for two-dimensional prob-
lems one very nice source I certainly profitted from was the YouTube video by Edward Frenkel of
Berkley. See his website http://math.berkeley.edu/ frenkel/ for links.

Example 5.3.5. XXX- add example from last notes. (chapter 8)

Example 5.3.6. XXX- add example from last notes. (chapter 8)

Example 5.3.7. XXX- add example from last notes. (chapter 8)

XXX—need to polish the notation for normal space. XXX—add examples for level sets.
132 CHAPTER 5. GEOMETRY OF LEVEL SETS
Chapter 6

critical point analysis for several


variables

In the typical calculus sequence you learn the first and second derivative tests in calculus I. Then
in calculus II you learn about power series and Taylor’s Theorem. Finally, in calculus III, in many
popular texts, you learn an essentially ad-hoc procedure for judging the nature of critical points
as minimum, maximum or saddle. These topics are easily seen as disconnected events. In this
chapter, we connect them. We learn that the geometry of quadratic forms is ellegantly revealed by
eigenvectors and more than that this geometry is precisely what elucidates the proper classifications
of critical points of multivariate functions with real values.

6.1 multivariate power series


We set aside the issue of convergence for another course. We will suppose the series discussed in
this section exist on and converge on some domain, but we do not seek to treat that topic here and
now. Our focus is computational. How should we calculate the Taylor series for 𝑓 (𝑥, 𝑦) at (𝑎, 𝑏)?
Or, what about 𝑓 (𝑥) at 𝑥𝑜 ∈ ℝ𝑛 ?.

6.1.1 taylor’s polynomial for one-variable


If 𝑓 : 𝑈 ⊆ ℝ → ℝ is analytic at 𝑥𝑜 ∈ 𝑈 then we can write

∑ 𝑓 (𝑛) (𝑥𝑜 )
1
𝑓 (𝑥) = 𝑓 (𝑥𝑜 ) + 𝑓 ′ (𝑥𝑜 )(𝑥 − 𝑥𝑜 ) + 𝑓 ′′ (𝑥𝑜 )(𝑥 − 𝑥𝑜 )2 + ⋅ ⋅ ⋅ = (𝑥 − 𝑥𝑜 )𝑛
2 𝑛!
𝑛=0

𝑑
We could write this in terms of the operator 𝐷 = 𝑑𝑡 and the evaluation of 𝑡 = 𝑥𝑜


[∑ ]
1 𝑛 𝑛
𝑓 (𝑥) = (𝑥 − 𝑡) 𝐷 𝑓 (𝑡) =
𝑛! 𝑡=𝑥𝑜
𝑛=0

133
134 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

I remind the reader that a function is called entire if it is analytic on all of ℝ, for example 𝑒𝑥 , cos(𝑥)
and sin(𝑥) are all entire. In particular, you should know that:

1 ∑ 1
𝑒 𝑥 = 1 + 𝑥 + 𝑥2 + ⋅ ⋅ ⋅ = 𝑥𝑛
2 𝑛!
𝑛=0

∑ (−1)𝑛 ∞
1 1
cos(𝑥) = 1 − 𝑥2 + 𝑥4 ⋅ ⋅ ⋅ = 𝑥2𝑛
2 4! (2𝑛)!
𝑛=0

1 3 1 ∑ (−1)𝑛 2𝑛+1
sin(𝑥) = 𝑥 − 𝑥 + 𝑥5 ⋅ ⋅ ⋅ = 𝑥
3! 5! (2𝑛 + 1)!
𝑛=0

Since 𝑒𝑥 = cosh(𝑥) + sinh(𝑥) it also follows that



1 1 ∑ 1
cosh(𝑥) = 1 + 𝑥2 + 𝑥4 ⋅ ⋅ ⋅ = 𝑥2𝑛
2 4! (2𝑛)!
𝑛=0


1 3 1 ∑ 1
sinh(𝑥) = 𝑥 + 𝑥 + 𝑥5 ⋅ ⋅ ⋅ = 𝑥2𝑛+1
3! 5! (2𝑛 + 1)!
𝑛=0

The geometric series is often useful, for 𝑎, 𝑟 ∈ ℝ with ∣𝑟∣ < 1 it is known

∑ 𝑎
𝑎 + 𝑎𝑟 + 𝑎𝑟2 + ⋅ ⋅ ⋅ = 𝑎𝑟𝑛 =
1−𝑟
𝑛=0

This generates a whole host of examples, for instance:


1
= 1 − 𝑥2 + 𝑥4 − 𝑥6 + ⋅ ⋅ ⋅
1 + 𝑥2
1
= 1 + 𝑥3 + 𝑥6 + 𝑥9 + ⋅ ⋅ ⋅
1 − 𝑥3
𝑥3
= 𝑥3 (1 + 2𝑥 + (2𝑥)2 + ⋅ ⋅ ⋅ ) = 𝑥3 + 2𝑥4 + 4𝑥5 + ⋅ ⋅ ⋅
1 − 2𝑥
Moreover, the term-by-term integration and differentiation theorems yield additional results in
conjuction with the geometric series:
∞ ∞
(−1)𝑛 2𝑛+1
∫ ∫ ∑
−1 𝑑𝑥 ∑ 1 1
tan (𝑥) = = (−1)𝑛 𝑥2𝑛 𝑑𝑥 = 𝑥 = 𝑥 − 𝑥3 + 𝑥5 + ⋅ ⋅ ⋅
1 + 𝑥2 2𝑛 + 1 3 5
𝑛=0 𝑛=0

∞ ∞
−1 −1 𝑛+1
∫ ∫ ∫ ∑
𝑑 ∑
ln(1 − 𝑥) = ln(1 − 𝑥)𝑑𝑥 = 𝑑𝑥 = − 𝑥𝑛 𝑑𝑥 = 𝑥
𝑑𝑥 1−𝑥 𝑛+1
𝑛=0 𝑛=0
6.1. MULTIVARIATE POWER SERIES 135

Of course, these are just the basic building blocks. We also can twist things and make the student
use algebra,
1
𝑒𝑥+2 = 𝑒𝑥 𝑒2 = 𝑒2 (1 + 𝑥 + 𝑥2 + ⋅ ⋅ ⋅ )
2
or trigonmetric identities,

sin(𝑥) = sin(𝑥 − 2 + 2) = sin(𝑥 − 2) cos(2) + cos(𝑥 − 2) sin(2)


∞ ∞
∑ (−1)𝑛 2𝑛+1
∑ (−1)𝑛
⇒ sin(𝑥) = cos(2) (𝑥 − 2) + sin(2) (𝑥 − 2)2𝑛 .
(2𝑛 + 1)! (2𝑛)!
𝑛=0 𝑛=0
Feel free to peruse my most recent calculus II materials to see a host of similarly sneaky calculations.

6.1.2 taylor’s multinomial for two-variables


Suppose we wish to find the taylor polynomial centered at (0, 0) for 𝑓 (𝑥, 𝑦) = 𝑒𝑥 sin(𝑦). It is a
simple as this:
( )( )
1 2 1 3 1 2 1
𝑓 (𝑥, 𝑦) = 1 + 𝑥 + 𝑥 + ⋅ ⋅ ⋅ 𝑦 − 𝑦 + ⋅ ⋅ ⋅ = 𝑦 + 𝑥𝑦 + 𝑥 𝑦 − 𝑦3 + ⋅ ⋅ ⋅
2 6 2 6

the resulting expression is called a multinomial since it is a polynomial in multiple variables. If


all functions 𝑓 (𝑥, 𝑦) could be written as 𝑓 (𝑥, 𝑦) = 𝐹 (𝑥)𝐺(𝑦) then multiplication of series known
from calculus II would often suffice. However, many functions do not possess this very special
form. For example, how should we expand 𝑓 (𝑥, 𝑦) = cos(𝑥𝑦) about (0, 0)?. We need to derive the
two-dimensional Taylor’s theorem.

We already know Taylor’s theorem for functions on ℝ,


1 1
𝑔(𝑥) = 𝑔(𝑎) + 𝑔 ′ (𝑎)(𝑥 − 𝑎) + 𝑔 ′′ (𝑎)(𝑥 − 𝑎)2 + ⋅ ⋅ ⋅ + 𝑔 (𝑘) (𝑎)(𝑥 − 𝑎)𝑘 + 𝑅𝑘
2 𝑘!
and... If the remainder term vanishes as 𝑘 → ∞ then the function 𝑔 is represented by the Taylor
series given above and we write:

∑ 1 (𝑘)
𝑔(𝑥) = 𝑔 (𝑎)(𝑥 − 𝑎)𝑘 .
𝑘!
𝑘=0

Consider the function of two variables 𝑓 : 𝑈 ⊆ ℝ2 → ℝ which is smooth with smooth partial
derivatives of all orders. Furthermore, let (𝑎, 𝑏) ∈ 𝑈 and construct a line through (𝑎, 𝑏) with
direction vector (ℎ1 , ℎ2 ) as usual:

𝜙(𝑡) = (𝑎, 𝑏) + 𝑡(ℎ1 , ℎ2 ) = (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 )

for 𝑡 ∈ ℝ. Note 𝜙(0) = (𝑎, 𝑏) and 𝜙′ (𝑡) = (ℎ1 , ℎ2 ) = 𝜙′ (0). Construct 𝑔 = 𝑓 ∘ 𝜙 : ℝ → ℝ and
choose 𝑑𝑜𝑚(𝑔) such that 𝜙(𝑡) ∈ 𝑈 for 𝑡 ∈ 𝑑𝑜𝑚(𝑔). This function 𝑔 is a real-valued function of a
136 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

real variable and we will be able to apply Taylor’s theorem from calculus II on 𝑔. However, to
differentiate 𝑔 we’ll need tools from calculus III to sort out the derivatives. In particular, as we
differentiate 𝑔, note we use the chain rule for functions of several variables:

𝑔 ′ (𝑡) = (𝑓 ∘ 𝜙)′ (𝑡) = 𝑓 ′ (𝜙(𝑡))𝜙′ (𝑡)


= ∇𝑓 (𝜙(𝑡)) ⋅ (ℎ1 , ℎ2 )
= ℎ1 𝑓𝑥 (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 ) + ℎ2 𝑓𝑦 (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 )

Note 𝑔 ′ (0) = ℎ1 𝑓𝑥 (𝑎, 𝑏) + ℎ2 𝑓𝑦 (𝑎, 𝑏). Differentiate again (I omit (𝜙(𝑡)) dependence in the last steps),

𝑔 ′′ (𝑡) = ℎ1 𝑓𝑥′ (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 ) + ℎ2 𝑓𝑦′ (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 )


= ℎ1 ∇𝑓𝑥 (𝜙(𝑡)) ⋅ (ℎ1 , ℎ2 ) + ℎ2 ∇𝑓𝑦 (𝜙(𝑡)) ⋅ (ℎ1 , ℎ2 )
= ℎ21 𝑓𝑥𝑥 + ℎ1 ℎ2 𝑓𝑦𝑥 + ℎ2 ℎ1 𝑓𝑥𝑦 + ℎ22 𝑓𝑦𝑦
= ℎ21 𝑓𝑥𝑥 + 2ℎ1 ℎ2 𝑓𝑥𝑦 + ℎ22 𝑓𝑦𝑦

Thus, making explicit the point dependence, 𝑔 ′′ (0) = ℎ21 𝑓𝑥𝑥 (𝑎, 𝑏) + 2ℎ1 ℎ2 𝑓𝑥𝑦 (𝑎, 𝑏) + ℎ22 𝑓𝑦𝑦 (𝑎, 𝑏). We
may construct the Taylor series for 𝑔 up to quadratic terms:
1
𝑔(0 + 𝑡) = 𝑔(0) + 𝑡𝑔 ′ (0) + 𝑔 ′′ (0) + ⋅ ⋅ ⋅
2
𝑡2 [ 2
ℎ1 𝑓𝑥𝑥 (𝑎, 𝑏) + 2ℎ1 ℎ2 𝑓𝑥𝑦 (𝑎, 𝑏) + ℎ22 𝑓𝑦𝑦 (𝑎, 𝑏) + ⋅ ⋅ ⋅
]
= 𝑓 (𝑎, 𝑏) + 𝑡[ℎ1 𝑓𝑥 (𝑎, 𝑏) + ℎ2 𝑓𝑦 (𝑎, 𝑏)] +
2

Note that 𝑔(𝑡) = 𝑓 (𝑎 + 𝑡ℎ1 , 𝑏 + 𝑡ℎ2 ) hence 𝑔(1) = 𝑓 (𝑎 + ℎ1 , 𝑏 + ℎ2 ) and consequently,

𝑓 (𝑎 + ℎ1 , 𝑏 + ℎ2 ) = 𝑓 (𝑎, 𝑏) + ℎ1 𝑓𝑥 (𝑎, 𝑏) + ℎ2 𝑓𝑦 (𝑎, 𝑏)+


[ ]
1 2 2
+ ℎ1 𝑓𝑥𝑥 (𝑎, 𝑏) + 2ℎ1 ℎ2 𝑓𝑥𝑦 (𝑎, 𝑏) + ℎ2 𝑓𝑦𝑦 (𝑎, 𝑏) + ⋅ ⋅ ⋅
2

Omitting point dependence on the 2𝑛𝑑 derivatives,


1
ℎ21 𝑓𝑥𝑥 + 2ℎ1 ℎ2 𝑓𝑥𝑦 + ℎ22 𝑓𝑦𝑦 + ⋅ ⋅ ⋅
[ ]
𝑓 (𝑎 + ℎ1 , 𝑏 + ℎ2 ) = 𝑓 (𝑎, 𝑏) + ℎ1 𝑓𝑥 (𝑎, 𝑏) + ℎ2 𝑓𝑦 (𝑎, 𝑏) + 2

Sometimes we’d rather have an expansion about (𝑥, 𝑦). To obtain that formula simply substitute
𝑥 − 𝑎 = ℎ1 and 𝑦 − 𝑏 = ℎ2 . Note that the point (𝑎, 𝑏) is fixed in this discussion so the derivatives
are not modified in this substitution,

𝑓 (𝑥, 𝑦) = 𝑓 (𝑎, 𝑏) + (𝑥 − 𝑎)𝑓𝑥 (𝑎, 𝑏) + (𝑦 − 𝑏)𝑓𝑦 (𝑎, 𝑏)+


[ ]
1 2 2
+ (𝑥 − 𝑎) 𝑓𝑥𝑥 (𝑎, 𝑏) + 2(𝑥 − 𝑎)(𝑦 − 𝑏)𝑓𝑥𝑦 (𝑎, 𝑏) + (𝑦 − 𝑏) 𝑓𝑦𝑦 (𝑎, 𝑏) + ⋅ ⋅ ⋅
2
At this point we ought to recognize the first three terms give the tangent plane to 𝑧 = 𝑓 (𝑧, 𝑦) at
(𝑎, 𝑏, 𝑓 (𝑎, 𝑏)). The higher order terms are nonlinear corrections to the linearization, these quadratic
6.1. MULTIVARIATE POWER SERIES 137

terms form a quadratic form. If we computed third, fourth or higher order terms we will find that,
using 𝑎 = 𝑎1 and 𝑏 = 𝑎2 as well as 𝑥 = 𝑥1 and 𝑦 = 𝑥2 ,

∞ ∑
2 ∑
2 2
∑ ∑ 1 ∂ (𝑛) 𝑓 (𝑎1 , 𝑎2 )
𝑓 (𝑥, 𝑦) = ⋅⋅⋅ (𝑥𝑖 − 𝑎𝑖1 )(𝑥𝑖2 − 𝑎𝑖2 ) ⋅ ⋅ ⋅ (𝑥𝑖𝑛 − 𝑎𝑖𝑛 )
𝑛! ∂𝑥𝑖1 ∂𝑥𝑖2 ⋅ ⋅ ⋅ ∂𝑥𝑖𝑛 1
𝑛=0 𝑖1 =0 𝑖2 =0 𝑖𝑛 =0

Example 6.1.1. Expand 𝑓 (𝑥, 𝑦) = cos(𝑥𝑦) about (0, 0). We calculate derivatives,

𝑓𝑥 = −𝑦 sin(𝑥𝑦) 𝑓𝑦 = −𝑥 sin(𝑥𝑦)

𝑓𝑥𝑥 = −𝑦 2 cos(𝑥𝑦) 𝑓𝑥𝑦 = − sin(𝑥𝑦) − 𝑥𝑦 cos(𝑥𝑦) 𝑓𝑦𝑦 = −𝑥2 cos(𝑥𝑦)

𝑓𝑥𝑥𝑥 = 𝑦 3 sin(𝑥𝑦) 𝑓𝑥𝑥𝑦 = −𝑦 cos(𝑥𝑦) − 𝑦 cos(𝑥𝑦) + 𝑥𝑦 2 sin(𝑥𝑦)

𝑓𝑥𝑦𝑦 = −𝑥 cos(𝑥𝑦) − 𝑥 cos(𝑥𝑦) + 𝑥2 𝑦 sin(𝑥𝑦) 𝑓𝑦𝑦𝑦 = 𝑥3 sin(𝑥𝑦)

Next, evaluate at 𝑥 = 0 and 𝑦 = 0 to find 𝑓 (𝑥, 𝑦) = 1 + ⋅ ⋅ ⋅ to third order in 𝑥, 𝑦 about (0, 0). We
can understand why these derivatives are all zero by approaching the expansion a different route:
simply expand cosine directly in the variable (𝑥𝑦),

1 1 1 1
𝑓 (𝑥, 𝑦) = 1 − (𝑥𝑦)2 + (𝑥𝑦)4 + ⋅ ⋅ ⋅ = 1 − 𝑥2 𝑦 2 + 𝑥4 𝑦 4 + ⋅ ⋅ ⋅ .
2 4! 2 4!

Apparently the given function only has nontrivial derivatives at (0, 0) at orders 0, 4, 8, .... We can
deduce that 𝑓𝑥𝑥𝑥𝑥𝑦 (0, 0) = 0 without furthter calculation.

This is actually a very interesting function, I think it defies our analysis in the later portion of this
chapter. The second order part of the expansion reveals nothing about the nature of the critical
point (0, 0). Of course, any student of trigonometry should recognize that 𝑓 (0, 0) = 1 is likely
a local maximum, it’s certainly not a local minimum. The graph reveals that 𝑓 (0, 0) is a local
maxium for 𝑓 restricted to certain rays from the origin whereas it is constant on several special
directions (the coordinate axes).
138 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

6.1.3 taylor’s multinomial for many-variables


Suppose 𝑓 : 𝑑𝑜𝑚(𝑓 ) ⊆ ℝ𝑛 → ℝ is a function of 𝑛-variables and we seek to derive the Taylor series
centered at 𝑎 = (𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ). Once more consider the composition of 𝑓 with a line in 𝑑𝑜𝑚(𝑓 ).
In particular, let 𝜙 : ℝ → ℝ𝑛 be defined by 𝜙(𝑡) = 𝑎 + 𝑡ℎ where ℎ = (ℎ1 , ℎ2 , . . . , ℎ𝑛 ) gives the
direction of the line and clearly 𝜙′ (𝑡) = ℎ. Let 𝑔 : 𝑑𝑜𝑚(𝑔) ⊆ ℝ → ℝ be defined by 𝑔(𝑡) = 𝑓 (𝜙(𝑡))
for all 𝑡 ∈ ℝ such that 𝜙(𝑡) ∈ 𝑑𝑜𝑚(𝑓 ). ∑ Differentiate, use the multivariate chain rule, recall here
that ∇ = 𝑒1 ∂𝑥∂ 1 + 𝑒2 ∂𝑥∂ 2 + ⋅ ⋅ ⋅ + 𝑒𝑛 ∂𝑥∂𝑛 = 𝑛𝑖=1 𝑒𝑖 ∂𝑖 ,
𝑛

𝑔 ′ (𝑡) = ∇𝑓 (𝜙(𝑡)) ⋅ 𝜙′ (𝑡) = ∇𝑓 (𝜙(𝑡)) ⋅ ℎ = ℎ𝑖 (∂𝑖 𝑓 )(𝜙(𝑡))
𝑖=1

If we omit the explicit dependence on 𝜙(𝑡) then we find the simple formula 𝑔 ′ (𝑡) = 𝑛𝑖=1 ℎ𝑖 ∂𝑖 𝑓 .

Differentiate a second time,
[ 𝑛 ] ∑𝑛 [ ] ∑𝑛
′′ 𝑑 ∑ 𝑑 ( )
ℎ𝑖 ∇∂𝑖 𝑓 (𝜙(𝑡)) ⋅ 𝜙′ (𝑡)
( )
𝑔 (𝑡) = ℎ𝑖 ∂𝑖 𝑓 (𝜙(𝑡)) = ℎ𝑖 ∂𝑖 𝑓 (𝜙(𝑡)) =
𝑑𝑡 𝑑𝑡
𝑖=1 𝑖=1 𝑖=1

Omitting the 𝜙(𝑡) dependence and once more using 𝜙′ (𝑡) = ℎ we find
𝑛

′′
𝑔 (𝑡) = ℎ𝑖 ∇∂𝑖 𝑓 ⋅ ℎ
𝑖=1
∑𝑛
Recall that ∇ = 𝑗=1 𝑒𝑗 ∂𝑗 and expand the expression above,
𝑛
∑ 𝑛
(∑ ) 𝑛 ∑
∑ 𝑛
′′
𝑔 (𝑡) = ℎ𝑖 𝑒𝑗 ∂𝑗 ∂𝑖 𝑓 ⋅ℎ= ℎ𝑖 ℎ𝑗 ∂𝑗 ∂𝑖 𝑓
𝑖=1 𝑗=1 𝑖=1 𝑗=1

where we should remember ∂𝑗 ∂𝑖 𝑓 depends on 𝜙(𝑡). It should be clear that if we continue and take
𝑘-derivatives then we will obtain:
𝑛 ∑
∑ 𝑛 𝑛

𝑔 (𝑘) (𝑡) = ⋅⋅⋅ ℎ𝑖1 ℎ𝑖2 ⋅ ⋅ ⋅ ℎ𝑖𝑘 ∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓
𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

More explicitly,
𝑛 ∑
∑ 𝑛 𝑛

(𝑘)
𝑔 (𝑡) = ⋅⋅⋅ ℎ𝑖1 ℎ𝑖2 ⋅ ⋅ ⋅ ℎ𝑖𝑘 (∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓 )(𝜙(𝑡))
𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

Hence, by Taylor’s theorem, provided we are sufficiently close to 𝑡 = 0 as to bound the remainder1
∞ ( 𝑛 𝑛 𝑛 )
∑ 1 ∑∑ ∑
𝑔(𝑡) = ⋅⋅⋅ ℎ𝑖1 ℎ𝑖2 ⋅ ⋅ ⋅ ℎ𝑖𝑘 (∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓 )(𝜙(𝑡)) 𝑡𝑘
𝑘!
𝑘=0 𝑖1 =1 𝑖2 =1 𝑖𝑘 =1
1
there exist smooth examples for which no neighborhood is small enough, the bump function in one-variable has
higher-dimensional analogues, we focus our attention to functions for which it is possible for the series below to
converge
6.1. MULTIVARIATE POWER SERIES 139

1
Recall that 𝑔(𝑡) = 𝑓 (𝜙(𝑡)) = 𝑓 (𝑎 + 𝑡ℎ). Put2 𝑡 = 1 and bring in the 𝑘! to derive

∞ ∑
𝑛 ∑
𝑛 𝑛
∑ ∑ 1( )
𝑓 (𝑎 + ℎ) = ⋅⋅⋅ ∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓 (𝑎) ℎ𝑖1 ℎ𝑖2 ⋅ ⋅ ⋅ ℎ𝑖𝑘 .
𝑘!
𝑘=0 𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

Naturally, we sometimes prefer to write the series expansion about 𝑎 as an expresssion in 𝑥 = 𝑎 + ℎ.


With this substitution we have ℎ = 𝑥 − 𝑎 and ℎ𝑖𝑗 = (𝑥 − 𝑎)𝑖𝑗 = 𝑥𝑖𝑗 − 𝑎𝑖𝑗 thus

∞ ∑
𝑛 ∑
𝑛 𝑛
∑ ∑ 1( )
𝑓 (𝑥) = ⋅⋅⋅ ∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓 (𝑎) (𝑥𝑖1 − 𝑎𝑖1 )(𝑥𝑖2 − 𝑎𝑖2 ) ⋅ ⋅ ⋅ (𝑥𝑖𝑘 − 𝑎𝑖𝑘 ).
𝑘!
𝑘=0 𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

Example 6.1.2. Suppose 𝑓 : ℝ3 → ℝ let’s unravel the Taylor series centered at (0, 0, 0) from the
general formula boxed above. Utilize the notation 𝑥 = 𝑥1 , 𝑦 = 𝑥2 and 𝑧 = 𝑥3 in this example.
∞ ∑
3 ∑
3 3
∑ ∑ 1( )
𝑓 (𝑥) = ⋅⋅⋅ ∂𝑖1 ∂𝑖2 ⋅ ⋅ ⋅ ∂𝑖𝑘 𝑓 (0) 𝑥𝑖1 𝑥𝑖2 ⋅ ⋅ ⋅ 𝑥𝑖𝑘 .
𝑘!
𝑘=0 𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

The terms to order 2 are as follows:

𝑓 (𝑥) = 𝑓 (0)
( + 𝑓𝑥 (0)𝑥 + 𝑓𝑦 (0)𝑦 + 𝑓𝑧 (0)𝑧
+ 12 𝑓𝑥𝑥 (0)𝑥2 + 𝑓𝑦𝑦 (0)𝑦 2 + 𝑓𝑧𝑧 (0)𝑧 2 +
)
+𝑓𝑥𝑦 (0)𝑥𝑦 + 𝑓𝑥𝑧 (0)𝑥𝑧 + 𝑓𝑦𝑧 (0)𝑦𝑧 + 𝑓𝑦𝑥 (0)𝑦𝑥 + 𝑓𝑧𝑥 (0)𝑧𝑥 + 𝑓𝑧𝑦 (0)𝑧𝑦 + ⋅⋅⋅

Partial derivatives commute for smooth functions hence,

𝑓 (𝑥) = 𝑓 (0)
( + 𝑓𝑥 (0)𝑥 + 𝑓𝑦 (0)𝑦 + 𝑓𝑧 (0)𝑧 )
+ 21 𝑓𝑥𝑥 (0)𝑥2 + 𝑓𝑦𝑦 (0)𝑦 2 + 𝑓𝑧𝑧 (0)𝑧 2 + 2𝑓𝑥𝑦 (0)𝑥𝑦 + 2𝑓𝑥𝑧 (0)𝑥𝑧 + 2𝑓𝑦𝑧 (0)𝑦𝑧
(
1
+ 3! 𝑓𝑥𝑥𝑥 (0)𝑥3 + 𝑓𝑦𝑦𝑦 (0)𝑦 3 + 𝑓𝑧𝑧𝑧 (0)𝑧 3 + 3𝑓𝑥𝑥𝑦 (0)𝑥2 𝑦 + 3𝑓𝑥𝑥𝑧 (0)𝑥2 𝑧
)
+3𝑓𝑦𝑦𝑧 (0)𝑦 2 𝑧 + 3𝑓𝑥𝑦𝑦 (0)𝑥𝑦 2 + 3𝑓𝑥𝑧𝑧 (0)𝑥𝑧 2 + 3𝑓𝑦𝑧𝑧 (0)𝑦𝑧 2 + 6𝑓𝑥𝑦𝑧 (0)𝑥𝑦𝑧 + ⋅⋅⋅

Example 6.1.3. Suppose 𝑓 (𝑥, 𝑦, 𝑧) = 𝑒𝑥𝑦𝑧 . Find a quadratic approximation to 𝑓 near (0, 1, 2).
Observe:
𝑓𝑥 = 𝑦𝑧𝑒𝑥𝑦𝑧 𝑓𝑦 = 𝑥𝑧𝑒𝑥𝑦𝑧 𝑓𝑧 = 𝑥𝑦𝑒𝑥𝑦𝑧
𝑓𝑥𝑥 = (𝑦𝑧)2 𝑒𝑥𝑦𝑧 𝑓𝑦𝑦 = (𝑥𝑧)2 𝑒𝑥𝑦𝑧 𝑓𝑧𝑧 = (𝑥𝑦)2 𝑒𝑥𝑦𝑧
𝑓𝑥𝑦 = 𝑧𝑒𝑥𝑦𝑧 + 𝑥𝑦𝑧 2 𝑒𝑥𝑦𝑧 𝑓𝑦𝑧 = 𝑥𝑒𝑥𝑦𝑧 + 𝑥2 𝑦𝑧𝑒𝑥𝑦𝑧 𝑓𝑥𝑧 = 𝑦𝑒𝑥𝑦𝑧 + 𝑥𝑦 2 𝑧𝑒𝑥𝑦𝑧
2
if 𝑡 = 1 is not in the domain of 𝑔 then we should rescale the vector ℎ so that 𝑡 = 1 places 𝜙(1) in 𝑑𝑜𝑚(𝑓 ), if 𝑓 is
smooth on some neighborhood of 𝑎 then this is possible
140 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

Evaluating at 𝑥 = 0, 𝑦 = 1 and 𝑧 = 2,

𝑓𝑥 (0, 1, 2) = 2 𝑓𝑦 (0, 1, 2) = 0 𝑓𝑧 (0, 1, 2) = 0

𝑓𝑥𝑥 (0, 1, 2) = 4 𝑓𝑦𝑦 (0, 1, 2) = 0 𝑓𝑧𝑧 (0, 1, 2) = 0


𝑓𝑥𝑦 (0, 1, 2) = 2 𝑓𝑦𝑧 (0, 1, 2) = 0 𝑓𝑥𝑧 (0, 1, 2) = 1
Hence, as 𝑓 (0, 1, 2) = 𝑒0 = 1 we find

𝑓 (𝑥, 𝑦, 𝑧) = 1 + 2𝑥 + 2𝑥2 + 2𝑥(𝑦 − 1) + 2𝑥(𝑧 − 2) + ⋅ ⋅ ⋅

Another way to calculate this expansion is to make use of the adding zero trick,
1[ ]2
𝑓 (𝑥, 𝑦, 𝑧) = 𝑒𝑥(𝑦−1+1)(𝑧−2+2) = 1 + 𝑥(𝑦 − 1 + 1)(𝑧 − 2 + 2) + 𝑥(𝑦 − 1 + 1)(𝑧 − 2 + 2) + ⋅ ⋅ ⋅
2
Keeping only terms with two or less of 𝑥, (𝑦 − 1) and (𝑧 − 2) variables,
1
𝑓 (𝑥, 𝑦, 𝑧) = 1 + 2𝑥 + 𝑥(𝑦 − 1)(2) + 𝑥(1)(𝑧 − 2) + 𝑥2 (1)2 (2)2 + ⋅ ⋅ ⋅
2
Which simplifies once more to 𝑓 (𝑥, 𝑦, 𝑧) = 1 + 2𝑥 + 2𝑥(𝑦 − 1) + 𝑥(𝑧 − 2) + 2𝑥2 + ⋅ ⋅ ⋅ .
6.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 141

6.2 a brief introduction to the theory of quadratic forms


Definition 6.2.1.
Generally, a quadratic form 𝑄 is a function 𝑄 : ℝ𝑛 → ℝ whose formula can be written
𝑇 𝐴⃗
𝑄(⃗𝑥) = ⃗𝑥[ 𝑥 for]all ⃗𝑥 ∈ ℝ𝑛 where 𝐴 ∈ ℝ 𝑛×𝑛 such that 𝐴𝑇 = 𝐴. In particular, if ⃗𝑥 = (𝑥, 𝑦)
𝑎 𝑏
and 𝐴 = then
𝑏 𝑐

𝑄(⃗𝑥) = ⃗𝑥𝑇 𝐴⃗𝑥 = 𝑎𝑥2 + 𝑏𝑥𝑦 + 𝑏𝑦𝑥 + 𝑐𝑦 2 = 𝑎𝑥2 + 2𝑏𝑥𝑦 + 𝑦 2 .

The 𝑛 = 3 case is similar,denote 𝐴 = [𝐴𝑖𝑗 ] and ⃗𝑥 = (𝑥, 𝑦, 𝑧) so that

𝑄(⃗𝑥) = ⃗𝑥𝑇 𝐴⃗𝑥 = 𝐴11 𝑥2 + 2𝐴12 𝑥𝑦 + 2𝐴13 𝑥𝑧 + 𝐴22 𝑦 2 + 2𝐴23 𝑦𝑧 + 𝐴33 𝑧 2 .

Generally, if [𝐴𝑖𝑗 ] ∈ ℝ 𝑛×𝑛 and ⃗𝑥 = [𝑥𝑖 ]𝑇 then the associated quadratic form is

∑ 𝑛
∑ ∑
𝑄(⃗𝑥) = ⃗𝑥𝑇 𝐴⃗𝑥 = 𝐴𝑖𝑗 𝑥𝑖 𝑥𝑗 = 𝐴𝑖𝑖 𝑥2𝑖 + 2𝐴𝑖𝑗 𝑥𝑖 𝑥𝑗 .
𝑖,𝑗 𝑖=1 𝑖<𝑗

In case you wondering, yes you could write a given quadratic form with a different matrix which
is not symmetric, but we will find it convenient to insist that our matrix is symmetric since that
choice is always possible for a given quadratic form.

It is at times useful to use the dot-product to express a given quadratic form:

⃗𝑥𝑇 𝐴⃗𝑥 = ⃗𝑥 ⋅ (𝐴⃗𝑥) = (𝐴⃗𝑥) ⋅ ⃗𝑥 = ⃗𝑥𝑇 𝐴𝑇 ⃗𝑥

Some texts actually use the middle equality above to define a symmetric matrix.
Example 6.2.2. [ ][ ]
2 1 𝑥
2𝑥2 + 2𝑥𝑦 + 2𝑦 2 =
[ ]
𝑥 𝑦
1 2 𝑦
Example 6.2.3.
⎤⎡ ⎤ ⎡
2 1 3/2 𝑥
2𝑥2 + 2𝑥𝑦 + 3𝑥𝑧 − 2𝑦 2 − 𝑧 2 = 𝑥 𝑦 𝑧 ⎣ 1 −2 0 ⎦ ⎣ 𝑦 ⎦
[ ]

3/2 0 −1 𝑧

Proposition 6.2.4.

The values of a quadratic form on ℝ𝑛 − {0} is completely determined by it’s values on


the (𝑛 − 1)-sphere 𝑆𝑛−1 = {⃗𝑥 ∈ ℝ𝑛 ∣ ∣∣⃗𝑥∣∣ = 1}. In particular, 𝑄(⃗𝑥) = ∣∣⃗𝑥∣∣2 𝑄(ˆ
𝑥) where
1
ˆ = ∣∣⃗𝑥∣∣ ⃗𝑥.
𝑥
142 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

Proof: Let 𝑄(⃗𝑥) = ⃗𝑥𝑇 𝐴⃗𝑥. Notice that we can write any nonzero vector as the product of its
ˆ = ∣∣⃗𝑥1∣∣ ⃗𝑥,
magnitude ∣∣𝑥∣∣ and its direction 𝑥

𝑄(⃗𝑥) = 𝑄(∣∣⃗𝑥∣∣ˆ 𝑥)𝑇 𝐴∣∣⃗𝑥∣∣ˆ


𝑥) = (∣∣⃗𝑥∣∣ˆ 𝑥 = ∣∣⃗𝑥∣∣2 𝑥
ˆ𝑇 𝐴ˆ
𝑥 = ∣∣𝑥∣∣2 𝑄(ˆ
𝑥).
𝑥) with proportionality constant ∣∣⃗𝑥∣∣2 . □
Therefore 𝑄(⃗𝑥) is simply proportional to 𝑄(ˆ

The proposition above is very interesting. It says that if we know how 𝑄 works on unit-vectors then
we can extrapolate its action on the remainder of ℝ𝑛 . If 𝑓 : 𝑆 → ℝ then we could say 𝑓 (𝑆) > 0
iff 𝑓 (𝑠) > 0 for all 𝑠 ∈ 𝑆. Likewise, 𝑓 (𝑆) < 0 iff 𝑓 (𝑠) < 0 for all 𝑠 ∈ 𝑆. The proposition below
follows from the proposition above since ∣∣⃗𝑥∣∣2 ranges over all nonzero positive real numbers in the
equations above.
Proposition 6.2.5.
If 𝑄 is a quadratic form on ℝ𝑛 and we denote ℝ𝑛∗ = ℝ𝑛 − {0}

1.(negative definite) 𝑄(ℝ𝑛∗ ) < 0 iff 𝑄(𝑆𝑛−1 ) < 0

2.(positive definite) 𝑄(ℝ𝑛∗ ) > 0 iff 𝑄(𝑆𝑛−1 ) > 0

3.(non-definite) 𝑄(ℝ𝑛∗ ) = ℝ − {0} iff 𝑄(𝑆𝑛−1 ) has both positive and negative values.
Before I get too carried away with the theory let’s look at a couple examples.
Example 6.2.6. Consider the quadric form 𝑄(𝑥, 𝑦) = 𝑥2 + 𝑦 2 . You can check for yourself that
𝑧 = 𝑄(𝑥, 𝑦) is a cone and 𝑄 has positive outputs for all inputs except (0, 0). Notice that 𝑄(𝑣) = ∣∣𝑣∣∣2
so it is clear that 𝑄(𝑆1 ) = 1. We find agreement with the preceding proposition. Next, √ think about
2 2
the application of 𝑄(𝑥, 𝑦) to level curves; 𝑥 + 𝑦 = 𝑘 is simply a circle of radius 𝑘 or just the
origin. Here’s a graph of 𝑧 = 𝑄(𝑥, 𝑦):

Notice that 𝑄(0, [ 0) = ]0 [is the


] absolute minimum for 𝑄. Finally, let’s take a moment to write
1 0 𝑥
𝑄(𝑥, 𝑦) = [𝑥, 𝑦] in this case the matrix is diagonal and we note that the e-values are
0 1 𝑦
𝜆1 = 𝜆2 = 1.
6.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 143

Example 6.2.7. Consider the quadric form 𝑄(𝑥, 𝑦) = 𝑥2 − 2𝑦 2 . You can check for yourself
that 𝑧 = 𝑄(𝑥, 𝑦) is a hyperboloid and 𝑄 has non-definite outputs since sometimes the 𝑥2 term
dominates whereas other points have −2𝑦 2 as the dominent term. Notice that 𝑄(1, 0) = 1 whereas
𝑄(0, 1) = −2 hence we find 𝑄(𝑆1 ) contains both positive and negative values and consequently we
find agreement with the preceding proposition. Next, think about the application of 𝑄(𝑥, 𝑦) to level
curves; 𝑥2 − 2𝑦 2 = 𝑘 yields either hyperbolas which open vertically (𝑘 > 0) or horizontally (𝑘 < 0)
or a pair of lines 𝑦 = ± 𝑥2 in the 𝑘 = 0 case. Here’s a graph of 𝑧 = 𝑄(𝑥, 𝑦):

[ ][ ]
1 0 𝑥
The origin is a saddle point. Finally, let’s take a moment to write 𝑄(𝑥, 𝑦) = [𝑥, 𝑦]
0 −2 𝑦
in this case the matrix is diagonal and we note that the e-values are 𝜆1 = 1 and 𝜆2 = −2.
Example 6.2.8. Consider the quadric form 𝑄(𝑥, 𝑦) = 3𝑥2 . You can check for yourself that 𝑧 =
𝑄(𝑥, 𝑦) is parabola-shaped trough along the 𝑦-axis. In this case 𝑄 has positive outputs for all inputs
except (0, 𝑦), we would call this form positive semi-definite. A short calculation reveals that
𝑄(𝑆1 ) = [0, 3] thus we again find agreement with the preceding proposition (case 3). Next, √ think
about the application of 𝑄(𝑥, 𝑦) to level curves; 3𝑥2 = 𝑘 is a pair of vertical lines: 𝑥 = ± 𝑘/3 or
just the 𝑦-axis. Here’s a graph of 𝑧 = 𝑄(𝑥, 𝑦):
144 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
[ ][ ]
3 0 𝑥
Finally, let’s take a moment to write 𝑄(𝑥, 𝑦) = [𝑥, 𝑦] in this case the matrix is
0 0 𝑦
diagonal and we note that the e-values are 𝜆1 = 3 and 𝜆2 = 0.

Example 6.2.9. Consider the quadric form 𝑄(𝑥, 𝑦, 𝑧) = 𝑥2 +2𝑦 2 +3𝑧 2 . Think about the application
of 𝑄(𝑥, 𝑦, 𝑧) to level surfaces; 𝑥2 + 2𝑦 2 + 3𝑧 2 = 𝑘 is an ellipsoid. I can’t graph a function of three
variables, however, we can look at level surfaces of the function. I use Mathematica to plot several
below:



1 0 0 [ ]
𝑥
Finally, let’s take a moment to write 𝑄(𝑥, 𝑦, 𝑧) = [𝑥, 𝑦, 𝑧] ⎣ 0 2 0 ⎦ in this case the matrix
𝑦
0 0 3
is diagonal and we note that the e-values are 𝜆1 = 1 and 𝜆2 = 2 and 𝜆3 = 3.

6.2.1 diagonalizing forms via eigenvectors


The examples given thus far are the simplest cases. We don’t really need linear algebra to un-
derstand them. In contrast, e-vectors and e-values will prove a useful tool to unravel the later
examples3

Definition 6.2.10.

Let 𝐴 ∈ ℝ 𝑛×𝑛 . If 𝑣 ∈ ℝ 𝑛×1 is nonzero and 𝐴𝑣 = 𝜆𝑣 for some 𝜆 ∈ ℂ then we say 𝑣 is an


eigenvector with eigenvalue 𝜆 of the matrix 𝐴.

Proposition 6.2.11.

Let 𝐴 ∈ ℝ 𝑛×𝑛 then 𝜆 is an eigenvalue of 𝐴 iff 𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 0. We say 𝑃 (𝜆) = 𝑑𝑒𝑡(𝐴 − 𝜆𝐼)
the characteristic polynomial and 𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 0 is the characteristic equation.
Proof: Suppose 𝜆 is an eigenvalue of 𝐴 then there exists a nonzero vector 𝑣 such that 𝐴𝑣 = 𝜆𝑣
which is equivalent to 𝐴𝑣 − 𝜆𝑣 = 0 which is precisely (𝐴 − 𝜆𝐼)𝑣 = 0. Notice that (𝐴 − 𝜆𝐼)0 = 0
3
this is the one place in this course where we need eigenvalues and eigenvector calculations, I include these to
illustrate the structure of quadratic forms in general, however, as linear algebra is not a prerequisite you may find some
things in this section mysterious. The homework and study guide will elaborate on what is required this semester
6.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 145

thus the matrix (𝐴 − 𝜆𝐼) is singular as the equation (𝐴 − 𝜆𝐼)𝑥 = 0 has more than one solution.
Consequently 𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 0.

Conversely, suppose 𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 0. It follows that (𝐴 − 𝜆𝐼) is singular. Clearly the system
(𝐴 − 𝜆𝐼)𝑥 = 0 is consistent as 𝑥 = 0 is a solution hence we know there are infinitely many solu-
tions. In particular there exists at least one vector 𝑣 ∕= 0 such that (𝐴 − 𝜆𝐼)𝑣 = 0 which means the
vector 𝑣 satisfies 𝐴𝑣 = 𝜆𝑣. Thus 𝑣 is an eigenvector with eigenvalue 𝜆 for 𝐴. □

[ ]
3 1
Example 6.2.12. Let 𝐴 = find the e-values and e-vectors of 𝐴.
3 1
[ ]
3−𝜆 1
𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 𝑑𝑒𝑡 = (3 − 𝜆)(1 − 𝜆) − 3 = 𝜆2 − 4𝜆 = 𝜆(𝜆 − 4) = 0
3 1−𝜆

We find 𝜆1 = 0 and 𝜆2 = 4. Now find the e-vector with e-value 𝜆1 = 0, let 𝑢1 = [𝑢, 𝑣]𝑇 denote the
e-vector we wish to find. Calculate,
[ ][ ] [ ] [ ]
3 1 𝑢 3𝑢 + 𝑣 0
(𝐴 − 0𝐼)𝑢1 = = =
3 1 𝑣 3𝑢 + 𝑣 0

Obviously the equations above are redundant and we have [ infinitely


] many
[ solutions
] of the form
𝑢 1
3𝑢 + 𝑣 = 0 which means 𝑣 = −3𝑢 so we can write, 𝑢1 = =𝑢 . In applications we
−3𝑢 −3
often make a choice to select a particular e-vector. Most modern graphing calculators can calcu-
late e-vectors. It is customary for the e-vectors to be chosen to have length one. That is a useful
choice for[certain
] applications as we will later discuss. If you use a calculator it would likely give
1 √
𝑢1 = √110 although the 10 would likely be approximated unless your calculator is smart.
−3
Continuing we wish to find eigenvectors 𝑢2 = [𝑢, 𝑣]𝑇 such that (𝐴 − 4𝐼)𝑢2 = 0. Notice that 𝑢, 𝑣
are disposable variables in this context, I do not mean to connect the formulas from the 𝜆 = 0 case
with the case considered now.
[ ][ ] [ ] [ ]
−1 1 𝑢 −𝑢 + 𝑣 0
(𝐴 − 4𝐼)𝑢1 = = =
3 −3 𝑣 3𝑢 − 3𝑣 0

Again[the ]equations
[ are
] redundant and we have infinitely many solutions of the form 𝑣 = 𝑢. Hence,
𝑢 1
𝑢2 = =𝑢 is an eigenvector for any 𝑢 ∈ ℝ such that 𝑢 ∕= 0.
𝑢 1

Theorem 6.2.13.

A matrix 𝐴 ∈ ℝ 𝑛×𝑛 is symmetric iff there exists an orthonormal eigenbasis for 𝐴.


146 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

There is a geometric proof of this theorem in Edwards4 (see Theorem 8.6 pgs 146-147) . I prove half
of this theorem in my linear algebra notes by a non-geometric argument (full proof is in Appendix C
of Insel,Spence and Friedberg). It might be very interesting to understand the connection between
the geometric verse algebraic arguments. We’ll content ourselves with an example here:
⎡ ⎤
0 0 0
Example 6.2.14. Let 𝐴 = ⎣ 0 1 2 ⎦. Observe that 𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = −𝜆(𝜆 + 1)(𝜆 − 3) thus 𝜆1 =
0 2 1
0, 𝜆2 = −1, 𝜆3 = 3. We can calculate orthonormal e-vectors of 𝑣1 = [1, 0, 0]𝑇 , 𝑣2 = √12 [0, 1, −1]𝑇
and 𝑣3 = √1 [0, 1, 1]𝑇 . I invite the reader to check the validity of the following equation:
2
⎡ ⎤⎡ ⎤⎡ ⎤
1 0 0 1 0 0
⎡ ⎤
0 0 0 0 0 0
⎢ 0 √1 −1
√ ⎢ 0 √1 √1
⎦ 0 1 2 ⎣ ⎦ = 0 −1 0 ⎦
⎥⎣ ⎦ ⎥ ⎣
⎣ 2 2 2 2
√1 √1 −1 √1
0 2 2
0 2 1 0 √
2 2
0 0 3

Its really neat that


⎡ to find the ⎤
inverse
⎡ of a matrix
⎤ of orthonormal e-vectors we need only take the
1 0 0 1 0 0
⎡ ⎤
1 0 0
1 −1 ⎥ ⎢ 1 1 ⎥
transpose; note ⎣ 0 √2 √2 ⎦ ⎣ 0 √2 √2 ⎦ = ⎣ 0 1 0 ⎦.

−1
0 √12 √12 0 √ 2
√1
2
0 0 1

XXX– remove comments about e-vectors and e-value before this section and put them here as
motivating examples for the proposition that follows.

Proposition 6.2.15.

If 𝑄 is a quadratic form on ℝ𝑛 with matrix 𝐴 and e-values 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 with orthonormal


e-vectors 𝑣1 , 𝑣2 , . . . , 𝑣𝑛 then
𝑄(𝑣𝑖 ) = 𝜆𝑖 2
for 𝑖 = 1, 2, . . . , 𝑛. Moreover, if 𝑃 = [𝑣1 ∣𝑣2 ∣ ⋅ ⋅ ⋅ ∣𝑣𝑛 ] then

𝑄(⃗𝑥) = (𝑃 𝑇 ⃗𝑥)𝑇 𝑃 𝑇 𝐴𝑃 𝑃 𝑇 ⃗𝑥 = 𝜆1 𝑦12 + 𝜆2 𝑦22 + ⋅ ⋅ ⋅ + 𝜆𝑛 𝑦𝑛2

where we defined ⃗𝑦 = 𝑃 𝑇 ⃗𝑥.


Let me restate the proposition above in simple terms: we can transform a given quadratic form to
a diagonal form by finding orthonormalized e-vectors and performing the appropriate coordinate
transformation. Since 𝑃 is formed from orthonormal e-vectors we know that 𝑃 will be either a
rotation or reflection. This proposition says we can remove ”cross-terms” by transforming the
quadratic forms with an appropriate rotation.

Example 6.2.16. Consider the quadric form 𝑄(𝑥, 𝑦) = 2𝑥2 + 2𝑥𝑦 + 2𝑦 2 . It’s not immediately
obvious (to me) what the level curves 𝑄(𝑥, 𝑦) = 𝑘 look like. We’ll make use of the preceding
4
think about it, there is a 1-1 correspondance between symmetric matrices and quadratic forms
6.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 147
[ ][ ]
2 1 𝑥
proposition to understand those graphs. Notice 𝑄(𝑥, 𝑦) = [𝑥, 𝑦] . Denote the matrix
1 2 𝑦
of the form by 𝐴 and calculate the e-values/vectors:
[ ]
2−𝜆 1
𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 𝑑𝑒𝑡 = (𝜆 − 2)2 − 1 = 𝜆2 − 4𝜆 + 3 = (𝜆 − 1)(𝜆 − 3) = 0
1 2−𝜆

Therefore, the e-values are 𝜆1 = 1 and 𝜆2 = 3.


[ ][ ] [ ] [ ]
1 1 𝑢 0 1 1
(𝐴 − 𝐼)⃗𝑢1 = = ⇒ ⃗𝑢1 = √
1 1 𝑣 0 2 −1
I just solved 𝑢 + 𝑣 = 0 to give 𝑣 = −𝑢 choose 𝑢 = 1 then normalize to get the vector above. Next,
[ ][ ] [ ] [ ]
−1 1 𝑢 0 1 1
(𝐴 − 3𝐼)⃗𝑢2 = = ⇒ ⃗𝑢2 = √
1 −1 𝑣 0 2 1
I just solved 𝑢 − 𝑣 = 0 to give 𝑣 = 𝑢 choose 𝑢 = 1 then normalize to get the vector above. Let
𝑃 = [⃗𝑢1 ∣⃗𝑢2 ] and introduce new coordinates ⃗𝑦 = [¯ 𝑥, 𝑦¯]𝑇 defined by ⃗𝑦 = 𝑃 𝑇 ⃗𝑥. Note these can be
inverted by multiplication by 𝑃 to give ⃗𝑥 = 𝑃 ⃗𝑦 . Observe that

𝑥 = 21 (¯ ¯ = 21 (𝑥 − 𝑦)
[ ]
1 1 1 𝑥 + 𝑦¯) 𝑥
𝑃 = ⇒ 1 or
2 −1 1 𝑦 = 2 (−¯ 𝑥 + 𝑦¯) 𝑦¯ = 12 (𝑥 + 𝑦)

The proposition preceding this example shows that substitution of the formulas above into 𝑄 yield5 :
˜ 𝑥, 𝑦¯) = 𝑥
𝑄(¯ ¯2 + 3¯
𝑦2

It is clear that in the barred coordinate system the level curve 𝑄(𝑥, 𝑦) = 𝑘 is an ellipse. If we draw
the barred coordinate system superposed over the 𝑥𝑦-coordinate system then you’ll see that the graph
of 𝑄(𝑥, 𝑦) = 2𝑥2 + 2𝑥𝑦 + 2𝑦 2 = 𝑘 is an ellipse rotated by 45 degrees. Or, if you like, we can plot
𝑧 = 𝑄(𝑥, 𝑦):

5 ˜ 𝑥, 𝑦¯) is 𝑄(𝑥(¯
technically 𝑄(¯ 𝑥, 𝑦¯), 𝑦(¯
𝑥, 𝑦¯))
148 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

Example 6.2.17. Consider the quadric form 𝑄(𝑥, 𝑦) = 𝑥2 + 2𝑥𝑦 + 𝑦 2 . It’s not immediately obvious
(to me) what the level curves 𝑄(𝑥, 𝑦) = 𝑘 look like.
[ We’ll] [make] use of the preceding proposition to
1 1 𝑥
understand those graphs. Notice 𝑄(𝑥, 𝑦) = [𝑥, 𝑦] . Denote the matrix of the form by
1 1 𝑦
𝐴 and calculate the e-values/vectors:
[ ]
1−𝜆 1
𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 𝑑𝑒𝑡 = (𝜆 − 1)2 − 1 = 𝜆2 − 2𝜆 = 𝜆(𝜆 − 2) = 0
1 1−𝜆
Therefore, the e-values are 𝜆1 = 0 and 𝜆2 = 2.
[ ][ ] [ ] [ ]
1 1 𝑢 0 1 1
(𝐴 − 0)⃗𝑢1 = = ⇒ ⃗𝑢1 = √
1 1 𝑣 0 2 −1
I just solved 𝑢 + 𝑣 = 0 to give 𝑣 = −𝑢 choose 𝑢 = 1 then normalize to get the vector above. Next,
[ ][ ] [ ] [ ]
−1 1 𝑢 0 1 1
(𝐴 − 2𝐼)⃗𝑢2 = = ⇒ ⃗𝑢2 = √
1 −1 𝑣 0 2 1
I just solved 𝑢 − 𝑣 = 0 to give 𝑣 = 𝑢 choose 𝑢 = 1 then normalize to get the vector above. Let
𝑃 = [⃗𝑢1 ∣⃗𝑢2 ] and introduce new coordinates ⃗𝑦 = [¯ 𝑥, 𝑦¯]𝑇 defined by ⃗𝑦 = 𝑃 𝑇 ⃗𝑥. Note these can be
inverted by multiplication by 𝑃 to give ⃗𝑥 = 𝑃 ⃗𝑦 . Observe that
𝑥 = 21 (¯ ¯ = 21 (𝑥 − 𝑦)
[ ]
1 1 1 𝑥 + 𝑦¯) 𝑥
𝑃 = ⇒ 1 or
2 −1 1 𝑦 = 2 (−¯ 𝑥 + 𝑦¯) 𝑦¯ = 12 (𝑥 + 𝑦)
The proposition preceding this example shows that substitution of the formulas above into 𝑄 yield:
˜ 𝑥, 𝑦¯) = 2¯
𝑄(¯ 𝑦2
It is clear that in the barred coordinate system the level curve 𝑄(𝑥, 𝑦) = 𝑘 is a pair of paralell
lines. If we draw the barred coordinate system superposed over the 𝑥𝑦-coordinate system then you’ll
see that the graph of 𝑄(𝑥, 𝑦) = 𝑥2 + 2𝑥𝑦 + 𝑦 2 = 𝑘 is a line with slope −1. Indeed, with a little
algebraic√insight we could 2
√ have anticipated this result since 𝑄(𝑥, 𝑦) = (𝑥+𝑦) so 𝑄(𝑥, 𝑦) = 𝑘 implies
𝑥 + 𝑦 = 𝑘 thus 𝑦 = 𝑘 − 𝑥. Here’s a plot which again verifies what we’ve already found:
6.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 149

Example 6.2.18. Consider the quadric form 𝑄(𝑥, 𝑦) = 4𝑥𝑦. It’s not immediately obvious (to
me) what the level curves 𝑄(𝑥, 𝑦) = 𝑘 look like.[ We’ll]make
[ ]use of the preceding proposition to
0 2 𝑥
understand those graphs. Notice 𝑄(𝑥, 𝑦) = [𝑥, 𝑦] . Denote the matrix of the form by
0 2 𝑦
𝐴 and calculate the e-values/vectors:
[ ]
−𝜆 2
𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 𝑑𝑒𝑡 = 𝜆2 − 4 = (𝜆 + 2)(𝜆 − 2) = 0
2 −𝜆
Therefore, the e-values are 𝜆1 = −2 and 𝜆2 = 2.
[ ][ ] [ ] [ ]
2 2 𝑢 0 1 1
(𝐴 + 2𝐼)⃗𝑢1 = = ⇒ ⃗𝑢1 = √
2 2 𝑣 0 2 −1
I just solved 𝑢 + 𝑣 = 0 to give 𝑣 = −𝑢 choose 𝑢 = 1 then normalize to get the vector above. Next,
[ ][ ] [ ] [ ]
−2 2 𝑢 0 1 1
(𝐴 − 2𝐼)⃗𝑢2 = = ⇒ ⃗𝑢2 = √
2 −2 𝑣 0 2 1
I just solved 𝑢 − 𝑣 = 0 to give 𝑣 = 𝑢 choose 𝑢 = 1 then normalize to get the vector above. Let
𝑃 = [⃗𝑢1 ∣⃗𝑢2 ] and introduce new coordinates ⃗𝑦 = [¯ 𝑥, 𝑦¯]𝑇 defined by ⃗𝑦 = 𝑃 𝑇 ⃗𝑥. Note these can be
inverted by multiplication by 𝑃 to give ⃗𝑥 = 𝑃 ⃗𝑦 . Observe that
𝑥 = 21 (¯ ¯ = 21 (𝑥 − 𝑦)
[ ]
1 1 1 𝑥 + 𝑦¯) 𝑥
𝑃 = ⇒ 1 or
2 −1 1 𝑦 = 2 (−¯ 𝑥 + 𝑦¯) 𝑦¯ = 12 (𝑥 + 𝑦)
The proposition preceding this example shows that substitution of the formulas above into 𝑄 yield:
˜ 𝑥, 𝑦¯) = −2¯
𝑄(¯ 𝑥2 + 2¯
𝑦2
It is clear that in the barred coordinate system the level curve 𝑄(𝑥, 𝑦) = 𝑘 is a hyperbola. If we
draw the barred coordinate system superposed over the 𝑥𝑦-coordinate system then you’ll see that
the graph of 𝑄(𝑥, 𝑦) = 4𝑥𝑦 = 𝑘 is a hyperbola rotated by 45 degrees. The graph 𝑧 = 4𝑥𝑦 is thus a
hyperbolic paraboloid:
150 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

The fascinating thing about the mathematics here is that if you don’t want to graph 𝑧 = 𝑄(𝑥, 𝑦),
but you do want to know the general shape then you can determine which type of quadraic surface
you’re dealing with by simply calculating the eigenvalues of the form.

Remark 6.2.19.
I made the preceding triple of examples all involved the same rotation. This is purely for my
lecturing convenience. In practice the rotation could be by all sorts of angles. In addition,
you might notice that a different ordering of the e-values would result in a redefinition of
the barred coordinates. 6
We ought to do at least one 3-dimensional example.

Example 6.2.20. Consider the quadric form 𝑄 defined below:


⎡ ⎤⎡ ⎤
6 −2 0 𝑥
𝑄(𝑥, 𝑦, 𝑧) = [𝑥, 𝑦, 𝑧] −2
⎣ 6 0 ⎦ ⎣ 𝑦 ⎦
0 0 5 𝑧

Denote the matrix of the form by 𝐴 and calculate the e-values/vectors:


⎡ ⎤
6 − 𝜆 −2 0
𝑑𝑒𝑡(𝐴 − 𝜆𝐼) = 𝑑𝑒𝑡 ⎣ −2 6 − 𝜆 0 ⎦
0 0 5−𝜆
= [(𝜆 − 6)2 − 4](5 − 𝜆)
= (5 − 𝜆)[𝜆2 − 12𝜆 + 32](5 − 𝜆)
= (𝜆 − 4)(𝜆 − 8)(5 − 𝜆)

Therefore, the e-values are 𝜆1 = 4, 𝜆2 = 8 and 𝜆3 = 5. After some calculation we find the following
orthonormal e-vectors for 𝐴:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
1 ⎣ ⎦ 1 ⎣
⃗𝑢1 = √ 1 ⃗𝑢2 = √ −1 ⎦ ⃗𝑢3 = ⎣ 0 ⎦
2 0 2 0 1

𝑥, 𝑦¯, 𝑧¯]𝑇 defined


Let 𝑃 = [⃗𝑢1 ∣⃗𝑢2 ∣⃗𝑢3 ] and introduce new coordinates ⃗𝑦 = [¯ by ⃗𝑦 = 𝑃 𝑇 ⃗𝑥. Note these
can be inverted by multiplication by 𝑃 to give ⃗𝑥 = 𝑃 ⃗𝑦 . Observe that

𝑥 = 12 (¯ = 12 (𝑥 − 𝑦)
⎡ ⎤
1 1 0 𝑥 + 𝑦¯) 𝑥¯
1 ⎣
𝑃 =√ −1 1 √0 ⎦ ⇒ 𝑦 = 12 (−¯ 𝑥 + 𝑦¯) or 𝑦¯ = 12 (𝑥 + 𝑦)
2 0 0 2 𝑧 = 𝑧¯ 𝑧¯ = 𝑧

The proposition preceding this example shows that substitution of the formulas above into 𝑄 yield:
˜ 𝑥, 𝑦¯, 𝑧¯) = 4¯
𝑄(¯ 𝑥2 + 8¯
𝑦 2 + 5¯
𝑧2
6.3. SECOND DERIVATIVE TEST IN MANY-VARIABLES 151

It is clear that in the barred coordinate system the level surface 𝑄(𝑥, 𝑦, 𝑧) = 𝑘 is an ellipsoid. If we
draw the barred coordinate system superposed over the 𝑥𝑦𝑧-coordinate system then you’ll see that
the graph of 𝑄(𝑥, 𝑦, 𝑧) = 𝑘 is an ellipsoid rotated by 45 degrees around the 𝑧 − 𝑎𝑥𝑖𝑠. Plotted below
are a few representative ellipsoids:

In summary, the behaviour of a quadratic form 𝑄(𝑥) = 𝑥𝑇 𝐴𝑥 is governed by it’s set of eigenvalues7
{𝜆1 , 𝜆2 , . . . , 𝜆𝑘 }. Moreover, the form can be written as 𝑄(𝑦) = 𝜆1 𝑦12 + 𝜆2 𝑦22 + ⋅ ⋅ ⋅ + 𝜆𝑘 𝑦𝑘2 by choosing
the coordinate system which is built from the orthonormal eigenbasis of 𝑐𝑜𝑙(𝐴). In this coordinate
system the shape of the level-sets of 𝑄 becomes manifest from the signs of the e-values. )

Remark 6.2.21.
If you would like to read more about conic sections or quadric surfaces and their connection
to e-values/vectors I reccommend sections 9.6 and 9.7 of Anton’s linear algebra text. I
have yet to add examples on how to include translations in the analysis. It’s not much
more trouble but I decided it would just be an unecessary complication this semester.
Also, section 7.1,7.2 and 7.3 in Lay’s linear algebra text show a bit more about how to
use this math to solve concrete applied problems. You might also take a look in Gilbert
Strang’s linear algebra text, his discussion of tests for positive-definite matrices is much
more complete than I will give here.

6.3 second derivative test in many-variables


There is a connection between the shape of level curves 𝑄(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) = 𝑘 and the graph 𝑥𝑛+1 =
𝑓 (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) of 𝑓 . I’ll discuss 𝑛 = 2 but these comments equally well apply to 𝑤 = 𝑓 (𝑥, 𝑦, 𝑧) or
higher dimensional examples. Consider a critical point (𝑎, 𝑏) for 𝑓 (𝑥, 𝑦) then the Taylor expansion
about (𝑎, 𝑏) has the form
𝑓 (𝑎 + ℎ, 𝑏 + 𝑘) = 𝑓 (𝑎, 𝑏) + 𝑄(ℎ, 𝑘)
where 𝑄(ℎ, 𝑘) = 21 ℎ2 𝑓𝑥𝑥 (𝑎, 𝑏) + ℎ𝑘𝑓𝑥𝑦 (𝑎, 𝑏) + 12 ℎ2 𝑓𝑦𝑦 (𝑎, 𝑏) = [ℎ, 𝑘][𝑄](ℎ, 𝑘). Since [𝑄]𝑇 = [𝑄] we can
find orthonormal e-vectors ⃗𝑢1 , ⃗𝑢2 for [𝑄] with e-values 𝜆1 and 𝜆2 respective. Using 𝑈 = [⃗𝑢1 ∣⃗𝑢2 ] we
7
this set is called the spectrum of the matrix
152 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

¯ = 𝑈 (ℎ, 𝑘). These will give


can introduce rotated coordinates (ℎ̄, 𝑘)
¯ = 𝜆1 ℎ̄2 + 𝜆2 𝑘¯2
𝑄(ℎ̄, 𝑘)

Clearly if 𝜆1 > 0 and 𝜆2 > 0 then 𝑓 (𝑎, 𝑏) yields the local minimum whereas if 𝜆1 < 0 and 𝜆2 < 0
then 𝑓 (𝑎, 𝑏) yields the local maximum. Edwards discusses these matters on pgs. 148-153. In short,
supposing 𝑓 ≈ 𝑓 (𝑝) + 𝑄, if all the e-values of 𝑄 are positive then 𝑓 has a local minimum of 𝑓 (𝑝)
at 𝑝 whereas if all the e-values of 𝑄 are negative then 𝑓 reaches a local maximum of 𝑓 (𝑝) at 𝑝.
Otherwise 𝑄 has both positive and negative e-values and we say 𝑄 is non-definite and the function
has a saddle point. If all the e-values of 𝑄 are positive then 𝑄 is said to be positive-definite
whereas if all the e-values of 𝑄 are negative then 𝑄 is said to be negative-definite. Edwards
gives a few nice tests for ascertaining if a matrix is positive definite without explicit computation
of e-values. Finally, if one of the e-values is zero then the graph will be like a trough.

Example 6.3.1. Suppose 𝑓 (𝑥, 𝑦) = 𝑒𝑥𝑝(−𝑥2 − 𝑦 2 + 2𝑦 − 1) expand 𝑓 about the point (0, 1):

𝑓 (𝑥, 𝑦) = 𝑒𝑥𝑝(−𝑥2 )𝑒𝑥𝑝(−𝑦 2 + 2𝑦 − 1) = 𝑒𝑥𝑝(−𝑥2 )𝑒𝑥𝑝(−(𝑦 − 1)2 )

expanding,

𝑓 (𝑥, 𝑦) = (1 − 𝑥2 + ⋅ ⋅ ⋅ )(1 − (𝑦 − 1)2 + ⋅ ⋅ ⋅ ) = 1 − 𝑥2 − (𝑦 − 1)2 + ⋅ ⋅ ⋅

Recenter about the point (0, 1) by setting 𝑥 = ℎ and 𝑦 = 1 + 𝑘 so

𝑓 (ℎ, 1 + 𝑘) = 1 − ℎ2 − 𝑘 2 + ⋅ ⋅ ⋅

If (ℎ, 𝑘) is near (0, 0) then the dominant terms are simply those we’ve written above hence the graph
is like that of a quadraic surface with a pair of negative e-values. It follows that 𝑓 (0, 1) is a local
maximum. In fact, it happens to be a global maximum for this function.

Example 6.3.2. Suppose 𝑓 (𝑥, 𝑦) = 4−(𝑥−1)2 +(𝑦−2)2 +𝐴𝑒𝑥𝑝(−(𝑥−1)2 −(𝑦−2)2 )+2𝐵(𝑥−1)(𝑦−2)


for some constants 𝐴, 𝐵. Analyze what values for 𝐴, 𝐵 will make (1, 2) a local maximum, minimum
or neither. Expanding about (1, 2) we set 𝑥 = 1 + ℎ and 𝑦 = 2 + 𝑘 in order to see clearly the local
behaviour of 𝑓 at (1, 2),

𝑓 (1 + ℎ, 2 + 𝑘) = 4 − ℎ2 − 𝑘 2 + 𝐴𝑒𝑥𝑝(−ℎ2 − 𝑘 2 ) + 2𝐵ℎ𝑘
= 4 − ℎ2 − 𝑘 2 + 𝐴(1 − ℎ2 − 𝑘 2 ) + 2𝐵ℎ𝑘 ⋅ ⋅ ⋅
= 4 + 𝐴 − (𝐴 + 1)ℎ2 + 2𝐵ℎ𝑘 − (𝐴 + 1)𝑘 2 + ⋅ ⋅ ⋅

There is no nonzero linear term in the expansion at (1, 2) which indicates that 𝑓 (1, 2) = 4 + 𝐴
may be a local extremum. In this case the quadratic terms are nontrivial which means the graph of
this function is well-approximated by a quadraic surface near (1, 2). The quadratic form 𝑄(ℎ, 𝑘) =
−(𝐴 + 1)ℎ2 + 2𝐵ℎ𝑘 − (𝐴 + 1)𝑘 2 has matrix
[ ]
−(𝐴 + 1) 𝐵
[𝑄] = .
𝐵 −(𝐴 + 1)2
6.3. SECOND DERIVATIVE TEST IN MANY-VARIABLES 153

The characteristic equation for 𝑄 is


[ ]
−(𝐴 + 1) − 𝜆 𝐵
𝑑𝑒𝑡([𝑄] − 𝜆𝐼) = 𝑑𝑒𝑡 = (𝜆 + 𝐴 + 1)2 − 𝐵 2 = 0
𝐵 −(𝐴 + 1)2 − 𝜆

We find solutions 𝜆1 = −𝐴 − 1 + 𝐵 and 𝜆2 = −𝐴 − 1 − 𝐵. The possibilities break down as follows:


1. if 𝜆1 , 𝜆2 > 0 then 𝑓 (1, 2) is local minimum.

2. if 𝜆1 , 𝜆2 < 0 then 𝑓 (1, 2) is local maximum.

3. if just one of 𝜆1 , 𝜆2 is zero then 𝑓 is constant along one direction and min/max along another
so technically it is a local extremum.

4. if 𝜆1 𝜆2 < 0 then 𝑓 (1, 2) is not a local etremum, however it is a saddle point.


In particular, the following choices for 𝐴, 𝐵 will match the choices above
1. Let 𝐴 = −3 and 𝐵 = 1 so 𝜆1 = 3 and 𝜆2 = 1;

2. Let 𝐴 = 3 and 𝐵 = 1 so 𝜆1 = −3 and 𝜆2 = −5

3. Let 𝐴 = −3 and 𝐵 = −2 so 𝜆1 = 0 and 𝜆2 = 4

4. Let 𝐴 = 1 and 𝐵 = 3 so 𝜆1 = 1 and 𝜆2 = −5


Here are the graphs of the cases above, note the analysis for case 3 is more subtle for Taylor
approximations as opposed to simple quadraic surfaces. In this example, case 3 was also a local
minimum. In contrast, in Example 6.2.17 the graph was like a trough. The behaviour of 𝑓 away
from the critical point includes higher order terms whose influence turns the trough into a local
minimum.

Example 6.3.3. Suppose 𝑓 (𝑥, 𝑦) = sin(𝑥) cos(𝑦) to find the Taylor series centered at (0, 0) we can
simply multiply the one-dimensional result sin(𝑥) = 𝑥 − 3!1 𝑥3 + 5!1 𝑥5 + ⋅ ⋅ ⋅ and cos(𝑦) = 1 − 2!1 𝑦 2 +
1 4
4! 𝑦 + ⋅ ⋅ ⋅ as follows:

𝑓 (𝑥, 𝑦) = (𝑥 − 3!1 𝑥3 + 5!1 𝑥5 + ⋅ ⋅ ⋅ )(1 − 2!1 𝑦 2 + 4!1 𝑦 4 + ⋅ ⋅ ⋅ )


= 𝑥 − 21 𝑥𝑦 2 + 24
1
𝑥𝑦 4 − 16 𝑥3 − 12
1 3 2
𝑥 𝑦 + ⋅⋅⋅
= 𝑥 + ⋅⋅⋅
154 CHAPTER 6. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES

The origin (0, 0) is a critical point since 𝑓𝑥 (0, 0) = 0 and 𝑓𝑦 (0, 0) = 0, however, this particular
critical point escapes the analysis via the quadratic form term since 𝑄 = 0 in the Taylor series
for this function at (0, 0). This is analogous to the inconclusive case of the 2nd derivative test in
calculus III.

Example 6.3.4. Suppose 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧. Calculate the multivariate Taylor expansion about the
point (1, 2, 3). I’ll actually calculate this one via differentiation, I have used tricks and/or calculus
II results to shortcut any differentiation in the previous examples. Calculate first derivatives

𝑓𝑥 = 𝑦𝑧 𝑓𝑦 = 𝑥𝑧 𝑓𝑧 = 𝑥𝑦,

and second derivatives,


𝑓𝑥𝑥 = 0 𝑓𝑥𝑦 = 𝑧 𝑓𝑥𝑧 = 𝑦
𝑓𝑦𝑥 = 𝑧 𝑓𝑦𝑦 = 0 𝑓𝑦𝑧 = 𝑥
𝑓𝑧𝑥 = 𝑦 𝑓𝑧𝑦 = 𝑥 𝑓𝑧𝑧 = 0,
and the nonzero third derivatives,

𝑓𝑥𝑦𝑧 = 𝑓𝑦𝑧𝑥 = 𝑓𝑧𝑥𝑦 = 𝑓𝑧𝑦𝑥 = 𝑓𝑦𝑥𝑧 = 𝑓𝑥𝑧𝑦 = 1.

It follows,

𝑓 (𝑎 + ℎ, 𝑏 + 𝑘, 𝑐 + 𝑙) =
= 𝑓 (𝑎, 𝑏, 𝑐) + 𝑓𝑥 (𝑎, 𝑏, 𝑐)ℎ + 𝑓𝑦 (𝑎, 𝑏, 𝑐)𝑘 + 𝑓𝑧 (𝑎, 𝑏, 𝑐)𝑙 +
1
2 ( 𝑓𝑥𝑥 ℎℎ + 𝑓𝑥𝑦 ℎ𝑘 + 𝑓𝑥𝑧 ℎ𝑙 + 𝑓𝑦𝑥 𝑘ℎ + 𝑓𝑦𝑦 𝑘𝑘 + 𝑓𝑦𝑧 𝑘𝑙 + 𝑓𝑧𝑥 𝑙ℎ + 𝑓𝑧𝑦 𝑙𝑘 + 𝑓𝑧𝑧 𝑙𝑙 ) + ⋅ ⋅ ⋅

Of course certain terms can be combined since 𝑓𝑥𝑦 = 𝑓𝑦𝑥 etc... for smooth functions (we assume
smooth in this section, moreover the given function here is clearly smooth). In total,
1( ) 1
𝑓 (1 + ℎ, 2 + 𝑘, 3 + 𝑙) = 6 + 6ℎ + 3𝑘 + 2𝑙 + 3ℎ𝑘 + 2ℎ𝑙 + 3𝑘ℎ + 𝑘𝑙 + 2𝑙ℎ + 𝑙𝑘 + (6)ℎ𝑘𝑙
2 3!
Of course, we could also obtain this from simple algebra:

𝑓 (1 + ℎ, 2 + 𝑘, 3 + 𝑙) = (1 + ℎ)(2 + 𝑘)(3 + 𝑙) = 6 + 6ℎ + 3𝑘 + 𝑙 + 3ℎ𝑘 + 2ℎ𝑙 + 𝑘𝑙 + ℎ𝑘𝑙.

6.3.1 morse theory and future reading


Chapter 7

multilinear algebra

7.1 dual space


Definition 7.1.1.
Suppose 𝑉 is a vector space over ℝ. We define the dual space to 𝑉 to be the set of all
linear functions from 𝑉 to ℝ. In particular, we denote:

𝑉 ∗ = {𝑓 : 𝑉 → ℝ ∣ 𝑓 (𝑥 + 𝑦) = 𝑓 (𝑥) + 𝑓 (𝑦) and 𝑓 (𝑐𝑥) = 𝑐𝑓 (𝑥) ∀𝑥, 𝑦 ∈ 𝑉 and 𝑐 ∈ ℝ}

If 𝛼 ∈ 𝑉 ∗ then we say 𝛼 is a dual vector.


I offer several abstract examples to begin, however the majority of this section concerns ℝ𝑛 .
∫1
Example 7.1.2. Suppose ℱ denotes the set of continuous functions on ℝ. Define 𝛼(𝑓 ) = 0 𝑓 (𝑡) 𝑑𝑡.
The mapping 𝛼 : ℱ → ℝ is linear by properties of definite integrals therefore we identify the definite
integral defines a dual-vector to the vector space of continuous functions.
Example 7.1.3. Suppose 𝑉 = ℱ(𝑊, ℝ) denotes a set of functions from a vector space 𝑊 to ℝ.
Note that 𝑉 is a vector space with respect to point-wise defined addition and scalar multiplication
of functions. Let 𝑤𝑜 ∈ 𝑊 and define 𝛼(𝑓 ) = 𝑓 (𝑤𝑜 ). The mapping 𝛼 : 𝑉 → ℝ is linear since
𝛼(𝑐𝑓 + 𝑔) = (𝑐𝑓 + 𝑔)(𝑤𝑜 ) = 𝑐𝑓 (𝑤𝑜 ) + 𝑔(𝑤𝑜 ) = 𝑐𝛼(𝑓 ) + 𝛼(𝑔) for all 𝑓, 𝑔 ∈ 𝑉 and 𝑐 ∈ ℝ. We find
that the evaluation map defines a dual-vector 𝛼 ∈ 𝑉 ∗ .
Example 7.1.4. The determinant is a mapping from ℝ 𝑛×𝑛 to ℝ but it does not define a dual-vector
to the vector space of square matrices since 𝑑𝑒𝑡(𝐴 + 𝐵) ∕= 𝑑𝑒𝑡(𝐴) + 𝑑𝑒𝑡(𝐵).
Example 7.1.5. Suppose 𝛼(𝑥) = 𝑥 ⋅ 𝑣 for a particular vector 𝑣 ∈ ℝ𝑛 . We argue 𝛼 ∈ 𝑉 ∗ where we
recall 𝑉 = ℝ𝑛 is a vector space. Additivity follows from a property of the dot-product on ℝ𝑛 ,

𝛼(𝑥 + 𝑦) = (𝑥 + 𝑦) ⋅ 𝑣 = 𝑥 ⋅ 𝑣 + 𝑦 ⋅ 𝑣 = 𝛼(𝑥) + 𝛼(𝑦)

for all 𝑥, 𝑦 ∈ ℝ𝑛 . Likewise, homogeneity follows from another property of the dot-product: observe

𝛼(𝑐𝑥) = (𝑐𝑥) ⋅ 𝑣 = 𝑐(𝑥 ⋅ 𝑣) = 𝑐𝛼(𝑥)

155
156 CHAPTER 7. MULTILINEAR ALGEBRA

for all 𝑥 ∈ ℝ𝑛 and 𝑐 ∈ ℝ.


Example 7.1.6. Let 𝛼(𝑥, 𝑦) = 2𝑥 + 5𝑦 define a function 𝛼 : ℝ2 → ℝ. Note that
𝛼(𝑥, 𝑦) = (𝑥, 𝑦) ⋅ (2, 5)
hence by the preceding example we find 𝛼 ∈ (ℝ2 )∗ .
The preceding example is no accident. It turns out there is a one-one correspondance between row
vectors and dual vectors on ℝ𝑛 . Let 𝑣 ∈ ℝ𝑛 then we define 𝛼𝑣 (𝑥) = 𝑥 ⋅ 𝑣. We proved in Example
7.1.5 that 𝛼𝑣 ∈ (ℝ𝑛 )∗ . Suppose 𝛼 ∈ (ℝ𝑛 )∗ we see to find 𝑣 ∈ ℝ𝑛 such that 𝛼 = 𝛼𝑣 . Recall that a
linear function is uniquely defined by its values on a basis; the values of 𝛼 on the standard
∑ basis
will show us how to choose 𝑣. This is a standard technique. Consider: 𝑣 ∈ ℝ𝑛 with1 𝑣 = 𝑛𝑗=1 𝑣 𝑗 𝑒𝑗
𝑛
∑ 𝑛
∑ 𝑛

𝑗 𝑗
𝛼(𝑥) = 𝛼( 𝑥 𝑒𝑗 ) = 𝛼(𝑥 𝑒𝑗 ) = 𝑥𝑗 𝛼(𝑒𝑗 ) = 𝑥 ⋅ 𝑣
𝑗=1 𝑗=1 𝑗=1
| {z }| {z }
𝑎𝑑𝑑𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ℎ𝑜𝑚𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦

where we define 𝑣 = (𝛼(𝑒1 ), 𝛼(𝑒2 ), . . . , 𝛼(𝑒𝑛 )) ∈ ℝ𝑛 . The vector which corresponds naturally2 to 𝛼
is simply the vector of of the values of 𝛼 on the standard basis.

The dual space to ℝ𝑛 is a vector space and the correspondance 𝑣 → 𝛼𝑣 gives an isomorphism of ℝ𝑛
and (ℝ𝑛 )∗ . The image of a basis under an isomorphism is once more a basis. Define Φ : ℝ𝑛 → (ℝ)∗
by Φ(𝑣) = 𝛼𝑣 to give the correspondance an explicit label. The image of the standard basis under
Φ is called the standard dual basis for (ℝ𝑛 )∗ . Consider Φ(𝑒𝑗 ), let 𝑥 ∈ ℝ𝑛 and calculate
Φ(𝑒𝑗 )(𝑥) = 𝛼𝑒𝑗 (𝑥) = 𝑥 ⋅ 𝑒𝑗
In particular, notice that when 𝑥 = 𝑒𝑖 then Φ(𝑒𝑗 )(𝑒𝑖 ) = 𝑒𝑖 ⋅ 𝑒𝑗 = 𝛿𝑖𝑗 . Dual vectors are linear
transformations therefore we can define the dual basis by its values on the standard basis.
Definition 7.1.7.
The standard dual basis of (ℝ𝑛 )∗ is denoted {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 } where we define 𝑒𝑗 : ℝ𝑛 → ℝ
to be the linear transformation such that 𝑒𝑗 (𝑒𝑖 ) = 𝛿𝑖𝑗 for all 𝑖, 𝑗 ∈ ℕ𝑛 . Generally, given a
vector space 𝑉 with basis 𝛽 = {𝑓1 , 𝑓2 , . . . , 𝑓𝑚 } we say the basis 𝛽 ∗ = {𝑓 1 , 𝑓 2 , . . . , 𝑓 𝑛 } is
dual to 𝛽 iff 𝑓 𝑗 (𝑓𝑖 ) = 𝛿𝑖𝑗 for all 𝑖, 𝑗 ∈ ℕ𝑛 .
The term basis indicates that {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 } is linearly independent 3 1 2 𝑛
∑𝑛and 𝑗𝑠𝑝𝑎𝑛{𝑒 , 𝑒 , . . . , 𝑒 } =
𝑛 ∗ 𝑛
(ℝ ) . The following calculation is often useful: if 𝑥 ∈ ℝ with 𝑥 = 𝑗=1 𝑥 𝑒𝑗 then
(∑𝑛 ) ∑ 𝑛 ∑𝑛
𝑖 𝑖 𝑗
𝑒 (𝑥) = 𝑒 𝑥 𝑒𝑗 = 𝑥𝑗 𝑒𝑖 (𝑒𝑗 ) = 𝑥𝑗 𝛿𝑖𝑗 = 𝑥𝑖 ⇒ 𝑒𝑖 (𝑥) = 𝑥𝑖 .
𝑗=1 𝑗=1 𝑗=1
1
the super-index is not a power in this context, it is just a notation to emphasize 𝑣 𝑗 is the component of a vector.
2
some authors will say ℝ𝑛×1 is dual to ℝ1×𝑛 since 𝛼𝑣 (𝑥) = 𝑣 𝑇 𝑥 and 𝑣 𝑇 is a row vector, I will avoid that langauge
in these notes.
3
direct proof of LI is left to the reader
7.2. MULTILINEARITY AND THE TENSOR PRODUCT 157

The calculation above is a prototype for many that follow in this chapter. Next, suppose 𝛼 ∈ (ℝ𝑛 )∗
𝑛
and suppose 𝑥 ∈ ℝ𝑛 with 𝑥 = 𝑗=1 𝑥𝑗 𝑒𝑗 . Calculate,

𝑛
(∑ ) 𝑛
∑ 𝑛

𝑖 𝑖
𝛼(𝑥) = 𝛼 𝑥 𝑒𝑖 = 𝛼(𝑒𝑖 )𝑒 (𝑥) ⇒ 𝛼= 𝛼(𝑒𝑖 )𝑒𝑖
𝑖=1 𝑖=1 𝑖=1

this shows every dual vector is in the span of the dual basis {𝑒𝑗 }𝑛𝑗=1 .

7.2 multilinearity and the tensor product


A multilinear mapping is a function of a Cartesian product of vector spaces which is linear with
respect to each ”slot”. The goal of this section is to explain what that means. It turns out the set
of all multilinear mappings on a particular set of vector spaces forms a vector space and we’ll show
how the tensor product can be used to construct an explicit basis by tensoring a bases which are
dual to the bases in the domain. We also examine the concepts of symmetric and antisymmetric
multilinear mappings, these form interesting subspaces of the set of all multilinear mappings. Our
approach in this section is to treat the case of bilinearity in depth then transition to the case of
multilinearity. Naturally this whole discussion demands a familarity with the preceding section.

7.2.1 bilinear maps


Definition 7.2.1.
Suppose 𝑉1 , 𝑉2 are vector spaces then 𝑏 : 𝑉1 × 𝑉2 → ℝ is a binear mapping on 𝑉1 × 𝑉2 iff
for all 𝑥, 𝑦 ∈ 𝑉1 , 𝑧, 𝑤 ∈ 𝑉2 and 𝑐 ∈ ℝ:

(1.) 𝑏(𝑐𝑥 + 𝑦, 𝑧) = 𝑐𝑏(𝑥, 𝑧) + 𝑏(𝑦, 𝑧) (linearity in the first slot)


(2.) 𝑏(𝑥, 𝑐𝑧 + 𝑤) = 𝑐𝑏(𝑥, 𝑧) + 𝑏(𝑥, 𝑤) (linearity in the second slot).

bilinear maps on 𝑉 × 𝑉
When 𝑉1 = 𝑉2 = 𝑉 we simply say that 𝑏 : 𝑉 × 𝑉 → ℝ is a bilinear mapping on 𝑉 . The set of
all bilinear maps of 𝑉 is denoted 𝑇02 𝑉 . You can show that 𝑇02 𝑉 forms a vector space under
the usual point-wise defined operations of function addition and scalar multiplication4 . Hopefully
you are familar with the example below.
Example 7.2.2. Define 𝑏 : ℝ𝑛 × ℝ𝑛 → ℝ by 𝑏(𝑥, 𝑦) = 𝑥 ⋅ 𝑦 for all 𝑥, 𝑦 ∈ ℝ𝑛 . Linearity in each slot
follows easily from properties of dot-products:

𝑏(𝑐𝑥 + 𝑦, 𝑧) = (𝑐𝑥 + 𝑦) ⋅ 𝑧 = 𝑐𝑥 ⋅ 𝑧 + 𝑦 ⋅ 𝑧 = 𝑐𝑏(𝑥, 𝑧) + 𝑏(𝑦, 𝑧)

𝑏(𝑥, 𝑐𝑦 + 𝑧) = 𝑥 ⋅ (𝑐𝑦 + 𝑧) = 𝑐𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧 = 𝑐𝑏(𝑥, 𝑦) + 𝑏(𝑥, 𝑧).


4
sounds like homework
158 CHAPTER 7. MULTILINEAR ALGEBRA

We can use matrix multiplication to generate a large class of examples with ease.
Example 7.2.3. Suppose 𝐴 ∈ ℝ 𝑛×𝑛 and define 𝑏 : ℝ𝑛 × ℝ𝑛 → ℝ by 𝑏(𝑥, 𝑦) = 𝑥𝑇 𝐴𝑦 for all
𝑥, 𝑦 ∈ ℝ𝑛 . Observe that, by properties of matrix multiplication,

𝑏(𝑐𝑥 + 𝑦, 𝑧) = (𝑐𝑥 + 𝑦)𝑇 𝐴𝑧 = (𝑐𝑥𝑇 + 𝑦 𝑇 )𝐴𝑧 = 𝑐𝑥𝑇 𝐴𝑧 + 𝑦 𝑇 𝐴𝑧 = 𝑐𝑏(𝑥, 𝑧) + 𝑏(𝑦, 𝑧)

𝑏(𝑥, 𝑐𝑦 + 𝑧) = 𝑥𝑇 𝐴(𝑐𝑦 + 𝑧) = 𝑐𝑥𝑇 𝐴𝑦 + 𝑥𝑇 𝐴𝑧 = 𝑐𝑏(𝑥, 𝑦) + 𝑏(𝑥, 𝑧)


for all 𝑥, 𝑦, 𝑧 ∈ ℝ𝑛 and 𝑐 ∈ ℝ. It follows that 𝑏 is bilinear on ℝ𝑛 .
Suppose 𝑏 : 𝑉 × 𝑉 → ℝ is bilinear and suppose 𝛽 = {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 } is a basis for 𝑉 whereas
𝛽 ∗ = {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 } is a basis of 𝑉 ∗ with 𝑒𝑗 (𝑒𝑖 ) = 𝛿𝑖𝑗
𝑛
(∑ 𝑛
∑ )
𝑖 𝑗
𝑏(𝑥, 𝑦) = 𝑏 𝑥 𝑒𝑖 , 𝑦 𝑒𝑗 (7.1)
𝑖=1 𝑗=1
𝑛

= 𝑏(𝑥𝑖 𝑒𝑖 , 𝑦 𝑗 𝑒𝑗 )
𝑖,𝑗=1
∑ 𝑛
= 𝑥𝑖 𝑦 𝑗 𝑏(𝑒𝑖 , 𝑒𝑗 )
𝑖,𝑗=1
∑ 𝑛
= 𝑏(𝑒𝑖 , 𝑒𝑗 )𝑒𝑖 (𝑥)𝑗 (𝑦)
𝑖,𝑗=1

Therefore, if we define 𝑏𝑖𝑗 = 𝑏(𝑒𝑖 , 𝑒𝑗 ) then we may compute 𝑏(𝑥, 𝑦) = 𝑛𝑖,𝑗=1 𝑏𝑖𝑗 𝑥𝑖 𝑦 𝑗 . The calculation

above also indicates that 𝑏 is a linear combination of certain basic bilinear mappings. In particular,
𝑏 can be written a linear combination of a tensor product of dual vectors on 𝑉 .
Definition 7.2.4.
Suppose 𝑉 is a vector space with dual space 𝑉 ∗ . If 𝛼, 𝛽 ∈ 𝑉 ∗ then we define 𝛼⊗𝛽 : 𝑉 ×𝑉 →
ℝ by (𝛼 ⊗ 𝛽)(𝑥, 𝑦) = 𝛼(𝑥)𝛽(𝑦) for all 𝑥, 𝑦 ∈ 𝑉 .
Given the notation5 preceding this definition, we note (𝑒𝑖 ⊗ 𝑒𝑗 )(𝑥, 𝑦) = 𝑒𝑖 (𝑥)𝑒𝑗 (𝑦) hence for all
𝑥, 𝑦 ∈ 𝑉 we find:
𝑛
∑ 𝑛

𝑏(𝑥, 𝑦) = 𝑏(𝑒𝑖 , 𝑒𝑗 )(𝑒𝑖 ⊗ 𝑒𝑗 )(𝑥, 𝑦) therefore, 𝑏 = 𝑏(𝑒𝑖 , 𝑒𝑗 )𝑒𝑖 ⊗ 𝑒𝑗
𝑖,𝑗=1 𝑖,𝑗=1

We find6 that 𝑇02 𝑉 = 𝑠𝑝𝑎𝑛{𝑒𝑖 ⊗𝑒𝑗 }𝑛𝑖,𝑗=1 . Moreover, it can be argued7 that {𝑒𝑖 ⊗𝑒𝑗 }𝑛𝑖,𝑗=1 is a linearly
independent set, therefore {𝑒𝑖 ⊗ 𝑒𝑗 }𝑛𝑖,𝑗=1 forms a basis for 𝑇02 𝑉 . We can count there are 𝑛2 vectors
5
perhaps you would rather write (𝑒𝑖 ⊗ 𝑒𝑗 )(𝑥, 𝑦) as 𝑒𝑖 ⊗ 𝑒𝑗 (𝑥, 𝑦), that is also fine.
6
with the help of your homework where you will show {𝑒𝑖 ⊗ 𝑒𝑗 }𝑛 2
𝑖,𝑗=1 ⊆ 𝑇0 𝑉
7
yes, again, in your homework
7.2. MULTILINEARITY AND THE TENSOR PRODUCT 159

in {𝑒𝑖 ⊗ 𝑒𝑗 }𝑛𝑖,𝑗=1 hence 𝑑𝑖𝑚( 𝑇02 𝑉 ) = 𝑛2 .

If 𝑉 = ℝ𝑛 and if {𝑒𝑖 }𝑛𝑖=1 denotes the standard dual basis, then there is a standard notation for
the set of coefficients found in the summation for 𝑏. In particular, we denote 𝐵 = [𝑏] where
𝐵𝑖𝑗 = 𝑏(𝑒𝑖 , 𝑒𝑗 ) hence, following Equation 7.1,

𝑛
∑ 𝑛 ∑
∑ 𝑛
𝑏(𝑥, 𝑦) = 𝑥𝑖 𝑦 𝑗 𝑏(𝑒𝑖 , 𝑒𝑗 ) = 𝑥𝑖 𝐵𝑖𝑗 𝑦 𝑗 = 𝑥𝑇 𝐵𝑦
𝑖,𝑗=1 𝑖=1 𝑗=1

Definition 7.2.5.

Suppose 𝑏 : 𝑉 × 𝑉 → ℝ is a bilinear mapping then we say:

1. 𝑏 is symmetric iff 𝑏(𝑥, 𝑦) = 𝑏(𝑦, 𝑥) for all 𝑥, 𝑦 ∈ 𝑉

2. 𝑏 is antisymmetric iff 𝑏(𝑥, 𝑦) = −𝑏(𝑦, 𝑥) for all 𝑥, 𝑦 ∈ 𝑉

Any bilinear mapping on 𝑉 can be written as the sum of a symmetric and antisymmetric bilinear
mapping, this claim follows easily from the calculation below:
( ) ( )
1 1
𝑏(𝑥, 𝑦) = 𝑏(𝑥, 𝑦) + 𝑏(𝑦, 𝑥) + 𝑏(𝑥, 𝑦) − 𝑏(𝑦, 𝑥) .
2 2
| {z } | {z }
𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐 𝑎𝑛𝑡𝑖𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐

We say 𝑆𝑖𝑗 is symmetric in 𝑖, 𝑗 iff 𝑆𝑖𝑗 = 𝑆𝑗𝑖 for all 𝑖, 𝑗. Likewise, we say 𝐴𝑖𝑗 is antisymmetric in
𝑖, 𝑗 iff 𝐴𝑖𝑗 = −𝐴𝑗𝑖 for all 𝑖, 𝑗. If 𝑆 is a symmetric bilinear mapping and 𝐴 is an antisymmetric bilinear
mapping then the components of 𝑆 are symmetric and the components of 𝐴 are antisymmetric.
Why? Simply note:
𝑆(𝑒𝑖 , 𝑒𝑗 ) = 𝑆(𝑒𝑗 , 𝑒𝑖 ) ⇒ 𝑆𝑖𝑗 = 𝑆𝑗𝑖

and
𝐴(𝑒𝑖 , 𝑒𝑗 ) = −𝐴(𝑒𝑗 , 𝑒𝑖 ) ⇒ 𝐴𝑖𝑗 = −𝐴𝑗𝑖 .

You can prove that the sum or scalar multiple of an (anti)symmetric bilinear mapping is once more
(anti)symmetric therefore the set of antisymmetric bilinear maps Λ2 (𝑉 ) and the set of symmetric
bilinear maps 𝑆𝑇20 𝑉 are subspaces of 𝑇20 𝑉 . The notation Λ2 (𝑉 ) is part of a larger discussion on
the wedge product, we will return to it in a later section.

Finally, if we consider the special case of 𝑉 = ℝ𝑛 once more we find that a bilinear mapping
𝑏 : ℝ𝑛 ×ℝ𝑛 → ℝ has a symmetric matrix [𝑏]𝑇 = [𝑏] iff 𝑏 is symmetric whereas it has an antisymmetric
matric [𝑏]𝑇 = −[𝑏] iff 𝑏 is antisymmetric.
160 CHAPTER 7. MULTILINEAR ALGEBRA

bilinear maps on 𝑉 ∗ × 𝑉 ∗
Suppose ℎ : 𝑉 ∗ ×𝑉 ∗ → ℝ is bilinear then we say ℎ ∈ 𝑇02 𝑉 . In addition, suppose 𝛽 = {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 }
is a basis for 𝑉 whereas 𝛽 ∗ = {𝑒1 , 𝑒2 , . . . , 𝑒𝑛 } is a basis of 𝑉 ∗ with 𝑒𝑗 (𝑒𝑖 ) = 𝛿𝑖𝑗 . Let 𝛼, 𝛽 ∈ 𝑉 ∗
𝑛
(∑ 𝑛
∑ )
𝑖 𝑗
ℎ(𝛼, 𝛽) = ℎ 𝛼𝑖 𝑒 , 𝛽𝑗 𝑒 (7.2)
𝑖=1 𝑗=1
𝑛

= ℎ(𝛼𝑖 𝑒𝑖 , 𝛽𝑗 𝑒𝑗 )
𝑖,𝑗=1
∑ 𝑛
= 𝛼𝑖 𝛽𝑗 ℎ(𝑒𝑖 , 𝑒𝑗 )
𝑖,𝑗=1
∑ 𝑛
= ℎ(𝑒𝑖 , 𝑒𝑗 )𝛼(𝑒𝑖 )𝛽(𝑒𝑗 )
𝑖,𝑗=1
∑𝑛
Therefore, if we define ℎ𝑖𝑗 = ℎ(𝑒𝑖 , 𝑒𝑗 ) then we find the nice formula ℎ(𝛼, 𝛽) = 𝑖𝑗
𝑖,𝑗=1 ℎ 𝛼𝑖 𝛽𝑗 . To
further refine the formula above we need a new concept.

The dual of the dual is called the double-dual and it is denoted 𝑉 ∗∗ . For a finite dimensional vector
space there is a cannonical isomorphism of 𝑉 and 𝑉 ∗∗ . In particular, Φ : 𝑉 → 𝑉 ∗∗ is defined by
Φ(𝑣)(𝛼) = 𝛼(𝑣) for all 𝛼 ∈ 𝑉 ∗ . It is customary to replace 𝑉 with 𝑉 ∗∗ wherever the context allows.
For example, to define the tensor product of two vectors 𝑥, 𝑦 ∈ 𝑉 as follows:
Definition 7.2.6.
Suppose 𝑉 is a vector space with dual space 𝑉 ∗ . We define the tensor product of vectors
𝑥, 𝑦 as the mapping 𝑥 ⊗ 𝑦 : 𝑉 ∗ × 𝑉 ∗ → ℝ by (𝑥 ⊗ 𝑦)(𝛼, 𝛽) = 𝛼(𝑥)𝛽(𝑦) for all 𝑥, 𝑦 ∈ 𝑉 .
We could just as well have defined 𝑥 ⊗ 𝑦 = Φ(𝑥) ⊗ Φ(𝑦) where Φ is once more the cannonical
isomorphism of 𝑉 and 𝑉 ∗∗ . It’s called cannonical because it has no particular dependendence on
the coordinates used on 𝑉 . In contrast, the isomorphism of ℝ𝑛 and (ℝ𝑛 )∗ was built around the
dot-product and the standard basis.

All of this said, note that 𝛼(𝑒𝑖 )𝛽(𝑒𝑗 ) = 𝑒𝑖 ⊗ 𝑒𝑗 (𝛼, 𝛽) thus,

𝑛
∑ 𝑛

ℎ(𝛼, 𝛽) = ℎ(𝑒𝑖 , 𝑒𝑗 )𝑒𝑖 ⊗ 𝑒𝑗 (𝛼, 𝛽) ⇒ ℎ= ℎ(𝑒𝑖 , 𝑒𝑗 )𝑒𝑖 ⊗ 𝑒𝑗
𝑖,𝑗=1 𝑖,𝑗=1

We argue that {𝑒𝑖 ⊗ 𝑒𝑗 }𝑛𝑖,𝑗=1 is a basis8


Definition 7.2.7.
8 2
𝑇0 𝑉 is a vector space and we’ve shown 𝑇02 (𝑉 ) ⊆ 𝑠𝑝𝑎𝑛{𝑒𝑖 ⊗ 𝑒𝑗 }𝑛 2
𝑖,𝑗=1 but we should also show 𝑒𝑖 ⊗ 𝑒𝑗 ∈ 𝑇0 and
check for LI of {𝑒𝑖 ⊗ 𝑒𝑗 }𝑛
𝑖,𝑗=1 .
7.2. MULTILINEARITY AND THE TENSOR PRODUCT 161

Suppose ℎ : 𝑉 ∗ × 𝑉 ∗ → ℝ is a bilinear mapping then we say:

1. ℎ is symmetric iff ℎ(𝛼, 𝛽) = ℎ(𝛽, 𝛼) for all 𝛼, 𝛽 ∈ 𝑉 ∗

2. ℎ is antisymmetric iff ℎ(𝛼, 𝛽) = −ℎ(𝛽, 𝛼) for all 𝛼, 𝛽 ∈ 𝑉 ∗

The discussion of the preceding subsection transfers to this context, we simply have to switch some
vectors to dual vectors and move some indices up or down. I leave this to the reader.

bilinear maps on 𝑉 × 𝑉 ∗
Suppose 𝐻 : 𝑉 × 𝑉 ∗ → ℝ is bilinear, we say 𝐻 ∈ 𝑇11 𝑉 (or, if the context demands this detail
𝐻 ∈ 𝑇1 1 𝑉 ). We define 𝛼 ⊗ 𝑥 ∈ 𝑇1 1 (𝑉 ) by the natural rule; (𝛼 ⊗ 𝑥)(𝑦, 𝛽) = 𝛼(𝑥)𝛽(𝑥) for all
(𝑦, 𝛽) ∈ 𝑉 × 𝑉 ∗ . We find, by calculations similar to those already given in this section,

𝑛 𝑛
𝑗 𝑖
𝐻𝑖 𝑗 𝑒𝑖 ⊗ 𝑒𝑗
∑ ∑
𝐻(𝑦, 𝛽) = 𝐻𝑖 𝑦 𝛽𝑗 and 𝐻=
𝑖,𝑗=1 𝑖,𝑗=1

𝑗
where we defined 𝐻𝑖 = 𝐻(𝑒𝑖 , 𝑒𝑗 ).

bilinear maps on 𝑉 ∗ × 𝑉
Suppose 𝐺 : 𝑉 ∗ × 𝑉 → ℝ is bilinear, we say 𝐺 ∈ 𝑇11 𝑉 (or, if the context demands this detail
𝐺 ∈ 𝑇 1 1 𝑉 ). We define 𝑥 ⊗ 𝛼 ∈ 𝑇 1 1 𝑉 by the natural rule; (𝑥 ⊗ 𝛼)(𝛽, 𝑦) = 𝛽(𝑥)𝛼(𝑦) for all
(𝛽, 𝑦) ∈ 𝑉 ∗ × 𝑉 . We find, by calculations similar to those already given in this section,

𝑛
∑ 𝑛

𝐺(𝛽, 𝑦) = 𝐺𝑖 𝑗 𝛽𝑖 𝑦 𝑗 and 𝐺= 𝐺𝑖 𝑗 𝑒 𝑖 ⊗ 𝑒 𝑗
𝑖,𝑗=1 𝑖,𝑗=1

where we defined 𝐺𝑖 𝑗 = 𝐺(𝑒𝑖 , 𝑒𝑗 ).

7.2.2 trilinear maps


Definition 7.2.8.

Suppose 𝑉1 , 𝑉2 , 𝑉3 are vector spaces then 𝑇 : 𝑉1 × 𝑉2 × 𝑉3 → ℝ is a trilinear mapping on


𝑉1 × 𝑉2 × 𝑉3 iff for all 𝑢, 𝑣 ∈ 𝑉1 , 𝑤, 𝑥 ∈ 𝑉2 . 𝑦, 𝑧 ∈ 𝑉3 and 𝑐 ∈ ℝ:

(1.) 𝑇 (𝑐𝑢 + 𝑣, 𝑤, 𝑦) = 𝑐𝑇 (𝑢, 𝑤, 𝑦) + 𝑇 (𝑣, 𝑤, 𝑦) (linearity in the first slot)


(2.) 𝑇 (𝑢, 𝑐𝑤 + 𝑥, 𝑦) = 𝑐𝑇 (𝑢, 𝑤, 𝑦) + 𝑇 (𝑢, 𝑥, 𝑦) (linearity in the second slot).
(3.) 𝑇 (𝑢, 𝑤, 𝑐𝑦 + 𝑧) = 𝑐𝑇 (𝑢, 𝑤, 𝑦) + 𝑇 (𝑢, 𝑤, 𝑧) (linearity in the third slot).
162 CHAPTER 7. MULTILINEAR ALGEBRA

If 𝑇 : 𝑉 × 𝑉 × 𝑉 → ℝ is trilinear on 𝑉 × 𝑉 × 𝑉 then we say 𝑇 is a trilinear mapping on 𝑉 and


we denote the set of all such mappings 𝑇30 𝑉 . The tensor product of three dual vectors is defined
much in the same way as it was for two,

(𝛼 ⊗ 𝛽 ⊗ 𝛾)(𝑥, 𝑦, 𝑧) = 𝛼(𝑥)𝛽(𝑦)𝛾(𝑧)

Let {𝑒𝑖 }𝑛𝑖=1 is a basis for 𝑉 with dual basis {𝑒𝑖 }𝑛𝑖=1 for 𝑉 ∗ . If 𝑇 is trilinear on 𝑉 it follows
𝑛
∑ 𝑛

𝑖 𝑗 𝑘
𝑇 (𝑥, 𝑦, 𝑧) = 𝑇𝑖𝑗𝑘 𝑥 𝑦 𝑧 and 𝑇 = 𝑇𝑖𝑗𝑘 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘
𝑖,𝑗,𝑘=1 𝑖,𝑗,𝑘=1

where we defined 𝑇𝑖𝑗𝑘 = 𝑇 (𝑒𝑖 , 𝑒𝑗 , 𝑒𝑘 ) for all 𝑖, 𝑗, 𝑘 ∈ ℕ𝑛 .

Generally suppose that 𝑉1 , 𝑉2 , 𝑉3 are possibly distinct vector spaces. Moreover, suppose 𝑉1 has basis
{𝑒𝑖 }𝑛𝑖=1
1
, 𝑉2 has basis {𝑓𝑗 }𝑛𝑗=1
2
and 𝑉3 has basis {𝑔𝑘 }𝑛𝑘=1
3
. Denote the dual bases for 𝑉1∗ , 𝑉2∗ , 𝑉3∗ in
𝑖 𝑛1 𝑗 𝑛1 𝑘 𝑛1
the usual fashion: {𝑒 }𝑖=1 , {𝑓 }𝑗=1 , {𝑔 }𝑘=1 . With this notation, we can write a trilinear mapping
on 𝑉1 × 𝑉2 × 𝑉3 as follows: (where we define 𝑇𝑖𝑗𝑘 = 𝑇 (𝑒𝑖 , 𝑓𝑗 , 𝑔𝑘 ))
𝑛1 ∑
∑ 𝑛2 ∑
𝑛3 𝑛1 ∑
∑ 𝑛2 ∑
𝑛3
𝑖 𝑗 𝑘
𝑇 (𝑥, 𝑦, 𝑧) = 𝑇𝑖𝑗𝑘 𝑥 𝑦 𝑧 and 𝑇 = 𝑇𝑖𝑗𝑘 𝑒𝑖 ⊗ 𝑓 𝑗 ⊗ 𝑔 𝑘
𝑖=1 𝑗=1 𝑘=1 𝑖=1 𝑗=1 𝑘=1

However, if 𝑉1 , 𝑉2 , 𝑉3 happen to be related by duality then it is customary to use up/down indices.


For example, if 𝑇 : 𝑉 × 𝑉 × 𝑉 ∗ → ℝ is trilinear then we write9
𝑛

𝑇 = 𝑇𝑖𝑗 𝑘 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘
𝑖,𝑗,𝑘=1

and say 𝑇 ∈ 𝑇2 1 𝑉 . On the other hand, if 𝑆 : 𝑉 ∗ × 𝑉 ∗ × 𝑉 is trilinear then we’d write


𝑛
𝑆 𝑖𝑗 𝑘 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘

𝑇 =
𝑖,𝑗,𝑘=1

and say 𝑇 ∈ 𝑇 2 1 𝑉 . I’m not sure that I’ve ever seen this notation elsewhere, but perhaps it could
be useful to denote the set of trinlinear maps 𝑇 : 𝑉 × 𝑉 ∗ × 𝑉 → ℝ as 𝑇1 1 1 𝑉 . Hopefully we will
not need such silly notation in what we consider this semester.

There was a natural correspondance between bilinear maps on ℝ𝑛 and square matrices. For a
trilinear map we would need a three-dimensional array of components. In some sense you could
picture 𝑇 : ℝ𝑛 × ℝ𝑛 × ℝ𝑛 → ℝ as multiplication by a cube of numbers. Don’t think too hard
about these silly comments, we actually already wrote the useful formulae for dealing with trilinear
objects. Let’s stop to look at an example.
9
we identify 𝑒𝑘 with its double-dual hence this tensor product is already defined, but to be safe let me write it out
in this context 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 (𝑥, 𝑦, 𝛼) = 𝑒𝑖 (𝑥)𝑒𝑗 (𝑦)𝛼(𝑒𝑘 ).
7.2. MULTILINEARITY AND THE TENSOR PRODUCT 163

Example 7.2.9. Define 𝑇 : ℝ3 × ℝ3 × ℝ3 → ℝ by 𝑇 (𝑥, 𝑦, 𝑧) = 𝑑𝑒𝑡(𝑥∣𝑦∣𝑧). You may not have


learned this in your linear algebra course10 but a nice formula11 for the determinant is given by the
Levi-Civita symbol,
∑3
𝑑𝑒𝑡(𝐴) = 𝜖𝑖𝑗𝑘 𝐴𝑖1 𝐴𝑗2 𝐴𝑘3
𝑖,𝑗,𝑘=1

note that 𝑐𝑜𝑙1 (𝐴) = [𝐴𝑖1 ], 𝑐𝑜𝑙2 (𝐴) = [𝐴𝑖2 ] and 𝑐𝑜𝑙3 (𝐴) = [𝐴𝑖3 ]. It follows that
3

𝑇 (𝑥, 𝑦, 𝑧) = 𝜖𝑖𝑗𝑘 𝑥𝑖 𝑦 𝑗 𝑧 𝑘
𝑖,𝑗,𝑘=1

Multilinearity follows easily from this formula. For example, linearity in the third slot:
3

𝑇 (𝑥, 𝑦, 𝑐𝑧 + 𝑤) = 𝜖𝑖𝑗𝑘 𝑥𝑖 𝑦 𝑗 (𝑐𝑧 + 𝑤)𝑘 (7.3)
𝑖,𝑗,𝑘=1
3

= 𝜖𝑖𝑗𝑘 𝑥𝑖 𝑦 𝑗 (𝑐𝑧 𝑘 + 𝑤𝑘 ) (7.4)
𝑖,𝑗,𝑘=1
3
∑ 3

𝑖 𝑗 𝑘
=𝑐 𝜖𝑖𝑗𝑘 𝑥 𝑦 𝑧 + 𝜖𝑖𝑗𝑘 𝑥𝑖 𝑦 𝑗 𝑤𝑘 (7.5)
𝑖,𝑗,𝑘=1 𝑖,𝑗,𝑘=1

= 𝑐𝑇 (𝑥, 𝑦, 𝑧) + 𝑇 (𝑥, 𝑦, 𝑤). (7.6)

Observe that by properties of determinants, or the Levi-Civita symbol if you prefer, swapping a pair
of inputs generates a minus sign, hence:

𝑇 (𝑥, 𝑦, 𝑧) = −𝑇 (𝑦, 𝑥, 𝑧) = 𝑇 (𝑦, 𝑧, 𝑥) = −𝑇 (𝑧, 𝑦, 𝑥) = 𝑇 (𝑧, 𝑥, 𝑦) = −𝑇 (𝑥, 𝑧, 𝑦).

If 𝑇 : 𝑉 × 𝑉 × 𝑉 → ℝ is a trilinear mapping such that

𝑇 (𝑥, 𝑦, 𝑧) = −𝑇 (𝑦, 𝑥, 𝑧) = 𝑇 (𝑦, 𝑧, 𝑥) = −𝑇 (𝑧, 𝑦, 𝑥) = 𝑇 (𝑧, 𝑥, 𝑦) = −𝑇 (𝑥, 𝑧, 𝑦)

for all 𝑥, 𝑦, 𝑧 ∈ 𝑉 then we say 𝑇 is antisymmetric. Likewise, if 𝑆 : 𝑉 × 𝑉 × 𝑉 → ℝ is a trilinear


mapping such that

𝑆(𝑥, 𝑦, 𝑧) = −𝑆(𝑦, 𝑥, 𝑧) = 𝑆(𝑦, 𝑧, 𝑥) = −𝑆(𝑧, 𝑦, 𝑥) = 𝑆(𝑧, 𝑥, 𝑦) = −𝑆(𝑥, 𝑧, 𝑦).

for all 𝑥, 𝑦, 𝑧 ∈ 𝑉 then we say 𝑇 is symmetric. Clearly the mapping defined by the determinant
is antisymmetric. In fact, many authors define the determinant of an 𝑛 × 𝑛 matrix as the antisym-
metric 𝑛-linear mapping which sends the identity matrix to 1. It turns out these criteria unquely
10
maybe you haven’t even taken linear yet!
11
actually, I take this as the definition in linear algebra, it does take considerable effort to recover the expansion
by minors formula which I use for concrete examples
164 CHAPTER 7. MULTILINEAR ALGEBRA

define the determinant. That is the motivation behind my Levi-Civita symbol definition. That
formula is just the nuts and bolts of complete antisymmetry.

You might wonder, can every trilinear mapping can be written as a the sum of a symmetric and
antisymmetric mapping? The answer is no. Take the following trilinear mapping on ℝ3 for example:

𝑇 (𝑥, 𝑦, 𝑧) = 𝑑𝑒𝑡[𝑥∣𝑦∣𝑒3 ] + 𝑦 ⋅ 𝑧

You can verify this is linear in each slot however, it is antisymetric in the first pair of slots

𝑇 (𝑥, 𝑦, 𝑧) = 𝑑𝑒𝑡[𝑥∣𝑦∣𝑒3 ] + 𝑦 ⋅ 𝑧 = −𝑑𝑒𝑡[𝑦∣𝑥∣𝑒3 ] + 𝑦 ⋅ 𝑧 = −𝑇 (𝑦, 𝑥, 𝑧)

and symmetric in the last pair,

𝑇 (𝑥, 𝑦, 𝑧) = 𝑑𝑒𝑡[𝑥∣𝑦∣𝑒3 ] + 𝑦 ⋅ 𝑧 = 𝑑𝑒𝑡[𝑥∣𝑦∣𝑒3 ] + 𝑧 ⋅ 𝑦 = 𝑇 (𝑥, 𝑧, 𝑦).

Generally, the decomposition of a multilinear mapping into more basic types is a problem which
requires much more thought than we intend here. Representation theory is concerned with precisely
this problem: how can we decompose a tensor product into irreducible pieces. Their idea of tensor
product is not precisely the same as ours, however algebraically the problems are quite intertwined.
I’ll leave it at that unless you’d like to do an independent study on representation theory. Ideally
you’d already have linear algebra and abstract algebra complete before you attempt that study.

7.2.3 multilinear maps


Definition 7.2.10.
Suppose 𝑉1 , 𝑉2 , . . . 𝑉𝑘 are vector spaces then 𝑇 : 𝑉1 × 𝑉2 × ⋅ ⋅ ⋅ × 𝑉𝑘 → ℝ is a 𝑘-multilinear
mapping on 𝑉1 ×𝑉2 ×⋅ ⋅ ⋅×𝑉𝑘 iff for each 𝑐 ∈ ℝ and 𝑥1 , 𝑦1 ∈ 𝑉1 , 𝑥2 , 𝑦2 ∈ 𝑉2 , . . . , 𝑥𝑘 , 𝑦𝑘 ∈ 𝑉𝑘

𝑇 (𝑥1 , . . . , 𝑐𝑥𝑗 + 𝑦𝑗 , . . . , 𝑥𝑘 ) = 𝑐𝑇 (𝑥1 , . . . , 𝑥𝑗 , . . . , 𝑥𝑘 ) + 𝑇 (𝑥1 , . . . , 𝑦𝑗 , . . . , 𝑥𝑘 )

for 𝑗 = 1, 2, . . . , 𝑘. In other words, we assume 𝑇 is linear in each of its 𝑘-slots. If 𝑇 is


multilinear on 𝑉 𝑟 × (𝑉 ∗ )𝑠 then we say that 𝑇 ∈ 𝑇𝑟𝑠 𝑉 and we say 𝑇 is a type (𝑟, 𝑠) tensor
on 𝑉 .
The definition above makes a dual vector a type (1, 0) tensor whereas a double dual of a vector a
type (0, 1) tensor, a bilinear mapping on 𝑉 is a type (2, 0) tensor and a bilinear mapping on 𝑉 ∗ is
a type (0, 2) tensor with respect to 𝑉 .

We are free to define tensor products in this context in the same manner as we have previously.
Suppose 𝛼1 ∈ 𝑉1∗ , 𝛼2 ∈ 𝑉2∗ , . . . , 𝛼𝑘 ∈ 𝑉𝑘∗ and 𝑣1 ∈ 𝑉1 , 𝑣2 ∈ 𝑉2 , . . . , 𝑣𝑘 ∈ 𝑉𝑘 then

𝛼1 ⊗ 𝛼2 ⊗ ⋅ ⋅ ⋅ ⊗ 𝛼𝑘 (𝑣1 , 𝑣2 , . . . , 𝑣𝑘 ) = 𝛼1 (𝑣1 )𝛼2 (𝑣2 ) ⋅ ⋅ ⋅ 𝛼𝑘 (𝑣𝑘 )

It is easy to show the tensor produce of 𝑘-dual vectors as defined above is indeed a 𝑘-multilinear
mapping. Moreover, the set of all 𝑘-multilinear mappings on 𝑉1 × 𝑉2 × ⋅ ⋅ ⋅ × 𝑉𝑘 clearly forms a
7.2. MULTILINEARITY AND THE TENSOR PRODUCT 165

vector space of dimension 𝑑𝑖𝑚(𝑉1 )𝑑𝑖𝑚(𝑉2 ) ⋅ ⋅ ⋅ 𝑑𝑖𝑚(𝑉𝑘 ) since it naturally takes the tensor product of
the dual bases for 𝑉1∗ , 𝑉2∗ , . . . , 𝑉𝑘∗ as its basis. In particular, suppose for 𝑗 = 1, 2, . . . , 𝑘 that 𝑉𝑗 has
𝑛𝑗 𝑛𝑗
basis {𝐸𝑗𝑖 }𝑖=1 which is dual to {𝐸𝑗𝑖 }𝑖=1 the basis for 𝑉𝑗∗ . Then we can derive that a 𝑘-multilinear
mapping can be written as
𝑛1 ∑
∑ 𝑛2 𝑛𝑘

𝑇 = ⋅⋅⋅ 𝑇𝑖1 𝑖2 ...𝑖𝑘 𝐸1𝑖1 ⊗ 𝐸2𝑖2 ⊗ 𝐸𝑘𝑖𝑘
𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

If 𝑇 is a type (𝑟, 𝑠) tensor on 𝑉 then there is no need for the ugly double indexing on the basis
since we need only tensor a basis {𝑒𝑖 }𝑛𝑖=1 for 𝑉 and its dual {𝑒𝑖 }𝑛𝑖=1 for 𝑉 ∗ in what follows:
𝑛 𝑛
𝑇𝑖𝑗11𝑖𝑗22...𝑖
...𝑗𝑠 𝑖1
∑ ∑
𝑇 = 𝑟
𝑒 ⊗ 𝑒𝑖2 ⊗ ⋅ ⋅ ⋅ ⊗ 𝑒𝑖𝑟 ⊗ 𝑒𝑗1 ⊗ 𝑒𝑗2 ⊗ ⋅ ⋅ ⋅ ⊗ 𝑒𝑗𝑠 .
𝑖1 ,...,𝑖𝑟 =1 𝑗1 ,...,𝑗𝑠 =1

permutations
Before I define symmetric and antisymmetric for 𝑘-linear mappings on 𝑉 I think it is best to discuss
briefly some ideas from the theory of permutations.

Definition 7.2.11.
A permutation on {1, 2, . . . 𝑝} is a bijection on {1, 2, . . . 𝑝}. We define the set of permutations
on {1, 2, . . . 𝑝} to be Σ𝑝 . Further, define the sign of a permutation to be 𝑠𝑔𝑛(𝜎) = 1 if 𝜎 is
the product of an even number of transpositions whereas 𝑠𝑔𝑛(𝜎) = −1 if 𝜎 is the product
of a odd number transpositions.
Let us consider the set of permutations on {1, 2, 3, . . . 𝑛}, this is called 𝑆𝑛 the symmetric group,
its order is 𝑛! if you were wondering. Let me remind12 you how the cycle notation works since it
allows us to explicitly present the number of transpositions contained in a permutation,
( )
1 2 3 4 5 6
𝜎= ⇐⇒ 𝜎 = (12)(356) = (12)(36)(35) (7.7)
2 1 5 4 6 3

recall the cycle notation is to be read right to left. If we think about inputing 5 we can read from
the matrix notation that we ought to find 5 7→ 6. Clearly that is the case for the first version of
𝜎 written in cycle notation; (356) indicates that 5 7→ 6 and nothing else messes with 6 after that.
Then consider feeding 5 into the version of 𝜎 written with just two-cycles (a.k.a. transpositions ),
first we note (35) indicates 5 7→ 3, then that 3 hits (36) which means 3 7→ 6, finally the cycle (12)
doesn’t care about 6 so we again have that 𝜎(5) = 6. Finally we note that 𝑠𝑔𝑛(𝜎) = −1 since it is
made of 3 transpositions.

It is always possible to write any permutation as a product of transpositions, such a decomposition


is not unique. However, if the number of transpositions is even then it will remain so no matter
12
or perhaps, more likely, introduce you to this notation
166 CHAPTER 7. MULTILINEAR ALGEBRA

how we rewrite the permutation. Likewise if the permutation is an product of an odd number of
transpositions then any other decomposition into transpositions is also comprised of an odd number
of transpositions. This is why we can define an even permutation is a permutation comprised by
an even number of transpositions and an odd permutation is one comprised of an odd number of
transpositions.

Example 7.2.12. Sample cycle calculations: we rewrite as product of transpositions to deter-


min if the given permutation is even or odd,

𝜎 = (12)(134)(152) = (12)(14)(13)(12)(15) =⇒ 𝑠𝑔𝑛(𝜎) = −1

𝜆 = (1243)(3521) = (13)(14)(12)(31)(32)(35) =⇒ 𝑠𝑔𝑛(𝜆) = 1


𝛾 = (123)(45678) = (13)(12)(48)(47)(46)(45) =⇒ 𝑠𝑔𝑛(𝛾) = 1

We will not actually write down permutations in the calculations the follow this part of the notes.
I merely include this material as to give a logically complete account of antisymmetry. In practice,
if you understood the terms as the apply to the bilinear and trilinear case it will usually suffice for
concrete examples. Now we are ready to define symmetric and antisymmetric.

Definition 7.2.13.
A 𝑘-linear mapping 𝐿 : 𝑉 × 𝑉 × ⋅ ⋅ ⋅ × 𝑉 → ℝ is completely symmetric if

𝐿(𝑥1 , . . . , 𝑥, . . . , 𝑦, . . . , 𝑥𝑘 ) = 𝐿(𝑥1 , . . . , 𝑦, . . . , 𝑥, . . . , 𝑥𝑘 )

for all possible 𝑥, 𝑦 ∈ 𝑉 . Conversely, if a 𝑘-linear mapping on 𝑉 has

𝐿(𝑥1 , . . . , 𝑥, . . . , 𝑦, . . . , 𝑥𝑝 ) = −𝐿(𝑥1 , . . . , 𝑦, . . . , 𝑥, . . . , 𝑥𝑝 )

for all possible pairs 𝑥, 𝑦 ∈ 𝑉 then it is said to be completely antisymmetric or alter-


nating. Equivalently a 𝑘-linear mapping L is alternating if for all 𝜋 ∈ Σ𝑘

𝐿(𝑥𝜋1 , 𝑥𝜋2 , . . . , 𝑥𝜋𝑘 ) = 𝑠𝑔𝑛(𝜋)𝐿(𝑥1 , 𝑥2 , . . . , 𝑥𝑘 )

The set of alternating multilinear mappings on 𝑉 is denoted Λ𝑉 , the set of 𝑘-linear alter-
nating maps on 𝑉 is denoted Λ𝑘 𝑉 . Often an alternating 𝑘-linear map is called a 𝑘-form.
Moreover, we say the degree of a 𝑘-form is 𝑘.
Similar terminology applies to the components of tensors. We say 𝑇𝑖1 𝑖2 ...𝑖𝑘 is completely symmetric
in 𝑖1 , 𝑖2 , . . . , 𝑖𝑘 iff 𝑇𝑖1 𝑖2 ...𝑖𝑘 = 𝑇𝑖𝜎(1) 𝑖𝜎(2) ...𝑖𝜎(𝑘) for all 𝜎 ∈ Σ𝑘 . On the other hand, 𝑇𝑖1 𝑖2 ...𝑖𝑘 is completely
antisymmetric in 𝑖1 , 𝑖2 , . . . , 𝑖𝑘 iff 𝑇𝑖1 𝑖2 ...𝑖𝑘 = 𝑠𝑔𝑛(𝜎)𝑇𝑖𝜎(1) 𝑖𝜎(2) ...𝑖𝜎(𝑘) for all 𝜎 ∈ Σ𝑘 . It is a simple
exercise to show that a completely (anti)symmetric tensor13 has completely (anti)symmetric com-
ponents.

13
in this context a tensor is simply a multilinear mapping, in physics there is more attached to the term
7.3. WEDGE PRODUCT 167

The tensor product is an interesting construction to discuss at length. To summarize, it is asso-


ciative and distributive across addition. Scalars factor out and it is not generally commutative.
For a given vector space 𝑉 we can in principle generate by tensor products multilinear mappings
of arbitrarily high order. This tensor algebra is infinite dimensional. In contrast, the space Λ𝑉 of
forms on 𝑉 is a finite-dimensional subspace of the tensor algebra. We discuss this next.

7.3 wedge product


We assume 𝑉 is a vector space with basis {𝑒𝑖 }𝑛𝑖=1 throughout this section. The dual basis is denoted
{𝑒𝑖 }𝑛𝑖=1 as is our usual custom. Our goal is to find a basis for the alternating maps on 𝑉 and explore
the structure implicit within its construction. This will lead us to call Λ𝑉 the exterior algebra
of 𝑉 after the discussion below is complete.

7.3.1 wedge product of dual basis generates basis for Λ𝑉


Suppose 𝑏 : 𝑉 × 𝑉 → ℝ is antisymmetric and 𝑏 = 𝑛𝑖.𝑗=1 𝑏𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗 , it follows that 𝑏𝑖𝑗 = −𝑏𝑗𝑖 for all

𝑖, 𝑗 ∈ ℕ𝑛 . Notice this implies that 𝑏𝑖𝑖 = 0 for 𝑖 = 1, 2, . . . , 𝑛. For a given pair of indices 𝑖, 𝑗 either
𝑖 < 𝑗 or 𝑗 < 𝑖 or 𝑖 = 𝑗 hence,
∑ ∑ ∑
𝑏= 𝑏𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗 + 𝑏𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗 + 𝑏𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗
𝑖<𝑗 𝑗<𝑖 𝑖=𝑗
∑ ∑
𝑖 𝑗 𝑖 𝑗
= 𝑏𝑖𝑗 𝑒 ⊗ 𝑒 + 𝑏𝑖𝑗 𝑒 ⊗ 𝑒
𝑖<𝑗 𝑗<𝑖
∑ ∑
𝑖 𝑗
= 𝑏𝑖𝑗 𝑒 ⊗ 𝑒 − 𝑏𝑗𝑖 𝑒𝑖 ⊗ 𝑒𝑗
𝑖<𝑗 𝑗<𝑖
∑ ∑
= 𝑏𝑘𝑙 𝑒𝑘 ⊗ 𝑒𝑙 − 𝑏𝑘𝑙 𝑒𝑙 ⊗ 𝑒𝑘
𝑘<𝑙 𝑘<𝑙

= 𝑏𝑘𝑙 (𝑒 ⊗ 𝑒 − 𝑒𝑙 ⊗ 𝑒𝑘 ).
𝑘 𝑙
(7.8)
𝑘<𝑙

Therefore, {𝑒𝑘 ⊗ 𝑒𝑙 − 𝑒𝑙 ⊗ 𝑒𝑘 ∣𝑙, 𝑘 ∈ ℕ𝑛 and 𝑙 < 𝑘} spans the set of antisymmetric bilinear maps on
𝑉 . Moreover, you can show this set is linearly independent hence it is a basis fo Λ2 𝑉 . We define
the wedge product of 𝑒𝑘 ∧ 𝑒𝑙 = 𝑒𝑘 ⊗ 𝑒𝑙 − 𝑒𝑙 ⊗ 𝑒𝑘 . With this notation we find that the alternating
bilinear form 𝑏 can be written as
𝑛

𝑘
∑ 1𝑙
𝑏= 𝑏𝑘𝑙 𝑒 ∧ 𝑒 = 𝑏𝑖𝑗 𝑒𝑖 ∧ 𝑒𝑗
2
𝑘<𝑙 𝑖,𝑗=1

where the summation on the r.h.s. is over all indices14 . Notice that 𝑒𝑖 ∧ 𝑒𝑗 is an antisymmetric
bilinear mapping because 𝑒𝑖 ∧ 𝑒𝑗 (𝑥, 𝑦) = −𝑒𝑖 ∧ 𝑒𝑗 (𝑦, 𝑥), however, there is more structure here than
14
yes there is something to work out here, probably in your homework
168 CHAPTER 7. MULTILINEAR ALGEBRA

just that. It is also true that 𝑒𝑖 ∧ 𝑒𝑗 = −𝑒𝑗 ∧ 𝑒𝑖 . This is a conceptually different antisymmetry, it
is the antisymmetry of the wedge produce ∧.

Suppose 𝑏 : 𝑉 × 𝑉 × 𝑉 → ℝ is antisymmetric and 𝑏 = 𝑛𝑖,𝑗,𝑘=1 𝑏𝑖𝑗𝑘 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 , it follows that



𝑏𝑖𝑗𝑘 = 𝑏𝑗𝑘𝑖 = 𝑏𝑘𝑖𝑗 and 𝑏𝑖𝑗𝑘 = −𝑏𝑘𝑗𝑖 = −𝑏𝑗𝑖𝑘 = 𝑏𝑖𝑘𝑗 for all 𝑖, 𝑗, 𝑘 ∈ ℕ𝑛 . Notice this implies that
𝑏𝑖𝑖𝑖 = 0 for 𝑖 = 1, 2, . . . , 𝑛. A calculation similar to the one just offered for the case of a bilinear
map reveals that we can write 𝑏 as follows:
∑ (
𝑏= 𝑏𝑖𝑗𝑘 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 + 𝑒𝑗 ⊗ 𝑒𝑘 ⊗ 𝑒𝑖 + 𝑒𝑘 ⊗ 𝑒𝑖 ⊗ 𝑒𝑗
𝑖<𝑗<𝑘
)
𝑘 𝑗 𝑖 𝑗 𝑖 𝑘 𝑖 𝑘 𝑗
−𝑒 ⊗𝑒 ⊗𝑒 −𝑒 ⊗𝑒 ⊗𝑒 −𝑒 ⊗𝑒 ⊗𝑒 (7.9)

Define 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 = 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 + 𝑒𝑗 ⊗ 𝑒𝑘 ⊗ 𝑒𝑖 + 𝑒𝑘 ⊗ 𝑒𝑖 ⊗ 𝑒𝑗 − 𝑒𝑘 ⊗ 𝑒𝑗 ⊗ 𝑒𝑖 − 𝑒𝑗 ⊗ 𝑒𝑖 ⊗ 𝑒𝑘 − 𝑒𝑖 ⊗ 𝑒𝑘 ⊗ 𝑒𝑗
thus
𝑛

𝑖 𝑗 𝑘
∑ 1
𝑏= 𝑏𝑖𝑗𝑘 𝑒 ∧ 𝑒 ∧ 𝑒 = 𝑏𝑖𝑗𝑘 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 (7.10)
3!
𝑖<𝑗<𝑘 𝑖,𝑗,𝑘=1

and it is clear that {𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 ∣ 𝑖, 𝑗, 𝑘 ∈ ℕ𝑛 and 𝑖 < 𝑗 < 𝑘} forms a basis for the set of alternating
trilinear maps on 𝑉 .

Following the patterns above, we define the wedge product of 𝑝 dual basis vectors,

𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 = 𝑠𝑔𝑛(𝜋)𝑒𝑖𝜋(1) ⊗ 𝑒𝑖𝜋(2) ⊗ ⋅ ⋅ ⋅ ⊗ 𝑒𝑖𝜋(𝑝) (7.11)
𝜋∈Σ𝑝

If 𝑥, 𝑦 ∈ 𝑉 we would like to show that

𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑥, . . . , 𝑦, . . . ) = −𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑦, . . . , 𝑥, . . . ) (7.12)

follows from the complete antisymmetrization in the definition of the wedge product. Before we
give the general argument, let’s see how this works in the trilinear case. Consider, 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 =

= 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 + 𝑒𝑗 ⊗ 𝑒𝑘 ⊗ 𝑒𝑖 + 𝑒𝑘 ⊗ 𝑒𝑖 ⊗ 𝑒𝑗 − 𝑒𝑘 ⊗ 𝑒𝑗 ⊗ 𝑒𝑖 − 𝑒𝑗 ⊗ 𝑒𝑖 ⊗ 𝑒𝑘 − 𝑒𝑖 ⊗ 𝑒𝑘 ⊗ 𝑒𝑗 .

Calculate, noting that 𝑒𝑖 ⊗ 𝑒𝑗 ⊗ 𝑒𝑘 (𝑥, 𝑦, 𝑧) = 𝑒𝑖 (𝑥)𝑒𝑗 (𝑦)𝑒𝑘 (𝑧) = 𝑥𝑖 𝑦 𝑗 𝑧 𝑘 hence

𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 (𝑥, 𝑦, 𝑧) = 𝑥𝑖 𝑦 𝑗 𝑧 𝑘 + 𝑥𝑗 𝑦 𝑘 𝑧 𝑖 + 𝑥𝑘 𝑦 𝑖 𝑧 𝑗 − 𝑥𝑘 𝑦 𝑗 𝑧 𝑖 − 𝑥𝑗 𝑦 𝑖 𝑧 𝑘 − 𝑥𝑖 𝑦 𝑘 𝑧 𝑗

Thus,
𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 (𝑥, 𝑧, 𝑦) = 𝑥𝑖 𝑧 𝑗 𝑦 𝑘 + 𝑥𝑗 𝑧 𝑘 𝑦 𝑖 + 𝑥𝑘 𝑧 𝑖 𝑦 𝑗 − 𝑥𝑘 𝑧 𝑗 𝑦 𝑖 − 𝑥𝑗 𝑧 𝑖 𝑦 𝑘 − 𝑥𝑖 𝑧 𝑘 𝑦 𝑗
and you can check that 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 (𝑥, 𝑦, 𝑧) = −𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 (𝑥, 𝑧, 𝑦). Similar tedious calculations prove
antisymmetry of the the interchange of the first and second or the first and third slots. Therefore,
𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 is an alternating trilinear map as it is clearly trilinear since it is built from the sum of
7.3. WEDGE PRODUCT 169

tensor products which we know are likewise trilinear.

The multilinear case follows essentially the same argument, note


∑ 𝑖 𝑖 𝑖 𝑖
𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑥𝑗 , . . . , 𝑥𝑘 , . . . ) = 𝑠𝑔𝑛(𝜋)𝑥1𝜋(1) ⋅ ⋅ ⋅ 𝑥𝑗𝜋(𝑗) ⋅ ⋅ ⋅ 𝑥𝑘𝜋(𝑘) ⋅ ⋅ ⋅ 𝑥𝑝𝜋(𝑝) (7.13)
𝜋∈Σ𝑝

whereas,
∑ 𝑖 𝑖 𝑖 𝑖
𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑥𝑘 , . . . , 𝑥𝑗 , . . . ) = 𝑠𝑔𝑛(𝜎)𝑥1𝜎(1) ⋅ ⋅ ⋅ 𝑥𝑘𝜎(𝑘) ⋅ ⋅ ⋅ 𝑥𝑗𝜎(𝑗) ⋅ ⋅ ⋅ 𝑥𝑝𝜎(𝑝) . (7.14)
𝜎∈Σ𝑝

Suppose we take each permutation 𝜎 and subsitute 𝛿 ∈ Σ𝑝 such that 𝜎(𝑗) = 𝛿(𝑘) and 𝜎(𝑘) = 𝛿(𝑗)
and otherwise 𝛿 and 𝜎 agree. In cycle notation, 𝛿(𝑗𝑘) = 𝜎. Substitution 𝛿 into Equation 7.14:
𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑥𝑘 , . . . , 𝑥𝑗 , . . . )
∑ 𝑖 𝑖 𝑖 𝑖
= 𝑠𝑔𝑛(𝛿(𝑗𝑘))𝑥1𝛿(1) ⋅ ⋅ ⋅ 𝑥𝑘𝛿(𝑗) ⋅ ⋅ ⋅ 𝑥𝑗𝛿(𝑘) ⋅ ⋅ ⋅ 𝑥𝑝𝛿(𝑝)
𝛿∈Σ𝑝
∑ 𝑖 𝑖 𝑖 𝑖
=− 𝑠𝑔𝑛(𝛿)𝑥1𝛿(1) ⋅ ⋅ ⋅ 𝑥𝑗𝛿(𝑘) ⋅ ⋅ ⋅ 𝑥𝑘𝛿(𝑗) ⋅ ⋅ ⋅ 𝑥𝑝𝛿(𝑝)
𝛿∈Σ𝑝
= −𝑒 ∧ 𝑒 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 (. . . , 𝑥𝑗 , . . . , 𝑥𝑘 , . . . )
𝑖1 𝑖2
(7.15)
Here the 𝑠𝑔𝑛 of a permutation 𝜎 is (−1)𝑁 where 𝑁 is the number of cycles in 𝜎. We observed
that 𝛿(𝑗𝑘) has one more cycle than 𝛿 hence 𝑠𝑔𝑛(𝛿(𝑗𝑘)) = −𝑠𝑔𝑛(𝛿). Therefore, we have shown that
𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 ∈ Λ𝑝 𝑉 .

Recall that 𝑒𝑖 ∧ 𝑒𝑗 = −𝑒𝑗 ∧ 𝑒𝑖 in the 𝑝 = 2 case. There is a generalization of that result to the
𝑝 > 2 case. In words, the wedge product is antisymetric with respect the interchange of any two
dual vectors. For 𝑝 = 3 we have the following identities for the wedge product:
∧ 𝑒}𝑖 ∧𝑒𝑘 = 𝑒𝑗 ∧ 𝑒|𝑘 {z
𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 = − 𝑒|𝑗 {z ∧ 𝑒}𝑖 = − 𝑒|𝑘 {z
∧ 𝑒}𝑗 ∧𝑒𝑖 = 𝑒𝑘 ∧ 𝑒|𝑖 {z
∧ 𝑒}𝑗 = − 𝑒|𝑖 {z
∧ 𝑒𝑘} ∧𝑒𝑗
𝑠𝑤𝑎𝑝𝑝𝑒𝑑 𝑠𝑤𝑎𝑝𝑝𝑒𝑑 𝑠𝑤𝑎𝑝𝑝𝑒𝑑 𝑠𝑤𝑎𝑝𝑝𝑒𝑑 𝑠𝑤𝑎𝑝𝑝𝑒𝑑

I’ve indicated how these signs are consistent with the 𝑝 = 2 antisymmetry. Any permutation of
the dual vectors can be thought of as a combination of several transpositions. In any event, it is
sometimes useful to just know that the wedge product of three elements is invariant under cyclic
permutations of the dual vectors,
𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 = 𝑒𝑗 ∧ 𝑒𝑘 ∧ 𝑒𝑖 = 𝑒𝑘 ∧ 𝑒𝑖 ∧ 𝑒𝑗
and changes by a sign for anticyclic permutations of the given object,
𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 = −𝑒𝑗 ∧ 𝑒𝑖 ∧ 𝑒𝑘 = −𝑒𝑘 ∧ 𝑒𝑗 ∧ 𝑒𝑖 = −𝑒𝑖 ∧ 𝑒𝑘 ∧ 𝑒𝑗
Generally we can argue that, for any permutation 𝜋 ∈ Σ𝑝 :

𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 = 𝑠𝑔𝑛(𝜋)𝑒𝑖𝜋(1) ∧ 𝑒𝑖𝜋(2) ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝜋(𝑝)

This is just a slick formula which says the wedge product generates a minus whenever you flip two
dual vectors which are wedged.
170 CHAPTER 7. MULTILINEAR ALGEBRA

7.3.2 the exterior algebra


The careful reader will realize we have yet to define wedge products of anything except for the dual
basis. But, naturally you must wonder if we can take the wedge product of other dual vectors or
morer generally alternating tensors. The answer is yes. Let us define the general wedge product:
Definition 7.3.1. Suppose 𝛼 ∈ Λ𝑝 𝑉 and 𝛽 ∈ Λ𝑞 𝑉 . We define ℐ𝑝 to be the set of all increasing lists
of 𝑝-indices, this set can be empty if 𝑑𝑖𝑚(𝑉 ) is not sufficiently large. Moreover, if 𝐼 = (𝑖1 , 𝑖2 , . . . , 𝑖𝑝 )
then introduce notation 𝑒𝐼 = 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 hence:
𝑛
∑ 1 ∑ 1 ∑
𝛼= 𝛼𝑖1 𝑖2 ...𝑖𝑝 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 = 𝛼𝐼 𝑒𝐼 = 𝛼𝐼 𝑒𝐼
𝑝! 𝑝!
𝑖1 ,𝑖2 ,...,𝑖𝑝 =1 𝐼 𝐼∈ℐ𝑝

and
𝑛
∑ 1 ∑ 1 ∑
𝛽= 𝛽𝑗1 𝑗2 ...𝑗𝑞 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞 = 𝛽𝐽 𝑒𝐽 = 𝛽𝐽 𝑒𝐽
𝑞! 𝑞!
𝑗1 ,𝑗2 ,...,𝑗𝑞 =1 𝐽 𝐽∈ℐ𝑞

Naturally, 𝑒𝐼 ∧ 𝑒𝐽 = 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 ∧ 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞 and we defined this carefully in the
preceding subsection. Define 𝛼 ∧ 𝛽 ∈ Λ𝑝+𝑞 𝑉 as follows:
∑∑ 1
𝛼∧𝛽 = 𝛼𝐼 𝛽𝐽 𝑒𝐼 ∧ 𝑒𝐽 .
𝑝!𝑞!
𝐼 𝐽

Again, but with less slick notation:


𝑛 𝑛
∑ ∑ 1
𝛼∧𝛽 = 𝛼𝑖 𝑖 ...𝑖 𝛽𝑗 𝑗 ...𝑗 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 ∧ 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞
𝑝!𝑞! 1 2 𝑝 1 2 𝑞
𝑖1 ,𝑖2 ,...,𝑖𝑝 =1 𝑗1 ,𝑗2 ,...,𝑗𝑞 =1

All the definition above really says is that we extend the wedge product on the basis to distribute
over the addition of dual vectors. What this means calculationally is that the wedge product obeys
the usual laws of addition and scalar multiplication. The one feature that is perhaps foreign is the
antisymmetry of the wedge product. We must take care to maintain the order of expressions since
the wedge product is not generally commutative.

Proposition 7.3.2.
Let 𝛼, 𝛽, 𝛾 be forms on 𝑉 and 𝑐 ∈ ℝ then

(𝑖) (𝛼 + 𝛽) ∧ 𝛾 = 𝛼 ∧ 𝛾 + 𝛽 ∧ 𝛾 distributes across vector addition


(𝑖𝑖) 𝛼 ∧ (𝛽 + 𝛾) = 𝛼 ∧ 𝛽 + 𝛼 ∧ 𝛾 distributes across vector addition
(𝑖𝑖𝑖) (𝑐𝛼) ∧ 𝛽 = 𝛼 ∧ (𝑐𝛽) = 𝑐 (𝛼 ∧ 𝛽) scalars factor out
(𝑖𝑣) 𝛼 ∧ (𝛽 ∧ 𝛾) = (𝛼 ∧ 𝛽) ∧ 𝛾 associativity

I leave the proof of this proposition to the reader.


7.3. WEDGE PRODUCT 171

Proposition 7.3.3. graded commutivity of homogeneous forms.


Let 𝛼, 𝛽 be forms on 𝑉 of degree 𝑝 and 𝑞 respectively then

𝛼 ∧ 𝛽 = −(−1)𝑝𝑞 𝛽 ∧ 𝛼

∑ 1 𝐼 ∑ 1 𝐽
Proof: suppose 𝛼 = 𝐼 𝑝! 𝑒 is a 𝑝-form on 𝑉 and 𝛽 = 𝐽 𝑞! 𝑒 is a 𝑞-form on 𝑉 . Calculate:
∑∑ 1
𝛼∧𝛽 = 𝛼𝐼 𝛽𝐽 𝑒𝐼 ∧ 𝑒𝐽 by defn. of ∧,
𝑝!𝑞!
𝐼 𝐽
∑∑ 1
= 𝛽𝐽 𝛼𝐼 𝑒𝐼 ∧ 𝑒𝐽 coefficients are scalars,
𝑝!𝑞!
𝐼 𝐽
∑∑ 1
= (−1)𝑝𝑞 𝛽𝐽 𝛼𝐼 𝑒𝐽 ∧ 𝑒𝐼 (details on sign given below)
𝑝!𝑞!
𝐼 𝐽
= (−1)𝑝𝑞 𝛽 ∧ 𝛼

Let’s expand in detail why 𝑒𝐽 ∧ 𝑒𝐼 = (−1)𝑝𝑞 𝑒𝐼 ∧ 𝑒𝐽 . Suppose 𝐼 = (𝑖1 , 𝑖2 , . . . , 𝑖𝑝 ) and 𝐽 =


(𝑗1 , 𝑗2 , . . . , 𝑗𝑞 ), the key is that every swap of dual vectors generates a sign:

𝑒𝐼 ∧ 𝑒𝐽 = 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝 ∧ 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞


= (−1)𝑞 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝−1 ∧ 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞 ∧ 𝑒𝑖𝑝
= (−1)𝑞 (−1)𝑞 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝−2 ∧ 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞 ∧ 𝑒𝑖𝑝−1 ∧ 𝑒𝑖𝑝
.. .. ..
. . .
= (−1)𝑞 (−1)𝑞 ⋅ ⋅ ⋅ (−1)𝑞 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑞 ∧ 𝑒𝑖1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝
| {z }
𝑝−𝑓 𝑎𝑐𝑡𝑜𝑟𝑠

= (−1) 𝑒 ∧ 𝑒𝐼 .
𝑝𝑞 𝐽

Example 7.3.4. Let 𝛼 be a 2-form defined by

𝛼 = 𝑎𝑒1 ∧ 𝑒2 + 𝑏𝑒2 ∧ 𝑒3

And let 𝛽 be a 1-form defined by


𝛽 = 3𝑒1
Consider then,
𝛼 ∧ 𝛽 = (𝑎𝑒1 ∧ 𝑒2 + 𝑏𝑒2 ∧ 𝑒3 ) ∧ (3𝑒1 )
= (3𝑎𝑒1 ∧ 𝑒2 ∧ 𝑒1 + 3𝑏𝑒2 ∧ 𝑒3 ∧ 𝑒1 (7.16)
= 3𝑏𝑒1 ∧ 𝑒2 ∧ 𝑒3 .
whereas,
𝛽 ∧ 𝛼 = 3𝑒1 ∧ (𝑎𝑒1 ∧ 𝑒2 + 𝑏𝑒2 ∧ 𝑒3 )
= (3𝑎𝑒1 ∧ 𝑒1 ∧ 𝑒2 + 3𝑏𝑒1 ∧ 𝑒2 ∧ 𝑒3 (7.17)
= 3𝑏𝑒1 ∧ 𝑒2 ∧ 𝑒3 .
172 CHAPTER 7. MULTILINEAR ALGEBRA

so this agrees with the proposition, (−1)𝑝𝑞 = (−1)2 = 1 so we should have found that 𝛼 ∧ 𝛽 = 𝛽 ∧ 𝛼.
This illustrates that although the wedge product is antisymmetric on the basis, it is not always
antisymmetric, in particular it is commutative for even forms.
The graded commutivity rule 𝛼 ∧ 𝛽 = −(−1)𝑝𝑞 𝛽 ∧ 𝛼 has some suprising implications. This rule is
ultimately the reason Λ𝑉 is finite dimensional. Let’s see how that happens.
Proposition 7.3.5. linear dependent one-forms wedge to zero:
If 𝛼, 𝛽 ∈ 𝑉 ∗ and 𝛼 = 𝑐𝛽 for some 𝑐 ∈ ℝ then 𝛼 ∧ 𝛽 = 0.

Proof: to begin, note that 𝛽 ∧ 𝛽 = −𝛽 ∧ 𝛽 hence 2𝛽 ∧ 𝛽 = 0 and it follows that 𝛽 ∧ 𝛽 = 0. Note:

𝛼 ∧ 𝛽 = 𝑐𝛽 ∧ 𝛽 = 𝑐(0) = 0

therefore the proposition is true. □


Proposition 7.3.6.

Suppose that 𝛼1 , 𝛼2 , . . . , 𝛼𝑝 are linearly dependent 1-forms then

𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝 = 0.

Proof: by assumption of linear dependence there exist constants 𝑐1 , 𝑐2 , . . . , 𝑐𝑝 not all zero such
that
𝑐1 𝛼1 + 𝑐2 𝛼2 + ⋅ ⋅ ⋅ 𝑐𝑝 𝛼𝑝 = 0.
Suppose that 𝑐𝑘 is a nonzero constant in the sum above, then we may divide by it and consequently
we can write 𝛼𝑘 in terms of all the other 1-forms,
( )
−1
𝛼𝑘 = 𝑐1 𝛼1 + ⋅ ⋅ ⋅ + 𝑐𝑘−1 𝛼𝑘−1 + 𝑐𝑘+1 𝛼𝑘+1 + ⋅ ⋅ ⋅ + 𝑐𝑝 𝛼𝑝
𝑐𝑘
Insert this sum into the wedge product in question,

𝛼1 ∧ 𝛼2 ∧ . . . ∧ 𝛼𝑝 = 𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑘 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝
= (−𝑐1 /𝑐𝑘 )𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼1 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝
+(−𝑐2 /𝑐𝑘 )𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝 + ⋅ ⋅ ⋅
+(−𝑐𝑘−1 /𝑐𝑘 )𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑘−1 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝 (7.18)
+(−𝑐𝑘+1 /𝑐𝑘 )𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑘+1 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝 + ⋅ ⋅ ⋅
+(−𝑐𝑝 /𝑐𝑘 )𝛼1 ∧ 𝛼2 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝 ∧ ⋅ ⋅ ⋅ ∧ 𝛼𝑝
= 0.

We know all the wedge products are zero in the above because in each there is at least one 1-form
repeated, we simply permute the wedge products till they are adjacent and by the previous propo-
sition the term vanishes. The proposition follows. □
7.3. WEDGE PRODUCT 173

Let us pause to reflect on the meaning of the proposition above for a 𝑛-dimensional vector space
𝑉 . The dual space 𝑉 ∗ is likewise 𝑛-dimensional, this is a general result which applies to all finite-
dimensional vector spaces15 . Thus, any set of more than 𝑛 dual vectors is necessarily linearly
dependent. Consquently, using the proposition above, we find the wedge product of more than 𝑛
one-forms is trivial. Therefore, while it is possible to construct Λ𝑘 𝑉 for 𝑘 > 𝑛 we should understand
that this space only contains zero. The highest degree of a nontrivial form over a vector space of
dimension 𝑛 is an 𝑛-form.

Moreover, we can use the proposition to deduce the dimension of a basis for Λ𝑝 𝑉 , it must consist
of the wedge product of distinct linearly independent one-forms. The number of ways to choose 𝑝
distinct objects from a list of 𝑛 distinct objects is precisely ”n choose p”,
( )
𝑛 𝑛!
= for 0 ≤ 𝑝 ≤ 𝑛. (7.19)
𝑝 (𝑛 − 𝑝)!𝑝!

Proposition 7.3.7.

If 𝑉 is an 𝑛-dimensional vector space then Λ𝑘 𝑉 is an 𝑛𝑝 -dimensional vector space of 𝑝-


( )

forms. Moreover, the direct sum of all forms over 𝑉 has the structure

Λ𝑉 = ℝ ⊕ Λ1 𝑉 ⊕ ⋅ ⋅ ⋅ Λ𝑛−1 𝑉 ⊕ Λ𝑛 𝑉

and is a vector space of dimension 2𝑛


Proof: define Λ0 𝑉 = ℝ then it is clear Λ𝑘 𝑉 forms a vector space for 𝑘 = 0, 1, . . . , 𝑛. Moreover,
Λ𝑗 𝑉 ∩ Λ𝑘 𝑉 = {0} for 𝑗 ∕= 𝑘 hence the term ”direct sum” is appropriate. It remains to show
𝑑𝑖𝑚(Λ𝑉 ) = 2𝑛 where 𝑑𝑖𝑚(𝑉 ) = 𝑛. A natural basis 𝛽 for Λ𝑉 is found from taking the union of the
bases for each subspace of 𝑘-forms,

𝛽 = {1, 𝑒𝑖1 , 𝑒𝑖1 ∧ 𝑒𝑖2 , . . . , 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑛 ∣ 1 ≤ 𝑖1 < 𝑖2 < ⋅ ⋅ ⋅ < 𝑖𝑛 ≤ 𝑛}

But, we can count the number of vectors 𝑁 in the set above as follows:
( ) ( ) ( )
𝑛 𝑛 𝑛
𝑁 =1+𝑛+ + ⋅⋅⋅ + +
2 𝑛−1 𝑛

Recall the binomial theorem states


𝑛 ( )
𝑛
∑ 𝑛 𝑛−𝑘 𝑘
(𝑎 + 𝑏) = 𝑎 𝑏 = 𝑎𝑛 + 𝑛𝑎𝑛−1 𝑏 + ⋅ ⋅ ⋅ + 𝑛𝑎𝑏𝑛−1 + 𝑏𝑛 .
𝑘
𝑘=0

Recognize that 𝑁 = (1 + 1)𝑛 and the proposition follows. □

15
however, in infinite dimensions, the story is not so simple
174 CHAPTER 7. MULTILINEAR ALGEBRA

We should note that in the basis above the space of 𝑛-forms is one-dimensional because there is
only one way to choose a strictly increasing list of 𝑛 integers in ℕ𝑛 . In particular, it is useful to note
Λ𝑛 𝑉 = 𝑠𝑝𝑎𝑛{𝑒1 ∧ 𝑒2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑛 }. The form 𝑒1 ∧ 𝑒2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑛 is sometimes called the the top-form16 .

Example 7.3.8. exterior algebra of ℝ2 Let us begin with the standard dual basis {𝑒1 , 𝑒2 }. By
definition we take the 𝑝 = 0 case to be the field itself; Λ0 𝑉 ≡ ℝ, it has basis 1. Next, Λ1 𝑉 =
𝑠𝑝𝑎𝑛(𝑒1 , 𝑒2 ) = 𝑉 ∗ and Λ2 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 ) is all we can do here. This makes Λ𝑉 a 22 = 4-
dimensional vector space with basis
{1, 𝑒1 , 𝑒2 , 𝑒1 ∧ 𝑒2 }.

Example 7.3.9. exterior algebra of ℝ3 Let us begin with the standard dual basis {𝑒1 , 𝑒2 , 𝑒3 }.
By definition we take the 𝑝 = 0 case to be the field itself; Λ0 𝑉 ≡ ℝ, it has basis 1. Next, Λ1 𝑉 =
𝑠𝑝𝑎𝑛(𝑒1 , 𝑒2 , 𝑒3 ) = 𝑉 ∗ . Now for something a little more interesting,

Λ2 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 , 𝑒1 ∧ 𝑒3 , 𝑒2 ∧ 𝑒3 )

and finally,
Λ3 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 ∧ 𝑒3 ).
This makes Λ𝑉 a 23 = 8-dimensional vector space with basis

{1, 𝑒1 , 𝑒2 , 𝑒3 , 𝑒1 ∧ 𝑒2 , 𝑒1 ∧ 𝑒3 , 𝑒2 ∧ 𝑒3 , 𝑒1 ∧ 𝑒2 ∧ 𝑒3 }

it is curious that the number of independent one-forms and 2-forms are equal.

Example 7.3.10. exterior algebra of ℝ4 Let us begin with the standard dual basis {𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 }.
By definition we take the 𝑝 = 0 case to be the field itself; Λ0 𝑉 ≡ ℝ, it has basis 1. Next, Λ1 𝑉 =
𝑠𝑝𝑎𝑛(𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 ) = 𝑉 ∗ . Now for something a little more interesting,

Λ2 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 , 𝑒1 ∧ 𝑒3 , 𝑒1 ∧ 𝑒4 , 𝑒2 ∧ 𝑒3 , 𝑒2 ∧ 𝑒4 , 𝑒3 ∧ 𝑒4 )

and three forms,

Λ3 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 ∧ 𝑒3 , 𝑒1 ∧ 𝑒2 ∧ 𝑒4 , 𝑒1 ∧ 𝑒3 ∧ 𝑒4 , 𝑒2 ∧ 𝑒3 ∧ 𝑒4 ).

and Λ3 𝑉 = 𝑠𝑝𝑎𝑛(𝑒1 ∧ 𝑒2 ∧ 𝑒3 ). Thus Λ𝑉 a 24 = 16-dimensional vector space. Note that, in contrast


to ℝ3 , we do not have the same number of independent one-forms and two-forms over ℝ4 .

Let’s explore how this algebra fits with calculations we already know about determinants.

Example 7.3.11. Suppose 𝐴 = [𝐴1 ∣𝐴2 ]. I propose the determinant[ of 𝐴 is] given by the top-form
𝑎 𝑏
on ℝ2 via the formula 𝑑𝑒𝑡(𝐴) = (𝑒1 ∧ 𝑒2 )(𝐴1 , 𝐴2 ). Suppose 𝐴 = then 𝐴1 = (𝑎, 𝑐) and
𝑐 𝑑
16
or volume form for reasons we will explain later, other authors begin the discussion of forms from the consideration
of volume, see Chapter 4 in Bernard Schutz’ Geometrical methods of mathematical physics
7.3. WEDGE PRODUCT 175

𝐴2 = (𝑏, 𝑑). Thus,


[ ]
𝑎 𝑏
𝑑𝑒𝑡 = (𝑒1 ∧ 𝑒2 )(𝐴1 , 𝐴2 )
𝑐 𝑑
= (𝑒1 ⊗ 𝑒2 − 𝑒2 ⊗ 𝑒1 )((𝑎, 𝑐), (𝑏, 𝑑))
= 𝑒1 (𝑎, 𝑐)𝑒2 (𝑏, 𝑑) − 𝑒2 (𝑎, 𝑐)𝑒1 (𝑏, 𝑑)
= 𝑎𝑑 − 𝑏𝑐.

I hope this is not surprising!

Example 7.3.12. Suppose 𝐴 = [𝐴1 ∣𝐴2 ∣𝐴3 ]. I propose the determinant of 𝐴 is given by the top-
form on ℝ3 via the formula 𝑑𝑒𝑡(𝐴) = (𝑒1 ∧𝑒2 ∧𝑒3 )(𝐴1 , 𝐴2 , 𝐴3 ). Let’s see if we can find the expansion
by cofactors. By the definition we have 𝑒1 ∧ 𝑒2 ∧ 𝑒3 =

= 𝑒1 ⊗ 𝑒2 ⊗ 𝑒3 + 𝑒2 ⊗ 𝑒3 ⊗ 𝑒1 + 𝑒3 ⊗ 𝑒1 ⊗ 𝑒2 − 𝑒3 ⊗ 𝑒2 ⊗ 𝑒1 − 𝑒2 ⊗ 𝑒1 ⊗ 𝑒3 − 𝑒1 ⊗ 𝑒3 ⊗ 𝑒2
= 𝑒1 ⊗ (𝑒2 ⊗ 𝑒3 − 𝑒3 ⊗ 𝑒2 ) − 𝑒2 ⊗ (𝑒1 ⊗ 𝑒3 − 𝑒3 ⊗ 𝑒1 ) + 𝑒3 ⊗ (𝑒1 ⊗ 𝑒2 − 𝑒2 ⊗ 𝑒1 )
= 𝑒1 ⊗ (𝑒2 ∧ 𝑒3 ) − 𝑒2 ⊗ (𝑒1 ∧ 𝑒3 ) + 𝑒3 ⊗ (𝑒1 ∧ 𝑒2 ).

I submit to the reader that this


⎡ is precisely
⎤ the cofactor expansion formula with respect to the first
𝑎 𝑏 𝑐
column of 𝐴. Suppose 𝐴 = ⎣ 𝑑 𝑒 𝑓 ⎦ then 𝐴1 = (𝑎, 𝑑, 𝑔), 𝐴2 = (𝑏, 𝑒, ℎ) and 𝐴3 = (𝑐, 𝑓, 𝑖).
𝑔 ℎ 𝑖
Calculate,

𝑑𝑒𝑡(𝐴) = 𝑒1 (𝐴1 )(𝑒2 ∧ 𝑒3 )(𝐴2 , 𝐴3 ) − 𝑒2 (𝐴1 )(𝑒1 ∧ 𝑒3 )(𝐴2 , 𝐴3 ) + 𝑒3 (𝐴1 )(𝑒1 ∧ 𝑤2 )(𝐴2 , 𝐴3 )
= 𝑎(𝑒2 ∧ 𝑒3 )(𝐴2 , 𝐴3 ) − 𝑑(𝑒1 ∧ 𝑒3 )(𝐴2 , 𝐴3 ) + 𝑔(𝑒1 ∧ 𝑤2 )(𝐴2 , 𝐴3 )
= 𝑎(𝑒𝑖 − 𝑓 ℎ) − 𝑑(𝑏𝑖 − 𝑐ℎ) + 𝑔(𝑏𝑓 − 𝑐𝑒)

which is precisely my claim.

7.3.3 connecting vectors and forms in ℝ3


There are a couple ways to connect vectors and forms in ℝ3 . Mainly we need the following maps:

Definition 7.3.13.

Given 𝑣 =< 𝑎, 𝑏, 𝑐 >∈ ℝ3 we can construct a corresponding one-form 𝜔𝑣 = 𝑎𝑒1 + 𝑏𝑒2 + 𝑐𝑒3
or we can construct a corresponding two-form Φ𝑣 = 𝑎𝑒2 ∧ 𝑒3 + 𝑏𝑒3 ∧ 𝑒1 + 𝑐𝑒1 ∧ 𝑒2
Recall that 𝑑𝑖𝑚(Λ1 ℝ3 ) = 𝑑𝑖𝑚(Λ2 ℝ3 ) = 3 hence the space of vectors, one-forms, and also two-
forms are isomorphic as vector spaces. It is not difficult to show that 𝜔𝑣1 +𝑐𝑣2 = 𝜔𝑣1 + 𝑐𝜔𝑣2 and
Φ𝑣1 +𝑐𝑣2 = Φ𝑣1 + 𝑐Φ𝑣2 for all 𝑣1 , 𝑣2 ∈ ℝ3 and 𝑐 ∈ ℝ. Moreover, 𝜔𝑣 = 0 iff 𝑣 = 0 and Φ𝑣 = 0 iff
𝑣 = 0 hence 𝑘𝑒𝑟(𝜔) = {0} and 𝑘𝑒𝑟(Φ) = {0} but this means that 𝜔 and Φ are injective and since
176 CHAPTER 7. MULTILINEAR ALGEBRA

the dimensions of the domain and codomain are 3 and these are linear transformations17 it follows
𝜔 and Φ are isomorphisms.

It appears we have two ways to represent vectors with forms in ℝ3 . We’ll see why this is important
as we study integration of forms. It turns out the two-forms go with surfaces whereas the one-
forms attach to curves. This corresponds to the fact in calculus III we have two ways to integrate
a vector-field, we can either calculate flux or work. Partly for this reason the mapping 𝜔 is called
the work-form correspondence and Φ is called the flux-form correspondence. Integration
has to wait a bit, for now we focus on algebra.
Example 7.3.14. Suppose 𝑣 =< 2, 0, 3 > and 𝑤 =< 0, 1, 2 > then 𝜔𝑣 = 2𝑒1 +3𝑒3 and 𝜔𝑤 = 𝑒2 +2𝑒3 .
Calculate the wedge product,

𝜔𝑣 ∧ 𝜔𝑤 = (2𝑒1 + 3𝑒3 ) ∧ (𝑒2 + 2𝑒3 )


= 2𝑒1 ∧ (𝑒2 + 2𝑒3 ) + 3𝑒3 ∧ (𝑒2 + 2𝑒3 )
= 2𝑒1 ∧ 𝑒2 + 4𝑒1 ∧ 𝑒3 + 3𝑒3 ∧ 𝑒2 + 6𝑒3 ∧ 𝑒3
= −3𝑒2 ∧ 𝑒3 − 4𝑒3 ∧ 𝑒1 + 2𝑒1 ∧ 𝑒2
= Φ<−3,−4,2>
= Φ𝑣×𝑤 (7.20)

Coincidence? Nope.

Proposition 7.3.15.

Suppose 𝑣, 𝑤 ∈ ℝ3 then
∑ 𝜔𝑣 ∧ 𝜔𝑤 = Φ𝑣×𝑤 where 𝑣 × 𝑤 denotes the cross-product which is
defined by 𝑣 × 𝑤 = 3𝑖,𝑗,𝑘=1 𝜖𝑖𝑗𝑘 𝑣𝑖 𝑤𝑗 𝑒𝑘 .

Proof: Suppose 𝑣 = 3𝑖=1 𝑣𝑖 𝑒𝑖 and 𝑤 = 3𝑗=1 𝑤𝑗 𝑒𝑗 then 𝜔𝑣 = 3𝑖=1 𝑣𝑖 𝑒𝑖 and 𝜔𝑤 = 3𝑗=1 𝑤𝑗 𝑒𝑗 .


∑ ∑ ∑ ∑
Calculate,
(∑ 3 ) (∑3 3 ∑
) ∑ 3
𝑖 𝑗
𝜔𝑣 ∧ 𝜔𝑤 = 𝑣𝑖 𝑒 ∧ 𝑤𝑗 𝑒 = 𝑣 𝑖 𝑤𝑗 𝑒 𝑖 ∧ 𝑒 𝑗
𝑖=1 𝑗=1 𝑖=1 𝑗=1
∑3
In invite the reader to show 𝑒𝑖 ∧ 𝑒𝑗 = Φ( 𝑘=1 𝜖𝑖𝑗𝑘 𝑒𝑘 ) where I’m using Φ𝑣 = Φ(𝑣) to make the
argument of the flux-form mapping easier to read, hence,
3 ∑
∑ 3 ∑3 3

𝜔𝑣 ∧ 𝜔𝑤 = 𝑣𝑖 𝑤𝑗 Φ( 𝜖𝑖𝑗𝑘 𝑒𝑘 ) = Φ( 𝑣𝑖 𝑤𝑗 𝜖𝑖𝑗𝑘 𝑒𝑘 ) = Φ(𝑣 × 𝑤) = Φ𝑣×𝑤
𝑖=1 𝑗=1 𝑘=1 𝑖,𝑗,𝑘=1
| {z }
𝑙𝑖𝑛𝑒𝑎𝑟𝑖𝑡𝑦 𝑜𝑓 Φ

Of course, if you don’t like my proof you could just work it out like the example that precedes this
proposition. I gave the proof to show off the mappings a bit more. □

17
this is not generally true, note 𝑓 (𝑥) = 𝑥2 has 𝑓 (𝑥) = 0 iff 𝑥 = 0 and yet 𝑓 is not injective. The linearity is key.
7.4. BILINEAR FORMS AND GEOMETRY; METRIC DUALITY 177

Is the wedge product just the cross-product generalized? Well, not really. I think they’re quite
different animals. The wedge product is an associative product which makes sense in any vector
space. The cross-product only matches the wedge product after we interpret it through a pair of
isomorphisms (𝜔 and 𝜙) which are special to ℝ3 . However, there is debate, largely the question
comes down to what you think makes the cross-product the cross-product. If you think it must
pick a unique perpendicular direction to a pair of given directions then that is only going to work
in ℝ3 since even in ℝ4 there is a whole plane of perpendicular vectors to a given pair. On the other
hand, if you think the cross-product in ℝ4 should be pick the unique perpendicular to a given triple
of vectors then you could set something up. You could define 𝑣 × 𝑤 × 𝑥 = 𝜔 −1 (𝜓(𝜔𝑣 ∧ 𝜔𝑤 ∧ 𝜔𝑥 ))
where 𝜓 : Λ3 ℝ4 → Λ1 ℝ4 is an isomorphism we’ll describe in a upcoming section. But, you see it’s
no longer a product of two vectors, it’s not a binary operation, it’s a tertiary operation. In any
event, you can read a lot more on this if you wish. We have all the tools we need for this course.
The wedge product provides the natural antisymmetric algebra for 𝑛-dimensiona and the work and
flux-form maps naturally connect us to the special world of three-dimensional mathematics.

There is more algebra for forms on ℝ3 however we defer it to a later section where we have a few
more tools. Chief among those is the Hodge dual. But, before we can discuss Hodge duality we
need to generalize our idea of a dot-product just a little.

7.4 bilinear forms and geometry; metric duality


The concept of a metric goes beyond the familar case of the dot-product. If you want a more
strict generalization of the dot-product then you should think about an inner-product. In contrast
to the definition below, the inner-product replaces non-degeneracy with the stricter condition of
positive-definite which would read 𝑔(𝑥, 𝑥) > 0 for 𝑥 ∕= 0. I included a discussion of inner products
at the end of this section for the interested reader although we are probably not going to need all
of that material.

7.4.1 metric geometry


A geometry is a vector space paired with a metric. For example, if we pair ℝ𝑛 with the dot-
product you get Euclidean space. However, if we pair ℝ4 with the Minkowski metric then we
obtain Minkowski space.
Definition 7.4.1.
If 𝑉 is a vector space and 𝑔 : 𝑉 × 𝑉 → ℝ is

1. bilinear: 𝑔 ∈ 𝑇20 𝑉 ,

2. symmetric: 𝑔(𝑥, 𝑦) = 𝑔(𝑦, 𝑥) for all 𝑥, 𝑦 ∈ 𝑉 ,

3. nondegenerate: 𝑔(𝑥, 𝑦) = 0 for all 𝑥 ∈ 𝑉 implies 𝑦 = 0.

the we call 𝑔 a metric on 𝑉 .


178 CHAPTER 7. MULTILINEAR ALGEBRA

If 𝑉 = ℝ𝑛 then we can write 𝑔(𝑥, 𝑦) = 𝑥𝑇 𝐺𝑦 where [𝑔] = 𝐺. Moreover, 𝑔(𝑥, 𝑦) = 𝑔(𝑦, 𝑥) implies
𝐺𝑇 = 𝐺. Nondegenerate means that 𝑔(𝑥, 𝑦) = 0 for all 𝑦 ∈ ℝ𝑛 iff 𝑥 = 0. It follows that 𝐺𝑦 = 0
has no non-trivial solutions hence 𝐺−1 exists.

Example 7.4.2. Suppose 𝑔(𝑥, 𝑦) = 𝑥𝑇 𝑦 for all 𝑥, 𝑦 ∈ ℝ𝑛 . This defines a metric for ℝ𝑛 , it is just
the dot-product. Note that 𝑔(𝑥, 𝑦) = 𝑥𝑇 𝑦 = 𝑥𝑇 𝐼𝑦 hence we see [𝑔] = 𝐼 where 𝐼 denotes the identity
matrix in ℝ 𝑛×𝑛 .

Example 7.4.3. Suppose 𝑣 = (𝑣 0 , 𝑣 1 , 𝑣 2 , 𝑣 3 ), 𝑤 = (𝑤0 , 𝑤1 , 𝑤2 , 𝑤3 ) ∈ ℝ4 then define the Minkowski


product of 𝑣 and 𝑤 as follows:

𝑔(𝑣, 𝑤) = −𝑣 0 𝑤0 + 𝑣 1 𝑤1 + 𝑣 2 𝑤2 + 𝑣 3 𝑤3

It is useful to write the Minkowski product in terms of a matrix multiplication. Observe that for
𝑥, 𝑦 ∈ ℝ4 ,
⎛ ⎞ ⎛ 0⎞
−1 0 0 0 𝑦
0 1 0 0 1⎟
𝑔(𝑥, 𝑦) = −𝑥0 𝑦 0 + 𝑥1 𝑦 1 + 𝑥2 𝑦 2 + 𝑥3 𝑦 3 = 𝑥0 𝑥1 𝑥2 𝑥3 ⎜ ⎟ ⎜𝑦 ⎟ ≡ 𝑥𝑡 𝜂𝑦
( ) ⎜ ⎟ ⎜
⎝ 0 0 1 0⎠ ⎝𝑦 2 ⎠
0 0 0 1 𝑦3

where we have introduced 𝜂 the matrix of the Minkowski product. Notice that 𝜂 𝑇 = 𝜂 and 𝑑𝑒𝑡(𝜂) =
−1 ∕= 0 hence 𝑔(𝑥, 𝑦) = 𝑥𝑡 𝜂𝑦 makes 𝑔 a symmetric, nondegenerate bilinear form on ℝ4 . The
formula is clearly related to the dot-product. Suppose 𝑣¯ = (𝑣 0 , ⃗𝑣 ) and 𝑤
¯ = (𝑤0 , 𝑤)
⃗ then note

𝑔(𝑣, 𝑤) = −𝑣 0 𝑤0 + ⃗𝑣 ⋅ 𝑤

For vectors with zero in the zeroth slot this Minkowski product reduces to the dot-product. However,
for vectors which have nonzero entries in both the zeroth and later slots much differs. Recall that
any vector’s dot-product with itself gives the square of the vectors length. Of course this means that
⃗𝑥 ⋅ ⃗𝑥 = 0 iff ⃗𝑥 = 0. Contrast that with the following: if 𝑣 = (1, 1, 0, 0) then

𝑔(𝑣, 𝑣) = −1 + 1 = 0

Yet 𝑣 ∕= 0. Why study such a strange generalization of length? The answer lies in physics. I’ll give
you a brief account by defining a few terms: Let 𝑣 = (𝑣 0 , 𝑣 1 , 𝑣 2 , 𝑣 3 ) ∈ ℝ4 then we say

1. 𝑣 is a timelike vector if < 𝑣, 𝑣 > < 0

2. 𝑣 is a lightlike vector if < 𝑣, 𝑣 > = 0

3. 𝑣 is a spacelike vector if < 𝑣, 𝑣 > > 0


7.4. BILINEAR FORMS AND GEOMETRY; METRIC DUALITY 179

If we consider the trajectory of a massive particle in ℝ4 that begins at the origin then at any later
time the trajectory will be located at a timelike vector. If we consider a light beam emitted from
the origin then at any future time it will located at the tip of a lightlike vector. Finally, spacelike
vectors point to points in ℝ4 which cannot be reached by the motion of physical particles that pass
throughout the origin. We say that massive particles are confined within their light cones, this
means that they are always located at timelike vectors relative to their current position in space
time. If you’d like to know more I can reccomend a few books.

At this point you might wonder if there are other types of metrics beyond these two examples.
Surprisingly, in a certain sense, no. A rather old theorem of linear algebra due to Sylvester states
that we can change coordinates so that the metric more or less resembles either the dot-product or
something like it with some sign-flips. We’ll return to this in a later section.

7.4.2 metric duality for tensors


Throughout this section we consider a vector space 𝑉 paired with a metric 𝑔 : 𝑉 × 𝑉 → ℝ.
Moreover, the vector space 𝑉 has basis {𝑒𝑖 }𝑛𝑖=1 which has a 𝑔-dual basis {𝑒𝑖 }𝑛𝑖=1 . Up to this point
we always have used a 𝑔-dual basis where the duality was offered by the dot-product. In the
context of Minkowski geometry that sort of duality is no longer natural. Instead we must follow
the definition below:

Definition 7.4.4.

If 𝑉 is a vector space with metric 𝑔 and basis {𝑒𝑖 }𝑛𝑖=1 then we say the basis {𝑒𝑖 }𝑛𝑖=1 is 𝑔-dual
iff
Suppose 𝑒𝑖 (𝑒𝑗 ) = 𝛿𝑖𝑗 and consider 𝑔 = 𝑛𝑖,𝑗=1 𝑔𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗 . Furthermore, suppose 𝑔 𝑖𝑗 are the com-

ponents of the inverse matrix to (𝑔𝑖𝑗 ) this means that 𝑛𝑘=1 𝑔𝑖𝑘 𝑔 𝑘𝑗 = 𝛿𝑖𝑗 . We use the components

of the metric and its inverse to raise and lower indices on tensors. Here are the basic conven-
tions: given an object 𝐴𝑗 which has the contravariant index 𝑗 we can lower it to be covariant by
contracting against the metric components as follows:

𝐴𝑖 = 𝑔𝑖𝑗 𝐴𝑗
𝑗

On the other hand, given an object 𝐵𝑗 which has a covariant index 𝑗 we can raise it to be con-
travariant by contracting against the inverse components of the metric:

𝐵𝑖 = 𝑔 𝑖𝑗 𝐵𝑗
𝑗

I like to think of this as some sort


∑of conservation of indices. Strict adherence to the notation
drives us to write things such as 𝑛𝑘=1 𝑔𝑖𝑘 𝑔 𝑘𝑗 = 𝛿𝑖𝑗 just to keep up the up/down index pattern.
I should mention that these formulas are much more beautiful in the physics literature, you can
180 CHAPTER 7. MULTILINEAR ALGEBRA

look at my old Math 430 notes from NCSU if you’d like a healthy dose of that notation18 . I use
Einstein’s implicit summation notation throughout those notes and I discuss this index calculation
more in the way a physicist typically approaches it. Here I am trying to be careful enough that
these equations are useful to mathematicians. Let me show you some examples:

Example
∑4 7.4.5. Specialize for this example to 𝑉 = ℝ4 with 𝑔(𝑥, 𝑦) = 𝑥𝑇 𝜂𝑦. Suppose 𝑥 =
𝜇 𝜇
𝜇=0 𝑥 𝑒𝜇 the components 𝑥 are called contravariant components. The metric allows us
to define covariant components by
4

𝑥𝜈 = 𝜂𝜈𝜇 𝑥𝜇 .
𝜇=0

For the minkowski metric this just adjoins a minus to the zeroth component: if (𝑥𝜇 ) = (𝑎, 𝑏, 𝑐, 𝑑)
then 𝑥𝜇 = (−𝑎, 𝑏, 𝑐, 𝑑).

Example 7.4.6. Suppose we are working on ℝ𝑛 with the Euclidean metric 𝑔𝑖𝑗 = 𝛿𝑖𝑗 and it follows
𝑖𝑗 = 𝛿 𝑘𝑗 = 𝛿 𝑗 . In this case 𝑣 𝑖 =
∑ ∑ 𝑖𝑗
that
∑ 𝑔 𝑖𝑗 or to be a purist for a moment 𝑘 𝑔𝑖𝑘 𝑔 𝑖 𝑗 𝑔 𝑣𝑗 =
𝑗 𝛿𝑖𝑗 𝑣𝑗 = 𝑣𝑖 . The covariant and contravariant components are the same. This is why is was ok
to ignore up/down indices when we work with a dot-product exclusively.

What if we raise an index and the lower it back∑down once more? Do we really get back where we
started? Given 𝑥𝜇 we lower the index by 𝑥𝜈 = 𝜇 𝑔𝜈𝜇 𝑥𝜇 then we raise it once more by
∑ ∑ ∑ ∑ ∑
𝑥𝛼 = 𝑔 𝛼𝜈 𝑥𝜈 = 𝑔 𝛼𝜈 𝑔𝜈𝜇 𝑥𝜇 = 𝑔 𝛼𝜈 𝑔𝜈𝜇 𝑥𝜇 = 𝛿𝜇𝛼 𝑥𝜇
𝜈 𝜈 𝜇 𝜇,𝜈 𝜇

and the last summation squishes down to 𝑥𝛼 once more. It would seem this procedure of raising
and lowering indices is at least consistent.

Example 7.4.7. Suppose we raise the index on the basis {𝑒𝑖 } and formally obtain {𝑒𝑗 = 𝑘 𝑔 𝑗𝑘 𝑒𝑘 }

on
∑ the other hand suppose we lower the index on the dual basis {𝑒𝑙 } to formally obtain {𝑒𝑚 =
𝑙 𝑗 𝑗
𝑙 𝑔𝑚𝑙 𝑒 }. I’m curious, are these consistent? We should get 𝑒 (𝑒𝑚 ) = 𝛿𝑚 , I’ll be nice an look at
𝑗
𝑒𝑚 (𝑒 ) in the following sense:
∑ (∑ ) ∑ ∑ ∑
𝑙 𝑗𝑘
𝑔𝑚𝑙 𝑒 𝑔 𝑒𝑘 = 𝑔𝑚𝑙 𝑔 𝑗𝑘 𝑒𝑙 (𝑒𝑘 ) = 𝑔𝑚𝑙 𝑔 𝑗𝑘 𝛿𝑘𝑙 = 𝑔𝑚𝑘 𝑔 𝑗𝑘 = 𝛿𝑚
𝑗

𝑙 𝑘 𝑙,𝑘 𝑙,𝑘 𝑘

Interesting, but what does it mean?

I used the term formal in the preceding example to mean that the example makes sense in as much
as you accept the equations which are written. If you think harder about it then you’ll find it was
rather meaningless. That said, this index notation is rather forgiving.

18
just a taste: 𝑣𝜇 = 𝜂𝜇𝜈 𝑣 𝜈 or 𝑣 𝜇 = 𝜂 𝜇𝜈 𝑣𝜈 or 𝑣 𝜇 𝑣𝜇 = 𝜂 𝜇𝜈 𝑣𝜈 𝑣𝜇 = 𝜂𝜇𝜈 𝑣 𝜇 𝑣 𝜈
7.4. BILINEAR FORMS AND GEOMETRY; METRIC DUALITY 181

Ok, but what are we doing? Recall that I insisted on using lower indices for forms and upper
indices for vectors? The index conventions I’m toying with above are the reason for this strange
notation. When we lower an index we might be changing a vector to a dual vector, or vice-versa
when we raise an index we might be changing a dual vector into a vector. Let me be explicit.
1. given 𝑣 ∈ 𝑉 we create 𝛼𝑣 ∈ 𝑉 ∗ by the rule 𝛼𝑣 (𝑥) = 𝑔(𝑥, 𝑣).

2. given 𝛼 ∈ 𝑉 ∗ we create 𝑣𝛼 ∈ 𝑉 ∗∗ by the rule 𝑣𝛼 (𝛽) = 𝑔 −1 (𝛼, 𝛽) where 𝑔 −1 (𝛼, 𝛽) = 𝛼𝑖 𝛽𝑗 𝑔 𝑖𝑗 .



𝑖𝑗

Recall we at times identify 𝑉 and 𝑉 ∗∗ . Let’s work out the component structure of 𝛼𝑣 and see how
it relates to 𝑣, ∑ ∑ ∑
𝛼𝑣 (𝑒𝑖 ) = 𝑔(𝑣, 𝑒𝑖 ) = 𝑔( 𝑣 𝑗 𝑒𝑗 , 𝑒𝑖 ) = 𝑣 𝑗 𝑔(𝑒𝑗 , 𝑒𝑖 ) = 𝑣 𝑗 𝑔𝑗𝑖
𝑗 𝑗 𝑗
𝑖 𝑗
∑ ∑
Thus, 𝛼𝑣 = 𝑖 𝑣𝑖 𝑒 where 𝑣𝑖 = 𝑗 𝑣 𝑔𝑗𝑖 . When we lower the index we’re actually using an
isomorphism which is provided by the metric to map vectors to forms. The process of raising the
index is just the inverse of this isomorphism.
∑ ∑
𝑣𝛼 (𝑒𝑖 ) = 𝑔 −1 (𝛼, 𝑒𝑖 ) = 𝑔 −1 ( 𝛼𝑗 𝑒𝑗 , 𝑒𝑖 ) = 𝛼𝑗 𝑔 𝑗𝑖
𝑗 𝑗

𝑖𝑒 where 𝛼𝑖 = 𝛼𝑗 𝑔 𝑗𝑖 .
∑ ∑
thus 𝑣𝛼 = 𝑖𝛼 𝑖 𝑗

want to change a type (0, 2) tensor to a type (2, 0) tensor. We’re given 𝑇 : 𝑉 ∗ × 𝑉 ∗
Suppose we∑
where 𝑇 = 𝑖𝑗 𝑇 𝑖𝑗 𝑒𝑖 ⊗ 𝑒𝑗 . Define 𝑇˜ : 𝑉 × 𝑉 → ℝ as follows:

𝑇˜(𝑣, 𝑤) = 𝑇 (𝛼𝑣 , 𝛼𝑤 )

What does this look like in components? Note 𝛼𝑒𝑖 (𝑒𝑗 ) = 𝑔(𝑒𝑖 , 𝑒𝑗 ) = 𝑔𝑖𝑗 hence 𝛼𝑒𝑖 = 𝑗 𝑔𝑖𝑗 𝑒𝑗 and

(∑ ∑ ) ∑ ∑
𝑇˜(𝑒𝑖 , 𝑒𝑗 ) = 𝑇 (𝛼𝑒𝑖 , 𝛼𝑒𝑗 ) = 𝑇 𝑔𝑖𝑘 𝑒𝑘 , 𝑔𝑗𝑙 𝑒𝑙 = 𝑔𝑘𝑖 𝑔𝑙𝑗 𝑇 (𝑒𝑘 , 𝑒𝑙 ) = 𝑔𝑘𝑖 𝑔𝑙𝑗 𝑇 𝑘𝑙
𝑘 𝑙 𝑘,𝑙 𝑘,𝑙

Or, as is often customary, we could write 𝑇𝑖𝑗 = 𝑘,𝑙 𝑔𝑖𝑘 𝑔𝑗𝑙 𝑇 𝑘𝑙 . However, this is an abuse of notation

since 𝑇𝑖𝑗 are not technically components for 𝑇 . If we have a metric we can recover either 𝑇 from 𝑇˜
or vice-versa. Generally, if we are given two tensors, say 𝑇1 of rank (𝑟, 𝑠) and the 𝑇2 of rank (𝑟′ , 𝑠′ ),
then these might be equilvalent if 𝑟 + 𝑠 = 𝑟′ + 𝑠′ . It may be that through raising and lowering
indices (a.k.a. appropriately composing with the vector↔dual vector isomorphisms) we can convert
𝑇1 to 𝑇2 . If you read Gravitation by Misner, Thorne and Wheeler you’ll find many more thoughts
on this equivalence. Challenge: can you find the explicit formulas like 𝑇˜(𝑣, 𝑤) = 𝑇 (𝛼𝑣 , 𝛼𝑤 ) which
back up the index calculations below?
∑ ∑
𝑇𝑖𝑗 𝑘 = 𝑔𝑖𝑎 𝑔𝑗𝑏 𝑇 𝑎𝑏𝑘 or 𝑆 𝑖𝑗 = 𝑔 𝑖𝑎 𝑔 𝑗𝑏 𝑆𝑎𝑏
𝑎,𝑏 𝑎,𝑏

I hope I’ve given you enough to chew on in this section to put these together.
182 CHAPTER 7. MULTILINEAR ALGEBRA

7.4.3 inner products and induced norm


There are generalized dot-products on many abstract vector spaces, we call them inner-products.

Definition 7.4.8.
Suppose 𝑉 is a vector space. If <, >: 𝑉 × 𝑉 → ℝ is a function such that for all 𝑥, 𝑦, 𝑧 ∈ 𝑉
and 𝑐 ∈ ℝ:

1. < 𝑥, 𝑦 >=< 𝑦, 𝑥 > (symmetric)

2. < 𝑥 + 𝑦, 𝑧 >=< 𝑥, 𝑧 > + < 𝑦, 𝑧 > (additive in the first slot)

3. < 𝑐𝑥, 𝑦 >= 𝑐 < 𝑥, 𝑦 > (homogeneity in the first slot)

4. < 𝑥, 𝑥 > ≥ 0 and < 𝑥, 𝑥 >= 0 iff 𝑥 = 0

then we say (𝑉, <, >) is an inner-product space with inner product <, >.
Given an inner-product space (𝑉, <, >) we can easily induce a norm for 𝑉 by the formula ∣∣𝑥∣∣ =

< 𝑥, 𝑥 > for all 𝑥 ∈ 𝑉 . Properties (1.), (3.) and (4.) in the definition of the norm are fairly obvious
for the induced norm. Let’s think throught the triangle inequality for the induced norm:

∣∣𝑥 + 𝑦∣∣2 = < 𝑥 + 𝑦, 𝑥 + 𝑦 > def. of induced norm


= < 𝑥, 𝑥 + 𝑦 > + < 𝑦, 𝑥 + 𝑦 > additive prop. of inner prod.
= < 𝑥 + 𝑦, 𝑥 > + < 𝑥 + 𝑦, 𝑦 > symmetric prop. of inner prod.
= < 𝑥, 𝑥 > + < 𝑦, 𝑥 > + < 𝑥, 𝑦 > + < 𝑦, 𝑦 > additive prop. of inner prod.
= ∣∣𝑥∣∣2 + 2 < 𝑥, 𝑦 > +∣∣𝑦∣∣2

At this point we’re stuck. A nontrivial identity19 called the Cauchy-Schwarz identity helps us
proceed; < 𝑥, 𝑦 >≤ ∣∣𝑥∣∣∣∣𝑦∣∣. It follows that ∣∣𝑥 + 𝑦∣∣2 ≤ ∣∣𝑥∣∣2 + 2∣∣𝑥∣∣∣∣𝑦∣∣ + ∣∣𝑦∣∣2 = (∣∣𝑥∣∣ + ∣∣𝑦∣∣)2 .
However, the induced norm is clearly positive20 so we find ∣∣𝑥 + 𝑦∣∣ ≤ ∣∣𝑥∣∣ + ∣∣𝑦∣∣.

Most linear algebra texts have a whole chapter on inner-products and their applications, you can
look at my notes for a start if you’re curious. That said, this is a bit of a digression for this course.

19
I prove this for the dot-product in my linear notes, however, the proof is written in such a way it equally well
applies to a general inner-product
20
note: if you have (−5)2 < (−7)2 it does not follow that −5 < −7, in order to take the squareroot of the inequality
we need positive terms squared
7.5. HODGE DUALITY 183

7.5 hodge duality


We can prove that 𝑛𝑝 = 𝑛−𝑝
( ) ( 𝑛 )
. This follows from explicit computation of the formula for 𝑛𝑝 or
( )

from the symmetry of Pascal’s triangle if you prefer. In any event, this equality suggests there is
some isomorphism between 𝑝 and (𝑛 − 𝑝)-forms. When we are given a metric 𝑔 on a vector space
𝑉 (and the notation of the preceding section) it is fairly simple to construct the isomorphism.
Suppose we are given 𝛼 ∈ Λ𝑝 𝑉 and following our usual notation:
𝑛
∑ 1
𝛼= 𝛼𝑖 𝑖 ...𝑖 𝑒𝑖1 ∧ 𝑒𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑖𝑝
𝑝! 1 2 𝑝
𝑖1 ,𝑖2 ,...,𝑖𝑝 =1

Then, define ∗𝛼 the hodge dual to be the (𝑛 − 𝑝)-form given below:

𝑛
∑ 1
∗𝛼 = 𝛼𝑖1 𝑖2 ...𝑖𝑝 𝜖𝑖1 𝑖2 ...𝑖𝑝 𝑗1 𝑗2 ...𝑗𝑛−𝑝 𝑒𝑗1 ∧ 𝑒𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑒𝑗𝑛−𝑝
𝑝!(𝑛 − 𝑝)!
𝑖1 ,𝑖2 ,...,𝑖𝑛 =1

I should admit, to prove this is a reasonable definition we’d need to do some work. It’s clearly a
linear transformation, but bijectivity and coordinate invariance of this definition might take a little
work. I intend to omit those details and instead focus on how this works for ℝ3 or ℝ4 . My advisor
taught a course on fiber bundles and there is a much more general and elegant presentation of the
hodge dual over a manifold. Ask if interested, I think I have a pdf.

7.5.1 hodge duality in euclidean space ℝ3


To begin, consider a scalar 1, this is a 0-form so we expect the hodge dual to give a 3-form:
∑ 1
∗1 = 𝜖𝑖𝑗𝑘 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 = 𝑒1 ∧ 𝑒2 ∧ 𝑒3
0!3!
𝑖,𝑗,𝑘

Interesting, the hodge dual of 1 is the top-form on ℝ3 . Conversely, calculate the dual of the top-
form, note 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = 𝑖𝑗𝑘 16 𝜖𝑖𝑗𝑘 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘 reveals the components of the top-form are precisely

𝜖𝑖𝑗𝑘 thus:
3
1 2 3
∑ 1 1
∗(𝑒 ∧ 𝑒 ∧ 𝑒 ) = 𝜖𝑖𝑗𝑘 𝜖𝑖𝑗𝑘 = (1 + 1 + 1 + (−1)2 + (−1)2 + (−1)2 ) = 1.
3!(3 − 3)! 6
𝑖,𝑗,𝑘=1

Next, consider 𝑒1 , note that 𝑒1 = 𝑘



𝑘 𝛿𝑘1 𝑒 hence the components are 𝛿𝑘1 . Thus,
∑ 1 1∑ 1
∗𝑒1 = 𝜖𝑖𝑗𝑘 𝛿 𝑘1 𝑒𝑖 ∧ 𝑒𝑗 = 𝜖𝑖𝑗1 𝑒𝑖 ∧ 𝑒𝑗 = (𝜖231 𝑒2 ∧ 𝑒3 + 𝜖321 𝑒3 ∧ 𝑒2 ) = 𝑒2 ∧ 𝑒3
1!2! 2 2
𝑖,𝑗,𝑘 𝑖,𝑗

Similar calculations reveal ∗𝑒2 = 𝑒3 ∧ 𝑒1 and ∗𝑒3 = 𝑒1 ∧ 𝑒2 . What about the duals of the two-forms?
Begin with 𝛼 = 𝑒1 ∧ 𝑒2 note that 𝑒1 ∧ 𝑒2 = 𝑒1 ⊗ 𝑒2 − 𝑒2 ⊗ 𝑒1 thus we can see the components are
184 CHAPTER 7. MULTILINEAR ALGEBRA

𝛼𝑖𝑗 = 𝛿𝑖1 𝛿𝑗2 − 𝛿𝑖2 𝛿𝑗1 . Thus,


∑ 1 ( )
1 2 𝑘 1 ∑ 𝑘
∑ 1
∗(𝑒 ∧ 𝑒 ) = 𝜖𝑖𝑗𝑘 (𝛿𝑖1 𝛿𝑗2 − 𝛿𝑖2 𝛿𝑗1 )𝑒 = 𝜖12𝑘 𝑒 − 𝜖21𝑘 𝑒 = (𝑒3 − (−𝑒3 )) = 𝑒3 .
𝑘
2!1! 2 2
𝑖,𝑗,𝑘 𝑘 𝑘

Similar calculations show that ∗(𝑒2 ∧ 𝑒3 ) = 𝑒1 and ∗(𝑒3 ∧ 𝑒1 ) = 𝑒2 . Put all of this together and we
find that
∗(𝑎𝑒1 + 𝑏𝑒2 + 𝑐𝑒3 ) = 𝑎𝑒2 ∧ 𝑒3 + 𝑏𝑒3 ∧ 𝑒1 + 𝑐𝑒1 ∧ 𝑒2
and
∗(𝑎𝑒2 ∧ 𝑒3 + 𝑏𝑒3 ∧ 𝑒1 + 𝑐𝑒1 ∧ 𝑒2 ) = 𝑎𝑒1 + 𝑏𝑒2 + 𝑐𝑒3
Which means that ∗𝜔𝑣 = Φ𝑣 and ∗Φ𝑣 = 𝜔𝑣 . Hodge duality links the two different form-representations
of vectors in a natural manner. Moveover, for ℝ3 we should also note that ∗∗𝛼 = 𝛼 for all 𝛼 ∈ Λℝ3 .
In general, for other metrics, we can have a change of signs which depends on the degree of 𝛼.

We can summarize hodge duality for three-dimensional Euclidean space as follows:


∗1= 𝑒1 ∧ 𝑒2 ∧ 𝑒3 ∗ (𝑒1 ∧ 𝑒2 ∧ 𝑒3 ) = 1
∗ 𝑒1 = 𝑒2 ∧ 𝑒3 ∗ (𝑒2 ∧ 𝑒3 ) = 𝑒1
∗ 𝑒2 = 𝑒3 ∧ 𝑒1 ∗ (𝑒3 ∧ 𝑒1 ) = 𝑒2
∗ 𝑒3 = 𝑒1 ∧ 𝑒2 ∗ (𝑒1 ∧ 𝑒2 ) = 𝑒3

A simple rule to calculate the hodge dual of a basis form is as follows


1. begin with the top-form 𝑒1 ∧ 𝑒2 ∧ 𝑒3

2. permute the forms until the basis form you wish to hodge dual is to the left of the expression,
whatever remains to the right is the hodge dual.
For example, to calculate the dual of 𝑒2 ∧ 𝑒3 note

𝑒1 ∧ 𝑒2 ∧ 𝑒3 = 𝑒|2 {z
∧ 𝑒}3 ∧ |{z}
𝑒1 ⇒ ∗(𝑒2 ∧ 𝑒3 ) = 𝑒1 .
𝑡𝑜 𝑏𝑒 𝑑𝑢𝑎𝑙𝑒𝑑 𝑡ℎ𝑒 𝑑𝑢𝑎𝑙

Consider what happens if we calculate ∗ ∗ 𝛼, since the dual is a linear operation it suffices to think
about the basis forms. Let me sketch the process of ∗ ∗ 𝑒𝐼 where 𝐼 is a multi-index:
1. begin with 𝑒1 ∧ 𝑒2 ∧ 𝑒3

2. write 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = (−1)𝑁 𝑒𝐼 ∧ 𝑒𝐽 and identify ∗𝑒𝐼 = (−1)𝑁 𝑒𝐽 .

3. then to calculate the second dual once more begin with 𝑒1 ∧ 𝑒2 ∧ 𝑒3 and note

𝑒1 ∧ 𝑒2 ∧ 𝑒3 = (−1)𝑁 𝑒𝐽 ∧ 𝑒𝐼

since the same 𝑁 transpositions are required to push 𝑒𝐼 to the left or 𝑒𝐽 to the right.
7.5. HODGE DUALITY 185

4. It follows that ∗ ∗ 𝑒𝐼 = 𝑒𝐼 for any multi-index hence ∗ ∗ 𝛼 = 𝛼 for all 𝛼 ∈ Λℝ3 .

I hope that once you get past the index calculation you can see the hodge dual is not a terribly
complicated construction. Some of the index calculation in this section was probably gratutious,
but I would like you to be aware of such techniques. Brute-force calculation has it’s place, but a
well-thought index notation can bring far more insight with much less effort.

7.5.2 hodge duality in minkowski space ℝ4


The logic here follows fairly close to the last section, however the wrinkle is that the metric here
demands more attention. We must take care to raise the indices on the forms when we Hodge dual
them. First let’s list the basis forms, we have to add time to the mix ( again 𝑐 = 1 so 𝑥0 = 𝑐𝑡 = 𝑡 if
you worried about it ) Remember that the Greek indices are defined to range over 0, 1, 2, 3. Here

Name Degree Typical Element Basis for Λ𝑝 ℝ4


function 𝑝=0 ∑𝑓 1
one-form 𝑝=1 𝛼 = 𝜇 𝛼𝜇 𝑒 𝜇 𝑒0 , 𝑒1 , 𝑒2 , 𝑒3
𝛽 = 𝜇,𝜈 12 𝛽𝜇𝜈 𝑒𝜇 ∧ 𝑒𝜈 𝑒2 ∧ 𝑒3 , 𝑒3 ∧ 𝑒1 , 𝑒1 ∧ 𝑒2

two-form 𝑝=2
𝑒0 ∧ 𝑒1 , 𝑒0 ∧ 𝑒2 , 𝑒0 ∧ 𝑒3
1 𝜇 ∧ 𝑒𝜈 𝑒𝛼 𝑒1 ∧ 𝑒2 ∧ 𝑒3 , 𝑒0 ∧ 𝑒2 ∧ 𝑒3

three-form 𝑝=3 𝛾= 𝜇,𝜈,𝛼 3! 𝛾𝜇𝜈𝛼 𝑒
𝑒0 ∧ 𝑒1 ∧ 𝑒3 , 𝑒0 ∧ 𝑒1 ∧ 𝑒2
four-form 𝑝=4 𝑔𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3

the top form is degree four since in four dimensions we can have at most four dual-basis vectors
without a repeat. Wedge products work the same as they have before, just now we have 𝑒0 to play
with. Hodge duality may offer some surprises though.

Definition 7.5.1. The antisymmetric symbol in flat ℝ4 is denoted 𝜖𝜇𝜈𝛼𝛽 and it is defined by the
value
𝜖0123 = 1
plus the demand that it be completely antisymmetric.

We must not assume that this symbol is invariant under a cyclic exhange of indices. Consider,

𝜖0123 = −𝜖1023 flipped (01)


= +𝜖1203 flipped (02) (7.21)
= −𝜖1230 flipped (03).

In four dimensions we’ll use antisymmetry directly and forego the cyclicity shortcut. Its not a big
deal if you notice it before it confuses you.

Example 7.5.2. Find the Hodge dual of 𝛾 = 𝑒1 with respect to the Minkowski metric∑𝜂𝜇𝜈 , to begin
notice that 𝑑𝑥 has components 𝛾𝜇 = 𝛿𝜇1 as is readily verified by the equation 𝑒1 = 𝜇 𝛿𝜇1 𝑒𝜇 . Lets
186 CHAPTER 7. MULTILINEAR ALGEBRA

raise the index using 𝜂 as we learned previously,


∑ ∑
𝛾𝜇 = 𝜂 𝜇𝜈 𝛾𝜈 = 𝜂 𝜇𝜈 𝛿𝜈1 = 𝜂 1𝜇 = 𝛿 1𝜇
𝜈 𝜈

Starting with the definition of Hodge duality we calculate


∗ (𝑒1 ) = 1 1 𝜇 𝜈 𝛼 𝛽

𝛼,𝛽,𝜇,𝜈 𝑝! (𝑛−𝑝)! 𝛾 𝜖𝜇𝜈𝛼𝛽 𝑒 ∧ 𝑒 ∧ 𝑒

1𝜇 𝜖 𝜈 ∧ 𝑒𝛼 ∧ 𝑒𝛽

= 𝛼,𝛽,𝜇,𝜈 (1/6)𝛿 𝜇𝜈𝛼𝛽 𝑒

𝜈 ∧ 𝑒𝛼 ∧ 𝑒𝛽

= 𝛼,𝛽,𝜈 (1/6)𝜖1𝜈𝛼𝛽 𝑒

= (1/6)[𝜖1023 𝑒0 ∧ 𝑒2 ∧ 𝑒3 + 𝜖1230 𝑒2 ∧ 𝑒3 ∧ 𝑒0 + 𝜖1302 𝑒3 ∧ 𝑒0 ∧ 𝑒2 (7.22)


+𝜖1320 𝑒3 ∧ 𝑒2 ∧ 𝑒0 + 𝜖1203 𝑒2 ∧ 𝑒0 ∧ 𝑒3 + 𝜖1032 𝑒0 ∧ 𝑒3 ∧ 𝑒2 ]

= (1/6)[−𝑒0 ∧ 𝑒2 ∧ 𝑒3 − 𝑒2 ∧ 𝑒3 ∧ 𝑒0 − 𝑒3 ∧ 𝑒0 ∧ 𝑒2
+𝑒3 ∧ 𝑒2 ∧ 𝑒0 + 𝑒2 ∧ 𝑒0 ∧ 𝑒3 + 𝑒0 ∧ 𝑒3 ∧ 𝑒2 ]

= −𝑒2 ∧ 𝑒3 ∧ 𝑒0 = −𝑒0 ∧ 𝑒2 ∧ 𝑒3 .

the difference between the three and four dimensional Hodge dual arises from two sources, for one
we are using the Minkowski metric so indices up or down makes a difference, and second the
antisymmetric symbol has more possibilities than before because the Greek indices take four values.

I suspect we can calculate the hodge dual by the following pattern: suppose we wish to find the
dual of 𝛼 where 𝛼 is a basis form for Λℝ4 with the Minkowski metric

1. begin with the top-form 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3

2. permute factors as needed to place 𝛼 to the left,

3. the form which remains to the right will be the hodge dual of 𝛼 if no 𝑒0 is in 𝛼 otherwise the
form to the right multiplied by −1 is ∗𝛼.

Note this works for the previous example as follows:

1. begin with 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3

2. note 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = −𝑒1 ∧ 𝑒0 ∧ 𝑒2 ∧ 𝑒3 = 𝑒1 ∧ (−𝑒0 ∧ 𝑒2 ∧ 𝑒3 )

3. identify ∗𝑒1 = −𝑒0 ∧ 𝑒2 ∧ 𝑒3 (no extra sign since no 𝑒0 appears in 𝑒1 )

Follow the algorithm for finding the dual of 𝑒0 ,

1. begin with 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3

2. note 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = 𝑒0 ∧ (𝑒1 ∧ 𝑒2 ∧ 𝑒3 )
7.5. HODGE DUALITY 187

3. identify ∗𝑒0 = −𝑒1 ∧ 𝑒2 ∧ 𝑒3 ( added sign since 𝑒0 appears in form being hodge dualed)

Let’s check from the definition if my algorithm worked out right.

Example 7.5.3. Find the Hodge dual of 𝛾 = 𝑒0 with respect to the Minkowski metric∑𝜂𝜇𝜈 , to begin
notice that 𝑒0 has components 𝛾𝜇 = 𝛿𝜇0 as is readily verified by the equation 𝑒0 = 𝜇 𝛿𝜇0 𝑒𝜇 . Lets
raise the index using 𝜂 as we learned previously,
∑ ∑
𝛾𝜇 = 𝜂 𝜇𝜈 𝛾𝜈 = 𝜂 𝜇𝜈 𝛿𝜈0 = 𝜂 𝜇0 = −𝛿 0𝜇
𝜈 𝜈

the minus sign is due to the Minkowski metric. Starting with the definition of Hodge duality we
calculate
∗ (𝑒0 ) = 1 1 𝜇 𝜈 𝛼 𝛽

𝛼,𝛽,𝜇,𝜈 𝑝! (𝑛−𝑝)! 𝛾 𝜖𝜇𝜈𝛼𝛽 𝑒 ∧ 𝑒 ∧ 𝑒

−(1/6)𝛿 0𝜇 𝜖𝜇𝜈𝛼𝛽 𝑒𝜈 ∧ 𝑒𝛼 ∧ 𝑒𝛽

= 𝛼,𝛽,𝜇,𝜈

−(1/6)𝜖0𝜈𝛼𝛽 𝑒𝜈 ∧ 𝑒𝛼 ∧ 𝑒𝛽

= 𝛼,𝛽,𝜈 (7.23)

= 𝑖,𝑗,𝑘 −(1/6)𝜖0𝑖𝑗𝑘 𝑒𝑖 ∧ 𝑒𝑗 ∧ 𝑒𝑘

= 𝑖,𝑗,𝑘 −(1/6)𝜖𝑖𝑗𝑘 𝜖𝑖𝑗𝑘 𝑒1 ∧ 𝑒2 ∧ 𝑒3



← sneaky step
= −𝑒1 ∧ 𝑒2 ∧ 𝑒3 .
Notice I am using the convention that Greek indices sum over 0, 1, 2, 3 whereas Latin indices sum
over 1, 2, 3.

Example 7.5.4. Find the Hodge dual of 𝛾 = 𝑒0 ∧ 𝑒1 with respect to the Minkowski metric 𝜂𝜇𝜈 , to
begin notice the following identity, it will help us find the components of 𝛾
∑1
𝑒0 ∧ 𝑒1 = 2𝛿𝜇0 𝛿𝜈1 𝑒𝜇 ∧ 𝑒𝜈
𝜇,𝜈
2

now we antisymmetrize to get the components of the form,


∑1
𝑒0 ∧ 𝑒1 = 0 1
𝛿[𝜇 𝛿𝜈] 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈
𝜇,𝜈
2

0 𝛿 1 = 𝛿 0 𝛿 1 − 𝛿 0 𝛿 1 and the factor of two is used up in the antisymmetrization. Lets raise


where 𝛿[𝜇 𝜈] 𝜇 𝜈 𝜈 𝜇
the index using 𝜂 as we learned previously,
∑ ∑
𝛾 𝛼𝛽 = 𝜂 𝛼𝜇 𝜂 𝛽𝜈 𝛾𝜇𝜈 = 𝜂 𝛼𝜇 𝜂 𝛽𝜈 𝛿[𝜇
0 1
𝛿𝜈] = −𝜂 𝛼0 𝜂 𝛽1 + 𝜂 𝛽0 𝜂 𝛼1 = −𝛿 [𝛼0 𝛿 𝛽]1
𝜇,𝜈 𝜇,𝜈

the minus sign is due to the Minkowski metric. Starting with the definition of Hodge duality we
188 CHAPTER 7. MULTILINEAR ALGEBRA

calculate
∗ (𝑒0 ∧ 𝑒1 ) = 1 1 𝛼𝛽 𝜇 ∧ 𝑒𝜈
𝑝! (𝑛−𝑝)! 𝛾 𝜖𝛼𝛽𝜇𝜈 𝑒

= (1/4)(−𝛿 [𝛼0 𝛿 𝛽]1 )𝜖𝛼𝛽𝜇𝜈 𝑒𝜇 ∧ 𝑒𝜈

= −(1/4)(𝜖01𝜇𝜈 𝑒𝜇 ∧ 𝑒𝜈 − 𝜖10𝜇𝜈 𝑒𝜇 ∧ 𝑒𝜈 )
(7.24)
= −(1/2)𝜖01𝜇𝜈 𝑒𝜇 ∧ 𝑒𝜈

= −(1/2)[𝜖0123 𝑒2 ∧ 𝑒3 + 𝜖0132 𝑒3 ∧ 𝑒2 ]

= −𝑒2 ∧ 𝑒3
Note, the algorithm works out the same,

∧ 𝑒}1 ∧(𝑒2 ∧ 𝑒3 ) ⇒ ∗(𝑒0 ∧ 𝑒1 ) = −𝑒2 ∧ 𝑒3


𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = 𝑒|0 {z
ℎ𝑎𝑠 𝑒0

The other Hodge duals of the basic two-forms calculate by almost the same calculation. Let us make
a table of all the basic Hodge dualities in Minkowski space, I have grouped the terms to emphasize
∗1 = 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 ∗ (𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 ) = −1
∗ (𝑒1 ∧ 𝑒2 ∧ 𝑒3 ) = −𝑒0 ∗ 𝑒0 = −𝑒1 ∧ 𝑒2 ∧ 𝑒3
∗ (𝑒0 ∧ 𝑒2 ∧ 𝑒3 ) = −𝑒1 ∗ 𝑒1 = −𝑒2 ∧ 𝑒3 ∧ 𝑒0
∗ (𝑒0 ∧ 𝑒3 ∧ 𝑒1 ) = −𝑒2 ∗ 𝑒2 = −𝑒3 ∧ 𝑒1 ∧ 𝑒0
∗ (𝑒0 ∧ 𝑒1 ∧ 𝑒2 ) = −𝑒3 ∗ 𝑒3 = −𝑒1 ∧ 𝑒2 ∧ 𝑒0
∗ (𝑒3 ∧ 𝑒0 ) = 𝑒1 ∧ 𝑒2 ∗ (𝑒1 ∧ 𝑒2 ) = −𝑒3 ∧ 𝑒0
∗ (𝑒1 ∧ 𝑒0 ) = 𝑒2 ∧ 𝑒3 ∗ (𝑒2 ∧ 𝑒3 ) = −𝑒1 ∧ 𝑒0
∗ (𝑒2 ∧ 𝑒0 ) = 𝑒3 ∧ 𝑒1 ∗ (𝑒3 ∧ 𝑒1 ) = −𝑒2 ∧ 𝑒0

the isomorphisms between the one-dimensional Λ0 ℝ4 and Λ4 ℝ4 , the four-dimensional Λ1 ℝ4 and


Λ3 ℝ4 , the six-dimensional Λ2 ℝ4 and itself. Notice that the dimension of Λℝ4 is 16 which we have
explained in depth in the previous section. Finally, it is useful to point out the three-dimensional
work and flux form mappings to provide some useful identities in this 1 + 3-dimensional setting.

∗𝜔⃗𝑣 = −𝑒0 ∧ Φ⃗𝑣 ∗ Φ⃗𝑣 = 𝑒0 ∧ 𝜔⃗𝑣 ∗ 𝑒0 ∧ Φ⃗𝑣 = 𝜔⃗𝑣


( )

I leave verification of these formulas to the reader ( use the table). Finally let us analyze the process
of taking two hodge duals in succession. In the context of ℝ3 we found that ∗ ∗ 𝛼 = 𝛼, we seek to
discern if a similar formula is available in the context of ℝ4 with the minkowksi metric. We can
calculate one type of example with the identities above:

∗𝜔⃗𝑣 = −𝑒0 ∧ Φ⃗𝑣 ⇒ ∗ ∗ 𝜔⃗𝑣 = − ∗ (𝑒0 ∧ Φ⃗𝑣 ) = −𝜔⃗𝑣 ⇒ ∗ ∗ 𝜔⃗𝑣 = −𝜔⃗𝑣


7.6. COORDINATE CHANGE 189

Perhaps this is true in general?

If we accept my algorithm then it’s not too hard to sort through using multi-index notation: since
hodge duality is linear it suffices to consider a basis element 𝑒𝐼 where 𝐼 is a multi-index,
1. transpose dual vectors so that 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = (−1)𝑁 𝑒𝐼 ∧ 𝑒𝐽

/ 𝐼 then ∗𝑒𝐼 = (−1)𝑁 𝑒𝐽 and 0 ∈ 𝐽 since 𝐼 ∪ 𝐽 = {0, 1, 2, 3}. Take a second dual by
2. if 0 ∈
writing 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = (−1)𝑁 𝑒𝐽 ∧ 𝑒𝐼 but note ∗((−1)𝑁 𝑒𝐽 ) = −𝑒𝐼 since 0 ∈ 𝐽. We find
∗ ∗ 𝑒𝐼 = −𝑒𝐼 for all 𝐼 not containing the 0-index.

3. if 0 ∈ 𝐼 then ∗𝑒𝐼 = −(−1)𝑁 𝑒𝐽 and 0 ∈ / 𝐽 since 𝐼 ∪ 𝐽 = {0, 1, 2, 3}. Take a second dual by
writing 𝑒0 ∧ 𝑒1 ∧ 𝑒2 ∧ 𝑒3 = −(−1)𝑁 𝑒𝐽 ∧ (−𝑒𝐼 ) and hence ∗(−(−1)𝑁 𝑒𝐽 ) = −𝑒𝐼 since 0 ∈
/ 𝐽. We
find ∗ ∗ 𝑒𝐼 = −𝑒𝐼 for all 𝐼 containing the 0-index.

4. it follows that ∗ ∗ 𝛼 = −𝛼 for all 𝛼 ∈ Λℝ4 with the minkowski metric.


To conclude, I would warn the reader that the results in this section pertain to our choice of notation
for ℝ4 . Some other texts use a metric which is −𝜂 relative to our notation. This modifies many
signs in this section. See Misner, Thorne and Wheeler’s Gravitation or Bertlmann’s Anomalies in
Field Theory for future reading on Hodge duality and a more systematic explaination of how and
when these signs arise from the metric.

7.6 coordinate change


Suppose 𝑉 has two bases 𝛽¯ = {𝑓¯1 , 𝑓¯2 , . . . , 𝑓¯𝑛 } and 𝛽 = {𝑓1 , 𝑓2 , . . . , 𝑓𝑛 }. If 𝑣 ∈ 𝑉 then we can write
𝑣 in as a linear combination of the 𝛽¯ basis or the 𝛽 basis:

¯1 𝑓¯1 + 𝑥
𝑣 = 𝑥1 𝑓1 + 𝑥2 𝑓2 + ⋅ ⋅ ⋅ + 𝑥𝑛 𝑓𝑛 and 𝑣 = 𝑥 ¯2 𝑓¯2 + ⋅ ⋅ ⋅ + 𝑥
¯𝑛 𝑓¯𝑛

given the notation above, we define coordinate maps as follows:

Φ𝛽 (𝑣) = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) = 𝑥 and 𝑥1 , 𝑥


Φ𝛽¯(𝑣) = (¯ ¯2 , . . . , 𝑥
¯𝑛 ) = 𝑥
¯

We sometimes use the notation Φ𝛽 (𝑣) = [𝑣]𝛽 = 𝑥 whereas Φ𝛽¯(𝑣) = [𝑣]𝛽¯ = 𝑥 ¯. A coordinate map
𝑛
takes an abstract vector 𝑣 and maps it to a particular representative in ℝ . A natural question
to ask is how do different representatives compare? How do 𝑥 and 𝑥 ¯ compare in our current
notation? Because the coordinate maps are isomorphisms it follows that Φ𝛽 ∘ Φ−1𝛽¯
: ℝ𝑛 → ℝ𝑛 is an
isomorphism and given the domain and codomain we can write its formula via matrix multiplication:

Φ𝛽 ∘ Φ−1
𝛽¯
(𝑢) = 𝑃 𝑢 ⇒ Φ𝛽 ∘ Φ−1
𝛽¯

𝑥) = 𝑃 𝑥
¯

However, Φ−1
𝛽¯

𝑥) = 𝑣 hence Φ𝛽 (𝑣) = 𝑃 𝑥
¯ and consequently, 𝑥 = 𝑃 𝑥
¯ . Conversely, to switch to
¯ = 𝑃 −1 𝑥 .
barred coordinates we multiply the coordinate vectors by 𝑃 −1 ; 𝑥
190 CHAPTER 7. MULTILINEAR ALGEBRA

Continuing this discussion we turn to the dual space. Suppose 𝛽¯∗ = {𝑓¯𝑗 }𝑛𝑗=1 is dual to 𝛽¯ = {𝑓¯𝑗 }𝑛𝑗=1
and 𝛽 ∗ = {𝑓 𝑗 }𝑛𝑗=1 is dual to 𝛽 = {𝑓𝑗 }𝑛𝑗=1 . By definition we are given that 𝑓 𝑗 (𝑓𝑖 ) = 𝛿𝑖𝑗 and
𝑓¯𝑗 (𝑓¯𝑖 ) = 𝛿𝑖𝑗 for all 𝑖, 𝑗 ∈ ℕ𝑛 . Suppose 𝛼 ∈ 𝑉 ∗ is a dual vector with components 𝛼𝑗 with respect
to the 𝛽 ∗ basis and ∑𝑛components 𝛼 with respect to the 𝛽¯∗ basis. In particular this means we can
¯𝑗 ∑
either write 𝛼 = 𝑗=1 𝛼𝑗 𝑓 or 𝛼 = 𝑛𝑗=1 𝛼
𝑗 ¯ 𝑗 𝑓¯𝑗 . Likewise, given a vector 𝑣 ∈ 𝑉 we can either write
𝑣 = 𝑖=1 𝑥𝑖 𝑓𝑖 or 𝑣 = 𝑛𝑖=1 𝑥
∑𝑛
¯𝑖 𝑓¯𝑖 . With these notations in mind calculate:

𝑛
( 𝑛 𝑛 𝑛 ∑
𝑛 𝑛
(∑ ) ∑ ∑ ∑ ∑
𝑗 𝑖 𝑖 𝑗 𝑖
𝛼𝑖 𝑥𝑖
)
𝛼(𝑣) = 𝛼𝑗 𝑓 𝑥 𝑓𝑖 = 𝛼𝑗 𝑥 𝑓 (𝑓𝑖 ) = 𝛼𝑗 𝑥 𝛿𝑖𝑗 =
𝑗=1 𝑖=1 𝑖,𝑗=1 𝑖=1 𝑗=1 𝑖=1

∑𝑛 𝑖.
and by the same calculation in the barred coordinates we find, 𝛼(𝑣) = 𝑖=1 𝛼
¯𝑖𝑥
¯ Therefore,

𝑛
∑ 𝑛

𝑖
𝛼𝑖 𝑥 = 𝛼 ¯𝑖 .
¯𝑖𝑥
𝑖=1 𝑖=1

∑𝑛
¯. In components, 𝑥𝑖 =
Recall, 𝑥 = 𝑃 𝑥 𝑖 ¯𝑘 .
𝑘=1 𝑃𝑘 𝑥 Substituting,

𝑛 ∑
∑ 𝑛 𝑛

¯𝑘 =
𝛼𝑖 𝑃𝑘𝑖 𝑥 𝛼 ¯𝑖 .
¯𝑖𝑥
𝑖=1 𝑘=1 𝑖=1

But, this formula holds for all possible∑𝑛 vectors 𝑣∑


and hence all possible coordinate vectors 𝑥
¯. If we
¯ 𝑖 𝑖 𝑛 ∑𝑛 ∑ 𝑛
consider
∑𝑛 ∑𝑛𝑣 = 𝑓𝑗 then 𝑥
¯∑ = 𝛿𝑖𝑗 hence 𝑖=1 𝛼 ¯𝑖𝑥¯ =∑ 𝑖=1 𝛼 ¯ 𝑗 . Moreover, 𝑖=1 𝑘=1 𝛼𝑖 𝑃𝑘𝑖 𝑥
¯ 𝑖 𝛿𝑖𝑗 = 𝛼 ¯𝑘 =
𝑖 𝑛 𝑖 𝑛 𝑖
𝑖=1 𝑘=1 𝛼𝑖 𝑃𝑘 𝛿𝑘𝑗 = 𝑖=1 𝛼𝑖 𝑃𝑗 . Thus, 𝛼
¯ 𝑗 = 𝑖=1 𝑃𝑗 𝛼𝑖 . Compare how vectors and dual vectors
transform:
𝑛 𝑛
(𝑃 −1 )𝑗𝑖 𝑥𝑖 .
∑ ∑
𝛼
¯𝑗 = 𝑃𝑗𝑖 𝛼𝑖 verses ¯𝑗 =
𝑥
𝑖=1 𝑖=1

It is customary to use lower-indices on∑𝑛the components of dual-vectors and upper-indices on the


components of vectors: we say 𝑥 = 𝑥 𝑖 𝑒 ∈ ℝ𝑛 has contravariant components whereas
∑𝑛 𝑖=1 𝑖
𝑗 𝑛 ∗
𝛼 = 𝑗=1 𝛼𝑗 𝑒 ∈ (ℝ ) has covariant components. These terms arise from the coordinate
change properties we derived in this section. The convenience of the up/down index notation will
be more apparent as we continue our study to more complicated objects. It is interesting to note
the basis elements tranform inversely:

𝑛 𝑛
(𝑃 −1 )𝑗𝑖 𝑓 𝑖
∑ ∑
𝑓¯𝑗 = verses 𝑓¯𝑗 = 𝑃𝑗𝑖 𝑓𝑖 .
𝑖=1 𝑖=1

The formulas above can be derived by arguments similar to those we already gave in this section,
7.6. COORDINATE CHANGE 191

however I think it may be more instructive to see how these rules work in concert:
𝑛
∑ 𝑛 ∑
∑ 𝑛
𝑥= ¯𝑖 𝑓¯𝑖 =
𝑥 (𝑃 −1 )𝑖𝑗 𝑥𝑗 𝑓¯𝑖 (7.25)
𝑖=1 𝑖=1 𝑗=1
∑𝑛 ∑ 𝑛 𝑛

= (𝑃 −1 )𝑖𝑗 𝑥𝑗 𝑃𝑖𝑘 𝑓𝑘
𝑖=1 𝑗=1 𝑘=1
𝑛 ∑
∑ 𝑛 ∑
𝑛
= (𝑃 −1 )𝑖𝑗 𝑃𝑖𝑘 𝑥𝑗 𝑓𝑘
𝑖=1 𝑗=1 𝑘=1
∑𝑛 ∑ 𝑛
= 𝛿𝑗𝑘 𝑥𝑗 𝑓𝑘
𝑗=1 𝑘=1
∑𝑛
= 𝑥𝑘 𝑓𝑘 .
𝑘=1

7.6.1 coordinate change for 𝑇20 (𝑉 )


For an abstract vector space, or for ℝ𝑛 with a nonstandard basis, we have to replace 𝑣, 𝑤 with their
coordinate vectors. If 𝑉 has basis 𝛽¯ = {𝑓¯1 , 𝑓¯2 , . . . , 𝑓∑
¯𝑛 } with dual basis 𝛽 ∗ = {𝑓¯1 , 𝑓¯2 , . . . , 𝑓¯𝑛 } and
¯, 𝑦¯ (which means 𝑣 = 𝑛𝑖=1 𝑥 ¯𝑖 𝑓¯𝑖 and 𝑤 = 𝑛𝑖=1 𝑦¯𝑖 𝑓¯𝑖 ) then,

𝑣, 𝑤 have coordinate vectors 𝑥
𝑛

𝑏(𝑣, 𝑤) = 𝑥 ¯𝑖𝑗 = 𝑥
¯𝑖 𝑦¯𝑗 𝐵 ¯ 𝑦¯
¯𝑇 𝐵
𝑖,𝑗=1

¯𝑖𝑗 = 𝑏(𝑓¯𝑖 , 𝑓¯𝑗 ). If 𝛽 = {𝑓1 , 𝑓2 , . . . , 𝑓𝑛 } is another basis on 𝑉 with dual basis 𝛽 ∗ then we
where 𝐵
define 𝐵𝑖𝑗 = 𝑏(𝑓𝑖 , 𝑓𝑗 ) and we have
𝑛

𝑏(𝑣, 𝑤) = 𝑥𝑖 𝑦 𝑗 𝐵𝑖𝑗 = 𝑥𝑇 𝐵𝑦.
𝑖,𝑗=1

Recall that 𝑓¯𝑖 = 𝑛𝑘=1 𝑃𝑖𝑘 𝑓𝑘 . With this in mind calculate:


𝑛
(∑ 𝑛
∑ ) 𝑛
∑ 𝑛

¯ ¯ ¯
𝐵𝑖𝑗 = 𝑏(𝑓𝑖 , 𝑓𝑗 ) = 𝑏 𝑘
𝑃𝑖 𝑓𝑘 , 𝑙
𝑃𝑗 𝑓𝑙 = 𝑘 𝑙
𝑃𝑖 𝑃𝑗 𝑏(𝑓𝑘 , 𝑓𝑙 ) = 𝑃𝑖𝑘 𝑃𝑗𝑙 𝐵𝑘𝑙
𝑘=1 𝑙=1 𝑘,𝑙=1 𝑘,𝑙=1

We find the components of a bilinear map transform as follows:


𝑛

¯𝑖𝑗 =
𝐵 𝑃𝑖𝑘 𝑃𝑗𝑙 𝐵𝑘𝑙
𝑘,𝑙=1

XXX- include general coordinate change and metrics with sylvester’s theorem.
192 CHAPTER 7. MULTILINEAR ALGEBRA
Chapter 8

manifold theory

In this chapter I intend to give you a fairly accurate account of the modern definition of a manifold1 .
In a nutshell, a manifold is simply a set which allows for calculus locally. Alternatively, many people
say that a manifold is simply a set which is locally ”flat”, or it locally ”looks like ℝ𝑛 ”. This covers
most of the objects you’ve seen in calculus III. However, the technical details most closely resemble
the parametric view-point.

1
the definitions we follow are primarily taken from Burns and Gidea’s Differential Geometry and Topology With
a View to Dynamical Systems, I like their notation, but you should understand this definition is known to many
authors

193
194 CHAPTER 8. MANIFOLD THEORY

8.1 manifolds

Definition 8.1.1.
We define a smooth manifold of dimension 𝑚 as follows: suppose we are given a set 𝑀 ,
a collection of open subsets 𝑈𝑖 of ℝ𝑚 , and a collection of mappings 𝜙𝑖 : 𝑈𝑖 ⊆ ℝ𝑚 → 𝑉𝑖 ⊆ 𝑀
which satisfies the following three criteria:

1. each map 𝜙𝑖 : 𝑈𝑖 → 𝑉𝑖 is injective

2. if 𝑉𝑖 ∩ 𝑉𝑗 ∕= ∅ then there exists a smooth mapping

𝜃𝑖𝑗 : 𝜙−1 −1
𝑗 (𝑉𝑖 ∩ 𝑉𝑗 ) → 𝜙𝑖 (𝑉𝑖 ∩ 𝑉𝑗 )

such that 𝜙𝑗 = 𝜙𝑖 ∘ 𝜃𝑖𝑗

3. 𝑀 = ∪𝑖 𝜙𝑖 (𝑈𝑖 )

Moreover, we call the mappings 𝜙𝑖 the local parametrizations or patches of 𝑀 and the
space 𝑈𝑖 is called the parameter space. The range 𝑉𝑖 together with the inverse 𝜙−1 𝑖 is
−1
called a coordinate chart on 𝑀 . The component functions of a chart (𝑉, 𝜙 ) are usually
denoted 𝜙−1 1 2 𝑚 𝑗
𝑖 = (𝑥 , 𝑥 , . . . , 𝑥 ) where 𝑥 : 𝑉 → ℝ for each 𝑗 = 1, 2, . . . , 𝑚. .

We could add to this definition that 𝑖 is taken from an index set ℐ (which could be an infinite
set). The union given in criteria (3.) is called a covering of 𝑀 . Most often, we deal with finitely
covered manifolds. You may recall that there are infinitely many ways to parametrize the lines
or surfaces we dealt with in calculus III. The story here is no different. It follows that when we
consider classification of manifolds the definition we just offered is a bit lacking. We would also like
to lump in all other possible compatible parametrizations. In short, the definition we gave says a
manifold is a set together with an atlas of compatible charts. If we take that atlas and adjoin
to it all possible compatible charts then we obtain the so-called maximal atlas which defines a
differentiable structure on the set 𝑀 . Many other authors define a manifold as a set together
with a differentiable structure. That said, our less ambtious definition will do.

We should also note that 𝜃𝑖𝑗 = 𝜙−1


𝑖
∘ 𝜙 hence 𝜃
𝑗
−1 −1 ∘
𝑖𝑗 = (𝜙𝑖 𝜙𝑗 )−1 = 𝜙−1
𝑗
∘ 𝜙 = 𝜃 . The functions
𝑖 𝑗𝑖
𝜃𝑖𝑗 are called the transition functions of ℳ. These explain how we change coordinates locally.

I now offer a few examples so you can appreciate how general this definition is, in contrast to the
level-set definition we explored previously. We will recover those as examples of this more general
definition later in this chapter.
Example 8.1.2. Let 𝑀 = ℝ𝑚 and suppose 𝜙 : ℝ𝑚 → ℝ𝑚 is the identity mapping ( 𝜙(𝑢) = 𝑢 for
all 𝑢 ∈ ℝ𝑚 ) defines the collection of paramterizations on 𝑀 . In this case the collection is just one
mapping and 𝑈 = 𝑉 = ℝ𝑚 , clearly 𝜙 is injective and 𝑉 covers ℝ𝑚 . The remaining overlap criteria
is trivially satisfied since there is only one patch to consider.
8.1. MANIFOLDS 195

Example 8.1.3. Let 𝑈 = ℝ𝑚 and suppose 𝑝𝑜 ∈ ℝ𝑝 then 𝜙 : 𝑈 → ℝ𝑝 × ℝ𝑚 defined by 𝜙(𝑢) = 𝑝𝑜 × 𝑢


makes 𝜙(𝑈 ) = 𝑀 an 𝑚-dimensional manifold. Again we have no overlap and the covering criteria
is clearly satisfied so that leaves injectivity of 𝜙. Note 𝜙(𝑢) = 𝜙(𝑢′ ) implies 𝑝𝑜 × 𝑢 = 𝑝𝑜 × 𝑢′ hence
𝑢 = 𝑢′ .
Example 8.1.4. Suppose 𝑉 is an 𝑚-dimensional vector space over ℝ with basis 𝛽 = {𝑒𝑖 }𝑛𝑖=1 .
Define 𝜙 : ℝ𝑚 → 𝑉 as follows, for each 𝑢 = (𝑢1 , 𝑢2 , . . . , 𝑢𝑚 ) ∈ ℝ𝑚

𝜙(𝑢) = 𝑢1 𝑒1 + 𝑢2 𝑒2 + ⋅ ⋅ ⋅ + 𝑢𝑚 𝑒𝑚 .

Injectivity of the map follows from the linear independence of 𝛽. The overlap criteria is trivially
satisfied. Moreover, 𝑠𝑝𝑎𝑛(𝛽) = 𝑉 thus we know that 𝜙(ℝ𝑚 ) = 𝑉 which means the vector space is
covered. All together we find 𝑉 is an 𝑚-dimensional manifold. Notice that the inverse of 𝜙 of the
coordinate mapping Φ𝛽 from out earlier work and so we find the coordinate chart is a coordinate
mapping in the context of a vector space. Of course, this is a very special case since most manifolds
are not spanned by a basis.
You might notice that there seems to be little contact with criteria two in the examples above.
These are rather special cases in truth. When we deal with curved manifolds we cannot avoid it
any longer. I should mention we can (and often do) consider other coordinate systems on ℝ𝑚 .
Moreover, in the context of a vector space we also have infinitely many coordinate systems to
use. We will have to analyze compatibility of those new coordinates as we adjoin them. For the
vector space it’s simple to see the transition maps are smooth since they’ll just be invertible linear
mappings. On the other hand, it is more work to show new curvelinear coordinates on ℝ𝑚 are
compatible with Cartesian coordinates. The inverse function theorem would likely be needed.
Example 8.1.5. Let 𝑀 = {(cos(𝜃), sin(𝜃)) ∣ 𝜃 ∈ [0, 2𝜋)}. Define 𝜙1 (𝑢) = (cos(𝑢) sin(𝑢)) for all
𝑢 ∈ (0, 3𝜋/2) = 𝑈1 . Also, define 𝜙2 (𝑣) = (cos(𝑣) sin(𝑣)) for all 𝑣 ∈ (𝜋, 2𝜋) = 𝑈2 . Injectivity
follows from the basic properties of sine and cosine and covering follows from the obvious geometry
of these mappings. However, overlap we should check. Let 𝑉1 = 𝜙1 (𝑈1 ) and 𝑉2 = 𝜙2 (𝑈2 ). Note
𝑉1 ∩ 𝑉2 = {(cos(𝜃), sin(𝜃)) ∣ 𝜋 < 𝜃 < 3𝜋/2}. We need to find the formula for

𝜃12 : 𝜙2−1 (𝑉1 ∩ 𝑉2 ) → 𝜙−1


1 (𝑉1 ∩ 𝑉2 )

In this example, this means


𝜃12 : (𝜋, 3𝜋/2) → (𝜋, 3𝜋/2)

Example 8.1.6. Let’s return to the vector space example. This time we want to allow for all
possible coordinate systems. Once more suppose 𝑉 is an 𝑚-dimensional vector space over ℝ. Note
that for each basis 𝛽 = {𝑒𝑖 }𝑛𝑖=1 . Define 𝜙𝛽 : ℝ𝑚 → 𝑉 as follows, for each 𝑢 = (𝑢1 , 𝑢2 , . . . , 𝑢𝑚 ) ∈ ℝ𝑚

𝜙𝛽 (𝑢) = 𝑢1 𝑒1 + 𝑢2 𝑒2 + ⋅ ⋅ ⋅ + 𝑢𝑚 𝑒𝑚 .

Suppose 𝛽, 𝛽 ′ are bases for 𝑉 which define local parametrizations 𝜙𝛽 , 𝜙𝛽 ′ respective. The transition
functions 𝜃 : ℝ𝑚 → ℝ𝑚 are given by
𝜃 = 𝜙−1
𝛽
∘𝜙 ′
𝛽
196 CHAPTER 8. MANIFOLD THEORY

Note 𝜃 is the composition of linear mappings and is therefore a linear mapping on ℝ𝑚 . It follows
that 𝜃(𝑥) = 𝑃 𝑥 for some 𝑀 ∈ 𝐺𝐿(𝑚) = {𝑋 ∈ ℝ 𝑚×𝑚 ∣ 𝑑𝑒𝑡(𝑋) ∕= 0}. It follows that 𝜃 is a smooth
mapping since each component function of 𝜃 is simply a linear combination of the variables in ℝ𝑚 .
Let’s take a moment to connect with linear algebra notation. If 𝜃 = 𝜙−1 𝛽 𝛽
−1
∘ 𝜙 ′ then 𝜃 ∘ 𝜙 ′ = 𝜙
𝛽
−1
𝛽
hence 𝜃 ∘ Φ𝛽 ′ = Φ𝛽 as we used Φ𝛽 : 𝑉 → ℝ𝑚 as the coordinate chart in linear algebra and 𝜙−1 𝛽 = Φ 𝛽.
Thus, 𝜃 Φ𝛽 ′ (𝑣) = Φ𝛽 (𝑣) implies 𝑃 [𝑣]𝛽 ′ = [𝑣]𝛽 . This matrix 𝑃 is the coordinate change matrix from

linear algebra.

The contrast of Examples 8.1.3 and 8.1.6 stems in the allowed coordinate systems. In Example
8.1.3 we had just one coordinate system whereas in Example 8.1.6 we allowed inifinitely many. We
could construct other manifolds over the set 𝑉 . We could take all coordinate systems that are of a
particular type. If 𝑉 = ℝ𝑚 then it is often interesting to consider only those coordinate systems for
which the Pythagorean theorem holds true, such coordinates have transition functions in the group
of orthogonal transformations. Or, if 𝑉 = ℝ4 then we might want to consider only inertially related
coordinates. Inertially related coordinates on ℝ4 preserve the interval defined by the Minkowski
product and the transition functions form a group of Lorentz transformations. Orthogonal matrices
and Lorentz matrices are simply the matrices of the aptly named transformations. In my opinion
this is one nice feature of saving the maximal atlas concept for the differentiable structure. Manifolds
as we have defined them give us a natural mathematical context to restrict the choice of coordinates.
From the viewpoint of physics, the maximal atlas contains many coordinate systems which are
unnatural for physics. Of course, it is possible to take a given theory of physics and translate
physically natural equations into less natural equations in non-standard coordinates. For example,
look up how Newton’s simple equation 𝐹⃗ = 𝑚⃗𝑎 is translated into rotating coordinate systems.
8.1. MANIFOLDS 197

8.1.1 embedded manifolds


The manifolds in Examples 8.1.2, 8.1.3 and 8.1.5 were all defined as subsets of euclidean space.
Generally, if a manifold is a subset of Euclidean space ℝ𝑛 then we say the manifold is embedded
in ℝ𝑛 . In contrast, Examples 8.1.4 and 8.1.6 are called abstract manifolds since the points in the
manifold were not found in Euclidean space2 . If you are only interested in embedded manifolds3
then the definition is less abstract:

Definition 8.1.7. embedded manifold.

We say ℳ is a smooth embedded manifold of dimension 𝑚 iff we are given a set


ℳ ⊆ ℝ𝑛 such that at each 𝑝 ∈ ℳ there is a set an open subsets 𝑈𝑖 of ℝ𝑚 and open subsets
𝑉𝑖 of ℳ containing 𝑝 such that the mapping 𝜙𝑖 : 𝑈𝑖 ⊆ ℝ𝑚 → 𝑉𝑖 ⊆ ℳ satisfies the following
criteria:

1. each map 𝜙𝑖 : 𝑈𝑖 → 𝑉𝑖 is injective

2. each map 𝜙𝑖 : 𝑈𝑖 → 𝑉𝑖 is smooth

3. each map 𝜙−1


𝑖 : 𝑉𝑖 → 𝑈𝑖 is continuous

4. the differential 𝑑𝜙𝑥 has rank 𝑚 for each 𝑥 ∈ 𝑈𝑖 .

You may identify that this definition more closely resembles the parametrized objects from your
multivariate calculus course. There are two key differences with this definition:

1. the set 𝑉𝑖 is assumed to be ”open in ℳ” where ℳ ⊆ ℝ𝑛 . This means that for each point 𝑝 ∈ 𝑉𝑖
there exists and open 𝑛-ball 𝐵 ⊂ ℝ𝑛 such that 𝐵 ∩ℳ contains 𝑝. This is called the subspace
topology for ℳ induced from the euclidean topology of ℝ𝑛 . No topological assumptions were
given for 𝑉𝑖 in the abstract definition. In practice, for the abstract case, we use the charts
to lift open sets to ℳ, we need not assume any topology on ℳ since the machinery of the
manifold allows us to build our own. However, this can lead to some pathological cases so
those cases are usually ruled out by stating that our manifold is Hausdorff and the covering
has a countable basis of open sets4 . I will leave it at that since this is not a topology course.

2. the condition that the inverse of the local parametrization be continuous and 𝜙𝑖 be smooth
were not present in the abstract definition. Instead, we assumed smoothness of the transition
functions.

One can prove that the embedded manifold of Defintition 8.1.7 is simply a subcase of the abstract
manifold given by Definition 8.1.1. See Munkres Theorem 24.1 where he shows the transition
2
a vector space could be euclidean space, but it could also be a set of polynomials, operators or a lot of other
rather abstract objects.
3
The defition I gave for embedded manifold here is mostly borrowed from Munkres’ excellent text Analysis on
Manifolds where he primarily analyzes embedded manifolds
4
see Burns and Gidea page 11 in Differential Geometry and Topology With a View to Dynamical Systems
198 CHAPTER 8. MANIFOLD THEORY

functions of an embedded manifold are smooth. In fact, his theorem is given for the case of a
manifold with boundary which adds a few complications to the discussion. We’ll discuss manifolds
with boundary at the conclusion of this chapter.

Example 8.1.8. A line is a one dimensional manifold with a global coordinate patch:

𝜙(𝑡) = ⃗𝑟𝑜 + 𝑡⃗𝑣

for all 𝑡 ∈ ℝ. We can think of this as the mapping which takes the real line and glues it in ℝ𝑛 along
some line which points in the direction ⃗𝑣 and the new origin is at ⃗𝑟𝑜 . In this case 𝜙 : ℝ → ℝ𝑛 and
𝑑𝜙𝑡 has matrix ⃗𝑣 which has rank one iff ⃗𝑣 ∕= 0.

Example 8.1.9. A plane is a two dimensional manifold with a global coordinate patch: suppose
⃗ 𝐵
𝐴, ⃗ are any two linearly independent vectors in the plane, and ⃗𝑟𝑜 is a particular point in the plane,

⃗ + 𝑣𝐵
𝜙(𝑢, 𝑣) = ⃗𝑟𝑜 + 𝑢𝐴 ⃗

for all (𝑢, 𝑣) ∈ ℝ2 . This amounts to pasting a copy of the 𝑥𝑦-plane in ℝ𝑛 where we moved the
origin to ⃗𝑟𝑜 . If we just wanted a little paralellogram then we could restrict (𝑢, 𝑣) ∈ [0, 1] × [0, 1],
then we would envision that the unit-square has been pasted on to a paralellogram. Lengths and
angles need not be maintained in this process of gluing. Note that the rank two condition for 𝑑𝜙 says
the derivative 𝜙′ (𝑢, 𝑣) = [ ∂𝜙 ∂𝜙 ⃗ ⃗
∂𝑢 ∣ ∂𝑣 ] = [𝐴∣𝐵] must have rank two. But, this amounts to insisting the
⃗ 𝐵
vectors 𝐴, ⃗ are linearly independent. In the case of ℝ3 this is conveniently tested by computation
⃗×𝐵
of 𝐴 ⃗ which happens to be the normal to the plane.

Example 8.1.10. A cone is almost a manifold, define

𝜙(𝑡, 𝑧) = (𝑧 cos(𝑡), 𝑧 sin(𝑡), 𝑧)

for 𝑡 ∈ [0, 2𝜋] and 𝑧 ≥ 0. What two problems does this potential coordinate patch 𝜙 : 𝑈 ⊆ ℝ2 → ℝ3
suffer from? Can you find a modification of 𝑈 which makes 𝜙(𝑈 ) a manifold (it could be a subset
of what we call a cone)

The cone is not a manifold because of its point. Generally a space which is mostly like a manifold
except at a finite, or discrete, number of singular points is called an orbifold. Recently, in the
past decade or two, the orbifold has been used in string theory. The singularities can be used to
fit various charge to fields through a mathematical process called the blow-up.

Example 8.1.11. Let 𝜙(𝜃, 𝛾) = (cos(𝜃) cosh(𝛾), sin(𝜃) cosh(𝛾), sinh(𝛾)) for 𝜃 ∈ (0, 2𝜋) and 𝛾 ∈ ℝ.
This gives us a patch on the hyperboloid 𝑥2 + 𝑦 2 − 𝑧 2 = 1

Example 8.1.12. Let 𝜙(𝑥, 𝑦, 𝑧, 𝑡) = (𝑥, 𝑦, 𝑧, 𝑅 cos(𝑡), 𝑅 sin(𝑡)) for 𝑡 ∈ (0, 2𝜋) and (𝑥, 𝑦, 𝑧) ∈ ℝ3 .
This gives a copy of ℝ3 inside ℝ5 where a circle has been attached at each point of space in the two
transverse directions of ℝ5 . You could imagine that 𝑅 is nearly zero so we cannot traverse these
extra dimensions.
8.1. MANIFOLDS 199

Example 8.1.13. The following patch describes the Mobius band which is obtained by gluing a
line segment to each point along a circle. However, these segments twist as you go around the circle
and the structure of this manifold is less trivial than those we have thus far considered. The mobius
band is an example of a manifold which is not oriented. This means that there is not a well-defined
normal vectorfield over the surface. The patch is:
( )
1 𝑡 1 𝑡 1 𝑡
[ ] [ ]
𝜙(𝑡, 𝜆) = 1 + 2 𝜆 cos( 2 ) cos(𝑡), 1 + 2 𝜆 sin( 2 ) sin(𝑡), 2 𝜆 sin( 2 )

for 0 ≤ 𝑡 ≤ 2𝜋 and −1 ≤ 𝜆 ≤ 1. To understand this mapping better try studying the map evaluated
at various values of 𝑡;

𝜙(0, 𝜆) = (1 + 𝜆/2, 0, 0), 𝜙(𝜋, 𝜆) = (−1, 0, 𝜆/2), 𝜙(2𝜋, 𝜆) = (1 − 𝜆/2, 0, 0)

Notice the line segment parametrized by 𝜙(0, 𝜆) and 𝜙(2𝜋, 𝜆) is the same set of points, however the
orientation is reversed.

Example 8.1.14. A regular surface is a two-dimensional manifold embedded in ℝ3 . We need


𝜙𝑖 : 𝑈𝑖 ⊆ ℝ2 → 𝑆 ⊂ ℝ3 such that, for each 𝑖, 𝑑𝜙𝑖 𝑢,𝑣 has rank two for all (𝑢, 𝑣) ∈ 𝑈𝑖 . Moreover, in
this case we can define a normal vector field 𝑁 (𝑢, 𝑣) = ∂𝑢 𝜙 × ∂𝑣 𝜙 and if we visualize these vectors
as attached to the surface they will point in or out of the surface and provide the normal to the
tangent plane at the point considered. The surface 𝑆 is called orientable iff the normal vector field
is non-vanishing on 𝑆. XXX, clean up normal idea

Example 8.1.15. Graphs

Example 8.1.16. (𝑛 − 𝑝)-dimensional level set.

8.1.2 manifolds defined by charts


Given patches which define a manifold you can derive charts, conversely, given properly constructed
charts you can just as well find patches and hence the manifold structure. I include these examples
to illustrate why charts are a nice starting point for certain examples. Recall that if 𝜙 : 𝑈 ⊆ ℝ𝑚 →
𝑉 ⊆ 𝑀 is a local parametrization then 𝜙−1 : 𝑉 → 𝑈 is called a local coordinate chart. We seek
to translate the definition which we gave in patch-notation into chart notation for this subsection.
Throughout this subsection we denote 𝜙−1 𝑖 = 𝜒𝑖 then we must insist, for all 𝑖, 𝜒𝑖 : 𝑉𝑖 → 𝑈𝑖 where
𝑈𝑖 is open in ℝ𝑚 satisfies

1. each map 𝜒𝑖 : 𝑉𝑖 → 𝑈𝑖 is bijective

2. if 𝑉𝑖 ∩ 𝑉𝑗 ∕= ∅ then there exists a smooth mapping

𝜃𝑖𝑗 : 𝜒𝑗 (𝑉𝑖 ∩ 𝑉𝑗 ) → 𝜒𝑖 (𝑉𝑖 ∩ 𝑉𝑗 )

such that 𝜃𝑖𝑗 = 𝜒𝑖 ∘ 𝜒−1


𝑗 .

3. ℳ = ∪𝑖 𝑉𝑖
200 CHAPTER 8. MANIFOLD THEORY

For convenience of discussion we suppose the local parametrizations are also bijective. There is
no loss of generality since we can always make an injective map bijective by simply shrinking the
codomain. The original definition only assumed injectivity since that was sufficient. Now that we
talk about inverse mappings it is convenient to add the supposition of surjectvity.

Example 8.1.17. Let ℳ = {(𝑥, 𝑦) ∈ ℝ2 ∣ 𝑥2 + 𝑦 2 = 1}.

1. Let 𝑉+ = {(𝑥, 𝑦) ∈ ℳ ∣ 𝑦 > 0} = 𝑑𝑜𝑚(𝜒+ ) and define 𝜒+ (𝑥, 𝑦) = 𝑥

2. Let 𝑉− = {(𝑥, 𝑦) ∈ ℳ ∣ 𝑦 < 0} = 𝑑𝑜𝑚(𝜒− ) and define 𝜒− (𝑥, 𝑦) = 𝑥

3. Let 𝑉𝑅 = {(𝑥, 𝑦) ∈ ℳ ∣ 𝑥 > 0} = 𝑑𝑜𝑚(𝜒𝑅 ) and define 𝜒𝑅 (𝑥, 𝑦) = 𝑦

4. Let 𝑉𝐿 = {(𝑥, 𝑦) ∈ ℳ ∣ 𝑥 < 0} = 𝑑𝑜𝑚(𝜒𝐿 ) and define 𝜒𝐿 (𝑥, 𝑦) = 𝑦

The set of charts 𝒜 = {(𝑉+ , 𝜒+ ), (𝑉− , 𝜒− ), (𝑉𝑅 , 𝜒𝑅 ), (𝑉𝐿 , 𝜒𝐿 )} forms an atlas on ℳ which gives
the circle a differentiable structure5 . It is not hard to show the transition functions are smooth on
the image of the intersection of their respect domains.√For example, 𝑉+ ∩ 𝑉𝑅 = 𝑊+𝑅 = {(𝑥, 𝑦) ∈
ℳ ∣ 𝑥, 𝑦 > 0}, it’s easy to calculate that 𝜒−1 2
+ (𝑥) = (𝑥, 1 − 𝑥 ) hence
√ √
(𝜒𝑅 ∘ 𝜒−1
+ )(𝑥) = 𝜒𝑅 (𝑥, 1 − 𝑥2 ) = 1 − 𝑥2

for each 𝑥 ∈ 𝜒𝑅 (𝑊+𝑅 ). Note 𝑥 ∈ 𝜒𝑅 (𝑊+𝑅 ) implies 0 < 𝑥 < 1 hence it is clear the transition
function is smooth. Similar calculations hold for all the other overlapping charts. This manifold is
usually denoted ℳ = 𝑆1 .

A cylinder is the Cartesian product of a line and a circle. In other words, we can create a cylinder
by gluing a copy of a circle at each point along a line. If all these copies line up and don’t twist
around then we get a cylinder. The example that follows here illustrates a more general pattern,
we can take a given manifold an paste a copy at each point along another manifold by using a
Cartesian product.

Example 8.1.18. Let 𝒫 = {(𝑥, 𝑦, 𝑧) ∈ ℝ3 ∣ 𝑥2 + 𝑦 2 = 1}.

1. Let 𝑉+ = {(𝑥, 𝑦, 𝑧) ∈ 𝒫 ∣ 𝑦 > 0} = 𝑑𝑜𝑚(𝜒+ ) and define 𝜒+ (𝑥, 𝑦, 𝑧) = (𝑥, 𝑧)

2. Let 𝑉− = {(𝑥, 𝑦, 𝑧) ∈ 𝒫 ∣ 𝑦 < 0} = 𝑑𝑜𝑚(𝜒− ) and define 𝜒− (𝑥, 𝑦, 𝑧) = (𝑥, 𝑧)

3. Let 𝑉𝑅 = {(𝑥, 𝑦, 𝑧) ∈ 𝒫 ∣ 𝑥 > 0} = 𝑑𝑜𝑚(𝜒𝑅 ) and define 𝜒𝑅 (𝑥, 𝑦, 𝑧) = (𝑦, 𝑧)

4. Let 𝑉𝐿 = {(𝑥, 𝑦, 𝑧) ∈ 𝒫 ∣ 𝑥 < 0} = 𝑑𝑜𝑚(𝜒𝐿 ) and define 𝜒𝐿 (𝑥, 𝑦, 𝑧) = (𝑦, 𝑧)

The set of charts 𝒜 = {(𝑉+ , 𝜒+ ), (𝑉− , 𝜒− ), (𝑉𝑅 , 𝜒𝑅 ), (𝑉𝐿 , 𝜒𝐿 )} forms an atlas on 𝒫 which gives the
cylinder a differentiable structure. It is not hard to show the transition functions are smooth on the
5
meaning that if we adjoin the infinity of likewise compatible charts that defines a differentiable structure on ℳ
8.1. MANIFOLDS 201

image of the intersection of their respective domains. For


−1
√ example, 𝑉+ ∩ 𝑉𝑅 = 𝑊+𝑅 = {(𝑥, 𝑦, 𝑧) ∈
𝒫 ∣ 𝑥, 𝑦 > 0}, it’s easy to calculate that 𝜒+ (𝑥, 𝑧) = (𝑥, 1 − 𝑥2 , 𝑧) hence
√ √
(𝜒𝑅 ∘ 𝜒−1
+ )(𝑥, 𝑧) = 𝜒𝑅 (𝑥, 1 − 𝑥2 , 𝑧) = ( 1 − 𝑥2 , 𝑧)

for each (𝑥, 𝑧) ∈ 𝜒𝑅 (𝑊+𝑅 ). Note (𝑥, 𝑧) ∈ 𝜒𝑅 (𝑊+𝑅 ) implies 0 < 𝑥 < 1 hence it is clear the
transition function is smooth. Similar calculations hold for all the other overlapping charts.

Generally, given two manifolds ℳ and 𝒩 we can construct ℳ×𝒩 by taking the Cartesian product
of the charts. Suppose 𝜒ℳ : 𝑉 ⊆ ℳ → 𝑈 ⊆ ℝ𝑚 and 𝜒𝒩 : 𝑉 ′ ⊆ 𝒩 → 𝑈 ′ ⊆ ℝ𝑛 then you can define
the product chart 𝜒 : 𝑉 × 𝑉 ′ → 𝑈 × 𝑈 ′ as 𝜒 = 𝜒ℳ × 𝜒𝒩 . The Cartesian product ℳ × 𝒩 together
with all such product charts naturally is given the structure of an (𝑚 + 𝑛)-dimensional manifold.
For example, in the preceding example we took ℳ = 𝑆1 and 𝒩 = ℝ to consruct 𝒫 = 𝑆1 × ℝ.

Example 8.1.19. The 2-torus, or donut, is constructed as 𝑇2 = 𝑆1 ×𝑆1 . The 𝑛-torus is constructed
by taking the product of 𝑛-circles:

𝑇𝑛 = 𝑆1 × 𝑆1 × ⋅ ⋅ ⋅ × 𝑆1
| {z }
𝑛 𝑐𝑜𝑝𝑖𝑒𝑠

The atlas on this space can be obtained by simply taking the product of the 𝑆1 charts 𝑛-times.

One of the surprising discoveries in manifold theory is that a particular set of points may have many
different possible differentiable structures. This is why mathematicians often say a manifold is a
set together with a maximal atlas. For example, higher-dimensional spheres (𝑆7 , 𝑆8 , ...) have more
than one differentiable structure. In contrast, 𝑆𝑛 for 𝑛 ≤ 6 has just one differentiable structure.
XXX-add reference.

The most familar example of a manifold is just ℝ2 or ℝ3 itself. One may ask which coordinates
are in the atlas which contains the standard Cartesian coordinate chart. The most commonly used
charts other than Cartesian would probably be the spherical and cylindrical coordinate systems for
ℝ3 or the polar coordinate system for ℝ2 . Technically, certain restrictions must be made on the
domain of these non-Cartesian coordinates if we are to correctly label them ”coordinate charts”.
Interestingly, applications are greedier than manifold theorists, we do need to include those points
in ℝ𝑛 which spoil the injectivity of spherical or cylindrical coordinates. On the other hand, those
bad points are just the origin and a ray of points which do not contribute noticable in the calcula-
tion of a surface or volume integral.

I will not attempt to make explicit the domain of the coordinate charts in the following two examples
( you might find them in a homework):

Example 8.1.20. Define 𝜒𝑠𝑝ℎ𝑒𝑟𝑖𝑐𝑎𝑙 (𝑥, 𝑦, 𝑧) = (𝑟, 𝜃, 𝜙) implicitly by the coordinate transformations

𝑥 = 𝑟 cos(𝜃) sin(𝜙), 𝑦 = 𝑟 sin(𝜃) sin(𝜙), 𝑧 = 𝑟 cos(𝜙)


202 CHAPTER 8. MANIFOLD THEORY

These can be inverted,


[ ] [ ]
−1 𝑦 𝑧

2 2 2 −1
𝑟 = 𝑥 + 𝑦 + 𝑧 , 𝜃 = tan , 𝜙 = cos √
𝑥 𝑥2 + 𝑦 2 + 𝑧 2
To show compatibility with the standard Cartesian coordinates we would need to select a subset of ℝ3
for which 𝜒𝑠𝑝ℎ𝑒𝑟𝑖𝑐𝑎𝑙 is 1-1 and the since 𝜒𝐶𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛 = 𝐼𝑑 the transition functions are just 𝜒−1
𝑠𝑝ℎ𝑒𝑟𝑖𝑐𝑎𝑙 .
Example 8.1.21. Define 𝜒𝑐𝑦𝑙𝑖𝑛𝑑𝑟𝑖𝑐𝑎𝑙 (𝑥, 𝑦, 𝑧) = (𝑠, 𝜃, 𝑧) implicitly by the coordinate transformations
𝑥 = 𝑠 cos(𝜃), 𝑦 = 𝑠 sin(𝜃), 𝑧 = 𝑧
These can be inverted, [ ]
√ 𝑦
𝑠= 𝑥2 + 𝑦 2 , 𝜃 = tan−1 , 𝑧=𝑧
𝑥
You can take 𝑑𝑜𝑚(𝜒𝑐𝑦𝑙𝑖𝑛𝑑𝑟𝑖𝑐𝑎𝑙 ) = {(𝑥, 𝑦, 𝑧) ∣0 < 𝜃 < 2𝜋, } − {(0, 0, 0)}

Remark 8.1.22.
I would encourage you to read Burns and Gidea and/or Munkres for future study. You’ll
find much more material, motivation and depth in those texts and if you were interested in
an independent study after you’ve completed real analysis feel free to ask.

8.1.3 diffeomorphism
At the outset of this study I emphasized that the purpose of a manifold was to give a natural
languague for calculus on curved spaces. This definition begins to expose how this is accomplished.
Definition 8.1.23. smoothness on manifolds.
Suppose ℳ and 𝒩 are smooth manifolds and 𝑓 : ℳ → 𝒩 is a function then we say 𝑓 is
smooth iff for each 𝑝 ∈ ℳ there exists local parametrizations 𝜙𝑀 : 𝑈𝑀 ⊆ ℝ𝑚 → 𝑉𝑀 ⊆ ℳ
and 𝜙𝑁 : 𝑈𝑁 ⊆ ℝ𝑛 → 𝑉𝑁 ⊆ 𝒩 such that 𝑝 ∈ 𝑈𝑀 and 𝜙−1 𝑁
∘𝑓 ∘𝜙
𝑀 is a smooth mapping
𝑚 𝑛
from ℝ to ℝ . If 𝑓 : ℳ → 𝒩 is a smooth bijection then we say 𝑓 is a diffeomorphism.
Moreover, if 𝑓 is a diffeomorphism then we say ℳ and 𝒩 are diffeomorphic.
In other words, 𝑓 is smooth iff its local coordinate representative is smooth. It suffices to check one
representative since any other will be related by transition functions which are smooth: suppose
we have patches 𝜙¯𝑀 : 𝑈¯𝑀 ⊆ ℝ𝑚 → 𝑉¯𝑀 ⊆ ℳ and 𝜙¯𝑁 : 𝑈 ¯𝑁 ⊆ ℝ𝑛 → 𝑉¯𝑁 ⊆ 𝒩 such that 𝑝 ∈ 𝑈 ¯𝑀 ,

𝜙−1 ∘𝑓 ∘𝜙
−1 ¯
𝑀 = 𝜙𝑁 ∘ 𝜙𝑁 ∘ 𝜙¯−1 ¯𝑀
∘𝑓 ∘𝜙 ∘ 𝜙¯−1 ∘𝜙
𝑀
| 𝑁 {z } | {z } | 𝑁 {z } | 𝑀 {z }
𝑙𝑜𝑐𝑎𝑙 𝑟𝑒𝑝. 𝑜𝑓 𝑓 𝑡𝑟𝑎𝑛𝑠. 𝑓 𝑛𝑐𝑡. 𝑙𝑜𝑐𝑎𝑙 𝑟𝑒𝑝. 𝑜𝑓 𝑓 𝑡𝑟𝑎𝑛𝑠. 𝑓 𝑛𝑐𝑡.

follows from the chain rule for mappings. This formula shows that if 𝑓 is smooth with respect to a
particular pair of coordinates then its representative will likewise be smooth for any other pair of
compatible patches.
8.1. MANIFOLDS 203

Example 8.1.24. Recall in Example 8.1.3 we studied ℳ = {𝑝𝑜 } × ℝ𝑚 . Recall we have one
parametrization 𝜙 : ℝ𝑚 → ℳ which is defined by 𝜙(𝑢) = 𝑝𝑜 × 𝑢. Clearly 𝜙−1 (𝑝𝑜 , 𝑢) = 𝑢 for all
(𝑝𝑜 , 𝑢) ∈ ℳ. Let ℝ𝑚 have Cartesian coordinates so the identity map is the patch for ℝ𝑚 . Consider
the function 𝑓 = 𝜙 : ℝ𝑚 → ℳ, we have only the local coordinate representative 𝜙−1 ∘ 𝑓 ∘ 𝐼𝑑 to
consider. Let 𝑥 ∈ ℝ𝑚 ,
𝜙−1 ∘ 𝑓 ∘ 𝐼𝑑 = 𝜙−1 ∘ 𝜙 ∘ 𝐼𝑑 = 𝐼𝑑.
Hence, 𝜙 is a smooth bijection from ℝ𝑚 to ℳ and we find ℳ is diffeomorphic to ℝ𝑚

The preceding example naturally generalizes to an arbitrary coordinate chart. Suppose ℳ is a


manifold and 𝜙−1 : 𝑉 → ℝ𝑚 is a coordinate chart around the point 𝑝 ∈ ℳ. We argue that 𝜙−1 is a
diffeomorphism. Once more take the Cartesian coordinate system for ℝ𝑚 and suppose 𝜙¯ : 𝑈¯ → 𝑉¯
¯ −1
is a local parametrization with 𝑝 ∈ 𝑉 . The local coordinate representative of 𝜙 is simply the
transition function since:
𝜙¯ ∘ 𝜙−1 ∘ 𝐼𝑑 = 𝜙¯ ∘ 𝜙−1 .
We find 𝜙−1 is smooth on 𝑉 ∩ 𝑉¯ . It follows that 𝜙−1 is a diffeomorphism since we know transition
functions are smooth on a manifold. We arrive at the following characterization of a manifold: a
manifold is a space which is locally diffeomorphic to ℝ𝑚 .

However, just because a manifold is locally diffeomorphic to ℝ𝑚 that does not mean it is actually
diffeomorphic to ℝ𝑛 . For example, it is a well-known fact that there does not exist a smooth
bijection between the 2-sphere and ℝ2 . The curvature of a manifold gives an obstruction to making
such a mapping.
204 CHAPTER 8. MANIFOLD THEORY

8.2 tangent space


Since a manifold is generally an abstract object we would like to give a definition for the tangent
space which is not directly based on the traditional geometric meaning. On the other hand, we
should expect that the definition which is given in the abstract reduces to the usual geometric
meaning for the context of an embedded manifold. It turns out there are three common viewpoints.

1. a tangent vector is an equivalence class of curves.

2. a tangent vector is a contravariant vector.

3. a tangent vector is a derivation.

I will explain each case and we will find explicit isomorphisms between each language. We assume
that ℳ is an 𝑚-dimensional smooth manifold throughout this section.

8.2.1 equivalence classes of curves


I essentially used case (1.) as the definition for the tangent space of a level-set. Suppose 𝛾 :
𝐼 ⊆ ℝ → ℳ is a smooth curve with 𝛾(0) = 𝑝. In this context, this means that all the local
coordinate representatives of 𝛾 are smooth curves on ℝ𝑚 ; that is, for every parametrization 𝜙 :
𝑈 ⊆ ℝ𝑚 → 𝑉 ⊆ ℳ containing 𝑝 ∈ 𝑉 the mapping6 𝜙−1 ∘ 𝛾 is a smooth mapping from ℝ to
ℝ𝑚 . Given the coordinate system defined by 𝜙−1 , we define two smooth curves 𝛾1 , 𝛾2 on ℳ with
𝛾1 (0) = 𝛾2 (0) = 𝑝 ∈ ℳ to be similar at 𝑝 iff (𝜙−1 ∘ 𝛾1 )′ (0) = (𝜙−1 ∘ 𝛾2 )′ (0). If smooth curves
𝛾1 , 𝛾2 are similar at 𝑝 ∈ ℳ then we denote this by writing 𝛾1 ∼𝑝 𝛾2 . We insist the curves be
parametrized such that they reach the point of interest at the parameter 𝑡 = 0, this is not a severe
restriction since we can always reparametrize a given curve which reaches 𝑝 at 𝑡 = 𝑡𝑜 by replacing
the parameter with 𝑡 − 𝑡𝑜 . Observe that ∼𝑝 defines an equivalence relation on the set of smooth
curves through 𝑝 which reach 𝑝 at 𝑡 = 0 in their domain.

(i) reflexive: 𝛾 ∼𝑝 𝛾 iff 𝛾(0) = 𝑝 and (𝜙−1 ∘ 𝛾)′ (0) = (𝜙−1 ∘ 𝛾)′ (0). If 𝛾 is a smooth curve on ℳ
with 𝛾(0) = 𝑝 then clearly the reflexive property holds true.

(ii) symmetric: Suppose 𝛾1 ∼𝑝 𝛾2 then (𝜙−1 ∘ 𝛾1 )′ (0) = (𝜙−1 ∘ 𝛾2 )′ (0) hence (𝜙−1 ∘ 𝛾2 )′ (0) =
(𝜙−1 ∘ 𝛾1 )′ (0) and we find 𝛾2 ∼𝑝 𝛾1 thus ∼𝑝 is a symmetric relation.

(iii) transitive: if 𝛾1 ∼𝑝 𝛾2 and 𝛾2 ∼𝑝 𝛾3 then (𝜙−1 ∘ 𝛾1 )′ (0) = (𝜙−1 ∘ 𝛾2 )′ (0) and (𝜙−1 ∘ 𝛾2 )′ (0) =
(𝜙−1 ∘ 𝛾3 )′ (0) thus (𝜙−1 ∘ 𝛾1 )′ (0) = (𝜙−1 ∘ 𝛾3 )′ (0) which shows 𝛾1 ∼𝑝 𝛾3 .

The equivalence classes of ∼𝑝 partition the set of smooth curves with 𝛾(0) = 𝑝. Each equivalence
class 𝛾˜ = {𝛽 : 𝐼 ⊆ ℝ → ℳ ∣ 𝛽 ∼𝑝 𝛾} corresponds uniquely to a particular vector (𝜙−1 ∘ 𝛾)′ (0) in
6
Note, we may have to restrict the domain of 𝜙−1 ∘ 𝛾 such that the image of 𝛾 falls inside 𝑉 , keep in mind this
poses no threat to the construction since we only consider the derivative of the curve at zero in the final construction.
That said, keep in mind as we construct composites in this section we always suppose the domain of a curve includes
some nbhd. of zero. We need this assumption in order that the derivative at zero exist.
8.2. TANGENT SPACE 205

ℝ𝑚 . Conversely, given 𝑣 = (𝑣 1 , 𝑣 2 , . . . , 𝑣 𝑚 ) ∈ ℝ𝑚 we can write the equation for a line in ℝ𝑚 with


direction 𝑣 and base-point 𝜙−1 (𝑝):
⃗𝑟(𝑡) = 𝜙−1 (𝑝) + 𝑡𝑣.

We compose ⃗𝑟 with 𝜙 to obtain a smooth curve through 𝑝 ∈ ℳ which corresponds to the vector 𝑣.
In invite the reader to verify that 𝛾 = 𝜙 ∘ ⃗𝑟 has

(1.) 𝛾(0) = 𝑝 (2.) (𝜙−1 ∘ 𝛾)′ (0) = 𝑣.

Notice that the correspondence is made between a vector in ℝ𝑚 and a whole family of curves.
There are naturally many curves that share the same tangent vector to a given point.

Moreover, we show these equivalence classes are not coordinate dependent. Suppose 𝛾 ∼𝑝 𝛽 rel-
ative to the chart 𝜙−1 : 𝑉 → 𝑈 , with 𝑝 ∈ 𝑉 . In particular, we suppose 𝛾(0) = 𝛽(0) = 𝑝 and
(𝜙−1 ∘ 𝛾)′ (0) = (𝜙−1 ∘ 𝛽)′ (0). Let 𝜙¯−1 : 𝑉¯ → 𝑈 ¯ , with 𝑝 ∈ 𝑉¯ , we seek to show 𝛾 ∼𝑝 𝛽 relative to the
¯−1
chart 𝜙 . Note that 𝜙 ¯−1 ∘ 𝛾=𝜙 ¯−1 ∘ 𝜙 𝜙
∘ −1 ∘ 𝛾 hence, by the chain rule,

(𝜙¯−1 ∘ 𝛾)′ (0) = (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝))(𝜙−1 ∘ 𝛾)′ (0)

Likewise, (𝜙¯−1 ∘ 𝛽)′ (0) = (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝))(𝜙−1 ∘ 𝛽)′ (0). Recognize that (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝)) is an in-
vertible matrix since it is the derivative of the invertible transition functions, label (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝)) =
𝑃 to obtain:

(𝜙¯−1 ∘ 𝛾)′ (0) = 𝑃 (𝜙−1 ∘ 𝛾)′ (0) and (𝜙¯−1 ∘ 𝛽)′ (0) = 𝑃 (𝜙−1 ∘ 𝛽)′ (0)

the equality (𝜙¯−1 ∘ 𝛾)′ (0) = (𝜙¯−1 ∘ 𝛽)′ (0) follows and this shows that 𝛾 ∼𝑝 𝛽 relative to the 𝜙¯ coordi-
nate chart. We find that the equivalence classes of curves are independent of the coordinate system.

With the analysis above in mind we define addition and scalar multiplication of equivalence classes
of curves as follows: given a coordinate chart 𝜙−1 : 𝑉 → 𝑈 with 𝑝 ∈ 𝑉 , equivalence classes 𝛾˜1 , 𝛾˜2
at 𝑝 and 𝑐 ∈ ℝ𝑚 , if 𝛾˜1 has (𝜙−1 ∘ 𝛾1 )′ (0) = 𝑣1 in ℝ𝑚 and 𝛾˜2 has (𝜙−1 ∘ 𝛾2 )′ (0) = 𝑣2 in ℝ𝑚 then we
define

˜ where (𝜙−1 ∘ 𝛼)′ (0) = 𝑣1 + 𝑣2


(i) 𝛾˜1 + 𝛾˜2 = 𝛼

(ii) 𝑐𝛾˜1 = 𝛽˜ where (𝜙−1 ∘ 𝛽)′ (0) = 𝑐𝑣1 .

We know 𝛼 and 𝛽 exist because we can simply push the lines in ℝ𝑚 based at 𝜙−1 (𝑝) with directions
𝑣1 + 𝑣2 and 𝑐𝑣1 up to ℳ to obtain the desired curve and hence the required equivalence class.
Moreover, we know this construction is coordinate independent since the equivalence classes are
indpendent of coordinates.

Definition 8.2.1.
206 CHAPTER 8. MANIFOLD THEORY

Suppose ℳ is an 𝑚-dimensional smooth manifold. We define the tangent space at 𝑝 ∈ ℳ


to be the set of ∼𝑝 -equivalence classes of curves. In particular, denote:

𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ = {˜
𝛾 ∣ 𝛾 smooth and 𝛾(0) = 𝑝}

Keep in mind this is just one of three equivalent definitions which are commonly implemented.

8.2.2 contravariant vectors


If (𝜙¯−1 ∘ 𝛾)′ (0) = 𝑣¯ and (𝜙−1 ∘ 𝛾)′ (0) = 𝑣 then 𝑣¯ = 𝑃 𝑣 where 𝑃 = (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝)). With this in
mind we could use the pair (𝑝, 𝑣) or (𝑝, 𝑣¯) to describe a tangent vector at 𝑝. The cost of using (𝑝, 𝑣)
is it brings in questions of coordinate dependence.

The equivalence class viewpoint is at times quite useful, but the definition of vector offered here is
a bit easier in certain respects. In particular, relative to a particular coordinate chart 𝜙−1 : 𝑉 → 𝑈 ,
with 𝑝 ∈ 𝑉 , we define (temporary notation)

𝑣𝑒𝑐𝑡𝑇𝑝 ℳ = {(𝑝, 𝑣) ∣ 𝑣 ∈ ℝ𝑚 }

Vectors are added and scalar multiplied in the obvious way:

(𝑝, 𝑣1 ) + (𝑝, 𝑣2 ) = (𝑝, 𝑣1 + 𝑣2 ) and 𝑐(𝑝, 𝑣1 ) = (𝑝, 𝑐𝑣1 )

for all (𝑝, 𝑣1 , (𝑝, 𝑣2 ) ∈ 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ and 𝑐 ∈ ℝ. Moreover, if we change from the 𝜙−1 chart to the 𝜙¯−1 co-
ordinate chart then the vector changes form as indicated in the previous subsection; (𝑝, 𝑣) → (𝑝, 𝑣¯)
where 𝑣¯ = 𝑃 𝑣 and 𝑃 = (𝜙¯−1 ∘ 𝜙)′ (𝜙−1 (𝑝)). The components of (𝑝, 𝑣) are said to transform
contravariantly.

Technically, this is also an equivalence class construction. A more honest notation would be to
¯ 𝑣¯) iff 𝑣¯ = 𝑃 𝑣 and 𝑃 =
replace (𝑝, 𝑣) with (𝑝, 𝜙, 𝑣) and then we could state that (𝑝, 𝜙, 𝑣) ∼ (𝑝, 𝜙,
¯
(𝜙 −1 ∘ ′ −1
𝜙) (𝜙 (𝑝)). However, this notation is tiresome so we do not pursue it further. I prefer the
notation of the next viewpoint.

8.2.3 derivations
To begin, let us define the set of locally smooth functions at 𝑝 ∈ ℳ:

𝐶 ∞ (𝑝) = {𝑓 : ℳ → ℝ ∣ 𝑓 is smooth on an open set containing 𝑝}

In particular, we suppose 𝑓 ∈ 𝐶 ∞ (𝑝) to mean there exists a patch 𝜙 : 𝑈 → 𝑉 ⊆ ℳ such that 𝑓 is


smooth on 𝑉 . Since we use Cartesian coordinates on ℝ by convention it follows that 𝑓 : 𝑉 → ℝ
smooth indicates the local coordinate representative 𝑓 ∘ 𝜙 : 𝑈 → ℝ is smooth (it has continuous
partial derivatives of all orders).
8.2. TANGENT SPACE 207

Definition 8.2.2.
Suppose 𝑋𝑝 : 𝐶 ∞ (𝑝) → ℝ is a linear transformation which satisfies the Leibniz rule then
we say 𝑋𝑝 is a derivation on 𝐶 ∞ (𝑝). Moreover, we denote 𝑋𝑝 ∈ 𝒟𝑝 ℳ iff 𝑋𝑝 (𝑓 + 𝑐𝑔) =
𝑋𝑝 (𝑓 ) + 𝑐𝑋𝑝 (𝑔) and 𝑋𝑝 (𝑓 𝑔) = 𝑓 (𝑝)𝑋𝑝 (𝑔) + 𝑋𝑝 (𝑓 )𝑔(𝑝) for all 𝑓, 𝑔 ∈ 𝐶 ∞ (𝑝) and 𝑐 ∈ ℝ.

Example 8.2.3. Let 𝑀 𝑐𝑎𝑙 = ℝ and consider 𝑋𝑡𝑜 = 𝑑/𝑑𝑡∣𝑡𝑜 . Clearly 𝑋 is a derivation on smooth
functions near 𝑡𝑜 .
∂ ∂

Example 8.2.4. Consider ℳ = ℝ2 . Pick 𝑝 = (𝑥𝑜 , 𝑦𝑜 ) and define 𝑋𝑝 = ∂𝑥 𝑝
and 𝑌𝑝 = ∂𝑦 𝑝
.
2
Once more it is clear that 𝑋𝑝 , 𝑌𝑝 ∈ 𝒟(𝑝)ℝ . These derivations action is accomplished by partial
differentiation followed by evaluation at 𝑝.
Example 8.2.5. Suppose ℳ = ℝ𝑚 . Pick 𝑝 ∈ ℝ𝑚 and define 𝑋 = ∂𝑥∂ 𝑗 𝑝 . Clearly this is a

derivation for any 𝑗 ∈ ℕ𝑚 .


Are the other types of derivations? Is the only thing a derivation is is a partial derivative operator?
Before we can explore this question we need to define partial differentiation on a manifold. We
should hope the definition is consistent with the langauge we already used in multivariate calculus
(and the preceding pair of examples) and yet is also general enough to be stated on any abstract
smooth manifold.

Definition 8.2.6.
Let ℳ be a smooth 𝑚-dimensional manifold and let 𝜙 : 𝑈 → 𝑉 be a local parametrization
with 𝑝 ∈ 𝑉 . The 𝑗-th coordinate function 𝑥𝑗 : 𝑉 → ℝ is the 𝑗-component function of
𝜙−1 : 𝑉 → 𝑈 . In other words:

𝜙−1 (𝑝) = 𝑥(𝑝) = (𝑥1 (𝑝), 𝑥2 (𝑝), . . . , 𝑥𝑚 (𝑝))

These 𝑥𝑗 are manifold coordinates. In constrast, we will denote the standard Cartesian
coordinates in 𝑈 ⊆ ℝ𝑚 via 𝑢𝑗 so a typical point has the form (𝑢1 , 𝑢2 , . . . , 𝑢𝑚 ) and viewed
as functions 𝑢𝑗 : ℝ𝑚 → ℝ where 𝑢𝑗 (𝑣) = 𝑒𝑗 (𝑣) = 𝑣 𝑗 . We define the partial derivative
with respect to 𝑥𝑗 at 𝑝 for 𝑓 ∈ 𝐶 ∞ (𝑝) as follows:
[ ] [ ]
∂𝑓 ∂ ∂ −1

(𝑝) = (𝑓 ∘ 𝜙)(𝑢) = 𝑓 ∘ 𝑥 .
∂𝑥𝑗 ∂𝑢𝑗
𝑢=𝜙−1 (𝑝) ∂𝑢𝑗 𝑥(𝑝)

The idea of the defintion is simply to take the function 𝑓 with domain in ℳ then pull it back to
a function 𝑓 ∘ 𝑥−1 : 𝑈 ⊆ ℝ𝑚 → 𝑉 → ℝ on ℝ𝑚 . Then we can take partial derivatives of 𝑓 ∘ 𝑥−1
in the same way we did in multivariate calculus. In particular, the partial derivative w.r.t. 𝑢𝑗 is
calculated by: [ ]
∂𝑓 𝑑 ( )
(𝑝) = 𝑓 ∘ 𝜙 (𝑥(𝑝) + 𝑡𝑒𝑗 )
∂𝑥𝑗 𝑑𝑡 𝑡=0
208 CHAPTER 8. MANIFOLD THEORY

which is precisely the directional derivative of 𝑓 ∘ 𝑥−1 in the 𝑗-direction at 𝑥(𝑝). In fact, Note
𝑓 ∘ 𝜙 (𝑥(𝑝) + 𝑡𝑒𝑗 ) = 𝑓 (𝑥−1 (𝑥(𝑝) + 𝑡𝑒𝑗 )).
( )

The curve 𝑡 → 𝑥−1 (𝑥(𝑝) + 𝑡𝑒𝑗 ) is the curve on ℳ through 𝑝 where all coordinates are fixed except
the 𝑗-coordinate. It is a coordinate curve on ℳ.

XXX-need a picture of coordinate curves on manifold!

Notice in the case that ℳ = ℝ𝑚 is given Cartesian coordinate 𝜙 = 𝐼𝑑 then 𝑥−1 = 𝐼𝑑 as well and
the 𝑡 → 𝑥−1 (𝑥(𝑝) + 𝑡𝑒𝑗 ) reduces to 𝑡 → 𝑝 + 𝑡𝑒𝑗 which is just the 𝑗-th coordinate curve through 𝑝 on
ℝ𝑚 . It follows that the partial derivative defined for manifolds naturally reduces to the ordinary
partial derivative in the context of ℳ = ℝ𝑚 with Cartesian coordinates. The beautiful thing is
that almost everything we know for ordinary partial derivatives equally well transfers to ∂𝑥∂ 𝑗 𝑝 .
Theorem 8.2.7. Partial differentiation on manifolds
Let ℳ be a smooth 𝑚-dimensional manifold with coordinates 𝑥1 , 𝑥2 , . . . , 𝑥𝑚 near 𝑝. Fur-
thermore, suppose coordinates 𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑚 are also defined near 𝑝. Suppose 𝑓, 𝑔 ∈ 𝐶 ∞ (𝑝)
and 𝑐 ∈ ℝ then:
∂𝑓 ∂𝑔
1. ∂𝑥∂ 𝑗 𝑝 𝑓 + 𝑔 = ∂𝑥
[ ]
𝑗 𝑝 + ∂𝑥𝑗 𝑝

∂ ∂𝑓
[ ]
2. ∂𝑥𝑗 𝑝
𝑐𝑓 = 𝑐 ∂𝑥𝑗 𝑝

∂ ∂𝑔 ∂𝑓
[ ]
3. ∂𝑥𝑗 𝑝
𝑓 𝑔 = 𝑓 (𝑝) ∂𝑥 𝑗 𝑝 + ∂𝑥𝑗 𝑝
𝑔(𝑝)

∂𝑥𝑖

4. ∂𝑥𝑗 𝑝
= 𝛿𝑖𝑗
∑𝑚 ∂𝑥𝑘 ∂𝑦 𝑖

5. 𝑘=1 ∂𝑦 𝑗 𝑝 ∂𝑥𝑘 𝑝 = 𝛿𝑖𝑗

∂𝑓 ∑𝑚 ∂𝑥𝑘 ∂𝑓

6. ∂𝑦 𝑗 𝑝
= 𝑘=1 ∂𝑦 𝑗 𝑝 ∂𝑥𝑘 𝑝

Proof: The proof of (1.) and (2.) follows from the calculation below:
[ ]
∂(𝑓 + 𝑐𝑔) ∂ −1

𝑗
(𝑝) = 𝑗
(𝑓 + 𝑐𝑔) ∘ 𝑥
∂𝑥 ∂𝑢
𝑥(𝑝)
[ ]

∘ 𝑥−1 + 𝑐𝑔 ∘ 𝑥−1

= 𝑓
∂𝑢𝑗
𝑥(𝑝)
[ ] [ ]
∂ −1
∂ −1

= 𝑓 ∘ 𝑥 + 𝑐 𝑔 ∘ 𝑥
∂𝑢𝑗
𝑥(𝑝) ∂𝑢𝑗
𝑥(𝑝)
∂𝑓 ∂𝑔
= (𝑝) + 𝑐 𝑗 (𝑝) (8.1)
∂𝑥𝑗 ∂𝑥
8.2. TANGENT SPACE 209

The key in this argument is that composition (𝑓 + 𝑐𝑔) ∘ 𝑥−1 = 𝑓 ∘ 𝑥−1 + 𝑐𝑔 ∘ 𝑥−1 along side the
linearity of the partial derivative. Item (3.) follows from the identity (𝑓 𝑔) ∘ 𝑥−1 = (𝑓 ∘ 𝑥−1 )(𝑔 ∘ 𝑥−1 )
in tandem with the product rule for a partial derivative on ℝ𝑚 . The reader may be asked to complete
the argument for (3.) in the homework. Continuing to (4.) we calculate from the definition:

∂𝑥𝑖 ∂𝑢𝑖
[ ]
∂ 𝑖 ∘ −1

= (𝑥 𝑥 )(𝑢)
= = 𝛿𝑖𝑗 .
∂𝑥𝑗 𝑝 ∂𝑢𝑗 𝑥(𝑝) ∂𝑢𝑗 𝑥(𝑝)

where the last equality is known from multivariate calculus. In invite the reader to prove it from
the definition if unaware of this fact. Before we prove (5.) it helps to have a picture and a bit
more notation in mind. Near the point 𝑝 we have two coordinate charts 𝑥 : 𝑉 → 𝑈 ⊆ ℝ𝑚 and
𝑦 : 𝑉 → 𝑊 ⊆ ℝ𝑚 , we take the chart domain 𝑉 to be small enough so that both charts are
defined. Denote Cartesian coordinates on 𝑈 by 𝑢1 , 𝑢2 , . . . , 𝑢𝑚 and for 𝑊 we likewise use Cartesian
coordinates 𝑤1 , 𝑤2 , . . . , 𝑤𝑚 . Let us denote patches 𝜙, 𝜓 as the inverses of these charts; 𝜙−1 = 𝑥
and 𝜓 −1 = 𝑦. Transition functions 𝜓 −1 ∘ 𝜙 = 𝑦 ∘ 𝑥−1 are mappings from 𝑈 ⊆ ℝ𝑚 to 𝑊 ⊆ ℝ𝑚 and
we note
∂𝑦 𝑖
[ ]
∂ 𝑖 ∘ −1
(𝑦 𝑥 )(𝑢) =
∂𝑢𝑗 ∂𝑥𝑗
Likewise, the inverse transition functions 𝜙−1 ∘ 𝜓 = 𝑥 ∘ 𝑦 −1 are mappings from 𝑊 ⊆ ℝ𝑚 to 𝑈 ⊆ ℝ𝑚

∂𝑥𝑖
[ ]
∂ 𝑖 ∘ −1
(𝑥 𝑦 )(𝑤) =
∂𝑤𝑗 ∂𝑦 𝑗

Recall that if 𝐹, 𝐺 : ℝ𝑚 → ℝ𝑚 and 𝐹 ∘ 𝐺 = 𝐼𝑑 then 𝐹 ′ 𝐺′ = 𝐼 by the chainrule, hence (𝐹 ′ )−1 = 𝐺′ .


Apply this general fact to the transition functions, we find their derivative matrices are inverses.
∂𝑦
Item (5.) follows. In matrix notation we item (5.) reads ∂𝑥
∂𝑦 ∂𝑥 = 𝐼. Item (6.) follows from:
[ ]
∂𝑓 ∂
∘ 𝑦 −1 )(𝑤)

= (𝑓
∂𝑦 𝑗 𝑝 ∂𝑤𝑗

𝑦(𝑝)
[ ]
∂ −1 ∘ ∘ −1

= (𝑓 ∘ 𝑥 𝑥 𝑦 )(𝑤)
∂𝑤𝑗
𝑦(𝑝)
[ ]
∂ ( −1
: where 𝑢𝑘 (𝑤) = (𝑥 ∘ 𝑦 −1 )𝑘 (𝑤)
) 1 𝑚

= 𝑓 ∘ 𝑥 (𝑢 (𝑤), . . . , 𝑢 (𝑤))
∂𝑤𝑗
𝑦(𝑝)
𝑚
∂(𝑥 ∘ 𝑦 −1 )𝑘 ∂(𝑓 ∘ 𝑥−1 )

= : chain rule
∂𝑤𝑗
𝑦(𝑝) ∂𝑢𝑘 (𝑥 ∘ 𝑦−1 )(𝑦(𝑝))
𝑘=1
𝑚
∂(𝑥𝑘 ∘ 𝑦 −1 ) ∂(𝑓 ∘ 𝑥−1 )

=
∂𝑤𝑗
𝑦(𝑝) ∂𝑢𝑘 𝑥(𝑝)
𝑘=1
𝑚
∑ ∂𝑥𝑘 ∂𝑓
=
∂𝑦 𝑗 𝑝 ∂𝑥𝑘 𝑝
𝑘=1
210 CHAPTER 8. MANIFOLD THEORY

The key step was the multivariate chain rule. □

This theorem proves we can lift calculus on ℝ𝑚 to ℳ in a natural manner. Moreover, we should

note that items (1.), (2.) and (3.) together show ∂𝑥 𝑖 𝑝 is a derivation at 𝑝. Item (6.) should remind
the reader of the contravariant vector discussion. Removing the 𝑓 from the equation reveals that
𝑚
∂𝑥𝑘 ∂

∂ ∑
=
∂𝑦 𝑗 𝑝 ∂𝑦 𝑗 𝑝 ∂𝑥𝑘 𝑝
𝑘=1

A notation convenient to the current discussion is that a contravariant transformation is (𝑝, 𝑣𝑥 ) →


∂𝑦
(𝑝, 𝑣𝑦 ) where 𝑣𝑦 = 𝑃 𝑣𝑥 and 𝑃 = (𝑦 ∘ 𝑥−1 )′ (𝑥(𝑝)) = ∂𝑥 𝑥(𝑝)
. Notice this is the inverse of what we see
in (6.). This suggests that the partial derivatives change coordinates like as a basis for the tangent
space. To complete this thought we need a few well-known propositions for derivations.

Proposition 8.2.8. derivations on constant function gives zero.

If 𝑓 ∈ 𝐶 ∞ (𝑝) is a constant function and 𝑋𝑝 ∈ 𝒟𝑝 ℳ then 𝑋𝑝 (𝑓 ) = 0.

Proof: Suppose 𝑓 (𝑥) = 𝑐 for all 𝑥 ∈ 𝑉 , define 𝑔(𝑥) = 1 for all 𝑥 ∈ 𝑉 and note 𝑓 = 𝑓 𝑔 on 𝑉 . Since
𝑋𝑝 is a derivation is satisfies the Leibniz rule hence

𝑋𝑝 (𝑓 ) = 𝑋𝑝 (𝑓 𝑔) = 𝑓 (𝑝)𝑋𝑝 (𝑔) + 𝑋(𝑓 )𝑔(𝑝) = 𝑐𝑋𝑝 (𝑔) + 𝑋𝑝 (𝑓 ) ⇒ 𝑐𝑋𝑝 (𝑔) = 0.

Moreover, by homogeneity of 𝑋𝑝 , note 𝑐𝑋𝑝 (𝑔) = 𝑋𝑝 (𝑐𝑔) = 𝑋𝑝 (𝑓 ). Thus, 𝑋𝑝 (𝑓 ) = 0. □

Proposition 8.2.9.

If 𝑓, 𝑔 ∈ 𝐶 ∞ (𝑝) and 𝑓 (𝑥) = 𝑔(𝑥) for all 𝑥 ∈ 𝑉 and 𝑋𝑝 ∈ 𝒟𝑝 ℳ then 𝑋𝑝 (𝑓 ) = 𝑋𝑝 (𝑔).

Proof: Note that 𝑓 (𝑥) = 𝑔(𝑥) implies ℎ(𝑥) = 𝑓 (𝑥) − 𝑔(𝑥) = 0 for all 𝑥 ∈ 𝑉 . Thus, the previous
proposition yields 𝑋𝑝 (ℎ) = 0. Thus, 𝑋𝑝 (𝑓 − 𝑔) = 0 and by linearity 𝑋𝑝 (𝑓 ) − 𝑋𝑝 (𝑔) = 0. The
proposition follows. □

Proposition 8.2.10.

Suppose 𝑋𝑝 ∈ 𝒟𝑝 ℳ and 𝑥 is a chart defined near 𝑝,


𝑚

𝑗 ∂
𝑋𝑝 = 𝑋𝑝 (𝑥 ) 𝑗
∂𝑥 𝑝
𝑗=1

Proof: this is a less trivial proposition. We need a standard lemma before we begin.
8.2. TANGENT SPACE 211

Lemma 8.2.11.

Let 𝑝 be a point in smooth manifold ℳ and let 𝑓 : ℳ → ℝ be a smooth function. If


𝑥 : 𝑉 → 𝑈 is a chart with 𝑝 ∈ 𝑉 and 𝑥(𝑝) = 0 then there exist smooth functions 𝑔𝑗 : ℳ → ℝ
∂𝑓
whose values at∑𝑝𝑚 satisfy 𝑔𝑗 (𝑝) = ∂𝑥𝑗 (𝑝). In addition, for all 𝑞 near enough to 𝑝 we have
𝑗
𝑓 (𝑞) = 𝑓 (𝑝) + 𝑘=1 𝑥 (𝑞)𝑔𝑗 (𝑞)

Proof: follows from proving a similar identity on ℝ𝑚 then lifting to the manifold. I leave this as a
nontrivial exercise for the reader. This can be found in many texts, see Burns and Gidea page 29
for one source. ▽

∂𝑓
Consider 𝑓 ∈ 𝐶 ∞ (𝑝), and use the lemma, we assume 𝑥(𝑝) = 0 and 𝑔𝑗 (𝑝) = ∂𝑥𝑗
(𝑝):
( 𝑚
∑ )
𝑗
𝑋𝑝 (𝑓 ) = 𝑋𝑝 𝑓 (𝑝) + 𝑥 (𝑞)𝑔𝑗 (𝑞)
𝑘=1
∑𝑚
𝑋𝑝 𝑥𝑗 (𝑞)𝑔𝑗 (𝑞)
( )
= 𝑋𝑝 (𝑓 (𝑝)) +
𝑘=1
𝑚 [
∑ ]
= 𝑋𝑝 (𝑥𝑗 )𝑔𝑗 (𝑞) + 𝑥𝑗 (𝑝)𝑋𝑝 (𝑔𝑗 (𝑞))
𝑘=1
𝑚
∑ ∂𝑓
= 𝑋𝑝 (𝑥𝑗 ) (𝑝).
∂𝑥𝑗
𝑘=1

The calculation above holds for arbitrary 𝑓 ∈ 𝐶 ∞ (𝑝) hence the proposition follows. □

We’ve answered the question posed earlier in this section. It is true that every derivation of a
manifold is simply a linear combination of partial derivatives. We can say more. The set of deriva-
tions at 𝑝 naturally forms a vector space under the usual addition and scalar multiplication of
operators: if 𝑋𝑝 , 𝑌𝑝 ∈ 𝒟𝑝 ℳ then we define 𝑋𝑝 + 𝑌𝑝 by (𝑋𝑝 + 𝑌𝑝 )(𝑓 ) = 𝑋𝑝 (𝑓 ) + 𝑌𝑝 (𝑓 ) and 𝑐𝑋𝑝 by
(𝑐𝑋𝑝 )(𝑓 ) = 𝑐𝑋𝑝 (𝑓 ) for all 𝑓, 𝑔 ∈ 𝐶 ∞ (𝑝) and 𝑐 ∈ ℝ. It is easy to show 𝒟𝑝 ℳ is a vector space under
∂𝑓 𝑚
these operations. Moreover, the preceding proposition shows that 𝒟𝑝 ℳ = 𝑠𝑝𝑎𝑛{ ∂𝑥 𝑗 𝑝 }𝑗=1 hence

𝒟𝑝 ℳ is an 𝑚-dimensional vector space . 7

Finally, let’s examine coordinate change for derivations. Given two coordinate charts 𝑥, 𝑦 at 𝑝 ∈ ℳ
we have two ways to write the derivation 𝑋𝑝 :

𝑚 𝑚

𝑗 ∂ ∑
𝑘 ∂
𝑋𝑝 = 𝑋𝑝 (𝑥 ) 𝑗 or 𝑋𝑝 = 𝑋𝑝 (𝑦 ) 𝑘
∂𝑥 𝑝 ∂𝑦 𝑝
𝑗=1 𝑘=1

7


technically, we should show the coordinate derivations ∂𝑥𝑗 𝑝 are linearly independent to make this conclusion. I

don’t suppose we’ve done that directly at this juncture. You might find this as a homework
212 CHAPTER 8. MANIFOLD THEORY

It is simple to connect these formulas. Whereas, for 𝑦-coordinates,


𝑚
∂𝑦 𝑘


𝑘 𝑗
𝑋𝑝 (𝑦 ) = 𝑋𝑝 (𝑥 ) 𝑗 (8.2)
∂𝑥 𝑝
𝑗=1

∂𝑥𝑘 ∂
This is the contravariant transformation rule. In contrast, recall ∂𝑦∂ 𝑗 𝑝 = 𝑚

𝑘=1 ∂𝑦 𝑗 𝑝 ∂𝑥𝑘 𝑝 . We
should have anticipated this pattern since from the outset it is clear there is no coordinate depen-
dence in the definition of a derivation.

8.2.4 dictionary between formalisms


We have three competing views of how to characterize a tangent vector.

1. 𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ = {˜
𝛾 ∣ 𝛾 smooth and 𝛾(0) = 𝑝}

2. 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ = {(𝑝, 𝑣) ∣ 𝑣 ∈ ℝ𝑚 }

3. 𝑑𝑒𝑟𝑇𝑝 ℳ = 𝒟𝑝 ℳ

Perhaps it is not terribly obvious how to get a derivation from an equivalence class of curves.
Suppose 𝛾˜ is a tangent vector to ℳ at 𝑝 and let 𝑓, 𝑔 ∈ 𝐶 ∞ (𝑝). Define an operator 𝑉𝑝 associated to
𝛾˜ via 𝑉𝑝 (𝑓 ) = (𝑓 ∘ 𝛾)′ (0). Consider, (𝑓 +𝑐𝑔) ∘ 𝛾)(𝑡) = (𝑓 +𝑐𝑔)(𝛾(𝑡)) = 𝑓 (𝛾(𝑡))+𝑐𝑔(𝛾(𝑡)) differentiate
at set 𝑡 = 0 to verify that 𝑉𝑝 (𝑓 +𝑐𝑔)(𝑝) = (𝑓 +𝑐𝑔) ∘ 𝛾)′ (0) = 𝑉𝑝 (𝑓 )(𝑝)+𝑐𝑉𝑝 (𝑔). Furthermore, observe
that ((𝑓 𝑔) ∘ 𝛾)(𝑡) = 𝑓 (𝛾(𝑡))𝑔(𝛾(𝑡)) therefore by the product rule from calculus I,

((𝑓 𝑔) ∘ 𝛾)′ (𝑡) = (𝑓 ∘ 𝛾)′ (𝑡)𝑔(𝛾(𝑡)) + 𝑓 (𝛾(𝑡))(𝑔 ∘ 𝛾)′ (𝑡)

hence, noting 𝛾(0) = 𝑝 we verify the Leibniz rule for 𝑉𝑝 ,

𝑉𝑝 (𝑓 𝑔) = ((𝑓 𝑔) ∘ 𝛾)′ (0) = (𝑓 ∘ 𝛾)′ (0)𝑔(𝑝) + 𝑓 (𝑝)(𝑔 ∘ 𝛾)′ (0) = 𝑉𝑝 (𝑓 )𝑔(𝑝) + 𝑓 (𝑝)𝑉𝑝 (𝑔)

In view of these calculations we find that Ξ : 𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ → 𝑑𝑒𝑟𝑇𝑝 ℳ defined by Ξ(˜𝛾 ) = 𝑉𝑝 is


well-defined. Moreover, we can show Ξ is an isomorphism. To be clear, we define:

𝛾 )(𝑓 ) = 𝑉𝑝 (𝑓 ) = (𝑓 ∘ 𝛾)′ (0).


Ξ(˜

I’ll begin with injectivity. Suppose Ξ(˜ ˜ then for all 𝑓 ∈ 𝐶 ∞ (𝑝) we have Ξ(˜
𝛾 ) = Ξ(𝛽) ˜ )
𝛾 )(𝑓 ) = Ξ(𝛽)(𝑓
′ ′
hence (𝑓 𝛾) (0) = (𝑓 𝛽) (0) for all smooth functions 𝑓 at 𝑝. Take 𝑓 = 𝑥 : 𝑉 → 𝑈 and it follows
∘ ∘

that 𝛾 ∼𝑝 𝛽 hence 𝛾˜ = 𝛽˜ and we have shown Ξ is injective. Linearity of Ξ must be judged on


the basis of our definition for the addition of equivalence classes of curves. I leave linearity and
surjectivity to the reader. Once those are established it follows that Ξ is an isomorphism and
𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ ≈ 𝑑𝑒𝑟𝑇𝑝 ℳ.

The isomorphism between 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ and 𝑑𝑒𝑟𝑇ℳ was nearly given in the previous subsection. Es-
sentially we can just paste the components from 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ onto the partial derivative basis for
8.2. TANGENT SPACE 213

derivations. Define Υ : 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ → 𝑑𝑒𝑟𝑇𝑝 ℳ for each (𝑝, 𝑣𝑥 ) ∈ 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ, relative to coordinates 𝑥
at 𝑝 ∈ ℳ,
( ∑ 𝑚 ) ∑ 𝑚
𝑘 𝑘 ∂

Υ 𝑝, 𝑣𝑥 𝑒 𝑘 = 𝑣𝑥 𝑘
∂𝑥 𝑝
𝑘=1 𝑘=1

Note that if we used a different chart 𝑦 then (𝑝, 𝑣𝑥 ) → (𝑝, 𝑣𝑦 ) and consequently
( ∑ 𝑚 ) ∑𝑚 𝑚
𝑘 ∂ 𝑘 ∂

𝑘

Υ 𝑝, 𝑣𝑦 𝑒 𝑘 = 𝑣𝑦 𝑘 = 𝑣 .
∂𝑦 𝑝 ∂𝑥𝑘 𝑝
𝑘=1 𝑘=1 𝑘=1

Thus Υ is single-valued on each equivalence class of vectors. Furthermore, the inverse mapping is
simple to write: for a chart 𝑥 at 𝑝,
𝑚

−1
Υ (𝑋𝑝 ) = (𝑝, 𝑋𝑝 (𝑥𝑘 )𝑒𝑘 )
𝑘=1

and the value of the mapping above is related contravariantly if we were to use a different chart 𝑦
𝑚

Υ−1 (𝑋𝑝 ) = (𝑝, 𝑋𝑝 (𝑦 𝑘 )𝑒𝑘 ).
𝑘=1

See Equation 8.2 and the surrounding discussion if you forgot. It is not hard to verify that Υ
is bijective and linear thus Υ is an isomorphism. We have shown 𝑣𝑒𝑐𝑡𝑇𝑝 ℳ ≈ 𝑑𝑒𝑟𝑇𝑝 ℳ. Let us
summarize:
𝑣𝑒𝑐𝑡𝑇𝑝 ℳ ≈ 𝑑𝑒𝑟𝑇𝑝 ℳ ≈ 𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ
Sorry to be anticlimatic here, but we choose the following for future use:

Definition 8.2.12. tangent space

We denote 𝑇𝑝 ℳ = 𝑑𝑒𝑟𝑇𝑝 ℳ.
214 CHAPTER 8. MANIFOLD THEORY

8.3 the differential


In this section we generalize the concept of the differential to the context of manifolds. Recall that
for 𝐹 : 𝑈 ⊆ ℝ𝑚 → 𝑉 ⊆ ℝ𝑛 the differential 𝑑𝑝 𝐹 : ℝ𝑚 → ℝ𝑛 was a linear transformation which best
approximated the change in 𝐹 near 𝑝. Notice that while the domain of 𝐹 could be a mere subset
of ℝ𝑚 the differential always took all of ℝ𝑚 as its domain. This suggests we should really think
of the differential as a mapping which transports tangent vectors to 𝑈 to tangent vectors at 𝑉 .
Often 𝑑𝑝 𝑓 is called the push-forward by 𝑓 at 𝑝 because it pushes tangent vectors along side the
mapping.
Definition 8.3.1. differential for manifolds.
Suppose ℳ and 𝒩 are smooth manifolds of dimension 𝑚 and 𝑛 respective. Furthermore,
suppose 𝑓 : ℳ → 𝒩 is a smooth mapping. We define 𝑑𝑝 𝑓 : 𝑇𝑝 ℳ → 𝑇𝑓 (𝑝) 𝒩 as follows: for
each 𝑋𝑝 ∈ 𝑇𝑝 ℳ and 𝑔 ∈ 𝐶 ∞ (𝑓 (𝑝))

𝑑𝑝 𝑓 (𝑋𝑝 )(𝑔) = 𝑋𝑝 (𝑔 ∘ 𝑓 ).

Notice that 𝑔 : 𝑑𝑜𝑚(𝑔) ⊆ 𝒩 → ℝ and consequently 𝑔 ∘ 𝑓 : ℳ → 𝒩 → ℝ and it follows 𝑔 ∘ 𝑓 ∈


𝐶 ∞ (𝑝) and it is natural to find 𝑔 ∘ 𝑓 in the domain of 𝑋𝑝 . In addition, it is not hard to show
𝑑𝑝 𝑓 (𝑋𝑝 ) ∈ 𝒟𝑓 (𝑝) 𝒩 . Observe:
1. 𝑑𝑝 𝑓 (𝑋𝑝 )(𝑔 + ℎ) = 𝑋𝑝 ((𝑔 + ℎ) ∘ 𝑓 ) = 𝑋𝑝 (𝑔 ∘ 𝑓 + ℎ ∘ 𝑓 ) = 𝑋𝑝 (𝑔 ∘ 𝑓 ) + 𝑋𝑝 (ℎ ∘ 𝑓 )
2. 𝑑𝑝 𝑓 (𝑋𝑝 )(𝑐𝑔) = 𝑋𝑝 ((𝑐𝑔) ∘ 𝑓 ) = 𝑋𝑝 (𝑐𝑔 ∘ 𝑓 )) = 𝑐𝑋𝑝 (𝑔 ∘ 𝑓 )) = 𝑐𝑑𝑝 𝑓 (𝑋𝑝 )(𝑔)
The proof of the Leibniz rule is similar. Now that we have justified the definition let’s look at an
interesting application to the study of surfaces in ℝ3 .

Suppose 𝑆 ⊂ ℝ3 is an embedded two-dimensional manifold. In particular suppose 𝑆 is a regular


surface which means that for each parametrization 𝜙 : 𝑈 → 𝑉 the normal vector field 𝑁 (𝑢, 𝑣) =
(𝜙𝑢 × 𝜙𝑣 )(𝑢, 𝑣) is a smooth non-vanishing vector field on 𝑆. Recall that the unit-sphere 𝑆2 = {𝑥 ∈
ℝ3 ∣ ∣∣𝑥∣∣ = 1} is also manifold, perhaps you showed this in a homework. In any event, the mapping
𝑈 : 𝑆 → 𝑆2 defined by
𝜙𝑢 × 𝜙𝑣
𝐺(𝑢, 𝑣) =
∣∣𝜙𝑢 × 𝜙𝑣 ∣∣
provides a smooth mapping from the surface to the unit sphere. The change in 𝐺 measures how
the normal deflects as we move about the surface 𝑆. One natural scalar we can use to quantify
that curving of the normal is called the Gaussian curvature which is defined by 𝐾 = 𝑑𝑒𝑡(𝑑𝐺).
Likewise, we define 𝐻 = 𝑡𝑟𝑎𝑐𝑒(𝑑𝐺) which is the mean curvature of 𝑆. If 𝑘1 , 𝑘2 are the eigen-
values the operator 𝑑𝑝 𝐺 then it is a well-known result of linear algebra that 𝑑𝑒𝑡(𝑑𝑝 𝐺) = 𝑘1 𝑘2 and
𝑡𝑟𝑎𝑐𝑒(𝑑𝑝 𝐺) = 𝑘1 + 𝑘2 . The eigenvalues are called the principal curvatures. Moreover, it can be
shown that the matrix of 𝑑𝑝 𝐺 is symmetric and a theorem of linear algebra says that the eigenvalues
are real and we can select an orthogonal basis of eigenvectors for 𝑇𝑝 𝑆.
8.3. THE DIFFERENTIAL 215

⃗ 𝐵,
Example 8.3.2. Consider the plane 𝑆 with base point 𝑟𝑜 and containing the vectors 𝐴, ⃗ write

⃗ + 𝑣𝐵
𝜙(𝑢, 𝑣) = 𝑟𝑜 + 𝑢𝐴 ⃗

to place coordinates 𝑢, 𝑣 on the plane 𝑆. Calculate the Gauss map,

𝜙𝑢 × 𝜙𝑣 ⃗×𝐵
𝐴 ⃗
𝐺(𝑢, 𝑣) = =
∣∣𝜙𝑢 × 𝜙𝑣 ∣∣ ⃗ × 𝐵∣∣
∣∣𝐴 ⃗

This is constant on 𝑆 hence 𝑑𝑝 𝐺 = 0 for each 𝑝 ∈ 𝑆. The curvatures (mean, Gaussian and
principles) are all zero for this case. Makes sense, a plane isn’t curved!

Let me outline how to calculate the curvature directly when 𝐺 is not trivial. Calculate,

∂(𝑦 𝑗 ∘ 𝐺)
( ) ( )
∂ 𝑗 ∂ 𝑗∘
𝑑𝑝 𝐺 (𝑦 ) = 𝑦 𝐺 =
∂𝑥𝑘 ∂𝑥𝑘 ∂𝑥𝑘

Thus, using the discussion of the preceding section,


2
∂(𝑦 𝑗 ∘ 𝐺) ∂
( )
∂ ∑
𝑑𝑝 𝐺 =
∂𝑥𝑘 ∂𝑥𝑘 ∂𝑦 𝑗
𝑗=1

[ ]
∂(𝑦 𝑗 ∘ 𝐺)
Therefore, the matrix of 𝑑𝑝 𝐺 is the 2 × 2 matrix ∂𝑥𝑘
with respect to the choice of coordinates
𝑥1 , 𝑥2 on 𝑆 and 𝑦 1 , 𝑦 2 on the sphere.

Example 8.3.3. Suppose 𝜙(𝑢, 𝑣) = ( 𝑢, 𝑣, 𝑅2 − 𝑢2 − 𝑣 2 ) parameterizes part of a sphere 𝑆𝑅 of
radius 𝑅 > 0. You can calculate the Gauss map and the result should be geometrically obvious:

1( √ )
𝐺(𝑢, 𝑣) = 𝑢, 𝑣, 𝑅2 − 𝑢2 − 𝑣 2
𝑅
Then the 𝑢 and 𝑣 components of 𝐺(𝑢, 𝑣) are simply 𝑢/𝑅 and 𝑣/𝑅 respective. Calculate,
∂ 𝑢 ∂ 𝑢 1
[ ] [ ]
[𝑑𝑝 𝐺] = ∂𝑢 [ 𝑅 ] ∂𝑣 [ 𝑅 ] = 𝑅 0
∂ 𝑣 ∂ 𝑣 1
∂𝑢 [ 𝑅 ] ∂𝑣 [ 𝑅 ]
0 𝑅

Thus the Gaussian curvature of the sphere 𝐾 = 1/𝑅2 . The principle curvatures are 𝑘1 = 𝑘2 = 1/𝑅
and the mean curvature is simply 𝐻 = 2/𝑅. Notice that as 𝑅 → ∞ we find agreement with the
curvature of a plane.

Example 8.3.4. Suppose 𝑆 is a cylinder which is parametrized by 𝜙(𝑢, 𝑣) = (𝑅 cos 𝑢, 𝑅 sin 𝑢, 𝑣).
The Gauss map yields 𝐺(𝑢, 𝑣) = (cos 𝑢, sin 𝑢, 0). I leave the explicit details to the reader, but it can
be shown that 𝑘1 = 1/𝑅, 𝑘2 = 0 and hence 𝐾 = 0 whereas 𝐻 = 1/𝑅.
216 CHAPTER 8. MANIFOLD THEORY

The differential is actually easier to frame in the equivalence class curve formulation of 𝑇𝑝 ℳ. In
particular, suppose 𝛾˜ = [𝛾] as a more convenient notation for what follows. In addition, suppose
𝐹 : ℳ → 𝒩 is a smooth function and [𝛾] ∈ 𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ then we define 𝑑𝑝 𝐹 : 𝑐𝑢𝑟𝑣𝑒𝑇𝑝 ℳ →
𝑐𝑢𝑟𝑣𝑒𝑇𝐹 (𝑝) 𝒩 as follows:
𝑑𝑝 𝐹 ([𝛾]) = [𝐹 ∘ 𝛾]
There is a chain-rule for differentials. It’s the natural rule you’d expect. If 𝐹 : ℳ → 𝒩 and
𝐺 : 𝒩 → 𝒫 then, denoting 𝑞 = 𝐹 (𝑝),

𝑑𝑝 (𝐺 ∘ 𝐹 ) = 𝑑𝑞 𝐺 ∘ 𝑑𝑝 𝐹.

The proof is simple in the curve notation:


( ) ( )
𝑑𝑞 𝐺 ∘ 𝑑𝑝 𝐹 ([𝛾]) = 𝑑𝑞 𝐺 𝑑𝑝 𝐹 ([𝛾]) = 𝑑𝑞 𝐺([𝐹 ∘ 𝛾]) = [𝐺 ∘ (𝐹 ∘ 𝛾)] = 𝑑𝑝 (𝐺 ∘ 𝐹 )[𝛾].

You can see why the curve formulation of tangent vectors is useful. It does simply certain questions.
That said, we will insist 𝑇𝑝 ℳ = 𝒟𝑝 ℳ in sequel.

The push-forward need not be an abstract8 exercise.


Example 8.3.5. Suppose 𝐹 : ℝ2𝑟,𝜃 → ℝ2𝑥,𝑦 is the polar coordinate transformation. In particular,

𝐹 (𝑟, 𝜃) = (𝑟 cos 𝜃, 𝑟 sin 𝜃)


∂ ∂

Let’s examine where 𝐹 pushes ∂𝑟 𝑝
and ∂𝜃 𝑝
. We use (𝑥, 𝑦) as coordinates in the codomain and
the problem is to calculate, for 𝑋𝑝 = ∂𝑟 ∣𝑝 or 𝑋𝑝 = ∂𝜃 ∣𝑝 , the values of 𝑑𝑝 𝐹 (𝑋𝑝 )(𝑥) and 𝑑𝑝 𝐹 (𝑋𝑝 )(𝑦)
as we know 𝑑𝑝 𝑓 (𝑋𝑝 ) = 𝑑𝑝 𝐹 (𝑋𝑝 )(𝑥)∂𝑥 ∣𝑞 + 𝑑𝑝 𝐹 (𝑋𝑝 )(𝑦)∂𝑦 ∣𝑞 where 𝑞 = 𝐹 (𝑝).

𝑑𝑝 𝐹 (∂𝑟 ∣𝑝 ) = ∂𝑟 ∣𝑝 (𝑥 ∘ 𝐹 )∂𝑥 𝑞 + ∂𝑟 ∣𝑝 (𝑦 ∘ 𝐹 )∂𝑦 𝑞

= ∂𝑟 ∣𝑝 (𝑟 cos 𝜃)∂𝑥 𝑞 + ∂𝑟 ∣𝑝 (𝑟 sin 𝜃)∂𝑦 𝑞

= cos 𝜃 ∂𝑥 𝑞 + sin 𝜃 ∂𝑦 𝑞

A similar calculation follows for 𝑑𝑝 𝐹 (∂𝜃 ∣𝑝 ). However, let me do the calculation in the traditional
notation from multivariate calculus. If we denote 𝐹 = (𝑥, 𝑦) and drop the point dependence on the
partials we find the formulas for ∂/∂𝜃 below:
∂ ∂𝑥 ∂ ∂𝑦 ∂ ∂ ∂
= + = −𝑟 sin 𝜃 + 𝑟 cos 𝜃 .
∂𝜃 ∂𝜃 ∂𝑥 ∂𝜃 ∂𝑦 ∂𝑥 ∂𝑦
Therefore, the push-forward is a tool which we can use to change coordinates for vectors. Given
the coordinate transformation on a manifold we just push the vector of interest presented in one
coordinate system to the other through the formulas above. In multivariate calculus we simply
thought of this as changing notation on a given problem. I would be good if you came to the same
understanding here.
8
it’s not my idea of abstract that is wrong... think about that. ⌣
8.4. COTANGENT SPACE 217

8.4 cotangent space


The tangent space to a smooth manifold ℳ is a vector space of derivations and we denote it by
𝑇𝑝 ℳ. The dual space to this vector space is called the cotangent space and the typical elements
are called covectors.
Definition 8.4.1. cotangent space 𝑇𝑝 ℳ∗
Suppose ℳ is a smooth manifold and 𝑇𝑝 ℳ is the tangent space at 𝑝 ∈ ℳ. We define,
𝑇𝑝 ℳ∗ = {𝛼𝑝 : 𝑇𝑝 ℳ → ℝ ∣ 𝛼 𝑖𝑠 𝑙𝑖𝑛𝑒𝑎𝑟}.
If 𝑥 is a local coordinate chart at 𝑝 and ∂𝑥∂ 1 𝑝 , ∂𝑥∂ 2 𝑝 , . . . , ∂𝑥∂𝑚 𝑝 is a basis for 𝑇𝑝 ℳ then we denote

( )
the dual basis 𝑑𝑝 𝑥1 , 𝑑𝑝 𝑥2 , . . . , 𝑑𝑝 𝑥𝑚 where 𝑑𝑝 𝑥𝑖 ∂𝑘 𝑝 = 𝛿𝑖𝑘 . Moreover, if 𝛼 is a covector at 𝑝 then9 :
𝑚

𝛼= 𝛼𝑘 𝑑𝑥𝑘
𝑘=1
)
(


where 𝛼𝑘 = 𝛼 ∂𝑥𝑘 and 𝑑𝑥𝑘 is a short-hand for 𝑑𝑝 𝑥𝑘 . We should understand that covectors
𝑝
are defined at a point even if the point is not explicitly indicated in a particular context. This
does lead to some ambiguity in the same way that the careless identification of the function 𝑓 and
it’s value 𝑓 (𝑥) does throughout calculus. That said, an abbreviated notation is often important to
help us see through more difficult patterns without getting distracted by the minutia of the problem.

You might worry the notation used for the differential and our current notation for the dual basis
of covectors is not consistent. After all, we have two rather different meanings for 𝑑𝑝 𝑥𝑘 at this time:
1. 𝑥𝑘 : 𝑉 → ℝ is a smooth function hence 𝑑𝑝 𝑥𝑘 : 𝑇𝑝 ℳ → 𝑇𝑥𝑘 (𝑝) ℝ
is defined as a push-forward, 𝑑𝑝 𝑥𝑘 (𝑋𝑝 )(𝑔) = 𝑋𝑝 (𝑔 ∘ 𝑥𝑘 )
( )
2. 𝑑𝑝 𝑥𝑘 : 𝑇𝑝 ℳ → ℝ where 𝑑𝑝 𝑥𝑘 ∂𝑗 𝑝 = 𝛿𝑗𝑘
It is customary to identify 𝑇𝑥𝑘 (𝑝) ℝ with ℝ hence there is no trouble. Let us examine how the
dual-basis condition can be derived for the differential, suppose 𝑔 : ℝ → ℝ hence 𝑔 ∘ 𝑥𝑘 : 𝑉 → ℝ,
∂𝑥𝑘 𝑑𝑔
( )
∂ ∂ 𝑑
𝑑𝑝 𝑥𝑘 𝑘
( )
(𝑔) = (𝑔(𝑥 )) = = 𝛿 𝑗𝑘 𝑔 = 𝛿𝑗𝑘 𝑔
∂𝑥𝑗 𝑝
∂𝑥𝑗 𝑝
∂𝑥𝑗 𝑝 𝑑𝑡 𝑥𝑘 (𝑝)
𝑑𝑡 𝑥𝑘 (𝑝)

| {z }
𝑐ℎ𝑎𝑖𝑛 𝑟𝑢𝑙𝑒

𝑑

Where, we’ve made the identification 1 = 𝑑𝑡 ( which is the nut and bolts of writing 𝑇𝑥𝑘 (𝑝) ℝ = ℝ
𝑥𝑘 (𝑝)
) and hence have the beautiful identity:
( )
𝑘 ∂
𝑑𝑝 𝑥 = 𝛿𝑗𝑘 .
∂𝑥𝑗 𝑝
9
we explained this for an arbitrary vector space 𝑉 and its dual 𝑉 ∗ in a previous chapter, we simply apply those
results once more here in the particular context 𝑉 = 𝑇𝑝 ℳ
218 CHAPTER 8. MANIFOLD THEORY

In contrast, there is no need to derive this for case (2.) since in that context this serves as the
definition for the object. Personally, I find the multiple interpretations of objects in manifold theory
is one of the most difficult aspects of the theory. On the other hand, the notation is really neat
once you understand how subtly it assumes many theorems. You should understand the notation
we enjoy at this time is the result of generations of mathematical thought. Following a similar
derivation for an arbitrary vector 𝑋𝑝 ∈ 𝑇𝑝 ℳ and 𝑓 : ℳ → ℝ we find

𝑑𝑝 𝑓 (𝑋𝑝 ) = 𝑋𝑝 (𝑓 )

This notation is completely consistent with the total differential as commonly discussed in multi-
variate calculus. Recall that if 𝑓 : ℝ𝑚 → ℝ then we defined
∂𝑓 ∂𝑓 ∂𝑓
𝑑𝑓 = 1
𝑑𝑥1 + 2 𝑑𝑥2 + ⋅ ⋅ ⋅ + 𝑚 𝑑𝑥𝑚 .
∂𝑥 ∂𝑥 ∂𝑥
∂𝑓
Notice that the 𝑗-th component of 𝑑𝑓 is simply ∂𝑥 𝑗 . Notice that the identity 𝑑𝑝 𝑓 (𝑋𝑝 ) = 𝑋𝑝 (𝑓 )

gives us the same component if we simply evaluate the covector 𝑑𝑝 𝑓 on the coordinate basis ∂𝑥∂ 𝑗 𝑝 ,

( )
∂ ∂𝑓
𝑑𝑝 𝑓 =
∂𝑥𝑗 𝑝 ∂𝑥𝑗 𝑝

8.5 tensors at a point


Given a smooth 𝑚-dimensional manifold ℳ and a point 𝑝 ∈ ℳ we have a tangent space 𝑇𝑝 ℳ and
a cotangent space 𝑇𝑝 ℳ∗ . The set of tensors at 𝑝 ∈ ℳ is simply the set of all multilinear mappings
on the tangent and cotangent space at 𝑝. We again define the set of all type (𝑟, 𝑠) tensors to be
𝑇𝑠𝑟 ℳ𝑝 meaning 𝐿 ∈ 𝑇𝑠𝑟 ℳ𝑝 iff 𝐿 is a multilinear mapping of the form
𝐿 : 𝑇𝑝 ℳ × ⋅ ⋅ ⋅ × 𝑇𝑝 ℳ × 𝑇𝑝 ℳ∗ × ⋅ ⋅ ⋅ × 𝑇𝑝 ℳ∗ → ℝ.
| {z } | {z }
𝑟 𝑐𝑜𝑝𝑖𝑒𝑠 𝑠 𝑐𝑜𝑝𝑖𝑒𝑠

Relative to a particular coordinate chart 𝑥 at 𝑝 we can build a basis for 𝑇𝑠𝑟 ℳ𝑝 via the tensor
product. In particular, for each 𝐿 ∈ 𝑇𝑠𝑟 ℳ𝑝 there exist constants 𝐿𝑗𝑖11𝑖𝑗22...𝑖
...𝑗𝑠
𝑟
such that
𝑚
∂ ∂
(𝐿𝑗𝑖11𝑖𝑗22...𝑖
...𝑗𝑠

𝐿𝑝 = )(𝑝)𝑑𝑝 𝑥𝑖1 𝑖𝑟
⊗ ⋅ ⋅ ⋅ ⊗ 𝑑𝑝 𝑥 ⊗ 𝑗1 ⊗ ⋅ ⋅ ⋅ ⊗ 𝑗𝑠 .
𝑟
∂𝑥 𝑝 ∂𝑥 𝑝
𝑖1 ,...,𝑖𝑟 ,𝑗1 ,...,𝑗𝑠 =1

The components can be calculated by contraction with the appropriate vectors and covectors:
( )
𝑗1 𝑗2 ...𝑗𝑠 ∂ ∂ 𝑗1 𝑗𝑠
(𝐿𝑖1 𝑖2 ...𝑖𝑟 )(𝑝) = 𝐿 , . . . , 𝑖𝑟 , 𝑑𝑝 𝑥 , . . . , 𝑑𝑝 𝑥 .
∂𝑥𝑖1 𝑝 ∂𝑥 𝑝
We can summarize the equations above with multi-index notation:

𝐼 𝑖1 𝑖2 ⊗ 𝑖𝑟 ∂ ∂ ∂
𝑑𝑝 𝑥 = 𝑑𝑝 𝑥 ⊗ 𝑑𝑝 𝑥 ⋅ ⋅ ⋅ ⊗ 𝑑𝑝 𝑥 and = ⊗ ⋅ ⋅ ⋅ ⊗ 𝑗𝑠
∂𝑥𝐽 𝑝 ∂𝑥𝑗1 𝑝 ∂𝑥 𝑝
8.6. TENSOR FIELDS 219


∑ ∂
Consequently, 𝐿𝑝 = 𝐿𝐽𝐼 (𝑝)𝑑𝑝 𝑥𝐼 ⊗ 𝐽 . We may also construct wedge products and build the
∂𝑥 𝑝
𝐼,𝐽

exterior algebra as we did for an arbitrary vector space. Given a metric 𝑔𝑝 ∈ 𝑇20 ℳ𝑝 we can calculate
hodge duals in Λℳ𝑝 . All these constructions are possible at each point in a smooth manifold10 .

8.6 tensor fields


Since the tangent and cotangent space are defined at each point in a smooth manifold we can
construct the tangent bundle and cotangent bundle by simply taking the union of all the tangent
or cotangent spaces:

Definition 8.6.1. tangent and cotangent bundles.

Suppose ℳ is a smooth manifold the we define the tangent bundle 𝑇 ℳ and the cotan-
gent bundle 𝑇 ℳ∗ as follows:

𝑇 ℳ = ∪𝑝∈ℳ 𝑇𝑝 ℳ and 𝑇 ℳ∗ = ∪𝑝∈ℳ 𝑇𝑝 ℳ∗ .

The cannonical projections 𝜋, 𝜋


˜ tell us where a particular vector or covector are found on the
manifold:
𝜋(𝑋𝑝 ) = 𝑝 and 𝜋˜ (𝛼𝑝 ) = 𝑝

I usually picture this construction as follows:

XXX– add projection pictures

Notice the fibers of 𝜋 and 𝜋˜ are 𝜋 −1 (𝑝) = 𝑇𝑝 ℳ and 𝜋˜ −1 (𝑝) = 𝑇𝑝 ℳ∗ . Generally a fiber bundle
(𝐸, ℳ, 𝜋) consists of a base manifold ℳ, a bundle space 𝐸 and a projection
𝜋 : 𝐸 → ℳ. A local section of 𝐸 is a mapping 𝑠 : 𝑉 ⊆ ℳ → 𝐸 such that 𝜋 ∘ 𝑠 is injective.
In other words, the image of a section hits each fiber over its domain just once. A section selects
a particular element of each fiber. Here’s an abstract picture of section, I sometimes think of the
section as its image although technically the section is actually a mapping:

XXX– add section picture–


Given the language above we find a natural langauge to define vector and covector-fields on a
manifold. However, for reasons that become clear later, we call a covector-field a differential one-
form.

Definition 8.6.2. tensor fields.


10
I assume ℳ is Hausdorff and has a countable basis, see Burn’s and Gidea Theorem 3.2.5 on page 116.
220 CHAPTER 8. MANIFOLD THEORY

Let 𝑉 ⊆ ℳ, we define:

1. 𝑋 is a vector field on 𝑉 iff 𝑋 is a section of 𝑇 ℳ on 𝑉

2. 𝛼 is a differential one-form on 𝑉 iff 𝛼 is a section of 𝑇 ℳ∗ on 𝑉 .

3. 𝐿 is a type (𝑟, 𝑠) tensor-field on 𝑉 iff 𝐿 is a section of 𝑇𝑠𝑟 ℳ on 𝑉 .

We consider only smooth sections and it turns out this is equivalent11 to the demand that the
component functions of the fields above are smooth on 𝑉 .

8.7 metric tensor


I’ll begin by discussing briefly the informal concept of a metric. The calculations given in the first
part of this section show you how to think for nice examples that are embedded in ℝ𝑚 . In such
cases the metric can be deduced by setting appropriate terms for the metric on ℝ𝑚 to zero. The
metric is then used to set-up arclength integrals over a curved space, see my Chapter on Varitional
Calculus from the previous notes if you want examples.

In the second part of this chapter I give the careful definition which applies to an arbitrary manifold.
I include this whole section mostly for informational purposes. Our main thrust in this course is
with the calculus of differential forms and the metric is actually, ignoring the task of hodge duals,
not on the center stage. That said, any student of differential geometry will be interested in the
metric. The problem of paralell transport12 , and the definition and calculation of geodesics13 are
fascinating problems beyond this course.

8.7.1 classical metric notation in ℝ𝑚


Definition 8.7.1.
The Euclidean metric is 𝑑𝑠2 = 𝑑𝑥2 + 𝑑𝑦 2 + 𝑑𝑧 2 . Generally, for orthogonal curvelinear
1 1 1
coordinates 𝑢, 𝑣, 𝑤 we calculate 𝑑𝑠2 = ∣∣∇𝑢∣∣ 2 2 2
2 𝑑𝑢 + ∣∣∇𝑣∣∣2 𝑑𝑣 + ∣∣∇𝑤∣∣2 𝑑𝑤 .

The beauty of the metric is that it allows us to calculate in other coordinates, consider
𝑥 = 𝑟 cos(𝜃) 𝑦 = 𝑟 sin(𝜃)
For which we have implicit inverse coordinate transformations 𝑟2 = 𝑥2 + 𝑦 2 and 𝜃 = tan−1 (𝑦/𝑥).
From these inverse formulas we calculate:
∇𝑟 = < 𝑥/𝑟, 𝑦/𝑟 > ∇𝜃 = < −𝑦/𝑟2 , 𝑥/𝑟2 >
11
all the bundles above are themselves manifolds, for example 𝑇 ℳ is a 2𝑚-dimensional manifold, and as such the
term smooth has already been defined. I do not intend to delve into that aspect of the theory here. See any text on
manifold theory for details.
12
how to move vectors around in a curved manifold
13
curve of shortest distance on a curved space, basically they are the lines on a manifold
8.7. METRIC TENSOR 221

Thus, ∣∣∇𝑟∣∣ = 1 whereas ∣∣∇𝜃∣∣ = 1/𝑟. We find that the metric in polar coordinates takes the form:

𝑑𝑠2 = 𝑑𝑟2 + 𝑟2 𝑑𝜃2

Physicists and engineers tend to like to think of these as arising from calculating the length of
infinitesimal displacements in the 𝑟 or 𝜃 directions. Generically, for 𝑢, 𝑣, 𝑤 coordinates
1 1 1
𝑑𝑙𝑢 = 𝑑𝑢 𝑑𝑙𝑣 = 𝑑𝑣 𝑑𝑙𝑤 = 𝑑𝑤
∣∣∇𝑢∣∣ ∣∣∇𝑣∣∣ ∣∣∇𝑤∣∣

and 𝑑𝑠2 = 𝑑𝑙2𝑢 + 𝑑𝑙2𝑣 + 𝑑𝑙2𝑤 . So in that notation we just found 𝑑𝑙𝑟 = 𝑑𝑟 and 𝑑𝑙𝜃 = 𝑟𝑑𝜃. Notice then
that cylindircal coordinates have the metric,

𝑑𝑠2 = 𝑑𝑟2 + 𝑟2 𝑑𝜃2 + 𝑑𝑧 2 .

For spherical coordinates 𝑥 = 𝑟 cos(𝜙) sin(𝜃), 𝑦 = 𝑟 sin(𝜙) sin(𝜃) and 𝑧 = 𝑟 cos(𝜃) (here 0 ≤ 𝜙 ≤ 2𝜋
and 0 ≤ 𝜃 ≤ 𝜋, physics notation). Calculation of the metric follows from the line elements,

𝑑𝑙𝑟 = 𝑑𝑟 𝑑𝑙𝜙 = 𝑟 sin(𝜃)𝑑𝜙 𝑑𝑙𝜃 = 𝑟𝑑𝜃

Thus,
𝑑𝑠2 = 𝑑𝑟2 + 𝑟2 sin2 (𝜃)𝑑𝜙2 + 𝑟2 𝑑𝜃2 .
We now have all the tools we need for examples in spherical or cylindrical coordinates. What about
other cases? In general, given some 𝑝-manifold embedded in ℝ𝑛 how does one find the metric on
that manifold? If we are to follow the approach of this section we’ll need to find coordinates on
ℝ𝑛 such that the manifold 𝑆 is described by setting all but 𝑝 of the coordinates to a constant.
For example, in ℝ4 we have generalized cylindircal coordinates (𝑟, 𝜙, 𝑧, 𝑡) defined implicitly by the
equations below
𝑥 = 𝑟 cos(𝜙), 𝑦 = 𝑟 sin(𝜙), 𝑧 = 𝑧, 𝑡=𝑡
On the hyper-cylinder 𝑟 = 𝑅 we have the metric 𝑑𝑠2 = 𝑅2 𝑑𝜃2 + 𝑑𝑧 2 + 𝑑𝑤2 . There are mathemati-
cians/physicists whose careers are founded upon the discovery of a metric for some manifold. This
is generally a difficult task.

8.7.2 metric tensor on a smooth manifold


A metric on a smooth manifold ℳ is a type (2, 0) tensor field on ℳ which is at each point 𝑝 a
metric on 𝑇𝑝 ℳ. In particular, 𝑔 is a metric iff 𝑔 makes the assignment 𝑝 → 𝑔𝑝 for each 𝑝 ∈ ℳ where
the mapping 𝑔𝑝 : 𝑇𝑝 ℳ × 𝑇𝑝 ℳ → ℝ is a metric. Recall the means 𝑔𝑝 is a symmetric, nondegenerate
bilinear form on 𝑇𝑝 ℳ. Relative to a particular coordinate system 𝑥 at 𝑝 we write

𝑔= 𝑔𝑖𝑗 𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗
𝑖,𝑗

In this context 𝑔𝑖𝑗 : 𝑉 → ℝ are assumed to be smooth functions, the values may vary from point to
point in 𝑉 . Furthermore, we know that 𝑔𝑖𝑗 = 𝑔𝑗𝑖 for all 𝑖, 𝑗 ∈ ℕ𝑚 and the matrix [𝑔𝑖𝑗 ] is invertible
222 CHAPTER 8. MANIFOLD THEORY

by the nondegneracy of 𝑔. Recall


∑𝑚 we use the notation 𝑔 𝑖𝑗 for components of the inverse matrix, in
particular we suppose that 𝑘=1 𝑔𝑖𝑘 𝑔 𝑘𝑗 = 𝛿𝑖𝑗 .

Recall that according to Sylvester’s theorem we can choose coordinates at some point 𝑝 which
will diagonalize the metric and leave 𝑑𝑖𝑎𝑔(𝑔𝑖𝑗 ) = {−1, −1, . . . , −1, 1, 1, . . . , 1}. In other words, we
can orthogonalize the coordinate basis at a paricular point 𝑝. The interesting feature of a curved
manifold ℳ is that as we travel away from the point where we straightened the coordinates it is
generally the case the components of the metric will not stay diagonal and constant over the whole
coordinate chart. If it is possible to choose coordinates centered on 𝑉 such that the coordinates are
constantly orthogonal with respect the metric over 𝑉 then the manifold ℳ is said to be flat on 𝑉 .
Examples of flat manifolds include ℝ𝑚 , cylinders and even cones without their point. A manifold
is said to be curved if it is not flat. The definition I gave just now is not probably one you’ll find
in a mathematics text14 . Instead, the curvature of a manifold is quantified through various tensors
which are derived from the metric and its derivatives. In particular, the Ricci and Riemann tensors
are used to carefully characterize the geometry of a manifold. It is very tempting to say more
about the general theory of curvature, but I will resist. If you would like to do further study I can
recommend a few books. We will consider some geometry of embedded two-dimensional manifolds
in ℝ3 . That particular case was studied in the 19-th century by Gauss and others and some of the
notation below goes back to that time.
Example 8.7.2. Consider a regular surface 𝑆 which has a global parametrization 𝜙 : 𝑈 ⊆ ℝ2 →
𝑆 ⊆ ℝ3 . In the usual notation in ℝ3 ,
𝜙(𝑢, 𝑣) = (𝑥(𝑢, 𝑣), 𝑦(𝑢, 𝑣), 𝑧(𝑢, 𝑣))
Consider a curve 𝛾 : [0, 1] → 𝑆 we can calculate the arclength of 𝛾 via the usal calculation in ℝ3 .
The magnitude of velocity 𝛾 ′ (𝑡) is ∣∣𝛾 ′ (𝑡)∣∣ and naturally this gives us 𝑑𝑠 ′
𝑑𝑡 hence 𝑑𝑠 = ∣∣𝛾 (𝑡)∣∣𝑑𝑡 and
the following integral calculates the length of 𝛾,
∫ 1
𝑠𝛾 = ∣∣𝛾 ′ (𝑡)∣∣𝑑𝑡
0

Since 𝛾[0, 1] ⊂ 𝑆 it follows there must exist some two-dimesional curve 𝑡 → (𝑢(𝑡), 𝑣(𝑡)) for which
𝛾(𝑡) = 𝜙(𝑢(𝑡), 𝑣(𝑡)). Observe by the chain rule that
( )
′ ∂𝑥 𝑑𝑢 ∂𝑥 𝑑𝑣 ∂𝑦 𝑑𝑢 ∂𝑦 𝑑𝑣 ∂𝑧 𝑑𝑢 ∂𝑧 𝑑𝑣
𝛾 (𝑡) = + , + , +
∂𝑢 𝑑𝑡 ∂𝑣 𝑑𝑡 ∂𝑢 𝑑𝑡 ∂𝑣 𝑑𝑡 ∂𝑢 𝑑𝑡 ∂𝑣 𝑑𝑡
𝑑𝑢 𝑑𝑣
We can calculate the square of the speed in view of the formula above, let 𝑑𝑡 = 𝑢˙ and 𝑑𝑡 = 𝑣,
˙
∣∣𝛾 ′ (𝑡)∣∣2 = 𝑥2𝑢 𝑢˙ 2 + 2𝑥𝑢 𝑥𝑣 𝑢˙ 𝑣˙ + 𝑥2𝑣 𝑣˙ 2 ,
(

𝑦𝑢2 𝑢˙ 2 + 2𝑦𝑢 𝑦𝑣 𝑢˙ 𝑣˙ + 𝑦𝑣2 𝑣˙ 2 ,


𝑧𝑢2 𝑢˙ 2 + 2𝑧𝑢 𝑧𝑣 𝑢˙ 𝑣˙ + 𝑧𝑣2 𝑣˙ 2
)
(8.3)
14
this was the definitin given in a general relativity course I took with the physicisist Martin Rocek of SUNY Stony
Brook. He then introduced non-coordinate form-fields which kept the metric constant. I may find a way to show you
some of those calculations at the end of this course.
8.7. METRIC TENSOR 223

Collecting together terms which share either 𝑢˙ 2 , 𝑢˙ 𝑣˙ or 𝑣˙ 2 and noting that 𝑥2𝑢 + 𝑦𝑢2 + 𝑧𝑢2 = 𝜙𝑢 ⋅ 𝜙𝑢 ,
𝑥𝑢 𝑥𝑣 + 𝑦𝑢 𝑦𝑣 + 𝑧𝑢 𝑧𝑣 = 𝜙𝑢 ⋅ 𝜙𝑣 and 𝑥2𝑣 + 𝑦𝑣2 + 𝑧𝑣2 = 𝜙𝑣 ⋅ 𝜙𝑣 we obtain:

∣∣𝛾 ′ (𝑡)∣∣2 = 𝜙𝑢 ⋅ 𝜙𝑢 𝑢˙ 2 + 𝜙𝑢 ⋅ 𝜙𝑣 𝑢˙ 𝑣˙ + 𝜙𝑣 ⋅ 𝜙𝑣 𝑣˙ 2

Or, in the notation of Gauss, 𝜙𝑢 ⋅ 𝜙𝑢 = 𝐸, 𝜙𝑢 ⋅ 𝜙𝑣 = 𝐹 and 𝜙𝑣 ⋅ 𝜙𝑣 = 𝐺 hence the arclength on 𝑆 is


given by ∫ 1√
𝑠𝛾 = 𝐸 𝑢˙ 2 + 2𝐹 𝑢˙ 𝑣˙ + 𝐺𝑣˙ 2 𝑑𝑡
0
We discover that on 𝑆 there is a metric induced from the ambient euclidean metric. In the current
coordinates, using (𝑢, 𝑣) = 𝜙−1 ,

𝑔 = 𝐸𝑑𝑢 ⊗ 𝑑𝑢 + 2𝐹 𝑑𝑢 ⊗ 𝑑𝑣 + 𝐺𝑑𝑣 ⊗ 𝑑𝑣

hence the length of a tangent vector is defined via ∣∣𝑋∣∣ = 𝑔(𝑋, 𝑋), we calcate the length of a
curve by integrating its speed along its extent and the speed is simply the magnitude of the tangent
vector at each point. The new thing here is that we judge the magnitude on the basis of a metric
which is intrinsic to the surface.

If arclength on 𝑆 is given by Gauss’ 𝐸, 𝐹, 𝐺 then what about surface area?. We know the magnitude
of the cross product of the tangent vectors 𝜙𝑢 , 𝜙𝑣 on 𝑆 will give us the area of a tiny paralellogram
corresponding to a change 𝑑𝑢 in 𝑢 and 𝑑𝑣 in 𝑣. Thus:

𝑑𝐴 = ∣∣𝜙𝑢 × 𝜙𝑣 ∣∣2 𝑑𝑢𝑑𝑣

However, Lagrange’s identity says ∣∣𝜙𝑢 × 𝜙𝑣 ∣∣2 = ∣∣𝜙𝑢 ∣∣2 ∣∣𝜙𝑣 ∣∣2 − 𝜙𝑢 ⋅ 𝜙𝑣 hence 𝑑𝐴 = 𝐸𝐹 − 𝐺2 𝑑𝑢 𝑑𝑣
and we can calculate surface area (if this integral exists!) via
∫ √
𝐴𝑟𝑒𝑎(𝑆) = 𝐸𝐺 − 𝐹 2 𝑑𝑢 𝑑𝑣.
𝑈

I make use of the standard notation for double integrals from multivariate calculus and the integra-
tion is to be taken over the domain of the parametrization of 𝑆.

Many additional formulas are known for 𝐸, 𝐹, 𝐺 and there are entire texts devoted to exploring
the geometric intracies of surfaces in ℝ3 . For example, John Oprea’s Differential Geometry and
its Applications. Theorem 4.1 of that text is the celebrated Theorem Egregium of Gauss which
states the curvature of a surface depends only on the metric of the surface as given by 𝐸, 𝐹, 𝐺. In
particular, ( ( ) ( ))
−1 ∂ 𝐸𝑣 ∂ 𝐺𝑢
𝐾= √ √ + √ .
2 𝐸𝐺 ∂𝑣 𝐸𝐺 ∂𝑢 𝐸𝐺
Where curvature at 𝑝 is defined by 𝐾(𝑝) = 𝑑𝑒𝑡(𝑆𝑝 ) and 𝑆𝑝 is the shape operator is defined
by the covariant derivative 𝑆𝑝 (𝑣) = −∇𝑣 𝑈 = −(𝑣(𝑈1 ), 𝑣(𝑈2 ), 𝑣(𝑈3 )) and 𝑈 is simply the normal
vector field to 𝑆 defined by 𝑈 (𝑢, 𝑣) = 𝜙𝑢 × 𝜙𝑣 in our current notation.
224 CHAPTER 8. MANIFOLD THEORY

It turns out there is an easier way to calculate curvature via wedge products. I will hopefully show
how that is done in the next chapter. However, I do not attempt to motivate why the curvature is
called curvature. You really should read something like Oprea if you want those thoughts.

Example 8.7.3. Let ℳ = ℝ4 and choose an atlas of charts which are all intertially related to
the standard Cartesian coordinates on ℝ4 . In other words, we allow coordinates 𝑥 ¯ which can be
obtained ¯ = Λ𝑥 and Λ ∈ ℝ 4×4 𝑇
∑ from a Lorentz transformation; 𝑥 such that Λ 𝜂Λ = 𝜂. Define
𝑔 = 3𝜇,𝜈=0 𝜂𝜇𝜈 𝑑𝑥𝜇 ⊗ 𝑑𝑥𝜈 for the standard Cartesian coordinates on ℝ4 . We can show that the
metric is invariant as we change coordinates, if you calculate the components of 𝑔 in some other
coordinate system then you will once more obtain 𝜂𝜇𝜈 as the components. This means that if we
can write the equation for the interval between events in one coordinate system then that inter-
val equation must also hold true in any other inertial coordinate system. In particle physics this
is a very useful observation because it means if we want to analyze an relativistic interaction then
we can study the problem in the frame of reference which makes the problem simplest to understand.

In physics a coordinate system if also called a ”frame of reference”, technically there is something
missing from our construction of ℳ from a relativity perspective. As a mathematical model of
spacetime ℝ4 is not quite right. Why? Because Einstein’s first axiom or postulate of special relativity
is that there is no ”preferred frame of reference”. With ℝ4 there certainly is a preferred frame, it’s
impicit within the very definition of the set ℝ4 , we get Cartesian coordinates for free. To eliminate
this convenient set of, according to Einstein, unphysical coordinates you have to consider an affine
space which is diffeomorphic to ℝ4 . If you take modern geometry you’ll learn all about affine space.
I will not pursue it further here, and as a bad habit I tend to say ℳ paired with 𝜂 is ”minkowski
space”. Technically this is not quite right for the reasons I just explained.

8.8 on boundaries and submanifolds


A manifold with boundary15 is basically just a manifold which has an edge which is also a manifold.
The boundary of a disk is a circle. In fact, in general, a closed ball in (𝑛 + 1)-dimensional euclidean
space has a boundary which is the 𝑆𝑛 sphere.

The boundary of quadrants I and II of the 𝑥𝑦-plane is the 𝑥-axis. Or, to generalize this example,
we define the upper-half of ℝ𝑛 as follows:
𝑛
ℍ = {(𝑥1 , 𝑥2 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ) ∈ ℝ𝑛 ∣ 𝑥𝑛 ≥ 0}.

The boundary of ℍ 𝑛 is the 𝑥1 𝑥2 ⋅ ⋅ ⋅ 𝑥𝑛−1 -hyperplane which is the solution set of 𝑥𝑛 = 0 in ℝ𝑛 ; we


can denote the boundary by ∂ℍ 𝑛 hence, ∂ℍ 𝑛 = ℝ 𝑛−1 × {0}. Furthermore, we define
𝑛
ℍ + = {(𝑥1 , 𝑥2 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ) ∈ ℝ𝑛 ∣ 𝑥𝑛 > 0}.
15
I am glossing over some analytical details here concerning extensions and continuity, smoothness etc... see section
24 of Munkres a bit more detail in the embedded case.
8.8. ON BOUNDARIES AND SUBMANIFOLDS 225

It follows that ℍ 𝑛 = ℍ+𝑛 ∪ ℝ 𝑛−1 × {0}. Note that a subset 𝑈 of ℍ 𝑛 is said to be open in ℍ 𝑛
iff there exists some open set 𝑈 ′ ⊆ ℝ𝑛 such that 𝑈 ′ ∩ ℍ 𝑛 = 𝑈 . For example, if we consider ℝ3
then the open sets in the 𝑥𝑦-plane are formed from intesecting open sets in ℝ3 with the plane; an
open ball intersects to give an open disk on the plane. Or for ℝ2 an open disks intersected with
the 𝑥-axis give open intervals.

Definition 8.8.1.
We say ℳ is a smooth 𝑚-dimensional manifold with boundary iff there exists a family
{𝑈𝑖 } of open subsets of ℝ𝑚 or ℍ 𝑚 and local parameterizations 𝜙𝑖 : 𝑈𝑖 → 𝑉𝑖 ⊆ ℳ such
that the following criteria hold:

1. each map 𝜙𝑖 : 𝑈𝑖 → 𝑉𝑖 is injective

2. if 𝑉𝑖 ∩ 𝑉𝑗 ∕= ∅ then there exists a smooth mapping

𝜃𝑖𝑗 : 𝜙−1 −1
𝑗 (𝑉𝑖 ∩ 𝑉𝑗 ) → 𝜙𝑖 (𝑉𝑖 ∩ 𝑉𝑗 )

such that 𝜙𝑗 = 𝜙𝑖 ∘ 𝜃𝑖𝑗

3. 𝑀 = ∪𝑖 𝜙𝑖 (𝑈𝑖 )

We again refer to the inverse of a local paramterization as a coordinate chart and often
use the notation 𝜙−1 (𝑝) = (𝑥1 (𝑝), 𝑥2 (𝑝), . . . , 𝑥𝑚 (𝑝)). If there exists 𝑈 open in ℝ𝑚 such that
𝜙 : 𝑈 → 𝑉 is a local parametrization with 𝑝 ∈ 𝑉 then 𝑝 is an interior point. Any point
𝑝 ∈ ℳ which is not an interior point is a boundary point. The set of all boundary points
is called boundary of ℳ is denoted ∂ℳ.
A more pragmatic characterization16 of a boundary point is that 𝑝 ∈ ∂ℳ iff there exists a chart
at 𝑝 such that 𝑥𝑚 (𝑝) = 0. A manifold without boundary is simply a manifold in our definition
since the definitions match precisely if there are no half-space-type charts. In the case that ∂ℳ is
nonempty we can show that it forms a manifold without boundary. Moreover, the atlas for ∂ℳ is
naturally induced from that of ℳ by restriction.

Proposition 8.8.2.

Suppose ℳ is a smooth 𝑚-dimensional manifold with boundary ∂ℳ ∕= ∅ then ∂ℳ is a


smooth manifold of dimension 𝑚 − 1. In other words, ∂ℳ is an 𝑚 − 1 dimensional manifold
with boundary and ∂(∂ℳ) = ∅.
Proof: Let 𝑝 ∈ ∂ℳ and suppose 𝜙 : 𝑈 ⊆ ℍ 𝑚 → 𝑉 ⊆ ℳ is a local parametrization containing
𝑝 ∈ 𝑉 . It follows 𝜙−1 = (𝑥1 , 𝑥2 , . . . , 𝑥𝑚−1 , 𝑥𝑚 ) : 𝑉 → 𝑈 is a chart at 𝑝 with 𝑥𝑚 (𝑝) = 0. De-
fine the restriction of 𝜙−1 to 𝑥𝑚 = 0 by 𝜓 : 𝑈 ′ → 𝑉 ∩ (∂ℳ) by 𝜓(𝑢) = 𝜙(𝑢, 0) where 𝑈 ′ =
{(𝑢1 , . . . , 𝑢𝑚−1 ) ∈ ℝ 𝑚−1 ∣ (𝑢1 , . . . , 𝑢𝑚−1 , 𝑢𝑚 ) ∈ 𝑈 }. It follows that 𝜓 −1 : 𝑉 ∩ (∂ℳ) → 𝑈 ′ ⊆ ℝ 𝑚−1
is just the first 𝑚 − 1 coordinates of the chart 𝜙−1 which is to say 𝜓 −1 = (𝑥1 , 𝑥2 , . . . , 𝑥𝑚−1 ). We
16
I leave it to the reader to show this follows from the words in green.
226 CHAPTER 8. MANIFOLD THEORY

construct charts in this fashion at each point in ∂ℳ. Note that 𝑈 ′ is open in ℝ 𝑚−1 hence the man-
ifold ∂ℳ only has interior points. There is no parametrization in ∂ℳ which takes a boundary-type
subset half-plane as its domain. It follows that ∂(∂ℳ) = ∅. I leave compatibility and smoothness
of the restricted charts on ∂ℳ to the reader. □

Given the terminology in this section we should note that there are shapes of interest which simply
do no fit our terminology. For example, a rectangle 𝑅 = [𝑎, 𝑏] × [𝑐, 𝑑] is not a manifold with bound-
ary since if it were we would have a boundary with sharp edges (which is not a smooth manifold!).

I have not included a full discussion of submanifolds in these notes. However, I would like to
give you some brief comments concerning how they arise from particular functions. In short, a
submanifold is a subset of a manifold which also a manifold in a natural manner. Burns and Gidea
define for a smooth mapping 𝑓 from a manifold ℳ to another manifold 𝒩 that
a 𝑝 ∈ ℳ is a critical point of 𝑓 if 𝑑𝑝 𝑓 : 𝑇𝑝 ℳ → 𝑇𝑓 (𝑝) 𝒩 is not surjective. Moreover, the image
𝑓 (𝑝) is called the critical value of 𝑓 .

b 𝑝 ∈ ℳ is a regular point of 𝑓 if 𝑝 is not critical. Moreover, 𝑞 ∈ 𝒩 is called a regular value


of 𝑓 iff 𝑓 −1 {𝑞} contains no critical points.
It turns out that:
Theorem 8.8.3.
If 𝑓 : ℳ → 𝒩 is a smooth function on smooth manifolds ℳ, ℳ of dimensions 𝑚, 𝑛
respective and 𝑞 ∈ 𝒩 is a regular value of 𝑓 with nonempty fiber 𝑓 −1 {𝑞} then the fiber
𝑓 −1 {𝑞} is a submanifold of ℳ of dimension (𝑚 − 𝑛).
Proof: see page 46 of Burns and Gidea. □.

The idea of this theorem is a variant of the implicit function theorem. Recall if we are given
𝐺 : ℝ𝑘 × ℝ𝑛 → ℝ𝑛 then the local solution 𝑦 = ℎ(𝑥) of 𝐺(𝑥, 𝑦) = 𝑘 exists provided ∂𝐺 ∂𝑦 is invertible.
But, this local solution suitably restricted is injective and hence the mapping 𝜙(𝑥) = (𝑥, ℎ(𝑥)) is a
local parametrization of a manifold in ℝ𝑘 × ℝ𝑛 . In fact, the graph 𝑦 = ℎ(𝑥) gives 𝑘-dimensional
submanifold of the manifold ℝ𝑘 × ℝ𝑛 . (think of ℳ = ℝ𝑘 × ℝ𝑛 hence 𝑚 = 𝑘 + 𝑛 and 𝑚 − 𝑛 = 𝑘 so
we find agreement with the theorem above at least in the concrete case of level-sets)

Example 8.8.4. Consider 𝑓 : ℝ2 → ℝ defined by 𝑓 (𝑥, 𝑦) = 𝑥2 + 𝑦 2 . Calculate 𝑑𝑓 = 2𝑥𝑑𝑥 + 2𝑦𝑑𝑦


we find that the only critical value of 𝑓 is (0, 0) since otherwise either 𝑥 or 𝑦 is nonzero and as a
consequence 𝑑𝑓 is surjective. It follows that 𝑓 −1 {𝑅2 } is a submanifold of ℝ2 for any 𝑅 > 0. I think
you’ve seen these submanifolds before. What are they?
Example 8.8.5. Consider 𝑓 : ℝ3 → ℝ defined by 𝑓 (𝑥, 𝑦, 𝑧) = 𝑧 2 − 𝑥2 − 𝑦 2 calculate that 𝑑𝑓 =
−2𝑥𝑑𝑥 − 2𝑦𝑑𝑦 + 2𝑧𝑑𝑧. Note (0, 0, 0) is a critical value of 𝑓 . Furthermore, note 𝑓 −1 {0} is the cone
𝑧 2 = 𝑥2 + 𝑦 2 which is not a submanifold of ℝ3 . It turns out that in general just about anything
8.8. ON BOUNDARIES AND SUBMANIFOLDS 227

can arise as the inverse image of a critical value. It could happen that the inverse image is a
submanifold, it’s just not a given.

Theorem 8.8.6.
If ℳ be a smooth manifold without boundary and 𝑓 : ℳ → ℝ is a smooth function with a
regular value 𝑎 ∈ ℝ then 𝑓 −1 (−∞, 𝑎] is a smooth manifold with boundar 𝑓 −1 {𝑎}.
Proof: see page 50 of Burns and Gidea. □.

Example 8.8.7. Suppose 𝑓 : ℝ𝑚 → ℝ is defined by 𝑓 (𝑥) = ∣∣𝑥∣∣2 then 𝑥 = 0 is the only critical value
of 𝑓 and we find 𝑓 −1 (−∞, 𝑅2 ] is a submanifold with boundary 𝑓 −1 {𝑟2 }. Note that 𝑓 −1 (−∞, 0) = ∅
in this case. However, perhaps you also see 𝐵 𝑚 = 𝑓 −1 [0, 𝑅2 ] is the closed 𝑚-ball and ∂𝐵 𝑚 =
𝑆𝑚−1 (𝑅) is the (𝑚 − 1)-sphere of radius 𝑅.

Theorem 8.8.8.
Let ℳ be a smooth manifold with boundary ∂𝑀 and 𝒩 a smooth manifold without bound-
ary. If 𝑓 : ℳ → 𝒩 and 𝑓 ∣∂ℳ : ∂ℳ → 𝒩 have regular value 𝑞 ∈ 𝒩 then 𝑓 −1 {𝑞} is a smooth
(𝑚 − 𝑛)-dimensional manifold with boundary 𝑓 −1 {𝑞} ∩ ∂ℳ.
Proof: see page 50 of Burns and Gidea. □.

This theorem would seem to give us a generalization of the implicit function theorem for some
closed sets. Interesting. Finally, I should mention that it is customary to also allow use the set
𝕃1 = {𝑥 ∈ ℝ ∣ 𝑥 ≤ 0} as the domain of a parametrization in the case of one-dimensional manifolds.
228 CHAPTER 8. MANIFOLD THEORY
Chapter 9

differential forms

9.1 algebra of differential forms


In this section we apply the results of the previous section on exterior algebra to the vector space

𝑉 = 𝑇𝑝 𝑀. Recall that { ∂𝑥 𝑖 ∣𝑝 } is a basis of 𝑇𝑝 𝑀 and thus the basis {𝑒𝑖 } of 𝑉 utilized throughout
the previous section on exterior algebra will be taken to be

𝑒𝑖 = ∣ , 1≤𝑖≤𝑛
∂𝑥𝑖 𝑝
in this section. Also recall that the set of covectors {𝑑𝑥𝑖 } is a basis of 𝑇𝑝∗ 𝑀 which is dual to { ∂𝑥

𝑖 ∣𝑝 }
𝑗
and consequently the {𝑒 } in the previous section is taken to be

𝑒𝑗 = 𝑑𝑥𝑗 , 1≤𝑗≤𝑛

in the present context. With these choices the machinery of the previous section takes over and
one obtains a vector space ∧𝑘 (𝑇𝑝 𝑀 ) for each 1 ≤ 𝑘 and for arbitrary 𝑝 ∈ 𝑀 . We write ∧𝑘 𝑇 𝑀
for the set of ordered pairs (𝑝, 𝛼) where 𝑝 ∈ 𝑀 and 𝛼 ∈ ∧𝑘 (𝑇𝑝 𝑀 ) and we refer to ∧𝑘 (𝑇 𝑀 ) as the
k-th exterior power of the tangent bundle 𝑇 𝑀 . There is a projection 𝜋 : ∧𝑘 (𝑇 𝑀 ) → 𝑀 defined by
𝜋(𝑝, 𝛼) = 𝑝 for (𝑝, 𝛼) ∈ ∧𝑘 (𝑇 𝑀 ). One refers to (∧𝑘 𝑇 𝑀, 𝜋) as a vector bundle for reasons we do not
pursue at this point. To say that 𝛼 ˆ is a section of this vector bundle means that 𝛼ˆ : 𝑀 → ∧𝑘 (𝑇 𝑀 )
is a (smooth) function such that 𝛼 𝑘
ˆ (𝑝) ∈ ∧ (𝑇𝑝 𝑀 ) for all 𝑝 ∈ 𝑀 . Such functions are also called
differential forms, or in this case, k-forms.

Definition 9.1.1. vector field on open subset of ℝ𝑛 .


To say that 𝑋 is a vector field on an open subset 𝑈 of 𝑀 means that
∂ ∂ ∂
𝑋 = 𝑋1 1
+ 𝑋2 2 + ⋅ ⋅ ⋅ 𝑋𝑛 𝑛
∂𝑥 ∂𝑥 ∂𝑥
where 𝑋 1 , 𝑋 2 , ⋅ ⋅ ⋅ , 𝑋 𝑛 are smooth functions from 𝑈 into R.

229
230 CHAPTER 9. DIFFERENTIAL FORMS

Note that in this context we implicitly require that differential forms be smooth. To explain this
we write out the requirements more fully below.

If 𝛽 is a function with domain 𝑀 such that for each 𝑝 ∈ 𝑀 , 𝛽(𝑝) ∈ ∧𝑘 (𝑇𝑝 𝑀 ) then 𝛽 is called a
differential k-form on 𝑀 if for all local vector fields 𝑋1 , 𝑋2 , ⋅ ⋅ ⋅ , 𝑋𝑘 defined on an arbitrary open
subset 𝑈 of 𝑀 it follows that the map defined by

𝑝 → 𝛽𝑝 (𝑋1 (𝑝), 𝑋2 (𝑝), ⋅ ⋅ ⋅ , 𝑋𝑘 (𝑝))

is smooth on 𝑈 . For example if (𝑥1 , 𝑥2 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) is a chart then its domain 𝑉 ⊂ 𝑀 is open in 𝑀


and the map
𝑝 → 𝑑𝑝 𝑥𝑖
is a differential 1-form on 𝑈 . Similarly the map

𝑝 → 𝑑𝑝 𝑥𝑖 ∧ 𝑑𝑝 𝑥𝑗

is a differential 2-form on 𝑈 . Generally if 𝛽 is a 1-form and (𝑥1 , 𝑥2 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) is a chart then there


are functions (𝑏𝑖 ) defined on the domain of 𝑥 such that

𝛽(𝑞) = 𝑏𝑖 (𝑞)𝑑𝑞 𝑥𝑖
𝜇

for all 𝑞 in the domain of 𝑥.


Similarly if 𝛾 is a 2-form on 𝑀 and 𝑥 = (𝑥1 , 𝑥2 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) is any chart on 𝑀 then there are smooth
functions 𝑐𝑖𝑗 on 𝑑𝑜𝑚(𝑥) such that
𝑛
1 ∑
𝛾𝑝 = 𝑐𝑖𝑗 (𝑝)(𝑑𝑝 𝑥𝑖 ∧ 𝑑𝑝 𝑥𝑗 )
2
𝑖,𝑗=1

and such that 𝑐𝑖𝑗 (𝑝) = −𝑐𝑗𝑖 (𝑝) for all 𝑝 ∈ 𝑑𝑜𝑚(𝑥).
Generally if 𝛼 is a 𝑘-form and 𝑥 is a chart then on 𝑑𝑜𝑚(𝑥)
∑ 1
𝛼𝑝 = 𝑎𝑖 𝑖 ⋅⋅⋅𝑖 (𝑝)(𝑑𝑝 𝑥𝑖1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑝 𝑥𝑖𝑘 )
𝑘! 1 2 𝑘
where the {𝑎𝑖1 𝑖2 ⋅⋅⋅𝑖𝑘 } are smooth real-valued functions on 𝑈 = 𝑑𝑜𝑚(𝑥) and 𝛼𝑖𝜎1 𝑖𝜎2 ⋅⋅⋅𝑖𝜎𝑘 = 𝑠𝑔𝑛(𝜎)𝑎𝑖1 𝑖2 ⋅⋅⋅𝑖𝑘 ,
for every permutation 𝜎. (this is just a fancy way of saying if you switch any pair of indices it
generates a minus sign).

The algebra of differential forms follows the same rules as the exterior algebra we previously dis-
cussed. Remember, a differential form evaluated a particular point gives us a wedge product of a
bunch of dual vectors. It follows that the differential form in total also follows the general properties
of the exterior algebra.

Theorem 9.1.2.
9.2. EXTERIOR DERIVATIVES: THE CALCULUS OF FORMS 231

If 𝛼 is a 𝑝-form, 𝛽 is a 𝑘-form, and 𝛾 is a 𝑙-form on 𝑀 then

1. 𝛼 ∧ (𝛽 ∧ 𝛾) = (𝛼 ∧ 𝛽) ∧ 𝛾

2. 𝛼 ∧ 𝛽 = (−1)𝑝𝑘 (𝛽 ∧ 𝛼)

3. 𝛼 ∧ (𝑎𝛽 + 𝑏𝛾) = 𝑎(𝛼 ∧ 𝛽) + 𝑏(𝛼 ∧ 𝛾) 𝑎, 𝑏 ∈ ℝ

.
Notice that in ℝ3 the set of differential forms
ℬ = {1, 𝑑𝑥, 𝑑𝑦, 𝑑𝑧, 𝑑𝑦 ∧ 𝑑𝑧, 𝑑𝑧 ∧ 𝑑𝑥, 𝑑𝑥 ∧ 𝑑𝑦, 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧}
is a basis of the space of differential forms in the sense that every form on ℝ3 is a linear combination
of the forms in ℬ with smooth real-valued functions on ℝ3 as coefficients.
Example 9.1.3. Let 𝛼 = 𝑓 𝑑𝑥 + 𝑔𝑑𝑦 and let 𝛽 = 3𝑑𝑥 + 𝑑𝑧 where 𝑓, 𝑔 are functions. Find 𝛼 ∧ 𝛽,
write the answer in terms of the basis defined in the Remark above,
𝛼 ∧ 𝛽 = (𝑓 𝑑𝑥 + 𝑔𝑑𝑦) ∧ (3𝑑𝑥 + 𝑑𝑧)
= 𝑓 𝑑𝑥 ∧ (3𝑑𝑥 + 𝑑𝑧) + 𝑔𝑑𝑦 ∧ (3𝑑𝑥 + 𝑑𝑧)
(9.1)
= 3𝑓 𝑑𝑥 ∧ 𝑑𝑥 + 𝑓 𝑑𝑥 ∧ 𝑑𝑧 + 3𝑔𝑑𝑦 ∧ 𝑑𝑥 + 𝑔𝑑𝑦 ∧ 𝑑𝑧
= −𝑔𝑑𝑦 ∧ 𝑑𝑧 − 𝑓 𝑑𝑧 ∧ 𝑑𝑥 − 3𝑔𝑑𝑥 ∧ 𝑑𝑦
Example 9.1.4. Top form: Let 𝛼 = 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 and let 𝛽 be any other form with degree 𝑝 > 0.
We argue that 𝛼 ∧ 𝛽 = 0. Notice that if 𝑝 > 0 then there must be at least one differential inside 𝛽
so if that differential is 𝑑𝑥𝑘 we can rewrite 𝛽 = 𝑑𝑥𝑘 ∧ 𝛾 for some 𝛾. Then consider,
𝛼 ∧ 𝛽 = 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑥𝑘 ∧ 𝛾 (9.2)
now 𝑘 has to be either 1, 2 or 3 therefore we will have 𝑑𝑥𝑘 repeated, thus the wedge product will be
zero. (can you prove this?).

9.2 exterior derivatives: the calculus of forms


The operation ∧ depends only on the values of the forms point by point. We define an operator 𝑑
on differential forms which depends not only on the value of the differential form at a point but on
its value in an entire neighborhood of the point. Thus if 𝛽 ia 𝑘-form then to define 𝑑𝛽 at a point
𝑝 we need to know not only the value of 𝛽 at 𝑝 but we also need to know its value at every 𝑞 in a
neighborhood of 𝑝.

You might note the derivative below does not directly involve the construction of differential forms
from tensors. Also, the rule given below is easily taken as a starting point for formal calculations.
In other words, even if you don’t understant the nuts and bolts of manifold theory you can still
calculate with differential forms. In the same sense that highschool students ”do” calculus, you can
”do” differential form calculations. I don’t believe this is a futile exercise so long as you understand
you have more to learn. Which is not to say we don’t know some things!
232 CHAPTER 9. DIFFERENTIAL FORMS

Definition 9.2.1. the exterior derivative.


1
If 𝛽 is a 𝑘-form and 𝑥 = (𝑥1 , 𝑥2 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) is a chart and 𝛽 = 𝐼

𝐼 𝑘! 𝛽𝐼 𝑑𝑥 and we define a
(𝑘 + 1)-form 𝑑𝛽 to be the form
∑ 1
𝑑𝛽 = 𝑑𝛽𝐼 ∧ 𝑑𝑥𝐼 .
𝑘!
𝐼

Where 𝑑𝛽𝐼 is defined as it was in calculus III,


𝑛
∑ ∂𝛽𝐼
𝑑𝛽𝐼 = 𝑑𝑥𝑗 .
∂𝑥𝑗
𝑗=1

Note that 𝑑𝛽𝐼 is well-defined as


𝛽𝐼 = 𝛽𝑖1 𝑖2 ⋅⋅⋅𝑖𝑘

is just a real-valued function on 𝑑𝑜𝑚(𝑥). The definition in an expanded form is given by


𝑛 𝑛 𝑛
1 ∑ ∑ ∑
𝑑𝑝 𝛽 = ⋅⋅⋅ (𝑑𝑝 𝛽𝑖1 𝑖2 ⋅⋅⋅𝑖𝑘 ) ∧ 𝑑𝑝 𝑥𝑖1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑝 𝑥𝑖𝑘
𝑘!
𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

where
𝑛 𝑛 𝑛
1 ∑ ∑ ∑
𝛽𝑞 = ⋅⋅⋅ 𝛽𝑖1 𝑖2 ⋅⋅⋅𝑖𝑘 (𝑞)𝑑𝑞 𝑥𝑖1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑝 𝑥𝑖𝑘 .
𝑘!
𝑖1 =1 𝑖2 =1 𝑖𝑘 =1

9.2.1 coordinate independence of exterior derivative


The Einstein summation convention is used in this section and throughout the remainder
of this chapter, please feel free to email me if it confuses you somewhere. When an index is
repeated in a single summand it is implicitly assumed there is a sum over all values of that
index
It must be shown that this definition is independent of the chart used to define 𝑑𝛽. Suppose for
example, that
𝛽𝑞 = 𝛽 𝐽 (𝑞)(𝑑𝑞 𝑥𝑗1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑞 𝑥𝑗𝑘 )

for all 𝑞 in the domain of a chart (𝑥1 , 𝑥2 , ⋅ ⋅ ⋅ 𝑥𝑛 ) where

𝑑𝑜𝑚(𝑥) ∩ 𝑑𝑜𝑚(𝑥), ∕= ∅ .

We assume, of course that the coefficients {𝛽 𝐽 (𝑞)} are skew-symmetric in 𝐽 for all 𝑞. We will
have defined 𝑑𝛽 in this chart by
𝑑𝛽 = 𝑑𝛽 𝐽 ∧ 𝑑𝑥𝐽 .
9.2. EXTERIOR DERIVATIVES: THE CALCULUS OF FORMS 233

We need to show that 𝑑𝑝 𝛽 𝐽 ∧ 𝑑𝑝 𝑥𝐽 = 𝑑𝑝 𝛽𝐼 ∧ 𝑑𝑝 𝑥𝐼 for all 𝑝 ∈ 𝑑𝑜𝑚(𝑥) ∩ 𝑑𝑜𝑚(𝑥) if this definition is


to be meaningful. Since 𝛽 is given to be a well-defined form we know

𝛽𝐼 (𝑝)𝑑𝑝 𝑥𝐼 = 𝛽𝑝 = 𝛽 𝐽 (𝑝)𝑑𝑝 𝑥𝐽 .

Using the identities


∂𝑥𝑗 𝑖
𝑑𝑥𝑗 = 𝑑𝑥
∂𝑥𝑖
we have
∂𝑥𝑗1 ∂𝑥𝑗𝑘 ∂𝑥𝑗𝑘
𝛽𝐼 𝑑𝑥𝐼 = 𝛽 𝐽 ⋅ ⋅ ⋅ 𝑑𝑥𝐼
∂𝑥𝑖1 ∂𝑥𝑖𝑘 ∂𝑥𝑖𝑘
so that
∂𝑥𝑗1 ∂𝑥𝑗2 ∂𝑥𝑗𝑘
( )
𝛽𝐼 = 𝛽 𝐽 ⋅ ⋅ ⋅ .
∂𝑥𝑖1 ∂𝑥𝑖2 ∂𝑥𝑖𝑘
Consequently,
( )
∂𝛽𝐽 ∂ 𝑖 ∂𝑥𝑖𝑘
𝑑𝛽𝐽 ∧ 𝑑𝑥𝐽 = ∂𝑥𝜆
(𝑑𝑥𝜆 ∧ 𝑑𝑥𝐽 ) = [𝛽 ∂𝑥 1 ⋅ ⋅ ⋅ ∂𝑥
∂𝑥𝜆 ( 𝐼 ∂𝑥𝑗1
𝜆
𝑗𝑘 ](𝑑𝑥 ∧ 𝑑𝑥 )
𝐽
)
∗ ∂𝛽 𝐼 ∂𝑥𝑖1 ∂𝑥𝑖𝑘
= ∂𝑥 𝜆 ∂𝑥𝑗(
1
⋅ ⋅ ⋅ ∂𝑥 𝑗𝑘 (𝑑𝑥𝜆 ∧ 𝑑𝑥𝐽 )
)
∑ ∂𝑥𝑖1 ∂ 2 𝑥𝑖𝑟 ∂𝑥𝑖𝑘
+𝛽 𝐼 𝑟 ∂𝑥 𝑗 ⋅ ⋅ ⋅ 𝜆 𝑗 ⋅ ⋅ ⋅ 𝑗 (𝑑𝑥𝜆 ∧ 𝑑𝑥𝐽 )
( 1 𝑖 ∂𝑥)∂𝑥 𝑟 ∂𝑥
(𝑘𝑖 )
∂𝛽 𝐼 𝜆 ∧ ∂𝑥 1 𝑑𝑥𝑗1 ∧ ⋅ ⋅ ⋅ ∧ ∂𝑥 𝑘 𝑑𝑥𝑗𝑘
= ∂𝑥 𝜆 𝑑𝑥 ∂𝑥𝑗1 ∂𝑥𝑗𝑘
∂𝛽 𝐼 [( ∂𝑥𝑝
= ∂𝑥𝑝 ∂𝑥𝜆 𝑑𝑥 ∧ 𝑑𝑥 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑖𝑘
𝜆 𝑖
) ]
1

= 𝑑𝛽 𝐼 ∧ 𝑑𝑥𝐼

where in (*) the sum is zero since:
𝑟

∂ 2 𝑥 𝑖𝑟 𝜆 𝐽 ∂ 2 𝑥 𝑖𝑟
(𝑑𝑥 ∧ 𝑑𝑥 ) = ± [(𝑑𝑥𝜆 ∧ 𝑑𝑥𝑗𝑟 ) ∧ 𝑑𝑥𝑗1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥
ˆ 𝑗𝑟 ∧ ⋅ ⋅ ⋅ 𝑑𝑥𝑗𝑘 ] = 0.
∂𝑥𝜆 ∂𝑥𝑗𝑟 ∂𝑥𝜆 ∂𝑥𝑗𝑟
It follows that 𝑑𝛽 is independent of the coordinates used to define it.

Consequently we see that for each 𝑘 the operator 𝑑 maps ∧𝑘 (𝑀 ) into ∧𝑘+1 (𝑀 ) and has the following
properties:

Theorem 9.2.2. properties of the exterior derivative.

If 𝛼 ∈ ∧𝑘 (𝑀 ), 𝛽 ∈ ∧𝑙 (𝑀 ) and 𝑎, 𝑏 ∈ R then

1. 𝑑(𝑎𝛼 + 𝑏𝛽) = 𝑎(𝑑𝛼) + 𝑏(𝑑𝛽)

2. 𝑑(𝛼 ∧ 𝛽) = (𝑑𝛼 ∧ 𝛽) + (−1)𝑘 (𝛼 ∧ 𝑑𝛽)

3. 𝑑(𝑑𝛼) = 0
234 CHAPTER 9. DIFFERENTIAL FORMS

Proof: The proof of (1) is obvious. To prove (2), let 𝑥 = (𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) be a chart on 𝑀 then
(ignoring the factorial coefficients)
𝑑(𝛼 ∧ 𝛽) = 𝑑(𝛼𝐼 𝛽𝐽 ) ∧ 𝑑𝑥𝐼 ∧ 𝑑𝑥𝐽 = (𝛼𝐼 𝑑𝛽𝐽 + 𝛽𝐽 𝑑𝛼𝐼 ) ∧ 𝑑𝑥𝐼 ∧ 𝑑𝑥𝐽
= 𝛼𝐼 (𝑑𝛽𝐽 ∧ 𝑑𝑥𝐼 ∧ 𝑑𝑥𝐽 )
+𝛽𝐽 (𝑑𝛼𝐼 ∧ 𝑑𝑥𝐼 ∧ 𝑑𝑥𝐽 )
= 𝛼𝐼 (𝑑𝑥𝐼 ∧ (−1)𝑘 (𝑑𝛽𝐽 ∧ 𝑑𝑥𝐽 ))
+𝛽𝐽 ((𝑑𝛼𝐼 ∧ 𝑑𝑥𝐼 ) ∧ 𝑑𝑥𝐽 )
= (𝛼 ∧ (−1)𝑘 𝑑𝛽) + 𝛽𝐽 (𝑑𝛼 ∧ 𝑑𝑥𝐽 )
= 𝑑𝛼 ∧ 𝛽 + (−1)𝑘 (𝛼 ∧ 𝑑𝛽) .

9.2.2 exterior derivatives on ℝ3


We begin by noting that vector fields may correspond either to a one-form or to a two-form.
Definition 9.2.3. dictionary of vectors verses forms on ℝ3 .
⃗ = (𝐴1 , 𝐴2 , 𝐴3 ) denote a vector field in ℝ3 . Define then,
Let 𝐴

𝜔𝐴 = 𝛿𝑖𝑗 𝐴𝑖 𝑑𝑥𝑗 = 𝐴𝑖 𝑑𝑥𝑖

⃗ Also define
which we will call the work-form of 𝐴.
1 1
Φ𝐴 = 𝛿𝑖𝑘 𝐴𝑘 𝜖𝑖𝑗𝑘 (𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ) = 𝐴𝑖 𝜖𝑖𝑗𝑘 (𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 )
2 2

which we will call the flux-form of 𝐴.
If you accept the primacy of differential forms, then you can see that vector calculus confuses two
separate objects. Apparently there are two types of vector fields. In fact, if you have studied coor-
dinate change for vector fields deeply then you will encounter the qualifiers axial or polar vector
fields. Those fields which are axial correspond directly to two-forms whereas those correspondant
to one-forms are called polar. As an example, the magnetic field is axial whereas the electric field
is polar.
Example 9.2.4. Gradient: Consider three-dimensional Euclidean space. Let 𝑓 : ℝ3 → ℝ then
∂𝑓 𝑖
𝑑𝑓 = 𝑑𝑥 = 𝜔∇𝑓
∂𝑥𝑖
which gives the one-form corresponding to ∇𝑓 .
Example 9.2.5. Curl: Consider three-dimensional Euclidean space. Let 𝐹⃗ be a vector field and
let 𝜔𝐹 = 𝐹𝑖 𝑑𝑥𝑖 be the corresponding one-form then
𝑑𝜔𝐹 = 𝑑𝐹𝑖 ∧ 𝑑𝑥𝑖
= ∂𝑗 𝐹𝑖 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖
= ∂𝑥 𝐹𝑦 𝑑𝑥 ∧ 𝑑𝑦 + ∂𝑦 𝐹𝑥 𝑑𝑦 ∧ 𝑑𝑥 + ∂𝑧 𝐹𝑥 𝑑𝑧 ∧ 𝑑𝑥 + ∂𝑥 𝐹𝑧 𝑑𝑥 ∧ 𝑑𝑧 + ∂𝑦 𝐹𝑧 𝑑𝑦 ∧ 𝑑𝑧 + ∂𝑧 𝐹𝑦 𝑑𝑧 ∧ 𝑑𝑦
= (∂𝑥 𝐹𝑦 − ∂𝑦 𝐹𝑥 )𝑑𝑥 ∧ 𝑑𝑦 + (∂𝑧 𝐹𝑥 − ∂𝑥 𝐹𝑧 )𝑑𝑧 ∧ 𝑑𝑥 + (∂𝑦 𝐹𝑧 − ∂𝑧 𝐹𝑦 )𝑑𝑦 ∧ 𝑑𝑧
= Φ∇×𝐹⃗ .
9.3. PULLBACKS 235

Thus we recover the curl.

⃗ be a vector
Example 9.2.6. Divergence: Consider three-dimensional Euclidean space. Let 𝐺
1 𝑗 𝑘
field and let Φ𝐺 = 2 𝜖𝑖𝑗𝑘 𝐺𝑖 𝑑𝑥 ∧ 𝑑𝑥 be the corresponding two-form then

𝑑Φ𝐺 = 𝑑( 21 𝜖𝑖𝑗𝑘 𝐺𝑖 ) ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘


= 12 𝜖𝑖𝑗𝑘 (∂𝑚 𝐺𝑖 )𝑑𝑥𝑚 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘
= 12 𝜖𝑖𝑗𝑘 (∂𝑚 𝐺𝑖 )𝜖𝑚𝑗𝑘 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
= 12 2𝛿𝑖𝑚 (∂𝑚 𝐺𝑖 )𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
= ∂𝑖 𝐺𝑖 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧

= (∇ ⋅ 𝐺)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧

and in this way we recover the divergence.

9.3 pullbacks
Another important operation one can perform on differential forms is the “pull-back” of a form
under a map1 . The definition is constructed in large part by a sneaky application of the push-
forward (aka differential) discussed in the preceding chapter.

Definition 9.3.1. pull-back of a differential form.

If 𝑓 : ℳ → 𝒩 is a smooth map and 𝜔 ∈ ∧𝑘 (𝑁 ) then 𝑓 ∗ 𝜔 is the form on ℳ defined by

(𝑓 ∗ 𝜔)𝑝 (𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘 ) = 𝜔𝑓 (𝑝) (𝑑𝑝 𝑓 (𝑋1 ), 𝑑𝑝 𝑓 (𝑋2 ), ⋅ ⋅ ⋅ , 𝑑𝑝 𝑓 (𝑋𝑘 )) .

for each 𝑝 ∈ ℳ and 𝑋1 , 𝑋2 , . . . , 𝑋𝑘 ∈ 𝑇𝑝 ℳ. Moreover, in the case 𝑘 = 0 we have a smooth


function 𝜔 : 𝒩 → ℝ and the pull-back is accomplished by composition (𝑓 ∗ 𝜔)(𝑝) = (𝜔 ∘ 𝑓 )(𝑝)
for all 𝑝 ∈ ℳ.
This operation is linear on forms and commutes with the wedge product and exterior derivative:

Theorem 9.3.2. properties of the pull-back.

If 𝑓 : 𝑀 → 𝑁 is a 𝐶 1 -map and 𝜔 ∈ ∧𝑘 (𝑁 ), 𝜏 ∈ ∧𝑙 (𝑁 ) then

1. 𝑓 ∗ (𝑎𝜔 + 𝑏𝜏 ) = 𝑎(𝑓 ∗ 𝜔) + 𝑏(𝑓 ∗ 𝜏 ) 𝑎, 𝑏 ∈ ℝ

2. 𝑓 ∗ (𝜔 ∧ 𝜏 ) = 𝑓 ∗ 𝜔 ∧ (𝑓 ∗ 𝜏 )

3. 𝑓 ∗ (𝑑𝜔) = 𝑑(𝑓 ∗ 𝜔)

1
thanks to my advisor R.O. Fulp for the arguments that follow
236 CHAPTER 9. DIFFERENTIAL FORMS

Proof: The proof of (1) is clear. We now prove (2).

𝑓 ∗ (𝜔 ∧ 𝜏 )]𝑝 (𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘+𝑙 ) = (𝜔 ∧ 𝜏 )𝑓 (𝑝) (𝑑𝑝 𝑓 (𝑋1 ), ⋅ ⋅ ⋅ , 𝑑𝑝 𝑓 (𝑋𝑘+𝑙 ))



= 𝜎 (sgn𝜎)(𝜔 ⊗ 𝜏 )𝑓 (𝑝) (𝑑𝑝 𝑓 (𝑋𝜎1 ), ⋅ ⋅ ⋅ , 𝑑𝑝 𝑓 (𝑋𝜎(𝑘+𝑙) ))

= 𝜎 sgn(𝜎)𝜔(𝑑𝑝 𝑓 (𝑋𝜎(1) ), ⋅ ⋅ ⋅ 𝑑𝑝 𝑓 (𝑋𝜎(𝑘) ))𝜏 (𝑑𝑓 (𝑋𝜎(𝑘+1) ⋅ ⋅ ⋅ 𝑑𝑓 𝑋𝜎(𝑘+𝑙) )
= 𝜎 sgn(𝜎)(𝑓 ∗ 𝜔)𝑝 (𝑋𝜎(1) , ⋅ ⋅ ⋅ , 𝑋𝜎(𝑘) )(𝑓 ∗ 𝜏𝑝 )(𝑋𝜎(𝑘+1) , ⋅ ⋅ ⋅ , 𝑋𝜎(𝑘+𝑙) )

= [(𝑓 ∗ 𝜔) ∧ (𝑓 ∗ 𝜏 )]𝑝 (𝑋1 , 𝑋2 , ⋅ ⋅ ⋅ , 𝑋(𝑘+𝑙) )

Finally we prove (3).

𝑓 ∗ (𝑑𝜔)]𝑝 (𝑋1 , 𝑋2 ⋅ ⋅ ⋅ , 𝑋𝑘+1 ) = (𝑑𝜔)𝑓 (𝑝) (𝑑𝑓 (𝑋1 ), ⋅ ⋅ ⋅ 𝑑𝑓 (𝑋𝑘+1 ))


= (𝑑𝜔𝐼 ∧ 𝑑𝑥𝐼 )𝑓 (𝑝) (𝑑𝑓 (𝑋1 ), ⋅ ⋅ ⋅ , 𝑑𝑓 (𝑋𝑘+1 ))
( )
∂𝜔𝐼

= ∂𝑥 𝜆 (𝑑𝑥𝜆 ∧ 𝑑𝑥𝐼 )𝑓 (𝑝) (𝑑𝑓 (𝑋1 ), ⋅ ⋅ ⋅ , 𝑑𝑓 (𝑋𝑘+1 ))
( 𝑓 (𝑝)
)
∂𝜔𝐼

= ∂𝑥 𝜆 [𝑑𝑝 (𝑥𝜆 ∘ 𝑓 ) ∧ 𝑑𝑝 (𝑥𝐼 ∘ 𝑓 )](𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘+1 )
𝑓 (𝑝)
= [𝑑(𝜔𝐼 ∘ 𝑓 ) ∧ 𝑑(𝑥𝐼 ∘ 𝑓 )](𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘+1 )
= 𝑑[(𝜔𝐼 ∘ 𝑓 )𝑝 𝑑𝑝 (𝑥𝐼 ∘ 𝑓 )](𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘+1 )
= 𝑑(𝑓 ∗ 𝜔)𝑝 (𝑋1 , ⋅ ⋅ ⋅ , 𝑋𝑘+1 ) .

The theorem follows. □.

We saw that one important application of the push-forward was to change coordinates for a given
vector. Similar comments apply here. If we wish to change coordinates on a given differential form
then we can use the pull-back. However, given the direction of the operation we need to use the
inverse coordinate transformation to pull forms forward. Let me mirror the example from the last
chapter for forms on ℝ2 . We wish to convert from 𝑟, 𝜃 to 𝑥, 𝑦 notation.
Example 9.3.3. Suppose 𝐹 : ℝ2𝑟,𝜃 → ℝ2𝑥,𝑦 is the polar coordinate transformation. In particular,

𝐹 (𝑟, 𝜃) = (𝑟 cos 𝜃, 𝑟 sin 𝜃)

The inverse transformation, at least for appropriate angles, is given by


(√ )
−1 2 2 −1
𝐹 (𝑥, 𝑦) = 𝑥 + 𝑦 , tan (𝑦/𝑥) .

Let calculate the pull-back of 𝑑𝑟 under 𝐹 −1 : let 𝑝 = 𝐹 −1 (𝑞)

𝐹 −1∗ (𝑑𝑟)𝑞 = 𝑑𝑝 𝑟(∂𝑥 ∣𝑝)𝑑𝑝 𝑥 + 𝑑𝑝 𝑟(∂𝑦 ∣𝑝)𝑑𝑝 𝑦

Again, drop the annoying point-dependence to see this clearly:


∂𝑟 ∂𝑟
𝐹 −1∗ (𝑑𝑟) = 𝑑𝑟(∂𝑥 )𝑑𝑥 + 𝑑𝑟(∂𝑦 )𝑑𝑦 = 𝑑𝑥 + 𝑑𝑦
∂𝑥 ∂𝑦
9.4. INTEGRATION OF DIFFERENTIAL FORMS 237

Likewise,
∂𝜃 ∂𝜃
𝐹 −1∗ (𝑑𝜃) = 𝑑𝜃(∂𝑥 )𝑑𝑥 + 𝑑𝜃(∂𝑦 )𝑑𝑦 = 𝑑𝑥 + 𝑑𝑦
∂𝑥 ∂𝑦

Note that 𝑟 = 𝑥2 + 𝑦 2 and 𝜃 = tan−1 (𝑦/𝑥) have the following partial derivatives:

∂𝑟 𝑥 𝑥 ∂𝑟 𝑦 𝑦
=√ = and =√ =
∂𝑥 𝑥2 + 𝑦 2 𝑟 ∂𝑦 𝑥2 + 𝑦 2 𝑟

∂𝜃 −𝑦 −𝑦 ∂𝜃 𝑥 𝑥
= 2 2
= 2 and = 2 2
= 2
∂𝑥 𝑥 +𝑦 𝑟 ∂𝑦 𝑥 +𝑦 𝑟
Of course the expressions using 𝑟 are pretty, but to make the point, we have changed into 𝑥, 𝑦-
notation via the pull-back of the inverse transformation as advertised. We find:

𝑥𝑑𝑥 + 𝑦𝑑𝑦 −𝑦𝑑𝑥 + 𝑥𝑑𝑦


𝑑𝑟 = √ and 𝑑𝜃 = .
𝑥2 + 𝑦 2 𝑥2 + 𝑦 2

Once again we have found results with the pull-back that we might previously have chalked up to
substitution in multivariate calculus. That’s often the idea behind an application of the pull-back.
It’s just a formal langauge to be precise about a substitution. It takes us past simple symbol
pushing and helps us think about where things are defined and how we may recast them to work
together with other objects. I leave it at that for here.

9.4 integration of differential forms


The general strategy is generally as follows:

(i) there is a natural way to calculate the integral of a 𝑘-form on a subset of ℝ𝑘

(ii) given a 𝑘-form on a manifold we can locally pull it back to a subset of ℝ𝑘 provided the
manifold is an oriented2 𝑘-dimensional and thus by the previous idea we have an integral.

(iii) globally we should expect that we can add the results from various local charts and arrive at
a total value for the manifold, assuming of course the integral in each chart is finite.

We will only investigate items (𝑖.) and (𝑖𝑖.) in these notes. There are many other excellent texts
which take great effort to carefully expand on point (iii.) and I do not wish to replicate that effort
here. You can read Edwards and see about pavings, or read Munkres’ where he has at least 100
pages devoted to the careful study of multivariate integration. I do not get into those topics in my
notes because we simply do not have sufficient analytical power to do them justice. I would encour-
age the student interested in deeper ideas of integration to find time to talk to Dr. Skoumbourdis,
he has thought a long time about these matters and he really understands integration in a way we
dare not cover in the calculus sequence. You really should have that conversation after you’ve taken
2
we will discuss this as the section progresses
238 CHAPTER 9. DIFFERENTIAL FORMS

real analysis and have gained a better sense of what analysis’ purpose is in mathematics. That
said, what we do cover in this section and the next is fascinating whether or not we understand all
the analytical underpinnings of the subject!

9.4.1 integration of 𝑘-form on ℝ𝑘


Note that on 𝑈 ⊆ ℝ𝑘 a 𝑘-form 𝛼 is the top form thus there exists some smooth function 𝑓 on 𝑈
such that 𝛼𝑥 = 𝑓 (𝑥)𝑑𝑥1 ∧ 𝑑𝑥2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑘 for all 𝑥 ∈ 𝑈 . If 𝐷 is a subset of 𝑈 then we define the
integral of 𝛼 over 𝐷 via the corresponding intgral of 𝑘-variables in ℝ𝑘 . In particular,
∫ ∫
𝛼= 𝑓 (𝑥)𝑑𝑘 𝑥
𝐷 𝐷

where on the r.h.s. the symbol 𝑑𝑘 𝑥 is meant to denote the usual integral of 𝑘-variables on ℝ𝑘 . It
is sometimes convenient to write such an integral as:
∫ ∫
𝑘
𝑓 (𝑥)𝑑 𝑥 = 𝑓 (𝑥)𝑑𝑥1 𝑑𝑥2 ⋅ ⋅ ⋅ 𝑑𝑥𝑘
𝐷 𝐷

but, to be more careful, the integration of 𝑓 over 𝐷 is a quantity which is independent of the
particular order in which the variables on ℝ𝑘 are assigned. On the other hand, the order of the
variables in the formula for 𝛼 certainly can introuduce signs. Note

𝛼𝑥 = −𝑓 (𝑥)𝑑𝑥2 ∧ 𝑑𝑥1 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑘 .

How the can we reasonably maintain the integral proposed above? Well, the answer is to make
a convention that we must write the form to match the standard orientation of ℝ𝑘 . The stan-
dard orientation of ℝ𝑘 is given∫ by 𝑉∫𝑜𝑙𝑘 = 𝑑𝑥
1 ∧ 𝑑𝑥2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑘 . If the given form is written

𝛼𝑥 = 𝑓 (𝑥)𝑉 𝑜𝑙𝑘 then we define 𝐷 𝛼 = 𝐷 𝑓 (𝑥)𝑑𝑘 𝑥. Since it is always possible to write a 𝑘-form as
a function multiplying 𝑉 𝑜𝑙𝑘 on ℝ𝑘 this definition suffices to cover all possible 𝑘-forms. I expand a
few basic cases below:

Suppose 𝛼𝑥 = 𝑓 (𝑥)𝑑𝑥 on some subset 𝐷 = [𝑎, 𝑏] of ℝ,


∫ ∫ ∫ 𝑏
𝛼= 𝑓 (𝑥)𝑑𝑥 = 𝑓 (𝑥)𝑑𝑥.
𝐷 𝐷 𝑎

Or, if 𝛼(𝑥,𝑦) = 𝑓 (𝑥, 𝑦)𝑑𝑥 ∧ 𝑑𝑦 then for 𝐷 a aubset of ℝ2 ,


∫ ∫ ∫
𝛼= 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 𝑓 𝑑𝐴.
𝐷 𝐷 𝐷

If 𝛼(𝑥,𝑦,𝑧) = 𝑓 (𝑥, 𝑦, 𝑧)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 then for 𝐷 a aubset of ℝ3 ,


∫ ∫ ∫
𝛼= 𝑓 (𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧 = 𝑓 𝑑𝑉.
𝐷 𝐷 𝐷
9.4. INTEGRATION OF DIFFERENTIAL FORMS 239

In practice we tend to break


∫ the integrals
∫ above down into an interated integral thanks to Fubini’s
theorems. The integrals 𝐷 𝑓 𝑑𝐴 and 𝐷 𝑓 𝑑𝑉 are not in and of themselves dependent on orientation.
However the set 𝐷 may be oriented the value of those integrals are the same for a fixed function
𝑓 . The orientation dependence of the form integral is completely wrapped up in our rule that the
form must be written as a multiple of the volume form on the given space.

9.4.2 orientations and submanifolds


Given a 𝑘-manifold ℳ we say it is an oriented manifold iff all coordinates on ℳ are consistently
oriented. If we make a choice and say 𝜙0 : 𝑈0 → 𝑉0 is right-handed then any overlapping patch
𝜙1 : 𝑈1 → 𝑉1 is said to be right-handed iff 𝑑𝑒𝑡(𝑑𝜃01 ) > 0. Otherwise, if 𝑑𝑒𝑡(𝑑𝜃01 ) < 0 then the
patch 𝜙1 : 𝑈1 → 𝑉1 is said to be left-handed. If the manifold is orientable then as we continue to
travel across the manifold we can choose coordinates such that on each overlap the transition func-
tions satisfy 𝑑𝑒𝑡(𝑑𝜃𝑖𝑗 ) > 0. In this way we find an atlas for an orientable ℳ which is right-handed.

We can also say ℳ is oriented is there exists a nonzero volume-form on ℳ. If ℳ is 𝑘-dimensional


then a volume form 𝑉 𝑜𝑙 is simply a nonzero 𝑘-form. At each point 𝑝 ∈ ℳ we can judge if a given
coordinate system is left or right handed. We have to make a convention to be precise and I do so
at this point. We assume 𝑉 𝑜𝑙 is positive and we say a coordinate system with chart (𝑥1 , 𝑥2 , . . . , 𝑥𝑘 )
is positively oriented iff 𝑉 𝑜𝑙(∂1 ∣𝑝 , ∂2 ∣𝑝 , . . . , ∂𝑘 ∣𝑝 ) > 0. If a coordinate system is not positively
oriented then it is said to be negatively oriented and we will find 𝑉 𝑜𝑙(∂1 ∣𝑝 , ∂2 ∣𝑝 , . . . , ∂𝑘 ∣𝑝 ) < 0 in
that case. It is important that we suppose 𝑉 𝑜𝑙𝑝 ∕= 0 at each 𝑝 ∈ ℳ since that is what allows us to
demarcate coordinate systems as positive or negatively oriented.

Naturally, you are probably wondering: is a positively oriented coordinate system is the same idea
as a right-handed coordinate system as defined above? To answer that we should analyze how the
𝑉 𝑜𝑙 changes coordinates on an overlap. Suppose we are given a positive volume form 𝑉 𝑜𝑙 and
a point 𝑝 ∈ ℳ where two coordinate systems 𝑥 and 𝑦 are both defined. There must exist some
function 𝑓 such that
𝑉 𝑜𝑙𝑥 = 𝑓 (𝑥)𝑑𝑥1 ∧ 𝑑𝑥2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑘
𝑗
To change coordinates recall 𝑑𝑥𝑗 = 𝑘𝑗=1 ∂𝑥 𝑑𝑦 𝑗 and subsitute,

∂𝑦 𝑗

𝑘
∑ ∂𝑥1 ∂𝑥2 ∂𝑥𝑘 𝑗1
𝑉 𝑜𝑙 = (𝑓 ∘ 𝑥 ∘ 𝑦 −1 )(𝑦) ⋅ ⋅ ⋅ 𝑑𝑦 ∧ 𝑑𝑦 𝑗2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑦 𝑗𝑘
∂𝑦 𝑗1 ∂𝑦 𝑗2 ∂𝑦 𝑗𝑘
𝑗1 ,...,𝑗𝑘 =1
[ ]
−1 ∂𝑥
= (𝑓 ∘ 𝑥 ∘ 𝑦 )(𝑦)𝑑𝑒𝑡 𝑑𝑦 1 ∧ 𝑑𝑦 2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑦 𝑘 (9.3)
∂𝑦

If you calculate the value of 𝑉 𝑜𝑙 on ∂𝑥𝐼 ∣𝑝 = ∂𝑥1 ∣𝑝 , ∂𝑥2 ∣𝑝 , . . . , ∂𝑥𝑘 ∣𝑝 you’ll find 𝑉 𝑜𝑙(∂𝑥𝐼 ∣𝑝 ) = 𝑓 (𝑥(𝑝).
Whereas, if you evaluate 𝑉 𝑜𝑙 on ∂𝑦𝐼 ∣𝑝 = ∂𝑦1 ∣𝑝 , ∂𝑦2 ∣𝑝 , . . . , ∂𝑦𝑘 ∣𝑝 then the value is 𝑉 𝑜𝑙(∂𝑦𝐼 ∣𝑝 ) =
𝑓 (𝑥(𝑝))𝑑𝑒𝑡 ∂𝑥
[ ] [ ∂𝑥 ]
∂𝑦 (𝑝) . But, we should recognize that 𝑑𝑒𝑡 ∂𝑦 = 𝑑𝑒𝑡(𝑑𝜃𝑖𝑗 ) hence two coordinate sys-
tems which are positively oriented must also be consistently oriented. Why? Assume 𝑉 𝑜𝑙(∂𝑥𝐼 ∣𝑝 ) =
240 CHAPTER 9. DIFFERENTIAL FORMS

𝑓 (𝑥(𝑝)) > 0 then 𝑉 𝑜𝑙(∂𝑦𝐼 ∣𝑝 ) = 𝑓 (𝑥(𝑝))𝑑𝑒𝑡 ∂𝑥


[ ] [ ∂𝑥 ]
∂𝑦 (𝑝) > 0 iff 𝑑𝑒𝑡
[ ∂𝑥 ]∂𝑦
(𝑝) > 0 hence 𝑦 is positively ori-
ented if we are given that 𝑥 is positively oriented and 𝑑𝑒𝑡 ∂𝑦 > 0.

Let ℳ be an oriented 𝑘-manifold with orientation given by the volume form 𝑉 𝑜𝑙 and an associated
atlas of positively oriented charts. Furthermore, let 𝛼 be a 𝑝-form defined on 𝑉 ⊆ ℳ. Suppose
there exists a local parametrization 𝜙 : 𝑈 ⊆ ℝ𝑘 → 𝑉 ⊆ ℳ and 𝐷 ⊂ 𝑉 then there is a smooth
function ℎ such that 𝛼𝑞 = ℎ(𝑞)𝑑𝑥1 ∧ 𝑑𝑥2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑘 for each 𝑞 ∈ 𝑉 . We define the integral of 𝛼
over 𝐷 as follows: ∫ ∫
𝛼= ℎ(𝜙(𝑥))𝑑𝑘 𝑥 ← [★𝑥 ]
𝐷 𝜙−1 (𝐷)

Is this definition dependent on the coordinate system 𝜙 : 𝑈 ⊆ ℝ𝑘 → 𝑉 ⊆ ℳ? If we instead used


coordinate system 𝜓 : 𝑈 ¯ ⊆ ℝ𝑘 → 𝑉¯ ⊆ ℳ where coordinates 𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑘 on 𝑉¯ then the given
form 𝛼 has a different coefficient of 𝑑𝑦 1 ∧ 𝑑𝑦 2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑦 𝑘
[ ]
1 2 𝑘 −1 ∂𝑥
𝛼 = ℎ(𝑥)𝑑𝑥 ∧ 𝑑𝑥 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥 = (ℎ 𝑥 𝑦 )(𝑦)𝑑𝑒𝑡
∘ ∘ 𝑑𝑦 1 ∧ 𝑑𝑦 2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑦 𝑘
∂𝑦

Thus, as we change over to 𝑦 coordinates the function picks up a factor which is precisely the
determinant of the derivative of the transition functions.
∫ ∫ [ ]
∂𝑥
𝛼= (ℎ ∘ 𝑥 ∘ 𝑦 −1 )(𝑦)𝑑𝑒𝑡 𝑑𝑦 1 ∧ 𝑑𝑦 2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑦 𝑘
𝐷 ∂𝑦
∫𝐷 [ ]
−1 ∂𝑥 𝑘
= (ℎ ∘ 𝑥 ∘ 𝑦 )(𝑦)𝑑𝑒𝑡 𝑑 𝑦 ← [★𝑦 ]
𝜓 −1 (𝐷) ∂𝑦

We need ★𝑥 = ★𝑦 in order for the integral 𝐷 𝛼 to be well-defined. Fortunately, the needed equality
is almost provided by the change of variables theorem for multivariate integrals on ℝ𝑘 . Recall,
∫ ∫
∂𝑥 𝑘
𝑘
𝑓 (𝑥)𝑑 𝑥 = ˜
𝑓 (𝑦) 𝑑𝑒𝑡 𝑑 𝑦

𝑅 ¯
𝑅 ∂𝑦

where 𝑓˜ is more pedantically written as 𝑓˜ = 𝑓 ∘ 𝑦 −1 , notation aside its just the function 𝑓 written
in terms of the new 𝑦-coordinates. Likewise, 𝑅 ¯ limits 𝑦-coordinates so that the corresponding
𝑥-coordinates are found in 𝑅. Applying this theorem to our pull-back expression,
∫ ∫ [ ]
𝑘 −1
∂𝑥 𝑘
ℎ(𝜙(𝑥)) 𝑑 𝑥 = (ℎ ∘ 𝑥 ∘ 𝑦 )(𝑦) 𝑑𝑒𝑡 𝑑 𝑦.
𝜙−1 (𝐷) 𝜓 −1 (𝐷) ∂𝑦

Equality of ★𝑥 and ★𝑦 follows from the fact that ℳ is oriented and has transition functions3 𝜃𝑖𝑗
which satisfy 𝑑𝑒𝑡(𝑑𝜃𝑖𝑗 ) > 0. We see that this integral to be well-defined only for oriented manifolds.
To integrate over manifolds without an orientation additional ideas are needed, but it is possible.

3
once more recall the notation ∂𝑥
∂𝑦
is just the matrix of the linear transformation 𝑑𝜃𝑖𝑗 and the determinant of a
linear transformation is the determinant of the matrix of the transformation
9.4. INTEGRATION OF DIFFERENTIAL FORMS 241

Perhaps the most interesting case to consider is that of an embedded 𝑘-manifold in ℝ𝑛 . In this
context we must deal with both the coordinates of the ambient ℝ𝑛 and the local parametrizations
of the 𝑘-manifold. In multivariate calculus we often consider vector fields which are defined on an
open subset of ℝ3 and then we calculate the flux over a surfaces or the work along a curve. What
we have defined thus-far is in essence like definition how to integrate a vector field on a surface
or a vector field along a curve, no mention of the vector field off the domain of integration was
made. We supposed the forms were already defined on the oriented manifold, but, what if we are
instead given a formula for a differential form on ℝ𝑛 ? How can we restrict that differential form
to a surface or line or more generally a parametrized 𝑘-dimensional submanifold of ℝ𝑛 ? That is
the problem we concern ourselvew with for the remainder of this section.

Let’s begin with a simple object. Consider a one-form 𝛼 = 𝑛𝑖=1 𝛼𝑖 𝑑𝑥𝑖 where the function 𝑝 → 𝛼𝑖 (𝑝)

is smooth on some subset of ℝ𝑛 . Suppose 𝐶 is a curve parametrized by 𝑋 : 𝐷 ⊆ ℝ → 𝐶 ⊆ ℝ𝑛 then


the natural chart on 𝐶 is provided by the parameter 𝑡 in particular we have 𝑇𝑝 𝐶 = 𝑠𝑝𝑎𝑛{ ∂𝑡 𝑡𝑜
}
∗ ∂
where 𝑋(𝑡𝑜 ) = 𝑝 and 𝑇𝑝 𝐶 = 𝑠𝑝𝑎𝑛{𝑑𝑡𝑜 𝑡} hence a vector field along 𝐶 has the form 𝑓 (𝑡) ∂𝑡 and a
differential form has the form 𝑔(𝑡)𝑑𝑡. How can we use the one-form 𝛼 on ℝ𝑛 to naturally obtain a
one-form defined along C? I propose:
𝑛
∑ ∂𝑋 𝑖
𝛼 𝐶 (𝑡) =
𝛼𝑖 (𝑋(𝑡)) 𝑑𝑡
∂𝑡
𝑖=1

It can be shown that 𝛼 𝐶 is a one-form on 𝐶. If we change coordinates on the curve by reparametriz-
ing 𝑡 → 𝑠 it then the component relative to 𝑠 vs. the component relative to 𝑡 are related:
𝑛 𝑛 ( 𝑛
∂𝑋 𝑖 ∑ 𝑑𝑡 ∂𝑋 𝑖 ∂𝑋 𝑖
)
∑ 𝑑𝑡 ∑
𝛼𝑖 (𝑋(𝑡(𝑠))) = 𝛼𝑖 (𝑋(𝑡)) = 𝛼𝑖 (𝑋(𝑡))
𝑑𝑠 𝑑𝑠 ∂𝑡 𝑑𝑠 ∂𝑡
𝑖=1 𝑖=1 𝑖=1

This is precisely the transformation rule we want for the components of a one-form.

Example 9.4.1. Suppose 𝛼 = 𝑑𝑥 + 3𝑥2 𝑑𝑦 + 𝑦𝑑𝑧 and 𝐶 is the curve 𝑋 : ℝ → 𝐶 ⊆ ℝ3 defined by


𝑋(𝑡) 2 2
= (1, 𝑡, 𝑡 ) we have 𝑥 = 1, 𝑦2= 𝑡 and 𝑧 = 𝑡 hence 𝑑𝑥 = 0, 𝑑𝑦 = 𝑑𝑡 and 𝑑𝑧 = 2𝑡𝑑𝑡 on 𝐶 hence
𝛼 𝐶 = 0 + 3𝑑𝑡 + 𝑡(2𝑡𝑑𝑡) = (3 + 2𝑡 )𝑑𝑡.

∑𝑛 1 𝑖 𝑗
Next, consider a two-form 𝛽 = 𝑖,𝑗=1 2 𝛽𝑖𝑗 𝑑𝑥 ∧ 𝑑𝑥 . Once more we consider a parametrized
submanifold of ℝ . In particular use the notation 𝑋 : 𝐷 ⊆ ℝ2 → 𝑆 ⊆ ℝ𝑛 where 𝑢, 𝑣 serve as
𝑛

coordinates on the surface 𝑆. We can write an arbitrary two-form on 𝑆 in the form ℎ(𝑢, 𝑣)𝑑𝑢 ∧ 𝑑𝑣
where ℎ : 𝑆 → ℝ is a smooth function on 𝑆. How should we construct ℎ(𝑢, 𝑣) given 𝛽? Again, I
think the following formula is quite natural, honestly, what else would you do4 ?
𝑛
∑ ∂𝑋 𝑖 ∂𝑋 𝑗
𝛽 𝑆 (𝑢, 𝑣) = 𝛽𝑖𝑗 (𝑋(𝑢, 𝑣)) 𝑑𝑢 ∧ 𝑑𝑣
∂𝑢 ∂𝑣
𝑖,𝑗=1

4 1
include the 2
you say?, we’ll see why not soon enough
242 CHAPTER 9. DIFFERENTIAL FORMS

The coefficient function of 𝑑𝑢 ∧ 𝑑𝑣 is smooth because we assume 𝛽𝑖𝑗 is smooth on ℝ𝑛 and the local
𝑖 ∂𝑋 𝑖
parametrization is also assumed smooth so the functions ∂𝑋∂𝑢 and ∂𝑣 are smooth. Moreover, the
component function has the desired coordinate change property with respect to a reparametrization
of 𝑆. Suppose we reparametrize by 𝑠, 𝑡, then suppressing the point-dependence of 𝛽𝑖𝑗 ,
𝑛 𝑛 𝑛
∑ ∂𝑌 𝑖 ∂𝑌 𝑗 𝑑𝑢 𝑑𝑣 ∑ ∂𝑋 𝑖 ∂𝑋 𝑗 ∑ ∂𝑋 𝑖 ∂𝑋 𝑗
𝛽𝑆=
𝛽𝑖𝑗 𝑑𝑠 ∧ 𝑑𝑡 = 𝛽𝑖𝑗 𝑑𝑠 ∧ 𝑑𝑡 = 𝛽𝑖𝑗 𝑑𝑢 ∧ 𝑑𝑣.
∂𝑠 ∂𝑡 𝑑𝑠 𝑑𝑡 ∂𝑢 ∂𝑣 ∂𝑢 ∂𝑣
𝑖,𝑗=1 𝑖,𝑗=1 𝑖,𝑗=1

Therefore, the restriction of 𝛽 to 𝑆 is coordinate independent and we have thus constructed a


two-form on a surface from the two-form in the ambient space.
Example 9.4.2. Consider 𝛽 = 𝑦 2 𝑑𝑡 ∧ 𝑑𝑥 + 𝑧𝑑𝑥 ∧ 𝑑𝑦 + (𝑥 + 𝑦 + 𝑧 + 𝑡)𝑑𝑡 ∧ 𝑑𝑧. Suppose 𝑆 ⊆ ℝ4 is
parametrized by
𝑋(𝑢, 𝑣) = (1, 𝑢2 𝑣 2 , 3𝑢, 𝑣)
In other words, we are given that
𝑡 = 1, 𝑥 = 𝑢2 𝑣 2 , 𝑦 = 3𝑢, 𝑧 = 𝑣

Hence, 𝑑𝑡 = 0, 𝑑𝑥 = 2𝑢𝑣 2 𝑑𝑢 + 2𝑢2 𝑣𝑑𝑣, 𝑑𝑦 = 3𝑑𝑢 and 𝑑𝑧 = 𝑑𝑣. Computing 𝛽 𝑆 is just a matter of
substuting in all the formulas above, fortunately 𝑑𝑡 = 0 so only the 𝑧𝑑𝑥 ∧ 𝑑𝑦 term is nontrivial:
𝛽 𝑆 = 𝑣(2𝑢𝑣 2 𝑑𝑢 + 2𝑢2 𝑣𝑑𝑣) ∧ (3𝑑𝑢) = 6𝑢2 𝑣 2 𝑑𝑣 ∧ 𝑑𝑢 = −6𝑢2 𝑣 2 𝑑𝑢 ∧ 𝑑𝑣.

It is fairly clear that we can restrict any 𝑝-form on ℝ𝑛 to a 𝑝-dimensional parametrized submanifold
by the procedure we explained above for 𝑝 = 1, 2. That is the underlying idea in the definitions
which follow. Beyond that, once we have restricted the 𝑝-form 𝛽 on ℝ𝑛 to 𝛽∣ℳ then we pull-back the
restricted form to an open subset of ℝ𝑝 and reduce the problem to an ordinary multivariate integral.

Our goal in the remainder of the section is to make contact with the5 integrals we study in calculus.
Note that an embedded manifold with a single patch is almost trivially oriented since there is no
overlap to consider. In particular, if 𝜙 : 𝑈 ⊆ ℝ𝑘 → ℳ ⊆ ℝ𝑛 is a local parametrization with
𝜙−1 = (𝑢1 , 𝑢2 , . . . , 𝑢𝑘 ) then 𝑑𝑢1 ∧ 𝑑𝑢2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑢𝑘 is a volume form for ℳ. This is the natural
generalization of the normal-vector field construction for surfaces in ℝ3 .
Definition 9.4.3. integral of one-form along oriented curve:
Let 𝛼 = 𝛼𝑖 𝑑𝑥𝑖 be a one form and let 𝐶 be an oriented curve with parametrization 𝑋(𝑡) :
[𝑎, 𝑏] → 𝐶 then we define the integral of the one-form 𝛼 along the curve 𝐶 as follows,
𝑏
𝑑𝑋 𝑖
∫ ∫
𝛼≡ 𝛼𝑖 (𝑋(𝑡)) (𝑡)𝑑𝑡
𝐶 𝑎 𝑑𝑡

where 𝑋(𝑡) = (𝑋 1 (𝑡), 𝑋 2 (𝑡), . . . , 𝑋 𝑛 (𝑡)) so we mean 𝑋 𝑖 to be the 𝑖𝑡ℎ component of 𝑋(𝑡).
Moreover, the indices are understood to range over the dimension of the ambient space, if
we consider forms in ℝ2 then 𝑖 = 1, 2 if in ℝ3 then 𝑖 = 1, 2, 3 if in Minkowski ℝ4 then 𝑖
should be replaced with 𝜇 = 0, 1, 2, 3 and so on.
5
hopefully known to you already from multivariate calculus
9.4. INTEGRATION OF DIFFERENTIAL FORMS 243

Example 9.4.4. One form integrals vs. line integrals of vector fields: We begin with a
vector field 𝐹⃗ and construct the corresponding one-form 𝜔𝐹⃗ = 𝐹𝑖 𝑑𝑥𝑖 . Next let 𝐶 be an oriented
curve with parametrization 𝑋 : [𝑎, 𝑏] ⊂ ℝ → 𝐶 ⊂ ℝ, observe
𝑏
𝑑𝑋 𝑖
∫ ∫ ∫
𝜔𝐹⃗ = 𝐹𝑖 (𝑋(𝑡)) (𝑡)𝑑𝑡 = 𝐹⃗ ⋅ 𝑑⃗𝑙
𝐶 𝑎 𝑑𝑡 𝐶

You may note that the definition of a line integral of a vector field is not special to three dimensions,
we can clearly construct the line integral in n-dimensions, likewise the correspondance 𝜔 can be
written between one-forms and vector fields in any dimension, provided we have a metric to lower
the index of the vector field components. The same cannot be said of the flux-form correspondance,
it is special to three dimensions for reasons we have explored previously.

Definition 9.4.5. integral of two-form over an oriented surface:

Let 𝛽 = 21 𝛽𝑖𝑗 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 be a two-form and let 𝑆 be an oriented piecewise smooth surface with
parametrization 𝑋(𝑢, 𝑣) : 𝐷2 ⊂ ℝ2 → 𝑆 ⊂ ℝ𝑛 then we define the integral of the two-form
𝛽 over the surface 𝑆 as follows,

∂𝑋 𝑖 ∂𝑋 𝑗
∫ ∫
𝛽≡ 𝛽𝑖𝑗 (𝑋(𝑢, 𝑣)) (𝑢, 𝑣) (𝑢, 𝑣)𝑑𝑢𝑑𝑣
𝑆 𝐷2 ∂𝑢 ∂𝑣

where 𝑋(𝑢, 𝑣) = (𝑋 1 (𝑢, 𝑣), 𝑋 2 (𝑢, 𝑣), . . . , 𝑋 𝑛 (𝑢, 𝑣)) so we mean 𝑋 𝑖 to be the 𝑖𝑡ℎ component
of 𝑋(𝑢, 𝑣). Moreover, the indices are understood to range over the dimension of the ambient
space, if we consider forms in ℝ2 then 𝑖, 𝑗 = 1, 2 if in ℝ3 then 𝑖, 𝑗 = 1, 2, 3 if in Minkowski
ℝ4 then 𝑖, 𝑗 should be replaced with 𝜇, 𝜈 = 0, 1, 2, 3 and so on.

Example 9.4.6. Two-form integrals vs. surface integrals of vector fields in ℝ3 : We begin
with a vector field 𝐹⃗ and construct the corresponding two-form Φ𝐹⃗ = 21 𝜖𝑖𝑗𝑘 𝐹𝑘 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 which is to
say Φ𝐹⃗ = 𝐹1 𝑑𝑦 ∧ 𝑑𝑧 + 𝐹2 𝑑𝑧 ∧ 𝑑𝑥 + 𝐹3 𝑑𝑥 ∧ 𝑑𝑦. Next let 𝑆 be an oriented piecewise smooth surface
with parametrization 𝑋 : 𝐷 ⊂ ℝ2 → 𝑆 ⊂ ℝ3 , then
∫ ∫
Φ⃗ = 𝐹⃗ ⋅ 𝑑𝐴
𝐹

𝑆 𝑆

Proof: Recall that the normal to the surface 𝑆 has the form,

∂𝑋 ∂𝑋 ∂𝑋 𝑖 ∂𝑋 𝑗
𝑁 (𝑢, 𝑣) = × = 𝜖𝑖𝑗𝑘 𝑒𝑘
∂𝑢 ∂𝑣 ∂𝑢 ∂𝑣
at the point 𝑋(𝑢, 𝑣). This gives us a vector which points along the outward normal to the surface
and it is nonvanishing throughout the whole surface by our assumption that 𝑆 is oriented. Moreover
the vector surface integral of 𝐹⃗ over 𝑆 was defined by the formula,
∫ ∫ ∫
𝐹⃗ ⋅ 𝑑𝐴
⃗≡ 𝐹⃗ (𝑋(𝑢, 𝑣)) ⋅ 𝑁
⃗ (𝑢, 𝑣) 𝑑𝑢𝑑𝑣.
𝑆 𝐷
244 CHAPTER 9. DIFFERENTIAL FORMS

now that the reader is reminded what’s what, lets prove the proposition, dropping the (u,v) depence
to reduce clutter we find,

∫ ∫ ∫
𝐹⃗ ⋅ 𝑑𝐴
⃗ = 𝐹⃗ ⋅ 𝑁
⃗ 𝑑𝑢𝑑𝑣
𝑆 ∫ ∫ 𝐷

= 𝐹𝑘 𝑁𝑘 𝑑𝑢𝑑𝑣
𝐷
∂𝑋 𝑖 ∂𝑋 𝑗
∫ ∫
= 𝐹𝑘 𝜖𝑖𝑗𝑘 𝑑𝑢𝑑𝑣
𝐷 ∂𝑢 ∂𝑣
∂𝑋 𝑖 ∂𝑋 𝑗
∫ ∫
= (Φ𝐹⃗ )𝑖𝑗 𝑑𝑢𝑑𝑣
∂𝑢 ∂𝑣
∫ 𝐷
= Φ𝐹⃗
𝑆

notice that we have again used our convention that (Φ𝐹⃗ )𝑖𝑗 refers to the tensor components of
the 2-form Φ𝐹⃗ meaning we have Φ𝐹⃗ = (Φ𝐹⃗ )𝑖𝑗 𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗 whereas with the wedge product Φ𝐹⃗ =
1 𝑖 𝑗
⃗ )𝑖𝑗 𝑑𝑥 ∧ 𝑑𝑥 , I mention this in case you are concerned there is a half in Φ𝐹
2 (Φ𝐹 ⃗ yet we never found
a half in the integral. Well, we don’t expect to because we defined the integral of the form with
respect to the tensor components of the form, again they don’t contain the half.

Example 9.4.7. Consider the vector field 𝐹⃗ = (0, 0, 3) then the corresponding two-form is simply
Φ𝐹 = 3𝑑𝑥 ∧ 𝑑𝑦. Lets calculate the surface integral and two-form integrals over the square 𝐷 =
[0, 1]×[0, 1] in the 𝑥𝑦-plane, in this case the parameters can be taken to be 𝑥 and 𝑦 so 𝑋(𝑥, 𝑦) = (𝑥, 𝑦)
and,
∂𝑋 ∂𝑋
𝑁 (𝑥, 𝑦) = × = (1, 0, 0) × (0, 1, 0) = (0, 0, 1)
∂𝑥 ∂𝑦

which is nice. Now calculate,

∫ ∫ ∫
𝐹⃗ ⋅ 𝑑𝐴
⃗ = 𝐹⃗ ⋅ 𝑁
⃗ 𝑑𝑥𝑑𝑦
𝑆 ∫ ∫ 𝐷

= (0, 0, 3) ⋅ (0, 0, 1)𝑑𝑥𝑑𝑦


𝐷
∫ 1∫ 1
= 3𝑑𝑥𝑑𝑦
0 0
= 3.

Consider that Φ𝐹 = 3𝑑𝑥 ∧ 𝑑𝑦 = 3𝑑𝑥 ⊗ 𝑑𝑦 − 3𝑑𝑦 ⊗ 𝑑𝑥 therefore we may read directly that (Φ𝐹 )12 =
9.4. INTEGRATION OF DIFFERENTIAL FORMS 245

−(Φ𝐹 )21 = 3 and all other components are zero,


∂𝑋 𝑖 ∂𝑋 𝑗
∫ ∫ ∫
Φ𝐹 = (Φ𝐹 )𝑖𝑗 𝑑𝑥𝑑𝑦
𝑆 𝐷 ∂𝑥 ∂𝑦
∂𝑋 1 ∂𝑋 2 ∂𝑋 2 ∂𝑋 1
∫ ∫ ( )
= 3 −3 𝑑𝑥𝑑𝑦
𝐷 ∂𝑥 ∂𝑦 ∂𝑥 ∂𝑦
∫ 1∫ 1( )
∂𝑥 ∂𝑦 ∂𝑦 ∂𝑥
= 3 −3 𝑑𝑥𝑑𝑦
0 0 ∂𝑥 ∂𝑦 ∂𝑥 ∂𝑦
= 3.

Definition 9.4.8. integral of a three-form over an oriented volume:

Let 𝛾 = 16 𝛽𝑖𝑗𝑘 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 be a three-form and let 𝑉 be an oriented piecewise smooth
volume with parametrization 𝑋(𝑢, 𝑣, 𝑤) : 𝐷3 ⊂ ℝ3 → 𝑉 ⊂ ℝ𝑛 then we define the integral
of the three-form 𝛾 in the volume 𝑉 as follows,

∂𝑋 𝑖 ∂𝑋 𝑗 ∂𝑋 𝑘
∫ ∫
𝛾≡ 𝛾𝑖𝑗𝑘 (𝑋(𝑢, 𝑣, 𝑤)) 𝑑𝑢𝑑𝑣𝑑𝑤
𝑉 𝐷3 ∂𝑢 ∂𝑣 ∂𝑤

where 𝑋(𝑢, 𝑣, 𝑤) = (𝑋 1 (𝑢, 𝑣, 𝑤), 𝑋 2 (𝑢, 𝑣, 𝑤), . . . , 𝑋 𝑛 (𝑢, 𝑣, 𝑤)) so we mean 𝑋 𝑖 to be the 𝑖𝑡ℎ
component of 𝑋(𝑢, 𝑣, 𝑤). Moreover, the indices are understood to range over the dimension
of the ambient space, if we consider forms in ℝ3 then 𝑖, 𝑗, 𝑘 = 1, 2, 3 if in Minkowski ℝ4 then
𝑖, 𝑗, 𝑘 should be replaced with 𝜇, 𝜈, 𝜎 = 0, 1, 2, 3 and so on.
Finally we define the integral of a 𝑝-form over an 𝑝-dimensional subspace of ℝ, we assume that
𝑝 ≤ 𝑛 so that it is possible to embed such a subspace in ℝ,
Definition 9.4.9. integral of a p-form over an oriented volume:

Let 𝛾 = 𝑝!1 𝛽𝑖1 ...𝑖𝑝 𝑑𝑥𝑖1 ∧ ⋅ ⋅ ⋅ 𝑑𝑥𝑖𝑝 be a p-form and let 𝑆 be an oriented piecewise smooth
subspace with parametrization 𝑋(𝑢1 , . . . , 𝑢𝑝 ) : 𝐷𝑝 ⊂ ℝ𝑝 → 𝑆 ⊂ ℝ𝑛 (for 𝑛 ≥ 𝑝) then we
define the integral of the p-form 𝛾 in the subspace 𝑆 as follows,

∂𝑋 𝑖1 ∂𝑋 𝑖𝑝
∫ ∫
𝛾≡ 𝛽𝑖1 ...𝑖𝑝 (𝑋(𝑢1 , . . . , 𝑢𝑝 )) ⋅⋅⋅ 𝑑𝑢1 ⋅ ⋅ ⋅ 𝑑𝑢𝑝
𝑆 𝐷𝑝 ∂𝑢1 ∂𝑢𝑝

where 𝑋(𝑢1 , . . . , 𝑢𝑝 ) = (𝑋 1 (𝑢1 , . . . , 𝑢𝑝 ), 𝑋 2 (𝑢1 , . . . , 𝑢𝑝 ), . . . , 𝑋 𝑛 (𝑢1 , . . . , 𝑢𝑝 )) so we mean 𝑋 𝑖


to be the 𝑖𝑡ℎ component of 𝑋(𝑢1 , . . . , 𝑢𝑝 ). Moreover, the indices are understood to range
over the dimension of the ambient space.
Integrals of forms play an important role in modern physics. I hope you can begin to appreciate
that forms recover all the formulas we learned in multivariate calculus and give us a way to extend
calculation into higher dimensions with ease. I include a toy example at the conclusion of this
chapter just to show you how electromagnetism is easily translated into higher dimensions.
246 CHAPTER 9. DIFFERENTIAL FORMS

9.5 Generalized Stokes Theorem


The generalized Stokes theorem contains within it most of the main theorems of integral calculus,
namely the fundamental theorem of calculus, the fundamental theorem of line integrals (a.k.a
the FTC in three dimensions), Greene’s Theorem in the plane, Gauss’ Theorem and also Stokes
Theorem, not to mention a myriad of higher dimensional not so commonly named theorems. The
breadth of its application is hard to overstate, yet the statement of the theorem is simple,

Theorem 9.5.1. Generalized Stokes Theorem:

Let 𝑆 be an oriented, piecewise smooth (p+1)-dimensional subspace of ℝ𝑛 where 𝑛 ≥ 𝑝 + 1


and let ∂𝑆 be it boundary which is consistently oriented then for a 𝑝-form 𝛼 which behaves
reasonably on 𝑆 we have that
∫ ∫
𝑑𝛼 = 𝛼
𝑆 ∂𝑆

The proof of this theorem (and a more careful statement of it) can be found in a number of places,
Susan Colley’s Vector Calculus or Steven H. Weintraub’s Differential Forms: A Complement to
Vector Calculus or Spivak’s Calculus on Manifolds just to name a few. I believe the argument in
Edward’s text is quite complete. In any event, you should already be familar with the idea from
the usual Stokes Theorem where we must insist the boundary curve to the surface is related to
the surface’s normal field according to the right-hand-rule. Explaining how to orient the boundary
∂ℳ given an oriented ℳ is the problem of generalizing the right-hand-rule to many dimensions. I
leave it to your homework for the time being.

Lets work out how this theorem reproduces the main integral theorems of calculus.

Example 9.5.2. Fundamental Theorem of Calculus in ℝ: Let 𝑓 : ℝ → ℝ be a zero-form


then consider the interval [𝑎, 𝑏] in ℝ. If we let 𝑆 = [𝑎, 𝑏] then ∂𝑆 = {𝑎, 𝑏}. Further observe that
𝑑𝑓 = 𝑓 ′ (𝑥)𝑑𝑥. Notice by the definition of one-form integration
∫ ∫ 𝑏
𝑑𝑓 = 𝑓 ′ (𝑥)𝑑𝑥
𝑆 𝑎

However on the other hand we find ( the integral over a zero-form is taken to be the evaluation
map, perhaps we should have defined this earlier, oops., but its only going to come up here so I’m
leaving it.) ∫
𝑓 = 𝑓 (𝑏) − 𝑓 (𝑎)
∂𝑆

Hence in view of the definition above we find that


∫ 𝑏 ∫ ∫

𝑓 (𝑥)𝑑𝑥 = 𝑓 (𝑏) − 𝑓 (𝑎) ⇐⇒ 𝑑𝑓 = 𝑓
𝑎 𝑆 ∂𝑆
9.5. GENERALIZED STOKES THEOREM 247

Example 9.5.3. Fundamental Theorem of Calculus in ℝ3 : Let 𝑓 : ℝ3 → ℝ be a zero-form


then consider a curve 𝐶 from 𝑝 ∈ ℝ3 to 𝑞 ∈ ℝ3 parametrized by 𝜙 : [𝑎, 𝑏] → ℝ3 . Note that
∂𝐶 = {𝜙(𝑎) = 𝑝, 𝜙(𝑏) = 𝑞}. Next note that
∂𝑓 𝑖
𝑑𝑓 = 𝑑𝑥
∂𝑥𝑖
Then consider that the exterior derivative of a function corresponds to the gradient of the function
thus we are not to surprised to find that
∫ 𝑏
∂𝑓 𝑑𝑥𝑖
∫ ∫
𝑑𝑓 = 𝑖
𝑑𝑡 = (∇𝑓 ) ⋅ 𝑑⃗𝑙
𝐶 𝑎 ∂𝑥 𝑑𝑡 𝐶

On the other hand, we use the definition of the integral over a a two point set again to find

𝑓 = 𝑓 (𝑞) − 𝑓 (𝑝)
∂𝐶

Hence if the Generalized Stokes Theorem is true then so is the FTC in three dimensions,
∫ ∫ ∫

(∇𝑓 ) ⋅ 𝑑𝑙 = 𝑓 (𝑞) − 𝑓 (𝑝) ⇐⇒ 𝑑𝑓 = 𝑓
𝐶 𝐶 ∂𝐶

another popular title for this theorem is the ”fundamental theorem for line integrals”. As a final
thought here we notice that this calculation easily generalizes to 2,4,5,6,... dimensions.
Example 9.5.4. Greene’s Theorem: Let us recall the statement of Greene’s Theorem as I have
not replicated it yet in the notes, let 𝐷 be a region in the 𝑥𝑦-plane and let ∂𝐷 be its consistently
oriented boundary then if 𝐹⃗ = (𝑀 (𝑥, 𝑦), 𝑁 (𝑥, 𝑦), 0) is well behaved on 𝐷
∫ ∫ ∫ ( )
∂𝑁 ∂𝑀
𝑀 𝑑𝑥 + 𝑁 𝑑𝑦 = − 𝑑𝑥𝑑𝑦
∂𝐷 𝐷 ∂𝑥 ∂𝑦

We begin by finding the one-form corresponding to 𝐹⃗ namely 𝜔𝐹 = 𝑀 𝑑𝑥 + 𝑁 𝑑𝑦 consider then that


∂𝑀 ∂𝑁
𝑑𝜔𝐹 = 𝑑(𝑀 𝑑𝑥 + 𝑁 𝑑𝑦) = 𝑑𝑀 ∧ 𝑑𝑥 + 𝑑𝑁 ∧ 𝑑𝑦 = 𝑑𝑦 ∧ 𝑑𝑥 + 𝑑𝑥 ∧ 𝑑𝑦
∂𝑦 ∂𝑥
which simplifies to, ( )
∂𝑁 ∂𝑀
𝑑𝜔𝐹 = − 𝑑𝑥 ∧ 𝑑𝑦 = Φ( ∂𝑁 − ∂𝑀 )𝑘ˆ
∂𝑥 ∂𝑦 ∂𝑥 ∂𝑦

Thus, using our discussion in the last section we recall


∫ ∫ ∫
𝜔𝐹 = ⃗ ⃗
𝐹 ⋅ 𝑑𝑙 = 𝑀 𝑑𝑥 + 𝑁 𝑑𝑦
∂𝐷 ∂𝐷 ∂𝐷

where we have reminded the reader that the notation in the rightmost expression is just another
way of denoting the line integral in question. Next observe,
∫ ∫
∂𝑁 ∂𝑀 ˆ ⃗
𝑑𝜔𝐹 = ( − )𝑘 ⋅ 𝑑𝐴
𝐷 𝐷 ∂𝑥 ∂𝑦
248 CHAPTER 9. DIFFERENTIAL FORMS

And clearly, since 𝑑𝐴 ˆ


⃗ = 𝑘𝑑𝑥𝑑𝑦 we have
∫ ∫
∂𝑁 ∂𝑀 ˆ ⃗ ∂𝑁 ∂𝑀
( − )𝑘 ⋅ 𝑑𝐴 = ( − )𝑑𝑥𝑑𝑦
𝐷 ∂𝑥 ∂𝑦 𝐷 ∂𝑥 ∂𝑦
Therefore,
∫ ∫ ∫ ( ) ∫ ∫
∂𝑁 ∂𝑀
𝑀 𝑑𝑥 + 𝑁 𝑑𝑦 = − 𝑑𝑥𝑑𝑦 ⇐⇒ 𝑑𝜔𝐹 = 𝜔𝐹
∂𝐷 𝐷 ∂𝑥 ∂𝑦 𝐷 ∂𝐷

Example 9.5.5. Gauss Theorem: Let us recall Gauss Theorem to begin, for suitably defined 𝐹⃗
and 𝑉 , ∫ ∫
𝐹⃗ ⋅ 𝑑𝐴
⃗= ∇ ⋅ 𝐹⃗ 𝑑𝜏
∂𝑉 𝑉
First we recall our earlier result that
𝑑(Φ𝐹 ) = (∇ ⋅ 𝐹⃗ )𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
Now note that we may integrate the three form over a volume,
∫ ∫
𝑑(Φ𝐹 ) = (∇ ⋅ 𝐹⃗ )𝑑𝑥𝑑𝑦𝑑𝑧
𝑉 𝑉
whereas, ∫ ∫
Φ𝐹 = 𝐹⃗ ⋅ 𝑑𝐴

∂𝑉 ∂𝑉
so there it is, ∫ ∫ ∫ ∫
(∇ ⋅ 𝐹⃗ )𝑑𝜏 = 𝐹⃗ ⋅ 𝑑𝐴
⃗ ⇐⇒ 𝑑(Φ𝐹 ) = Φ𝐹
𝑉 ∂𝑉 𝑉 ∂𝑉
I have left a little detail out here, I may assign it for homework.
Example 9.5.6. Stokes Theorem: Let us recall Stokes Theorem to begin, for suitably defined 𝐹⃗
and 𝑆, ∫ ∫
(∇ × 𝐹⃗ ) ⋅ 𝑑𝐴
⃗= 𝐹⃗ ⋅ 𝑑⃗𝑙
𝑆 ∂𝑆
Next recall we have shown in the last chapter that,
𝑑(𝜔𝐹 ) = Φ∇×𝐹⃗
Hence, ∫ ∫
𝑑(𝜔𝐹 ) = (∇ × 𝐹⃗ ) ⋅ 𝑑𝐴

𝑆 𝑆
whereas, ∫ ∫
𝜔𝐹 = 𝐹⃗ ⋅ 𝑑⃗𝑙
∂𝑆 ∂𝑆
which tells us that,
∫ ∫ ∫ ∫
(∇ × 𝐹⃗ ) ⋅ 𝑑𝐴
⃗= 𝐹⃗ ⋅ 𝑑⃗𝑙 ⇐⇒ 𝑑(𝜔𝐹 ) = 𝜔𝐹
𝑆 ∂𝑆 𝑆 ∂𝑆
9.6. POINCARE’S LEMMA AND CONVERSE 249

The Generalized Stokes Theorem is perhaps the most persausive argument for mathematicians to
be aware of differential forms, it is clear they allow for more deep and sweeping statements of
the calculus. The generality of differential forms is what drives modern physicists to work with
them, string theorists for example examine higher dimensional theories so they are forced to use
a language more general than that of the conventional vector calculus. See the end of the next
chapter for an example of such thinking.

9.6 poincare’s lemma and converse


This section is in large part inspired by M. Gockeler and T. Schucker’s Differential geometry, gauge
theories, and gravity page 20-22. The converse calculation is modelled on the argument found in
H. Flanders Differential Forms with Applications to the Physical Sciences. The original work was
done around the dawn of the twentieth century and can be found in many texts besides the two I
mentioned here.

9.6.1 exact forms are closed


Proposition 9.6.1.

The exterior derivative of the exterior derivative is zero. 𝑑2 = 0

Proof: Let 𝛼 be an arbitrary 𝑝-form then

1
𝑑𝛼 = (∂𝑚 𝛼𝑖1 𝑖2 ...𝑖𝑝 )𝑑𝑥𝑚 ∧ 𝑑𝑥𝑖1 ∧ 𝑑𝑥𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑖𝑝 (9.4)
𝑝!

then differentiate again,


[ ]
𝑑(𝑑𝛼) = 𝑑 𝑝!1 (∂𝑚 𝛼𝑖1 𝑖2 ...𝑖𝑝 )𝑑𝑥𝑚 ∧ 𝑑𝑥𝑖1 ∧ 𝑑𝑥𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑖𝑝
(9.5)
= 𝑝!1 (∂𝑘 ∂𝑚 𝛼𝑖1 𝑖2 ...𝑖𝑝 )𝑑𝑥𝑘 ∧ 𝑑𝑥𝑚 ∧ 𝑑𝑥𝑖1 ∧ 𝑑𝑥𝑖2 ∧ ⋅ ⋅ ⋅ ∧ 𝑑𝑥𝑖𝑝
=0

since the partial derivatives commute whereas the wedge product anticommutes so we note that
the pair of indices (k,m) is symmetric for the derivatives but antisymmetric for the wedge, as we
know the sum of symmetric against antisymmetric vanishes ( see equation ?? part 𝑖𝑣 if you forgot.)

Definition 9.6.2.

A differential form 𝛼 is closed iff 𝑑𝛼 = 0. A differential form 𝛽 is exact iff there exists 𝛾


such that 𝛽 = 𝑑𝛾.

Proposition 9.6.3.
250 CHAPTER 9. DIFFERENTIAL FORMS

All exact forms are closed. However, there exist closed forms which are not exact.

Proof: Exact implies closed is easy, let 𝛽 be exact such that𝛽 = 𝑑𝛾 then

𝑑𝛽 = 𝑑(𝑑𝛾) = 0

using the theorem 𝑑2 = 0. To prove that there exists a closed form which is not exact it suffices
to give an example. A popular example ( due to its physical significance to magnetic monopoles,
Dirac Strings and the like..) is the following differential form in ℝ2
1
𝜙= (𝑥𝑑𝑦 − 𝑦𝑑𝑥) (9.6)
𝑥2 + 𝑦2
You may verify that 𝑑𝜙 = 0 in homework. Observe that if 𝜙 were exact then there would exist 𝑓
such that 𝜙 = 𝑑𝑓 meaning that
∂𝑓 𝑦 ∂𝑓 𝑥
=− 2 , = 2
∂𝑥 𝑥 + 𝑦2 ∂𝑦 𝑥 + 𝑦2

which are solved by 𝑓 = 𝑎𝑟𝑐𝑡𝑎𝑛(𝑦/𝑥) + 𝑐 where 𝑐 is arbitrary. Observe that 𝑓 is ill-defined along
the 𝑦-axis 𝑥 = 0 ( this is the Dirac String if we put things in context ), however the natural domain
of 𝜙 is ℝ 𝑛×𝑛 − {(0, 0)}.

9.6.2 potentials for closed forms


Poincare’ suggested the following partial converse, he said closed implies exact provided we place
a topological restriction on the domain of the form. In particular, if the domain of a closed form
is smoothly deformable to a point then each closed form is exact. We’ll work out a proof of that
result for a subset of ℝ𝑛 . Be patient, we have to build some toys before we play.

Suppose 𝑈 ⊆ ℝ𝑛 and 𝐼 = [0, 1] we denote a typical point in 𝐼 × 𝑈 as (𝑡, 𝑥) where 𝑡 ∈ 𝐼 and 𝑥 ∈ ℝ𝑛 .


Define maps,
𝐽1 : 𝑈 → 𝐼 × 𝑈, 𝐽0 : 𝑈 → 𝐼 × 𝑈
by 𝐽1 (𝑥) = (1, 𝑥) and 𝐽0 (𝑥) = (0, 𝑥) for each 𝑥 ∈ 𝑈 . Flanders encourages us to view 𝐼 × 𝑈 as a
cylinder and where the map 𝐽1 maps 𝑈 to the top and 𝐽0 maps 𝑈 to the base. We can pull-back
forms on the cylinder to the 𝑈 on the top (𝑡 = 1) or to the base (𝑡 = 0). For instance, if we consider
𝜔 = (𝑥 + 𝑡)𝑑𝑥 + 𝑑𝑡 for the case 𝑛 = 1 then

𝐽0∗ 𝜔 = 𝑥𝑑𝑥 𝐽1∗ 𝜔 = (𝑥 + 1)𝑑𝑥.

Define a smooth mapping 𝐾 of (𝑝 + 1) forms on 𝐼 × 𝑈 to 𝑝-forms on 𝑈 as follows:


(∫ 1 )
𝐼 𝐽
(1.) 𝐾(𝑎(𝑡, 𝑥)𝑑𝑥 ) = 0, (2.) 𝐾(𝑎(𝑡, 𝑥)𝑑𝑡 ∧ 𝑑𝑥 ) = 𝑎(𝑡, 𝑥)𝑑𝑡 𝑑𝑥𝐽
0
9.6. POINCARE’S LEMMA AND CONVERSE 251

for multi-indices 𝐼 of length (𝑝 + 1) and 𝐽 of length 𝑝. The cases (1.) and (2.) simply divide
the possible monomial6 inputs from Λ𝑝+1 (𝐼 × 𝑈 ) into forms which have 𝑑𝑡 and those which don’t.
Then 𝐾 is defined for a general (𝑝 + 1)-form on 𝐼 × 𝑈 by linearly extending the formulas above to
multinomials of the basic monomials.

It turns out that the following identity holds for 𝐾:

Lemma 9.6.4. the 𝐾-lemma.

If 𝜔 is a differential form on 𝐼 × 𝑈 then

𝐾(𝑑𝜔) + 𝑑(𝐾(𝜔)) = 𝐽1∗ 𝜔 − 𝐽0∗ 𝜔.

Proof: Since the equation is given for linear operations it suffices to check the formula for mono-
mials since we can extend the result linearly once those are affirmed. As in the definition of 𝐾
there are two basic categories of forms on 𝐼 × 𝑈 :

Case 1: If 𝜔 = 𝑎(𝑡, 𝑥)𝑑𝑥𝐼 then clearly 𝐾(𝜔) = 0 hence 𝑑(𝐾(𝜔)) = 0. Observe,

∑ ∂𝑎 ∂𝑎
𝑑𝜔 = 𝑑𝑎 ∧ 𝑑𝑥𝐼 = 𝑗
𝑑𝑥𝑗 ∧ 𝑑𝑥𝐼 + 𝑑𝑡 ∧ 𝑑𝑥𝐼
∂𝑥 ∂𝑡
𝑗

Hence 𝐾(𝑑𝜔) is calculated as follows:


(∑ ) ( )
∂𝑎 𝑗 𝐼 ∂𝑎 𝐼
𝐾(𝑑𝜔) = 𝐾 𝑑𝑥 ∧ 𝑑𝑥 + 𝐾 𝑑𝑡 ∧ 𝑑𝑥
∂𝑥𝑗 ∂𝑡
𝑗
(∫ 1 )
∂𝑎
= 𝑑𝑡 𝑑𝑥𝐼
0 ∂𝑡
= 𝑎(𝑥, 1) − 𝑎(𝑥, 0) 𝑑𝑥𝐼
[ ]

= 𝐽1∗ 𝜔 − 𝐽0∗ 𝜔 (9.7)

where we used the FTC in the next to last step. The pull-backs in this case just amount to evalu-
ation at 𝑡 = 0 or 𝑡 = 1 as there is no 𝑑𝑡-type term to squash in 𝜔. The identity follows.

Case 2: Suppose 𝜔 = 𝑎(𝑡, 𝑥)𝑑𝑡 ∧ 𝑑𝑥𝐽 . Calculate,

∑ ∂𝑎 ∂𝑎
𝑑𝜔 = 𝑘
𝑑𝑥𝑘 ∧ 𝑑𝑡 ∧ 𝑑𝑥𝐽 + 𝑑𝑡 ∧ 𝑑𝑡 ∧𝑑𝑥𝐽
∂𝑥 ∂𝑡 | {z }
𝑗 𝑧𝑒𝑟𝑜 !

6
𝑑𝑥 ∧ 𝑑𝑦 is a monomial whereas 𝑑𝑥 + 𝑑𝑦 is a binomial in this context
252 CHAPTER 9. DIFFERENTIAL FORMS

Thus, using 𝑑𝑥𝑘 ∧ 𝑑𝑡 = −𝑑𝑡 ∧ 𝑑𝑥𝑘 , we calculate:


( ∑ )
∂𝑎 𝑘 𝐼
𝐾(𝑑𝜔) = 𝐾 − 𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑥
∂𝑥𝑘
𝑘
∑(∫ 1 ∂𝑎 )
=− 𝑘
𝑑𝑡 𝑑𝑥𝑘 ∧ 𝑑𝑥𝐼
0 ∂𝑥
𝑘

at which point we cannot procede further since 𝑎 is an arbitrary function which can include a
nontrivial time-dependence. We turn to the calculation of 𝑑(𝐾(𝜔)). Recall we defined
(∫ 1 )
𝐾(𝜔) = 𝑎(𝑡, 𝑥)𝑑𝑡 𝑑𝑥𝐽 .
0

We calculate the exterior derivative of 𝐾(𝜔):


(∫ 1 )
𝑑(𝐾(𝜔)) = 𝑑 𝑎(𝑡, 𝑥)𝑑𝑡 ∧ 𝑑𝑥𝐽
0
( [∫ 1 ] ∑ ∂ [∫ 1 ] )

= 𝑎(𝜏, 𝑥)𝑑𝜏 𝑑𝑡 + 𝑎(𝑡, 𝑥)𝑑𝑡 𝑑𝑥 ∧ 𝑑𝑥𝐽
𝑘
∂𝑡 0 ∂𝑥𝑘 0
| {z } 𝑘
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑖𝑛 𝑡
∑(∫ 1 )
∂𝑎
= 𝑑𝑡 𝑑𝑥𝑘 ∧ 𝑑𝑥𝐽 . (9.8)
0 ∂𝑥𝑘
𝑘

Therefore, 𝐾(𝑑𝜔)+𝑑(𝐾(𝜔)) = 0 and clearly 𝐽0∗ 𝜔 = 𝐽1∗ 𝜔 = 0 in this case since the pull-backs squash
the 𝑑𝑡 to zero. The lemma follows. □.

Definition 9.6.5.
A subset 𝑈 ⊆ ℝ𝑛 is deformable to a point 𝑃 if there exists a smooth mapping 𝐺 : 𝐼 ×𝑈 → 𝑈
such that 𝐺(1, 𝑥) = 𝑥 and 𝐺(0, 𝑥) = 𝑃 for all 𝑥 ∈ 𝑈 .
The map 𝐺 deforms 𝑈 smoothly into the point 𝑃 . Recall that 𝐽1 (𝑥) = (1, 𝑥) and 𝐽0 (𝑥) = (0, 𝑥)
hence the conditions on the deformation can be expressed as:

𝐺(𝐽1 (𝑥)) = 𝑥 𝐺(𝐽0 (𝑥)) = 𝑃

Denoting 𝐼𝑑 for the identity on 𝑈 and 𝑃 as the constant mapping with value 𝑃 on 𝑈 we have

𝐺 ∘ 𝐽1 = 𝐼𝑑 𝐺 ∘ 𝐽0 = 𝑃

Therefore, if 𝛾 is a (𝑝 + 1)-form on 𝑈 we calculate,

(𝐺 ∘ 𝐽1 )∗ 𝛾 = 𝐼𝑑∗ 𝛾 ⇒ 𝐽1∗ [𝐺∗ 𝛾] = 𝛾


9.6. POINCARE’S LEMMA AND CONVERSE 253

whereas,
(𝐺 ∘ 𝐽0 )∗ 𝛾 = 𝑃 ∗ 𝛾 = 0 ⇒ 𝐽0∗ [𝐺∗ 𝛾] = 0

Apply the 𝐾-lemma to the form 𝜔 = 𝐺∗ 𝛾 on 𝐼 × 𝑈 and we find:

𝐾(𝑑(𝐺∗ 𝛾)) + 𝑑(𝐾(𝐺∗ 𝛾)) = 𝛾.

However, recall that we proved that pull-backs and exterior derivatives commute thus

𝑑(𝐺∗ 𝛾) = 𝐺∗ (𝑑𝛾)

and we find an extremely interesting identity,

𝐾(𝐺∗ (𝑑𝛾)) + 𝑑(𝐾(𝐺∗ 𝛾)) = 𝛾.

Proposition 9.6.6.

If 𝑈 ⊆ ℝ is deformable to a point 𝑃 then a 𝑝-form 𝛾 on 𝑈 is closed iff 𝜙 is exact.

Proof: Suppose 𝛾 is exact then 𝛾 = 𝑑𝛽 for some (𝑝 − 1)-form 𝛽 on 𝑈 hence 𝑑𝛾 = 𝑑(𝑑𝛽) = 0 by


Proposition 9.6.1 hence 𝛾 is closed. Conversely, suppose 𝛾 is closed. Apply the boxed consequence
of the 𝐾-lemma, note that 𝐾(𝐺∗ (0)) = 0 since we assume 𝑑𝛾 = 0. We find,

𝑑(𝐾(𝐺∗ 𝛾)) = 𝛾

identify that 𝐺∗ 𝛾 is a 𝑝-form on 𝐼 × 𝑈 whereas 𝐾(𝐺∗ 𝛾) is a (𝑝 − 1)-form on 𝑈 by the very con-


struction of 𝐾. It follows that 𝛾 is exact since we have shown how it is obtained as the exterior
derivative of another differential form of one degree less. □

Where was deformability to a point 𝑃 used in the proof above? The key is the existence of the
mapping 𝐺. In other words, if you have a space which is not deformable to a point then no
deformation map 𝐺 is available and the construction via 𝐾 breaks down. Basically, if the space
has a hole which you get stuck on as you deform loops to a point then it is not deformable to a
point. Often we call such spaces simply connected. Careful definition of these terms is too difficult
for calculus, deformation of loops and higher dimensional objects is properly covered in algebraic
topology. In any event, the connection of the deformation and exactness of closed forms allows
topologists to use differential forms detect holes in spaces. In particular:

Definition 9.6.7. de Rham cohomology:


254 CHAPTER 9. DIFFERENTIAL FORMS

We define several real vector spaces of differential forms over some subset 𝑈 of ℝ,

𝑍 𝑝 (𝑈 ) ≡ {𝜙 ∈ Λ𝑝 𝑈 ∣ 𝜙 closed}

the space of closed p-forms on 𝑈 . Then,

𝐵 𝑝 (𝑈 ) ≡ {𝜙 ∈ Λ𝑝 𝑈 ∣ 𝜙 exact}

the space of exact p-forms where by convention 𝐵 0 (𝑈 ) = {0} The de Rham cohomology
groups are defined by the quotient of closed/exact,

𝐻 𝑝 (𝑈 ) ≡ 𝑍 𝑝 (𝑈 )/𝐵 𝑝 (𝑈 ).

the 𝑑𝑖𝑚(𝐻 𝑝 𝑈 ) = 𝑝𝑡ℎ Betti number of U.


We observe that simply connected regions have all the Betti numbers zero since 𝑍 𝑝 (𝑈 ) = 𝐵 𝑝 (𝑈 )
implies that 𝐻 𝑝 (𝑈 ) = {0}. Of course there is much more to say about Cohomology, I just wanted
to give you a taste and alert you to the fact that differential forms can be used to reveal aspects of
topology. Not all algebraic topology uses differential forms though, there are several other calcula-
tional schemes based on triangulation of the space, or studying singular simplexes. One important
event in 20-th century mathematics was the discovery that all these schemes described the same
homology groups. Steenrod reduced the problem to a few central axioms and it was shown that all
the calculational schemes adhere to that same set of axioms.

One interesting aspect of the proof we (copied from Flanders 7 ) is that it is not a mere existence
proof. It actually lays out how to calculate the form which provides exactness. Let’s call 𝛽 the
potential form of 𝛾 if 𝛾 = 𝑑𝛽. Notice this is totally reasonable langauge since in the case of
classical mechanics we consider conservative forces 𝐹⃗ which as derivable from a scalar potential
𝑉 by 𝐹⃗ = −∇𝑉 . Translated into differential forms we have 𝜔𝐹⃗ = −𝑑𝑉 . Let’s explore how the
𝐾-mapping and proof indicate the potential of a vector field ought be calculated.

Suppose 𝑈 is deformable to a point and 𝐹 is a smooth conservative vector field on 𝑈 . Perhaps you
recall that for conservative 𝐹 are irrotational hence ∇ × 𝐹 = 0. Recall that 𝑑𝜔𝐹 = Φ∇×𝐹 = Φ0 = 0
thus the one-form corresponding to a conservative vector field is a closed form. Apply the identity:
let 𝐺 : 𝐼 × 𝑈 → 𝑈 ⊆ ℝ3 be the deformation of 𝑈 to a point 𝑃 ,
𝑑(𝐾(𝐺∗ 𝜔𝐹 )) = 𝜔𝐹
Hence, including the minus to make energy conservation natural,
𝑉 = −𝐾(𝐺∗ 𝜔𝐹 )
For convenience, lets suppose the space considered is the unit-ball 𝐵 and lets use a deformation to
the origin. Explicitly, 𝐺(𝑡, 𝑟) = 𝑡𝑟 for all 𝑟 ∈ ℝ3 such that ∣∣𝑟∣∣ ≤ 1. Note that clearly 𝐺(0, 𝑟) = 0
7
I don’t know the complete history of this calculation at the present. It would be nice to find it since I doubt
Flanders is the originator.
9.6. POINCARE’S LEMMA AND CONVERSE 255

whereas 𝐺(1, 𝑟) = 𝑟 and 𝐺 has a nice formula so it’s smooth8 . We wish to calculate the pull-back
of 𝜔𝐹 = 𝑃 𝑑𝑥 + 𝑄𝑑𝑦 + 𝑅𝑑𝑧 under 𝐺, from the definition of pull-back we have

(𝐺∗ 𝜔𝐹 )(𝑋) = 𝜔𝐹 (𝑑𝐺(𝑋))

for each smooth vector field 𝑋 on 𝐼 × 𝐵. Differential forms on 𝐼 × 𝐵 are written as linear combi-
nations of 𝑑𝑡, 𝑑𝑥, 𝑑𝑦, 𝑑𝑧 with smooth functions as coefficients. We can calculate the coefficents by
evalutaion on the corresponding vector fields ∂𝑡 , ∂𝑥 , ∂𝑦 , ∂𝑧 . Observe, since 𝐺(𝑡, 𝑥, 𝑦, 𝑧) = (𝑡𝑥, 𝑡𝑦, 𝑡𝑧)
we have
∂𝐺1 ∂ ∂𝐺2 ∂ ∂𝐺3 ∂ ∂ ∂ ∂
𝑑𝐺(∂𝑡 ) = + + =𝑥 +𝑦 +𝑧
∂𝑡 ∂𝑥 ∂𝑡 ∂𝑦 ∂𝑡 ∂𝑧 ∂𝑥 ∂𝑦 ∂𝑧
wheras,
∂𝐺1 ∂ ∂𝐺2 ∂ ∂𝐺3 ∂ ∂
𝑑𝐺(∂𝑥 ) = + + =𝑡
∂𝑥 ∂𝑥 ∂𝑥 ∂𝑦 ∂𝑥 ∂𝑧 ∂𝑥
and similarly,
∂ ∂
𝑑𝐺(∂𝑦 ) = 𝑡 𝑑𝐺(∂𝑥 ) = 𝑡
∂𝑦 ∂𝑧
Furthermore,
𝜔𝐹 (𝑑𝐺(∂𝑡 )) = 𝜔𝐹 (𝑥∂𝑥 + 𝑦∂𝑦 + 𝑧∂𝑧 ) = 𝑥𝑃 + 𝑦𝑄 + 𝑧𝑅
𝜔𝐹 (𝑑𝐺(∂𝑥 )) = 𝜔𝐹 (𝑡∂𝑥 ) = 𝑡𝑃, 𝜔𝐹 (𝑑𝐺(∂𝑦 )) = 𝜔𝐹 (𝑡∂𝑦 ) = 𝑡𝑄, 𝜔𝐹 (𝑑𝐺(∂𝑧 )) = 𝜔𝐹 (𝑡∂𝑧 ) = 𝑡𝑅
Therefore,

𝐺∗ 𝜔𝐹 = (𝑥𝑃 + 𝑦𝑄 + 𝑧𝑅)𝑑𝑡 + 𝑡𝑃 𝑑𝑥 + 𝑡𝑄𝑑𝑦 + 𝑡𝑅𝑑𝑧 = (𝑥𝑃 + 𝑦𝑄 + 𝑧𝑅)𝑑𝑡 + 𝑡𝜔𝐹

Now we can calculate 𝐾(𝐺∗ 𝜔𝐹 ) and hence 𝑉 . Note that only the coefficient of 𝑑𝑡 gives a nontrivial
contribution so in retrospect we did a bit more calculation than necessary. That said, I’ll just
keep it as a celebration of extreme youth for calculation. Also, I’ve been a bit careless in omiting
the point up to this point, let’s include the point dependence since it will be critical to properly
understand the formula.
( )

( )
𝐾(𝐺 𝜔𝐹 )(𝑡, 𝑥, 𝑦, 𝑧) = 𝐾 𝑥𝑃 (𝑡𝑥, 𝑡𝑦, 𝑡𝑧) + 𝑦𝑄(𝑡𝑥, 𝑡𝑦, 𝑡𝑧) + 𝑧𝑅(𝑡𝑥, 𝑡𝑦, 𝑡𝑧) 𝑑𝑡

Therefore,
∫ 1(

)
𝑉 (𝑥, 𝑦, 𝑧) = −𝐾(𝐺 𝜔𝐹 ) = − 𝑥𝑃 (𝑡𝑥, 𝑡𝑦, 𝑡𝑧) + 𝑦𝑄(𝑡𝑥, 𝑡𝑦, 𝑡𝑧) + 𝑧𝑅(𝑡𝑥, 𝑡𝑦, 𝑡𝑧) 𝑑𝑡
0

Notice this is precisely the line-integral of 𝐹 =< 𝑃, 𝑄, 𝑅 > along the line 𝐶 with direction < 𝑥, 𝑦, 𝑧 >
from the origin to (𝑥, 𝑦, 𝑧). In particular, if ⃗𝑟(𝑡) =< 𝑡𝑥, 𝑡𝑦, 𝑡𝑧 > then 𝑑⃗ 𝑟
𝑑𝑡 =< 𝑥, 𝑦, 𝑧 > hence we
identify ∫ 1 ∫
) 𝑑⃗𝑟
⃗ 𝐹⃗ ⋅ 𝑑⃗𝑟
(
𝑉 (𝑥, 𝑦, 𝑧) = − 𝐹 ⃗𝑟(𝑡) ⋅ 𝑑𝑡 = −
0 𝑑𝑡 𝐶
8
there is of course a deeper meaning to the word, but, for brevity I gloss over this.
256 CHAPTER 9. DIFFERENTIAL FORMS

Perhaps you recall this is precisely how we calculate the potential function for a conservative vector
field provided we take the origin as the zero for the potential.

Actually, this calculation is quite interesting. Suppose we used a different deformation


𝐺˜ : 𝐼 × 𝑈 → 𝑈 . For fixed point 𝑄 we travel to from the origin to the point by the path 𝑡 7→ 𝐺(𝑡,
˜ 𝑄).
Of course this path need not be a line. The space considered might look like a snake where a line
cannot reach from the base point 𝑃 to the point 𝑄. But, the same potential is derived. Why?
Path independence of the vector field is one answer. The criteria ∇ × 𝐹 = 0 suffices for a sim-
ply connected region. However, we see something deeper. The criteria of a closed form paired
with a simply connected (deformable) domain suffices to construct a potential for the given form.
This result reproduces the familar case of conservative vector fields derived from scalar potentials
and much more. In Flanders he calculates the potential for a closed two-form. This ought to
be the mathematics underlying the construction of the so-called vector potential of magnetism.
In junior-level electromagnetism9 the magnetic field 𝐵 satisfies ∇ ⋅ 𝐵 = 0 and thus the two-form
Φ𝐵 has exterior derivative 𝑑Φ𝐵 = ∇ ⋅ 𝐵𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 = 0. The magnetic field corresponds to a
closed form. Poincare’s lemma shows that there exists a one-form 𝜔𝐴 such that 𝑑𝜔𝐴 = Φ𝐵 . But
this means Φ∇×𝐴 = Φ𝐵 hence in the langauge of vector fields we expect the vector potential 𝐴
generated the magnetic field 𝐵 throught the curl 𝐵 = ∇ × 𝐴. Indeed, this is precisely what a
typical junior level physics student learns in the magnetostatic case. Appreciate that it goes deeper
still, the Poincare lemma holds for 𝑝-forms which correspond to objects which don’t match up with
the quaint vector fields of 19-th century physics. We can be confident to find potential for 3-form
fluxes in a 10-dimensional space, or wherever our imagination takes us. I explain at the end of this
chapter how to translate electromagnetics into the langauge of differential forms, it may well be
that in the future we think about forms the way we currently think about vectors. This is one of
the reasons I like Flanders text, he really sticks with the langauge of differential forms throughout.
In contrast to these notes, he just does what is most interesting. I think undergraduates need to
see more detail and not just the most clever calculations, but, I can hardly blame Flanders! He
makes no claim to be an undergraduate work.

Finally, I should at least mention that though we can derive a potential 𝛽 for a given closed form
𝛼 on a simply connected domain it need not be unique. In fact, it will not be unique unless we add
further criteria for the potential. This ambuity is called gauge freedom in physics. Mathematically
it’s really simple give form language. If 𝛼 = 𝑑𝛽 where 𝛽 is a (𝑝 − 1)-form then we can take any
smooth (𝑝 − 2) form and calculate that

𝑑(𝛼 + 𝑑𝜆) = 𝑑𝛽 + 𝑑2 𝜆 = 𝑑𝛽 = 𝛼

Therefore, if 𝛽 is a potential-form for 𝛼 then 𝛽 + 𝑑𝜆 is also a potential-form for 𝛼.

9
just discussing magnetostatic case here to keep it simple
9.7. CLASSICAL DIFFERENTIAL GEOMETRY IN FORMS 257

9.7 classical differential geometry in forms


XXX- already worked it out, copy from my notepad.
258 CHAPTER 9. DIFFERENTIAL FORMS

9.8 E & M in differential form


Warning: I will use Einstein’s implicit summation convention throughout this section.
I have made a point of abstaining from Einstein’s convention in these notes up to this point.
However, I just can’t bear the summations in this section. They’re just too ugly.

9.8.1 differential forms in Minkowski space


The logic here follows fairly close to the last section, however the wrinkle is that the metric here
demands more attention. We must take care to raise the indices on the forms when we Hodge dual
them. First we list the basis differential forms, we have to add time to the mix ( again 𝑐 = 1 so
𝑥0 = 𝑐𝑡 = 𝑡 if you worried about it )

Name Degree Typical Element ”Basis” for Λ𝑝 (ℝ4 )


function 𝑝=0 𝑓 1
one-form 𝑝=1 𝛼 = 𝛼𝜇 𝑑𝑥𝜇 𝑑𝑡, 𝑑𝑥, 𝑑𝑦, 𝑑𝑧
two-form 𝑝=2 𝛽 = 12 𝛽𝜇𝜈 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 𝑑𝑦 ∧ 𝑑𝑧, 𝑑𝑧 ∧ 𝑑𝑥, 𝑑𝑥 ∧ 𝑑𝑦
𝑑𝑡 ∧ 𝑑𝑥, 𝑑𝑡 ∧ 𝑑𝑦, 𝑑𝑡 ∧ 𝑑𝑧
1 𝜇
three-form 𝑝=3 𝛾= 3! 𝛾𝜇𝜈𝛼 𝑑𝑥 ∧ 𝑑𝑥𝜈 𝑑𝑥𝛼 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧, 𝑑𝑡 ∧ 𝑑𝑦 ∧ 𝑑𝑧
𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑧, 𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦
four-form 𝑝=4 𝑔𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧

Greek indices are defined to range over 0, 1, 2, 3. Here the top form is degree four since in four
dimensions we can have four differentials without a repeat. Wedge products work the same as they
have before, just now we have 𝑑𝑡 to play with. Hodge duality may offer some surprises though.

Definition 9.8.1. The antisymmetric symbol in flat ℝ4 is denoted 𝜖𝜇𝜈𝛼𝛽 and it is defined by the
value
𝜖0123 = 1
plus the demand that it be completely antisymmetric.

We must not assume that this symbol is invariant under a cyclic exhange of indices. Consider,

𝜖0123 = −𝜖1023 flipped (01)


= +𝜖1203 flipped (02) (9.9)
= −𝜖1230 flipped (03).

Example 9.8.2. We now compute the Hodge dual of 𝛾 = 𝑑𝑥 with respect to the Minkowski metric
𝜂𝜇𝜈 . First notice that 𝑑𝑥 has components 𝛾𝜇 = 𝛿𝜇1 as is readily verified by the equation 𝑑𝑥 = 𝛿𝜇1 𝑑𝑥𝜇 .
We raise the index using 𝜂, as follows

𝛾 𝜇 = 𝜂 𝜇𝜈 𝛾𝜈 = 𝜂 𝜇𝜈 𝛿𝜈1 = 𝜂 1𝜇 = 𝛿 1𝜇 .
9.8. E & M IN DIFFERENTIAL FORM 259

Beginning with the definition of the Hodge dual we calculate


∗ (𝑑𝑥) 1
= (4−1)! 𝛿 1𝜇 𝜖𝜇𝜈𝛼𝛽 𝑑𝑥𝜈 ∧ 𝑑𝑥𝛼 ∧ 𝑑𝑥𝛽
= (1/6)𝜖1𝜈𝛼𝛽 𝑑𝑥𝜈 ∧ 𝑑𝑥𝛼 ∧ 𝑑𝑥𝛽

= (1/6)[𝜖1023 𝑑𝑡 ∧ 𝑑𝑦 ∧ 𝑑𝑧 + 𝜖1230 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡 + 𝜖1302 𝑑𝑧 ∧ 𝑑𝑡 ∧ 𝑑𝑦


+𝜖1320 𝑑𝑧 ∧ 𝑑𝑦 ∧ 𝑑𝑡 + 𝜖1203 𝑑𝑦 ∧ 𝑑𝑡 ∧ 𝑑𝑧 + 𝜖1032 𝑑𝑡 ∧ 𝑑𝑧 ∧ 𝑑𝑦]
(9.10)
= (1/6)[−𝑑𝑡 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡 − 𝑑𝑧 ∧ 𝑑𝑡 ∧ 𝑑𝑦
+𝑑𝑧 ∧ 𝑑𝑦 ∧ 𝑑𝑡 + 𝑑𝑦 ∧ 𝑑𝑡 ∧ 𝑑𝑧 + 𝑑𝑡 ∧ 𝑑𝑧 ∧ 𝑑𝑦]

= −𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡.

The difference between the three and four dimensional Hodge dual arises from two sources, for
one we are using the Minkowski metric so indices up or down makes a difference, and second the
antisymmetric symbol has more possibilities than before because the Greek indices take four values.

Example 9.8.3. We find the Hodge dual of 𝛾 = 𝑑𝑡 with respect to the Minkowski metric 𝜂𝜇𝜈 .
Notice that 𝑑𝑡 has components 𝛾𝜇 = 𝛿𝜇0 as is easily seen using the equation 𝑑𝑡 = 𝛿𝜇0 𝑑𝑥𝜇 . Raising the
index using 𝜂 as usual, we have

𝛾 𝜇 = 𝜂 𝜇𝜈 𝛾𝜈 = 𝜂 𝜇𝜈 𝛿𝜈0 = −𝜂 0𝜇 = −𝛿 0𝜇

where the minus sign is due to the Minkowski metric. Starting with the definition of Hodge duality
we calculate
∗ (𝑑𝑡) = −(1/6)𝛿 0𝜇 𝜖 𝜈 𝛼 𝛽
𝜇𝜈𝛼𝛽 𝑑𝑥 ∧ 𝑑𝑥 ∧ 𝑑𝑥
𝜈 𝛼
= −(1/6)𝜖0𝜈𝛼𝛽 𝑑𝑥 ∧ 𝑑𝑥 ∧ 𝑑𝑥 𝛽

(9.11)
= −(1/6)𝜖0𝑖𝑗𝑘 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘
= −(1/6)𝜖𝑖𝑗𝑘 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘
= −𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧.
for the case here we are able to use some of our old three dimensional ideas. The Hodge dual of 𝑑𝑡
cannot have a 𝑑𝑡 in it which means our answer will only have 𝑑𝑥, 𝑑𝑦, 𝑑𝑧 in it and that is why we
were able to shortcut some of the work, (compared to the previous example).

Example 9.8.4. Finally, we find the Hodge dual of 𝛾 = 𝑑𝑡∧𝑑𝑥 with respect to the Minkowski metric
𝜂𝜇𝜈 . Recall that ∗ (𝑑𝑡∧𝑑𝑥) = (4−2)!
1
𝜖01𝜇𝜈 𝛾 01 (𝑑𝑥𝜇 ∧𝑑𝑥𝜈 ) and that 𝛾 01 = 𝜂 0𝜆 𝜂 1𝜌 𝛾𝜆𝜌 = (−1)(1)𝛾01 = −1.
Thus
∗ (𝑑𝑡 ∧ 𝑑𝑥) = −(1/2)𝜖 𝜇 𝜈
01𝜇𝜈 𝑑𝑥 ∧ 𝑑𝑥
= −(1/2)[𝜖0123 𝑑𝑦 ∧ 𝑑𝑧 + 𝜖0132 𝑑𝑧 ∧ 𝑑𝑦]
(9.12)
= −𝑑𝑦 ∧ 𝑑𝑧.
Notice also that since 𝑑𝑡 ∧ 𝑑𝑥 = −𝑑𝑥 ∧ 𝑑𝑡 we find ∗(𝑑𝑥 ∧ 𝑑𝑡) = 𝑑𝑦 ∧ 𝑑𝑧
260 CHAPTER 9. DIFFERENTIAL FORMS

∗1 = 𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∗ (𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧) = −1
∗ (𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧) = −𝑑𝑡 ∗ 𝑑𝑡 = −𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
∗ (𝑑𝑡 ∧ 𝑑𝑦 ∧ 𝑑𝑧) = −𝑑𝑥 ∗ 𝑑𝑥 = −𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡
∗ (𝑑𝑡 ∧ 𝑑𝑧 ∧ 𝑑𝑥) = −𝑑𝑦 ∗ 𝑑𝑦 = −𝑑𝑧 ∧ 𝑑𝑥 ∧ 𝑑𝑡
∗ (𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦) = −𝑑𝑧 ∗ 𝑑𝑧 = −𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑡
∗ (𝑑𝑧 ∧ 𝑑𝑡) = 𝑑𝑥 ∧ 𝑑𝑦 ∗ (𝑑𝑥 ∧ 𝑑𝑦) = −𝑑𝑧 ∧ 𝑑𝑡
∗ (𝑑𝑥 ∧ 𝑑𝑡) = 𝑑𝑦 ∧ 𝑑𝑧 ∗ (𝑑𝑦 ∧ 𝑑𝑧) = −𝑑𝑥 ∧ 𝑑𝑡
∗ (𝑑𝑦 ∧ 𝑑𝑡) = 𝑑𝑧 ∧ 𝑑𝑥 ∗ (𝑑𝑧 ∧ 𝑑𝑥) = −𝑑𝑦 ∧ 𝑑𝑡

The other Hodge duals of the basic two-forms follow from similar calculations. Here is a table of
all the basic Hodge dualities in Minkowski space, In the table the terms are grouped as they are to
emphasize the isomorphisms between the one-dimensional Λ0 (𝑀 ) and Λ4 (𝑀 ), the four-dimensional
Λ1 (𝑀 ) and Λ3 (𝑀 ), the six-dimensional Λ2 (𝑀 ) and itself. Notice that the dimension of Λ(𝑀 ) is
16 which just happens to be 24 .

Now that we’ve established how the Hodge dual works on the differentials we can easily take the
Hodge dual of arbitrary differential forms on Minkowski space. We begin with the example of the
4-current 𝒥
Example 9.8.5. Four Current: often in relativistic physics we would even just call the four
current simply the current, however it actually includes the charge density 𝜌 and current density
⃗ Consequently, we define,
𝐽.

(𝒥 𝜇 ) ≡ (𝜌, 𝐽),
moreover if we lower the index we obtain,

(𝒥𝜇 ) = (−𝜌, 𝐽)

which are the components of the current one-form,

𝒥 = 𝒥𝜇 𝑑𝑥𝜇 = −𝜌𝑑𝑡 + 𝐽𝑥 𝑑𝑥 + 𝐽𝑦 𝑑𝑦 + 𝐽𝑧 𝑑𝑧

This equation could be taken as the definition of the current as it is equivalent to the vector defini-
tion. Now we can rewrite the last equation using the vectors 7→ forms mapping as,

𝒥 = −𝜌𝑑𝑡 + 𝜔𝐽⃗.

Consider the Hodge dual of 𝒥 ,


∗𝒥 = ∗ (−𝜌𝑑𝑡 + 𝐽𝑥 𝑑𝑥 + 𝐽𝑦 𝑑𝑦 + 𝐽𝑧 𝑑𝑧)
= −𝜌∗ 𝑑𝑡 + 𝐽𝑥 ∗ 𝑑𝑥 + 𝐽𝑦 ∗ 𝑑𝑦 + 𝐽𝑧 ∗ 𝑑𝑧
(9.13)
= 𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 𝐽𝑥 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡 − 𝐽𝑦 𝑑𝑧 ∧ 𝑑𝑥 ∧ 𝑑𝑡 − 𝐽𝑧 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑡
= 𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − Φ𝐽⃗ ∧ 𝑑𝑡.

we will find it useful to appeal to this calculation in a later section.


9.8. E & M IN DIFFERENTIAL FORM 261

Example 9.8.6. Four Potential: often in relativistic physics we would call the four potential

simply the potential, however it actually includes the scalar potential 𝑉 and the vector potential 𝐴
(discussed at the end of chapter 3). To be precise we define,

(𝐴𝜇 ) ≡ (𝑉, 𝐴)

we can lower the index to obtain,



(𝐴𝜇 ) = (−𝑉, 𝐴)
which are the components of the current one-form,

𝐴 = 𝐴𝜇 𝑑𝑥𝜇 = −𝑉 𝑑𝑡 + 𝐴𝑥 𝑑𝑥 + 𝐴𝑦 𝑑𝑦 + 𝐴𝑧 𝑑𝑧

Sometimes this equation is taken as the definition of the four potential. We can rewrite the four
potential vector field using the vectors 7→ forms mapping as,

𝐴 = −𝑉 𝑑𝑡 + 𝜔𝐴⃗ .

The Hodge dual of 𝐴 is



𝐴 = 𝑉 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − Φ𝐴⃗ ∧ 𝑑𝑡. (9.14)
Several steps were omitted because they are identical to the calculation of the dual of the 4-current
above.

Definition 9.8.7. Faraday tensor.


⃗ = (𝐸1 , 𝐸2 , 𝐸3 ) and a magnetic field 𝐵
Given an electric field 𝐸 ⃗ = (𝐵1 , 𝐵2 , 𝐵3 ) we define a
2-form 𝐹 by
𝐹 = 𝜔𝐸 ∧ 𝑑𝑡 + Φ𝐵 .
This 2-form is often called the electromagnetic field tensor or the Faraday tensor. If
we write it in tensor components as 𝐹 = 12 𝐹𝜇𝜈 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 and then consider its matrix (𝐹𝜇𝜈 )
of components then it is easy to see that
⎛ ⎞
0 −𝐸1 −𝐸2 −𝐸3
⎜𝐸1 0 𝐵3 −𝐵2 ⎟
(𝐹𝜇𝜈 ) = ⎜
⎝𝐸2 −𝐵3
⎟ (9.15)
0 𝐵1 ⎠
𝐸3 𝐵2 −𝐵1 0

Convention: Notice that when we write the matrix version of the tensor components we take the
first index to be the row index and the second index to be the column index, that means 𝐹01 = −𝐸1
whereas 𝐹10 = 𝐸1 .
Example 9.8.8. In this example we demonstrate various conventions which show how one can
transform the field tensor to other type tensors. Define a type (1, 1) tensor by raising the first index
by the inverse metric 𝜂 𝛼𝜇 as follows,
𝐹 𝛼 𝜈 = 𝜂 𝛼𝜇 𝐹𝜇𝜈
262 CHAPTER 9. DIFFERENTIAL FORMS

The zeroth row,


(𝐹 0 𝜈 ) = (𝜂 0𝜇 𝐹𝜇𝜈 ) = (0, 𝐸1 , 𝐸2 , 𝐸3 )
Then row one is unchanged since 𝜂 1𝜇 = 𝛿 1𝜇 ,
(𝐹 1 𝜈 ) = (𝜂 1𝜇 𝐹𝜇𝜈 ) = (𝐸1 , 0, 𝐵3 , −𝐵2 )
and likewise for rows two and three. In summary the (1,1) tensor 𝐹 ′ = 𝐹𝜈𝛼 ( ∂𝑥∂𝛼 ⊗ 𝑑𝑥𝜈 ) has the
components below ⎛ ⎞
0 𝐸1 𝐸2 𝐸3
⎜𝐸1 0 𝐵3 −𝐵2 ⎟
(𝐹 𝛼 𝜈 ) = ⎜
⎝𝐸2 −𝐵3
⎟. (9.16)
0 𝐵1 ⎠
𝐸3 𝐵2 −𝐵1 0
At this point we raise the other index to create a (2, 0) tensor,

𝐹 𝛼𝛽 = 𝜂 𝛼𝜇 𝜂 𝛽𝜈 𝐹𝜇𝜈 (9.17)

and we see that it takes one copy of the inverse metric to raise each index and 𝐹 𝛼𝛽 = 𝜂 𝛽𝜈 𝐹 𝛼 𝜈 so
we can pick up where we left off in the (1, 1) case. We could proceed case by case like we did with
the (1, 1) case but it is better to use matrix multiplication. Notice that 𝜂 𝛽𝜈 𝐹 𝛼 𝜈 = 𝐹 𝛼 𝜈 𝜂 𝜈𝛽 is just
the (𝛼, 𝛽) component of the following matrix product,
⎛ ⎞⎛ ⎞ ⎛ ⎞
0 𝐸1 𝐸2 𝐸3 −1 0 0 0 0 𝐸1 𝐸2 𝐸3
⎜𝐸1 0 𝐵3 −𝐵2 ⎟ ⎟ ⎜ 0 1 0 0⎟ = ⎜−𝐸1 0 𝐵3 −𝐵2 ⎟
(𝐹 𝛼𝛽 ) = ⎜
⎜ ⎟ ⎜
⎟. (9.18)
⎝𝐸2 −𝐵3 0 𝐵1 ⎠ ⎝ 0 0 1 0 ⎠ ⎝−𝐸2 −𝐵3 0 𝐵1 ⎠
𝐸3 𝐵2 −𝐵1 0 0 0 0 1 −𝐸3 𝐵2 −𝐵1 0
So we find a (2, 0) tensor 𝐹 ′′ = 𝐹 𝛼𝛽 ( ∂𝑥∂𝛼 ⊗ ∂𝑥∂ 𝛽 ). Other books might even use the same symbol 𝐹 for
𝐹 ′ and 𝐹 ′′ , it is in fact typically clear from the context which version of 𝐹 one is thinking about.
Pragmatically physicists just write the components so its not even an issue for them.
Example 9.8.9. Field tensor’s dual: We now calculate the Hodge dual of the field tensor,
∗𝐹 = ∗ (𝜔𝐸 ∧ 𝑑𝑡 + Φ𝐵 )
= 𝐸𝑥 ∗ (𝑑𝑥 ∧ 𝑑𝑡) + 𝐸𝑦 ∗ (𝑑𝑦 ∧ 𝑑𝑡) + 𝐸𝑧 ∗ (𝑑𝑧 ∧ 𝑑𝑡)
+𝐵𝑥 ∗ (𝑑𝑦 ∧ 𝑑𝑧) + 𝐵𝑦 ∗ (𝑑𝑧 ∧ 𝑑𝑥) + 𝐵𝑧 ∗ (𝑑𝑥 ∧ 𝑑𝑦)
= 𝐸𝑥 𝑑𝑦 ∧ 𝑑𝑧 + 𝐸𝑦 𝑑𝑧 ∧ 𝑑𝑥 + 𝐸𝑧 𝑑𝑥 ∧ 𝑑𝑦
−𝐵𝑥 𝑑𝑥 ∧ 𝑑𝑡 − 𝐵𝑦 𝑑𝑦 ∧ 𝑑𝑡 − 𝐵𝑧 𝑑𝑧 ∧ 𝑑𝑡
= Φ𝐸 − 𝜔𝐵 ∧ 𝑑𝑡.
we can also write the components of ∗ 𝐹 in matrix form:
⎛ ⎞
0 𝐵1 𝐵2 𝐵3
⎜−𝐵1 0 𝐸3 −𝐸2 ⎟
(∗ 𝐹𝜇𝜈 ) = ⎜
⎝−𝐵2 −𝐸3
⎟ (9.19)
0 𝐸1 ⎠
−𝐵3 𝐸2 −𝐸1 0.
⃗ →
Notice that the net-effect of Hodge duality on the field tensor was to make the exchanges 𝐸 ⃗
7 −𝐵
and 𝐵⃗ 7→ 𝐸.

9.8. E & M IN DIFFERENTIAL FORM 263

9.8.2 exterior derivatives of charge forms, field tensors, and their duals

In the last chapter we found that the single operation of the exterior differentiation reproduces the
gradiant, curl and divergence of vector calculus provided we make the appropriate identifications
under the ”work” and ”flux” form mappings. We now move on to some four dimensional examples.

Example 9.8.10. Charge conservation: Consider the 4-current we introduced in example 9.8.5.
Take the exterior derivative of the dual of the current to get,

𝑑(∗ 𝒥 ) = 𝑑(𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − Φ𝐽⃗ ∧ 𝑑𝑡)


= (∂𝑡 𝜌)𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 𝑑[(𝐽𝑥 𝑑𝑦 ∧ 𝑑𝑧 + 𝐽𝑦 𝑑𝑧 ∧ 𝑑𝑥 + 𝐽𝑧 𝑑𝑥 ∧ 𝑑𝑦) ∧ 𝑑𝑡]
= 𝑑𝜌 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
−∂𝑥 𝐽𝑥 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡 − ∂𝑦 𝐽𝑦 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑥 ∧ 𝑑𝑡 − ∂𝑧 𝐽𝑧 𝑑𝑧 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑡
⃗ ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧.
= (∂𝑡 𝜌 + ∇ ⋅ 𝐽)𝑑𝑡

We work through the same calculation using index techniques,

𝑑(∗ 𝒥 ) = 𝑑(𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − Φ𝐽⃗ ∧ 𝑑𝑡)


= 𝑑(𝜌) ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 𝑑[ 12 𝜖𝑖𝑗𝑘 𝐽𝑖 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 ∧ 𝑑𝑡)
= (∂𝑡 𝜌)𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 12 𝜖𝑖𝑗𝑘 ∂𝜇 𝐽𝑖 𝑑𝑥𝜇 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 ∧ 𝑑𝑡
= (∂𝑡 𝜌)𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 12 𝜖𝑖𝑗𝑘 ∂𝑚 𝐽𝑖 𝑑𝑥𝑚 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 ∧ 𝑑𝑡
= (∂𝑡 𝜌)𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 12 𝜖𝑖𝑗𝑘 𝜖𝑚𝑗𝑘 ∂𝑚 𝐽𝑖 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡
= (∂𝑡 𝜌)𝑑𝑡 ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 − 12 2𝛿𝑖𝑚 ∂𝑚 𝐽𝑖 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡
⃗ ∧ 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧.
= (∂𝑡 𝜌 + ∇ ⋅ 𝐽)𝑑𝑡

Observe that we can now phrase charge conservation by the following equation

𝑑(∗ 𝒥 ) = 0 ⇐⇒ ∂𝑡 𝜌 + ∇ ⋅ 𝐽⃗ = 0.

In the classical scheme of things this was a derived consequence of the equations of electromagnetism,
however it is possible to build the theory regarding this equation as fundamental. Rindler describes
that formal approach in a late chapter of ”Introduction to Special Relativity”.

Proposition 9.8.11.

⃗ is the vector potential (which gives the magnetic field) and 𝐴 = −𝑉 𝑑𝑡 +


If (𝐴𝜇 ) = (−𝑉, 𝐴)
𝜔𝐴⃗ , then 𝑑𝐴 = 𝜔𝐸⃗ + Φ𝐵⃗ = 𝐹 where 𝐹 is the electromagnetic field tensor. Moreover,
𝐹𝜇𝜈 = ∂𝜇 𝐴𝜈 − ∂𝜈 𝐴𝜇 .
264 CHAPTER 9. DIFFERENTIAL FORMS

⃗ = −∇𝑉 − ∂𝑡 𝐴 and 𝐵
Proof: The proof uses the definitions 𝐸 ⃗ = ∇×𝐴
⃗ and some vector identities:

𝑑𝐴 = 𝑑(−𝑉 𝑑𝑡 + 𝜔𝐴⃗ )
= −𝑑𝑉 ∧ 𝑑𝑡 + 𝑑(𝜔𝐴⃗ )
= −𝑑𝑉 ∧ 𝑑𝑡 + (∂𝑡 𝐴𝑖 )𝑑𝑡 ∧ 𝑑𝑥𝑖 + (∂𝑗 𝐴𝑖 )𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖
= 𝜔(−∇𝑉 ) ∧ 𝑑𝑡 − 𝜔∂𝑡 𝐴⃗ ∧ 𝑑𝑡 + Φ∇×𝐴⃗
= (𝜔(−∇𝑉 ) − 𝜔∂𝑡 𝐴⃗ ) ∧ 𝑑𝑡 + Φ∇×𝐴⃗
⃗ ∧ 𝑑𝑡 + Φ∇×𝐴
= 𝜔(−∇𝑉 −∂𝑡 𝐴) ⃗
= 𝜔𝐸⃗ ∧ 𝑑𝑡 + Φ𝐵⃗
1
= 𝐹 = 𝐹𝜇𝜈 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 .
2
Moreover we also have:

𝑑𝐴 = 𝑑(𝐴𝜈 ) ∧ 𝑑𝑥𝜈
= ∂𝜇 𝐴𝜈 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈
= 21 (∂𝜇 𝐴𝜈 − ∂𝜈 𝐴𝜇 )𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 + 21 (∂𝜇 𝐴𝜈 + ∂𝜈 𝐴𝜇 )𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈
= 21 (∂𝜇 𝐴𝜈 − ∂𝜈 𝐴𝜇 )𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 .

Comparing the two identities we see that 𝐹𝜇𝜈 = ∂𝜇 𝐴𝜈 − ∂𝜈 𝐴𝜇 and the proposition follows.

Example 9.8.12. Exterior derivative of the field tensor: We have just seen that the field
tensor is the exterior derivative of the potential one-form. We now compute the exterior derivative
of the field tensor expecting to find Maxwell’s equations since the derivative of the fields are governed
by Maxwell’s equations,

𝑑𝐹 = 𝑑(𝐸𝑖 𝑑𝑥𝑖 ∧ 𝑑𝑡) + 𝑑(Φ𝐵⃗ )


⃗ (9.20)
= ∂𝑚 𝐸𝑖 (𝑑𝑥𝑚 ∧ 𝑑𝑥𝑖 ∧ 𝑑𝑡) + (∇ ⋅ 𝐵)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 + 21 𝜖𝑖𝑗𝑘 (∂𝑡 𝐵𝑖 )(𝑑𝑡 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 ).

W pause here to explain our logic. In the above we dropped the ∂𝑡 𝐸𝑖 𝑑𝑡 ∧ ⋅ ⋅ ⋅ term because it was
wedged with another 𝑑𝑡 in the term so it vanished. Also we broke up the exterior derivative on the
⃗ into the space and then time derivative terms and used our work in example 9.2.6.
flux form of 𝐵
Continuing the calculation,

𝑑𝐹 ⃗
= [∂𝑗 𝐸𝑘 + 12 𝜖𝑖𝑗𝑘 (∂𝑡 𝐵𝑖 )]𝑑𝑥𝑗 ∧ 𝑑𝑥𝑘 ∧ 𝑑𝑡 + (∇ ⋅ 𝐵)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
= [∂𝑥 𝐸𝑦 − ∂𝑦 𝐸𝑥 + 𝜖𝑖12 (∂𝑡 𝐵𝑖 )]𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑡
+[∂𝑧 𝐸𝑥 − ∂𝑥 𝐸𝑧 + 𝜖𝑖31 (∂𝑡 𝐵𝑖 )]𝑑𝑧 ∧ 𝑑𝑥 ∧ 𝑑𝑡
+[∂𝑦 𝐸𝑧 − ∂𝑧 𝐸𝑦 + 𝜖𝑖23 (∂𝑡 𝐵𝑖 )]𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑡 (9.21)

+(∇ ⋅ 𝐵)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧

= (∇ × 𝐸 + ∂𝑡 𝐵) ⃗ 𝑖 Φ𝑒 ∧ 𝑑𝑡 + (∇ ⋅ 𝐵)𝑑𝑥
⃗ ∧ 𝑑𝑦 ∧ 𝑑𝑧
𝑖
= Φ∇×𝐸+∂


⃗ ∧ 𝑑𝑡 + (∇ ⋅ 𝐵)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
𝑡𝐵
9.8. E & M IN DIFFERENTIAL FORM 265

where we used the fact that Φ is an isomorphism of vector spaces (at a point) and Φ𝑒1 = 𝑑𝑦 ∧ 𝑑𝑧,
Φ𝑒2 = 𝑑𝑧 ∧ 𝑑𝑥, and Φ𝑒3 = 𝑑𝑥 ∧ 𝑑𝑦. Behold, we can state two of Maxwell’s equations as

𝑑𝐹 = 0 ⇐⇒ ∇×𝐸 ⃗ = 0,
⃗ + ∂𝑡 𝐵 ⃗ =0
∇⋅𝐵 (9.22)

Example 9.8.13. We now compute the exterior derivative of the dual to the field tensor:

𝑑∗ 𝐹 = 𝑑(−𝐵𝑖 𝑑𝑥𝑖 ∧ 𝑑𝑡) + 𝑑(Φ𝐸⃗ )


⃗ (9.23)
= Φ−∇×𝐵+∂
⃗ ⃗ ∧ 𝑑𝑡 + (∇ ⋅ 𝐸)𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧
𝑡𝐸

⃗ 7→ −𝐵
This follows directly from the last example by replacing 𝐸 ⃗ and 𝐵
⃗ 7→ 𝐸.
⃗ We obtain the two

inhomogeneous Maxwell’s equations by setting 𝑑 𝐹 equal to the Hodge dual of the 4-current,

𝑑∗ 𝐹 = 𝜇𝑜 ∗ 𝒥 ⇐⇒ ⃗ + ∂𝑡 𝐸
−∇ × 𝐵 ⃗
⃗ = −𝜇𝑜 𝐽, ⃗ =𝜌
∇⋅𝐸 (9.24)

Here we have used example 9.8.5 to find the RHS of the Maxwell equations.

We now know how to write Maxwell’s equations via differential forms. The stage is set to prove that
Maxwell’s equations are Lorentz covariant, that is they have the same form in all inertial frames.

9.8.3 coderivatives and comparing to Griffith’s relativitic E & M


Optional section, for those who wish to compare our tensorial E & M with that of
Griffith’s, you may skip ahead to the next section if not interested

I should mention that this is not the only way to phrase Maxwell’s equations in terms of
differential forms. If you try to see how what we have done here compares with the equations
presented in Griffith’s text it is not immediately obvious. He works with 𝐹 𝜇𝜈 and 𝐺𝜇𝜈 and 𝐽 𝜇 none
of which are the components of differential forms. Nevertheless he recovers Maxwell’s equations
as ∂𝜇 𝐹 𝜇𝜈 = 𝐽 𝜈 and ∂𝜇 𝐺𝜇𝜈 = 0. If we compare the components of ∗ 𝐹 with equation 12.119 ( the
matrix form of 𝐺𝜇𝜈 ) in Griffith’s text,
⎛ ⎞
0 𝐵1 𝐵2 𝐵3
⎜−𝐵1 0 −𝐸3 𝐸2 ⎟
(𝐺𝜇𝜈 (𝑐 = 1)) = ⎜ ⎟ = −(∗ 𝐹 𝜇𝜈 ). (9.25)
⎝−𝐵2 −𝐸3 0 −𝐸1 ⎠
−𝐵3 𝐸2 −𝐸1 0

we find that we obtain the negative of Griffith’s ”dual tensor” ( recall that raising the indices has
the net-effect of multiplying the zeroth row and column by −1). The equation ∂𝜇 𝐹 𝜇𝜈 = 𝐽 𝜈 does not
follow directly from an exterior derivative, rather it is the component form of a ”coderivative”. The
coderivative is defined 𝛿 = ∗ 𝑑∗ , it takes a 𝑝-form to an (𝑛−𝑝)-form then 𝑑 makes it a (𝑛−𝑝+1)-form
then finally the second Hodge dual takes it to an (𝑛 − (𝑛 − 𝑝 + 1))-form. That is 𝛿 takes a 𝑝-form
to a 𝑝 − 1-form. We stated Maxwell’s equations as

𝑑𝐹 = 0 𝑑∗ 𝐹 = ∗ 𝒥
266 CHAPTER 9. DIFFERENTIAL FORMS

Now we can take the Hodge dual of the inhomogeneous equation to obtain,
∗ ∗
𝑑 𝐹 = 𝛿𝐹 = ∗∗ 𝒥 = ±𝒥

where I leave the sign for you to figure out. Then the other equation

∂𝜇 𝐺𝜇𝜈 = 0

can be understood as the component form of 𝛿 ∗ 𝐹 = 0 but this is really 𝑑𝐹 = 0 in disguise,

0 = 𝛿 ∗ 𝐹 = ∗ 𝑑∗∗ 𝐹 = ±∗ 𝑑𝐹 ⇐⇒ 𝑑𝐹 = 0

so even though it looks like Griffith’s is using the dual field tensor for the homogeneous Maxwell’s
equations and the field tensor for the inhomogeneous Maxwell’s equations it is in fact not the case.
The key point is that there are coderivatives implicit within Griffith’s equations, so you have to
read between the lines a little to see how it matched up with what we’ve done here. I have not en-
tirely proved it here, to be complete we should look at the component form of 𝛿𝐹 = 𝒥 and explicitly
show that this gives us ∂𝜇 𝐹 𝜇𝜈 = 𝐽 𝜈 , I don’t think it is terribly difficult but I’ll leave it to the reader.

Comparing with Griffith’s is fairly straightforward because he uses the same metric as we have.
Other texts use the mostly negative metric, its just a convention. If you try to compare to such
a book you’ll find that our equations are almost the same up to a sign. One good careful book
is Reinhold A. Bertlmann’s Anomalies in Quantum Field Theory you will find much of what we
have done here done there with respect to the other metric. Another good book which shares our
conventions is Sean M. Carroll’s An Introduction to General Relativity: Spacetime and Geometry,
that text has a no-nonsense introduction to tensors forms and much more over a curved space (
in contrast to our approach which has been over a vector space which is flat ). By now there are
probably thousands of texts on tensors; these are a few we have found useful here.

9.8.4 Maxwell’s equations are relativistically covariant


Let us begin with the definition of the field tensor once more. We define the components of the
field tensor in terms of the 4-potentials as we take the view-point those are the basic objects (not
the fields). If
𝐹𝜇𝜈 ≡ ∂𝜇 𝐴𝜈 − ∂𝜈 𝐴𝜇 ,
then the field tensor 𝐹 = 𝐹𝜇𝜈 𝑑𝑥𝜇 ⊗ 𝑑𝑥𝜈 is a tensor, or is it ? We should check that the components
transform as they ought according to the discussion in section ??. Let 𝑥¯𝜇 = Λ𝜇𝜈 𝑥𝜈 then we observe,
𝛼
(1.) 𝐴¯𝜇 = (Λ−1 )𝜇 𝐴𝛼
𝛽 ∂
−1 𝛽 ∂
(9.26)
(2.) ∂∂𝑥¯𝜈 = ∂𝑥
¯𝜈 ∂𝑥𝛽 = (Λ )𝜈 ∂𝑥𝛽
∂𝑥
where (2.) is simply the chain rule of multivariate calculus and (1.) is not at all obvious. We will
assume that (1.) holds, that is we assume that the 4-potential transforms in the appropriate way
for a one-form. In principle one could prove that from more base assumptions. After all electro-
magnetism is the study of the interaction of charged objects, we should hope that the potentials
9.8. E & M IN DIFFERENTIAL FORM 267

are derivable from the source charge distribution. Indeed, there exist formulas to calculate the
potentials for moving distributions of charge. We could take those as definitions for the potentials,
then it would be possible to actually calculate if (1.) is true. We’d just change coordinates via a
Lorentz transformation and verify (1.). For the sake of brevity we will just assume that (1.) holds.
We should mention that alternatively one can show the electric and magnetic fields transform as to
make 𝐹𝜇𝜈 a tensor. Those derivations assume that charge is an invariant quantity and just apply
Lorentz transformations to special physical situations to deduce the field transformation rules. See
Griffith’s chapter on special relativity or look in Resnick for example.

Let us find how the field tensor transforms assuming that (1.) and (2.) hold, again we consider
¯𝜇 = Λ𝜇𝜈 𝑥𝜈 ,
𝑥
𝐹¯𝜇𝜈 = ∂¯𝜇 𝐴¯𝜈 − ∂¯𝜈 𝐴¯𝜇
𝛼 𝛽 𝛽 𝛼
= (Λ−1 )𝜇 ∂𝛼 ((Λ−1 )𝜈 𝐴𝛽 ) − (Λ−1 )𝜈 ∂𝛽 ((Λ−1 )𝜇 𝐴𝛼 )
𝛼 𝛽 (9.27)
= (Λ−1 )𝜇 (Λ−1 )𝜈 (∂𝛼 𝐴𝛽 − ∂𝛽 𝐴𝛼 )
𝛼 𝛽
= (Λ−1 )𝜇 (Λ−1 )𝜈 𝐹𝛼𝛽 .
therefore the field tensor really is a tensor over Minkowski space.

Proposition 9.8.14.

The dual to the field tensor is a tensor over Minkowski space. For a given Lorentz trans-
¯𝜇 = Λ𝜇𝜈 𝑥𝜈 it follows that
formation 𝑥
∗ 𝛼 𝛽
𝐹¯𝜇𝜈 = (Λ−1 )𝜇 (Λ−1 )𝜈 ∗ 𝐹𝛼𝛽

Proof: homework (just kidding in 2010), it follows quickly from the definition and the fact we
already know that the field tensor is a tensor.

Proposition 9.8.15.

¯𝜇 = Λ𝜇𝜈 𝑥𝜈 we
The four-current is a four-vector. That is under the Lorentz transformation 𝑥
can show,
𝛼
𝒥¯𝜇 = (Λ−1 )𝜇 𝒥𝛼

Proof: follows from arguments involving the invariance of charge, time dilation and length con-
traction. See Griffith’s for details, sorry we have no time.

Corollary 9.8.16.

The dual to the four current transforms as a 3-form. That is under the Lorentz transfor-
¯𝜇 = Λ𝜇𝜈 𝑥𝜈 we can show,
mation 𝑥
𝛼 𝛽 𝛾
∗¯
𝒥 𝜇𝜈𝜎 = (Λ−1 )𝜇 (Λ−1 )𝜈 (Λ−1 )𝜎 𝒥𝛼𝛽𝛾
268 CHAPTER 9. DIFFERENTIAL FORMS

Up to now the content of this section is simply an admission that we have been a little careless in
defining things upto this point. The main point is that if we say that something is a tensor then we
need to make sure that is in fact the case. With the knowledge that our tensors are indeed tensors
the proof of the covariance of Maxwell’s equations is trivial.

𝑑𝐹 = 0 𝑑∗ 𝐹 = ∗ 𝒥

are coordinate invariant expressions which we have already proved give Maxwell’s equations in one
frame of reference, thus they must give Maxwell’s equations in all frames of reference.
The essential point is simply that

1 1
𝐹 = 𝐹𝜇𝜈 𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈 = 𝐹¯𝜇𝜈 𝑑¯
𝑥𝜇 ∧ 𝑑¯
𝑥𝜈
2 2

Again, we have no hope for the equation above to be true unless we know that
𝛼 𝛽
𝐹¯𝜇𝜈 = (Λ−1 )𝜇 (Λ−1 )𝜈 𝐹𝛼𝛽 . That transformation follows from the fact that the four-potential is a
four-vector. It should be mentioned that others prefer to ”prove” the field tensor is a tensor by
studying how the electric and magnetic fields transform under a Lorentz transformation. We in
contrast have derived the field transforms based ultimately on the seemingly innocuous assumption
𝛼
that the four-potential transforms according to 𝐴¯𝜇 = (Λ−1 )𝜇 𝐴𝛼 . OK enough about that.

So the fact that Maxwell’s equations have the same form in all relativistically inertial frames
of reference simply stems from the fact that we found Maxwell’s equation were given by an arbitrary
frame, and the field tensor looks the same in the new barred frame so we can again go through all
the same arguments with barred coordinates. Thus we find that Maxwell’s equations are the same
in all relativistic frames of reference, that is if they hold in one inertial frame then they will hold
in any other frame which is related by a Lorentz transformation.

9.8.5 Electrostatics in Five dimensions

We will endeavor to determine the electric field of a point charge in 5 dimensions where we are
thinking of adding an extra spatial dimension. Lets call the fourth spatial dimension the 𝑤-direction
so that a typical point in space time will be (𝑡, 𝑥, 𝑦, 𝑧, 𝑤). First we note that the electromagnetic
field tensor can still be derived from a one-form potential,

𝐴 = −𝜌𝑑𝑡 + 𝐴1 𝑑𝑥 + 𝐴2 𝑑𝑦 + 𝐴3 𝑑𝑧 + 𝐴4 𝑑𝑤

we will find it convenient to make our convention for this section that 𝜇, 𝜈, ... = 0, 1, 2, 3, 4 whereas
𝑚, 𝑛, ... = 1, 2, 3, 4 so we can rewrite the potential one-form as,

𝐴 = −𝜌𝑑𝑡 + 𝐴𝑚 𝑑𝑥𝑚
9.8. E & M IN DIFFERENTIAL FORM 269

This is derived from the vector potential 𝐴𝜇 = (𝜌, 𝐴𝑚 ) under the assumption we use the natural
generalization of the Minkowski metric, namely the 5 by 5 matrix,
⎛ ⎞
−1 0 0 0 0
⎜0 1 0 0 0⎟
𝜇𝜈
⎜ ⎟
⎜0 0
(𝜂𝜇𝜈 ) = ⎜ 1 0 0⎟ ⎟ = (𝜂 ) (9.28)
⎝0 0 0 1 0 ⎠
0 0 0 0 1

we could study the linear isometries of this metric, they would form the group 𝑂(1, 4). Now we
form the field tensor by taking the exterior derivative of the one-form potential,
1
𝐹 = 𝑑𝐴 = (∂𝜇 ∂𝜈 − ∂𝜈 ∂𝜇 )𝑑𝑥𝜇 ∧ 𝑑𝑥𝜈
2
now we would like to find the electric and magnetic ”fields” in 4 dimensions. Perhaps we should
say 4+1 dimensions, just understand that I take there to be 4 spatial directions throughout this
discussion if in doubt. Note that we are faced with a dilemma of interpretation. There are 10
independent components of a 5 by 5 antisymmetric tensor, naively we wold expect that the electric
and magnetic fields each would have 4 components, but that is not possible, we’d be missing
two components. The solution is this, the time components of the field tensor are understood to
correspond to the electric part of the fields whereas the remaining 6 components are said to be
magnetic. This aligns with what we found in 3 dimensions, its just in 3 dimensions we had the
fortunate quirk that the number of linearly independent one and two forms were equal at any point.
This definition means that the magnetic field will in general not be a vector field but rather a ”flux”
encoded by a 2-form. ⎛ ⎞
0 −𝐸𝑥 −𝐸𝑦 −𝐸𝑧 −𝐸𝑤
⎜ 𝐸𝑥
⎜ 0 𝐵𝑧 −𝐵𝑦 𝐻1 ⎟ ⎟
(𝐹𝜇𝜈 ) = ⎜ 𝐸𝑦 −𝐵𝑧
⎜ 0 𝐵𝑥 𝐻2 ⎟⎟ (9.29)
⎝ 𝐸𝑧 𝐵𝑦 −𝐵𝑥 0 𝐻3 ⎠
𝐸𝑤 −𝐻1 −𝐻2 −𝐻3 0
Now we can write this compactly via the following equation,

𝐹 = 𝐸 ∧ 𝑑𝑡 + 𝐵

I admit there are subtle points about how exactly we should interpret the magnetic field, however
I’m going to leave that to your imagination and instead focus on the electric sector. What is the
generalized Maxwell’s equation that 𝐸 must satisfy?

𝑑∗ 𝐹 = 𝜇𝑜 ∗ 𝒥 =⇒ 𝑑∗ (𝐸 ∧ 𝑑𝑡 + 𝐵) = 𝜇𝑜 ∗ 𝒥

where 𝒥 = −𝜌𝑑𝑡 + 𝐽𝑚 𝑑𝑥𝑚 so the 5 dimensional Hodge dual will give us a 5 − 1 = 4 form, in
particular we will be interested in just the term stemming from the dual of 𝑑𝑡,

(−𝜌𝑑𝑡) = 𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑤
270 CHAPTER 9. DIFFERENTIAL FORMS

the corresponding term in 𝑑∗ 𝐹 is 𝑑∗ (𝐸 ∧ 𝑑𝑡) thus, using 𝜇𝑜 = 1


𝜖𝑜 ,

1
𝑑∗ (𝐸 ∧ 𝑑𝑡) = 𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑤 (9.30)
𝜖𝑜
is the 4-dimensional Gauss’s equation. Now consider the case we have an isolated point charge
which has somehow always existed at the origin. Moreover consider a 3-sphere that surrounds the
charge. We wish to determine the generalized Coulomb field due to the point charge. First we note
that the solid 3-sphere is a 4-dimensional object, it the set of all (𝑥, 𝑦, 𝑧, 𝑤) ∈ ℝ4 such that

𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 ≤ 𝑟2

We may parametrize a three-sphere of radius 𝑟 via generalized spherical coordinates,

𝑥 = 𝑟𝑠𝑖𝑛(𝜃)𝑐𝑜𝑠(𝜙)𝑠𝑖𝑛(𝜓)
𝑦 = 𝑟𝑠𝑖𝑛(𝜃)𝑠𝑖𝑛(𝜙)𝑠𝑖𝑛(𝜓)
(9.31)
𝑧 = 𝑟𝑐𝑜𝑠(𝜃)𝑠𝑖𝑛(𝜓)
𝑤 = 𝑟𝑐𝑜𝑠(𝜓)

Now it can be shown that the volume and surface area of the radius 𝑟 three-sphere are as follows,

𝜋2 4
𝑣𝑜𝑙(𝑆 3 ) = 𝑟 𝑎𝑟𝑒𝑎(𝑆 3 ) = 2𝜋 2 𝑟3
2
We may write the charge density of a smeared out point charge 𝑞 as,
{
2𝑞/𝜋 2 𝑎4 , 0 ≤ 𝑟 ≤ 𝑎
𝜌= . (9.32)
0, 𝑟>𝑎

Notice that if we integrate 𝜌 over any four-dimensional region which contains the solid three sphere
of radius 𝑎 will give the enclosed charge to be 𝑞. Then integrate over the Gaussian 3-sphere 𝑆 3
with radius 𝑟 call it 𝑀 ,
∫ ∫
∗ 1
𝑑 (𝐸 ∧ 𝑑𝑡) = 𝜌𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 ∧ 𝑑𝑤
𝑀 𝜖𝑜 𝑀
now use the Generalized Stokes Theorem to deduce,

∗ 𝑞
(𝐸 ∧ 𝑑𝑡) =
∂𝑀 𝜖𝑜
but by the ”spherical” symmetry of the problem we find that 𝐸 must be independent of the direction
it points, this means that it can only have a radial component. Thus we may calculate the integral
with respect to generalized spherical coordinates and we will find that it is the product of 𝐸𝑟 ≡ 𝐸
and the surface volume of the four dimensional solid three sphere. That is,

∗ 𝑞
(𝐸 ∧ 𝑑𝑡) = 2𝜋 2 𝑟3 𝐸 =
∂𝑀 𝜖𝑜
9.8. E & M IN DIFFERENTIAL FORM 271

Thus,
𝑞
𝐸=
2𝜋 2 𝜖𝑜 𝑟3
the Coulomb field is weaker if it were to propogate in 4 spatial dimensions. Qualitatively what has
happened is that the have taken the same net flux and spread it out over an additional dimension,
this means it thins out quicker. A very similar idea is used in some brane world scenarios. String
theorists posit that the gravitational field spreads out in more than four dimensions while in con-
trast the standard model fields of electromagnetism, and the strong and weak forces are confined
to a four-dimensional brane. That sort of model attempts an explaination as to why gravity is so
weak in comparison to the other forces. Also it gives large scale corrections to gravity that some
hope will match observations which at present don’t seem to fit the standard gravitational models.

This example is but a taste of the theoretical discussion that differential forms allow. As a
final comment I remind the reader that we have done things for flat space for the most part in
this course, when considering a curved space there are a few extra considerations that must enter.
Coordinate vector fields 𝑒𝑖 must be thought of as derivations ∂/∂𝑥𝜇 for one. Also the metric is not
a constant tensor like 𝛿𝑖𝑗 or 𝜂𝜇𝜈 rather is depends on position, this means Hodge duality aquires
a coordinate dependence as well. Doubtless I have forgotten something else in this brief warning.
One more advanced treatment of many of our discussions is Dr. Fulp’s Fiber Bundles 2001 notes
which I have posted on my webpage. He uses the other metric but it is rather elegantly argued, all
his arguments are coordinate independent. He also deals with the issue of the magnetic induction
and the dielectric, issues which we have entirely ignored since we always have worked in free space.

References and Acknowledgements:

I have drawn from many sources to assemble the content of the last couple chapters, the refer-
ences are listed approximately in the order of their use to the course, additionally we are indebted
to Dr. Fulp for his course notes from many courses (ma 430, ma 518, ma 555, ma 756, ...). Also
Manuela Kulaxizi helped me towards the correct (I hope) interpretation of 5-dimensional E&M in
the last example.

Vector Calculus, Susan Jane Colley

Introduction to Special Relativity, Robert Resnick

Differential Forms and Connections, R.W.R. Darling

Differential geometry, gauge theories, and gravity, M. Göckerler & T. Schücker

Anomalies in Quantum Field Theory, Reinhold A. Bertlmann


272 CHAPTER 9. DIFFERENTIAL FORMS

”The Differential Geometry and Physical Basis for the Applications of Feynman Diagrams”, S.L.
Marateck, Notices of the AMS, Vol. 53, Number 7, pp. 744-752

Abstract Linear Algebra, Morton L. Curtis

Gravitation, Misner, Thorne and Wheeler

Introduction to Special Relativity, Wolfgang Rindler

Differential Forms A Complement to Vector Calculus, Steven H. Weintraub

Differential Forms with Applications to the Physical Sciences, Harley Flanders

Introduction to Electrodynamics, (3rd ed.) David J. Griffiths

The Geometry of Physics: An Introduction, Theodore Frankel

An Introduction to General Relativity: Spacetime and Geometry, Sean M. Carroll

Gauge Theory and Variational Principles, David Bleeker

Group Theory in Physics, Wu-Ki Tung


Chapter 10

supermath

273

You might also like