AdvancedCalculus2017_2
AdvancedCalculus2017_2
AdvancedCalculus2017_2
James S. Cook
Liberty University
Department of Mathematics
Fall 2017
2
This course is primarily concerned with abstractions of calculus and geometry which are accessible
to the undergraduate. This is not a course in real analysis and it does not have a prerequisite of
real analysis so my typical students are not prepared for topology or measure theory. We defer
the topology of manifolds and exposition of abstract integration in the measure theoretic sense to
another course. Our focus for course as a whole is on what you might call the linear algebra of
abstract calculus. Indeed, McInerney essentially declares the same philosophy so his text is the
natural extension of what I share in these notes. So, what do we study here? In these notes:
In particular, we study: basic linear algebra, spanning, linear independennce, basis, coordinates,
norms, distance functions, inner products, metric topology, limits and their laws in a NLS, conti-
nuity of mappings on NLS’s, linearization, Frechet derivatives, partial derivatives and continuous
differentiability, linearization, properties of differentials, generalized chain and product rules, in-
tuition and statement of inverse and implicit function theorems, implicit differentiation via the
method of differentials, manifolds in Rn from an implicit or parametric viewpoint, tangent and nor-
mal spaces of a submanifold of Euclidean space, Lagrange multiplier technique, compact sets and
the extreme value theorem, theory of quadratic forms, proof of real Spectral Theorem via method
of Lagrange multipliers, higher derivatives and the multivariate Taylor theorem, multivariate power
series, critical point analysis for multivariate functions, introduction to variational calculus.
In contrast to some previous versions of this course, I do not study contraction mappings, dif-
ferentiating under the integral and other questions related to uniform convergence. I leave such
topics to a future course which likely takes Math 431 (our real analysis course) as a prerequisite.
Furthermore, these notes have little to say about further calculus (differential forms, vector fields,
etc.). We read on those topics in McInerney once we exhaust these notes.
There are many excellent texts on calculus of many variables. Three which have had significant
influence on my thinking and the creation of these notes are:
These notes are a work in progress, do let me know about the errors. Thanks!
2 differentiation 25
2.1 the Frechet differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 properties of the Frechet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 partial derivatives of differentiable maps . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 partial differentiation in a finite dimensional real vector space . . . . . . . . . 34
2.3.2 partial differentiation for real . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 examples of Jacobian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.4 on chain rule and Jacobian matrix multiplication . . . . . . . . . . . . . . . . 43
2.4 continuous differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 the product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 higher derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.7 differentiation in an algebra variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3
4 CONTENTS
A normed linear space is a vector space which also has a concept of vector length. We use this
length function to set-up limits for maps on normed linear spaces. The idea of the limit is the same
as it was in first semester calculus; we say the map approaches a value when we can make values of
the map arbitrary close to the value by taking inputs sufficiently close to the limit point. A map
is continuous at a limit point in its domain if and only if its limiting value matches its actual value
at the limit point. We derive the usual limit laws and work out results which are based on the
component expansion with respect to a basis. We try to provide a fairly complete account of why
common maps are continuous. For example, we argue why the determinant map is a continuous
map from square matrices to real numbers.
We also introduce elementary concepts of topology. Open and closed sets are defined in terms of
the metric topology induced from a given norm. We also discuss inner products and the more
general concept of a distance function or metric. We explain why the set of invertible matrices is
topologically open.
This Chapter concludes with a brief introduction into sequential methods. We define completeness
of a normed linear space and hence introduce the concept of a Banach Space. Finally, the matrix
exponential is shown to exist by an analytical appeal to the completeness of matrices.
Certain topics are not covered in depth in this work, I survey them here to attempt to provide con-
text for the larger world of math I hope my students soon discover. In particular, while I introduce
inner products, metric spaces and the rudiments of functional analysis, there is certainly far more
to learn and indicate some future reading as we go. For future chapters we need to understand
both linear algebra and limits carefully so my focus here is on normed linear spaces and limits.
These suffice for us to begin our study of Frechet differentiation in the next chapter.
History is important and I must admit failure on this point. I do not know the history of these
topics as deeply as I’d like. Similar comments apply to the next Chapter. I believe most of the
linear algebra and analysis was discovered between about 1870 and 1910 by the likes of Frobenius,
Frechet, Banach and other great analysts of that time, but, I have doubtless left out important
work and names.
5
6 CHAPTER 1. ON NORMS AND LIMITS
(3.) Cn = {(z1 , . . . , zn ) | z1 , . . . , zn ∈ C}. Once more define addition and scalar multiplication
component-wise; for z, w ∈ Cn and c ∈ C define (z + w)i = zi + wi and (cz)i = czi . Since
R ⊆ C the complex scalar multiplication in Cn also provides a real scalar multiplication. We
can either view Cn as a real or complex vector space.
(4.) C m×n is the set of m × n complex matrices. If Z, W ∈ C m×n and c ∈ C then (Z + W )ij =
Zij + Wij and (cZ)ij = cZij for 1 ≤ i ≤ m and 1 ≤ j ≤ n. Just as in the previous example,
we can view C m×n either as a complex vector space or as a real vector space.
(5.) If V and W are real vector spaces then HomR (V, W ) is the set of all linear transformations from
V to W . This forms a vector space with respect to the usual pointwise addition of functions.
If V = W then we denote HomR (V, V ) = EndR (V ) for endomorphisms of V . The set of
endomorphisms forms an algebra with respect to composition of functions since the composite
of linear maps is once more linear. The set of invertible endomorphisms of V forms GL(V ).
In the particular case that V = Rn we denote GL(Rn ) = GL(n, R). Notice GL(V ) is not a
subspace since IdV ∈ GL(V ) where IdV (x) = x for all x ∈ V and IdV − IdV = 0 ∈ / GL(V ).
Definition 1.1.1.
If V is real vector space and S ⊆ V then define the span of S by
In words, span(S) is the set of all finite R-linear combinations of vectors from S. Since the scalar
multiple and linear combination of linear combinations is once more a linear combination we find
that span(S) ≤ V . That is, span(S) forms a subspace of V . The set S is called a spanning set
or generating set for span(S).
Definition 1.1.2.
Let V be a real vector space and S ⊆ V . If c1 , . . . , ck ∈ R and s1 , . . . , sk ∈ S with
c1 s1 +· · ·+ck sk = 0 imply c1 = 0, . . . , ck = 0 for each k ∈ N then S is linearly independent
(LI). Otherwise, we say S is linearly dependent.
When generating sets are linearly independent they are minimal, if you remove any vector from
a minimal spanning set then the resulting span is smaller. In contrast, if S is linearly dependent
then there exists S 0 ⊂ S for which span(S 0 ) = span(S). Our convention is that span(∅) = {0}.
Definition 1.1.3.
Let V be a real vector space. If β is a linearly independent spanning set for V then we say
β is a basis for V . Furthermore, using # to denote cardnality, #(β) is the dimension of
V . If #(β) = n ∈ N then we say V is an n-dimensional vector space and write dim(V ) = n.
Bases are very helpful for calculations. In particular, if β = {v1 , . . . , vn } then
x1 v1 + · · · + xn vn = y1 v1 + · · · + yn vn ⇒ xi = yi for i = 1, . . . , n. (1.3)
We call this calculation equating coefficients with respect to the basis β.
Definition 1.1.4.
Let V be a real finite dimensional vector space with basis β = {v1 , . . . , vn } then for each
x ∈ V there exist ci ∈ R for which x = c1 v1 + · · · + cn vn . We write [x]β = (c1 , . . . , cn )
and say [x]β is the coordinate vector of x with respect to the β basis. We also denote
Φβ (x) = [x]β and say Φβ : V → Rn is the coordinate map with respect to the basis β.
If β = {v1 , . . . , vn } is a basis for the real vector space V and ψ ∈ GL(V ) then ψ(β) = {ψ(v1 ), . . . , ψ(vn )}
forms a basis for V . Clearly the choice of basis is far from unique. That said, it is useful for us to
settle on a standard basis for our usual real examples:
(1.) Let (ei )j = δij hence e1 = (1, 0, . . .P , 0), e2 = (0, 1, 0, . . . , 0) and en = (0, . . . , 0, 1). If β =
{e1 , . . . , en } then x = (x1 , . . . , xn ) = ni=1 xi ei and [x]β = x. We say β is the standard basis
of column vectors and note #(β) = n = dim(Rn ).
(2.) Let (Eij )kl = δik δjl for 1 ≤ i, k ≤ m nad 1 ≤ j, l ≤ n define Eij ∈ R m×n . The matrix Eij has
a 1 in the ij-th entry and zeros elsewhere. For any A ∈ R m×n we have
m X
X n
A= Aij Eij (1.4)
i=1 j=1
Viewing Cn and C m×n as real vector spaces there are at least two natural choices for the basis,
(3.) For Cn notice β = {e1 , ie1 , . . . , en , ien } and γ = {e1 , . . . , en , ie1 , . . . , ien } serve as natural bases.
If z = x + iy where x, y ∈ Rn then we define Re(z) = x and Im(z) = y. Hence,1
Φγ (z) = (x, y), & Φβ (z) = (x1 , y1 , x2 , y2 , . . . , xn , yn ). (1.7)
Note dimR (Cn ) = 2n.
(4.) For C m×n notice β = {E11 , iE11 , . . . , Emn , iEmm } and γ = {E11 , . . . , Emn , iE11 , . . . , iEmn }
serve as natural bases. If Z = X + iY where X, Y ∈ R m×n then we define Re(Z) = X and
Im(Z) = Y . With this notation,
[Z]γ = (X11 , . . . , Xmn , Y11 , . . . , Ymn ) & [Z]β = (X11 , Y11 , . . . , Xmn , Ymn ). (1.8)
For example,
1+i 2
A= ⇒ [A]β = (1, 1, 2, 0, 3, 4, 0, 5) & [A]γ = (1, 2, 3, 0, 1, 0, 4, 5).
3 + 4i 5i
Finally, note dimR (C m×n ) = 2mn
Naturally, dimC (Cn ) = n and dimC (C m×n ) = mn, but, our primary interest is in the calculus of
real vector spaces so we just need such formulas as a conceptual backdrop.
then we say (V, || · ||) is a normed vector space. When there is no danger of ambiguity we
also say that V is a normed vector space or a normed linear space (NLS).
Notice that we did not assume V was finite-dimensional in the definition above. Our current focus
is on finite-dimensional cases.
1
technically, this is an abuse of notation, I’m ignoring the distinction between a vector of vectors and a vector
1.2. NORMS, METRICS AND INNER PRODUCTS 9
√
(1.) the standard euclidean norm on Rn is defined by ||v|| = v • v.
(2.) the taxicab norm on Rn is defined by ||v||1 = nj=1 |vj |.
P
We could play similar games for our other favorite examples, but primarily we just use the analog of
√
the p = 1, 2 or ∞ norms in our application of norms. Let us agree by convention that kxk = x • x
for x ∈ Rn , since the coordinate map yields real column vectors the r.h.s. makes use of this
convention in each of the following examples:
(2.) the standard norm for Rm×n is given by ||A|| = ||Φβ (A)|| where Φβ is the standard
coordinate map for Rm×n as defined in Equation 1.6.
(3.) the standard norm for Cn is given by ||z|| = ||Φβ (z)|| where Φβ is the standard
coordinate map described in Equation 1.7.
(4.) the standard norm for Cm×n is given by ||Z|| = ||Φβ (Z)|| where Φβ is the standard
coordinate map for Cm×n as defined in Equation 1.8.
In each case above there is some slick formula which hides the simple truth I described above;
the length of matrices and complex vectors is simply the Euclidean length of the corresponding
coordinate vectors.
where the complex vector v = (v1 , . . . , vn ) has conjugate vector v̄ = (v̄1 , . . . , v̄n ) and the complex
matrix Z has conjugates Z̄ defined by (Z̄)ij = Z̄ij and Z † = Z̄ T is the Hermitian conjugate.
Again, to be clear, there is not just one choice of norm for Cn , Rm×n or Cm×n . The set paired with
the norm is what gives us the structure of a normed space. We conclude this Section with norms
which are a bit less obvious.
Example 1.2.2. Let C([a, b], R) denote the set of continuous real-valued functions with domain
[a, b]. If f ∈ C([a, b], R) then we define kf k = max{|f (x)| | x ∈ [a, b]}. It is not too difficult to
check this defines a norm on the infinite dimensional vector space C([a, b], R).
10 CHAPTER 1. ON NORMS AND LIMITS
Example 1.2.3. Suppose V, W are normed linear spaces and T : V → W is a linear transformation.
Then we may define the norm of kT k as follows:
When V is infinite dimensional there is no reason that kT k must be finite. In fact, the linear
transformations with finite norm are special. I leave the completion of this thought to your functional
analysis course. On the other hand, for finite dimensional V we can argue kT k is finite.
Incidentally, given T : V → W with kT k < ∞ you can show kT (x)k ≤ kT kkxk for all x ∈ V . To
see this claim, consider x 6= 0 has kxk =
6 0 hence:
kxk
kT (x)k = T x (1.9)
kxk
x
= kxkT
kxk
x
= kxk T
kxk
≤ kxkkT k
I include the next example to give you a sense of what sort of calculation takes the place of
coordinates in infinite dimensions. I’m mostly including these examples so we can appreciate the
technical meaning of continuously differentiable in our later work.
Rb
Example 1.2.4. Assume a < b. Define T (f ) = a f (x) dx for each f ∈ C([a, b], R). Observe T is
a linear transformation. Also,
Z b Z b
|T (f )| = f (x)dx ≤ |f (x)|dx.
a a
Use the max-norm of Example 1.2.2. If kf k = max{|f (x)| | x ∈ [a, b]} = 1 then |f (x)| ≤ 1 for
Rb
a ≤ x ≤ b. Thus |T (f )| ≤ a dx = b − a. However, the constant function f (x) = 1 has kf k = 1
Rb
and T (1) = a dx = b − a thus kT k = sup{kT (f )k | f ∈ C([a, b], R), kf k = 1} = b − a.
At this point I introduce some notation I found in Zorich. I think it’s a useful addition to my
standard notations. Pay attention to the semi-colon.
Example 1.2.6. If T ∈ L(V1 , . . . , Vk ; W ) where V1 , . . . , Vk , W are normed linear spaces then define2
and
|det(x1 , x2 , . . . , xn )| ≤ kdetkkx1 kkx2 k · · · kxn k. (1.13)
But, det(I) = 1 thus kdetk = 1.
I’ve probably done a bit more than we need here, I hope it is not too disturbing.
At this point we’re stuck. A nontrivial identity3 called the Cauchy-Schwarz identity helps us
proceed; hx, yi ≤ ||x||||y||. It follows that ||x + y||2 ≤ ||x||2 + 2||x||||y|| + ||y||2 = (||x|| + ||y||)2 .
However, the induced norm is clearly positive so we find ||x + y|| ≤ ||x|| + ||y||.
Most linear algebra texts have a whole chapter on inner-products and their applications, you can
look at my notes for a start if you’re curious. That said, this is a bit of a digression for this course.
Primarily we use the dot-product paired with Rn in certain applications. I should mention, Rn with
the usual dot-product forms Euclidean n-space. We’ll say more just before we use the theory of
orthogonal complements to understand how to find extreme values on curves or surfaces.
3
I prove this for the dot-product in my linear notes, however, the proof is written in such a way it equally well
applies to a general inner-product
12 CHAPTER 1. ON NORMS AND LIMITS
Definition 1.2.8.
A function d : S ×S → R is a metric or distance function on S if d satisfies the following:
for all x, y, z ∈ S,
4
at Liberty University we still cover elementary − δ proofs in the beginning calculus course
1.3. TOPOLOGY AND LIMITS IN NORMED LINEAR SPACES 13
BR (xo ) = {x ∈ V | kx − xo k ≤ R}.
If xo ∈ U and there exists R > 0 for which BR (xo ) ⊆ U then we say xo is an interior point
of U . When each point in U ⊆ V is an interior point we say U is an open set. If S ⊂ V
has V − S open then we say S is a closed set.
In the case V = Rn with n = 1, 2, 3 we have other terms.
Intuitively, an open set either has no edges, or, has only fuzzy edges whereas a closed set either has
no edges, or, has solid edges. The larger problem of studying which sets are open and how that
relates to the continuity of functions is known as topology. Briefly, a topology is a set paired
with the set of all sets declared to be open. The topology we study here is metric topology as it
is derived from a distance function. Moving on,
Definition 1.3.2. limit points, isolated points and boundary points in an NLS.
Let (V, k k) be a NLS. We define a deleted open ball centered at xo with radius R
by:
BR (xo ) − {xo } = {x ∈ V | 0 < kx − xo k < R}.
We say xo is a limit point of a function f if and only if there exists a deleted open ball
which is contained in the dom(f ). If yo ∈ dom(f ) and there exists an open ball centered at
yo which contains no other points in dom(f ) then yo is called an isolated point of dom(f ).
A boundary point of S ⊆ V is a point in xo ∈ V for which every open ball centered at xo
contains points outside S.
Notice a limit point of f need not be in the domain of f . Also, a boundary point of S need not be
in S. Furthermore, if we consider g : N → V then each point in dom(g) = N is isolated.
If f : dom(f ) ⊆ V → W is a function from normed space (V, || · ||V ) to normed vector space
(W, || · ||W ) and xo is either a limit point or an isolated point of dom(f ) and L ∈ W then
we say limx→xo f (x) = L if and only if for each > 0 there exists δ > 0 such that if x ∈ V
with 0 < ||x − xo ||V < δ then ||f (x) − f (xo )||W < . If limx→xo f (x) = f (xo ) then we say
that f is a continuous function at xo .
The definition above indicates functions are by default continuous at isolated points, my apologies
if you find this bothersome. Let me give a few examples then we’ll turn our attention to proving
limit laws for an NLS.
14 CHAPTER 1. ON NORMS AND LIMITS
Example 1.3.4. Suppose V is an NLS and let c ∈ R with c 6= 0. Also fix bo ∈ V . Let F (x) = cx+bo
for each x ∈ V . We wish to calculate limx→a F (x). Naturally, we expect the limit is simply ca + bo
hence we work towards proving our intutiion is correct. If > 0 then choose δ = /|c| and note
0 < kx − ak < δ = /|c| provides 0 < |c|kx − ak < . With this estimate in mind we calculate:
Example 1.3.5. Let V and W be normed linear spaces. Fix wo ∈ W and define F (x) = wo for
each x ∈ V . I leave it to the reader to prove limx→a (F (x)) = wo for any a ∈ V . In other words, a
constant function is everywhere continuous in the context of a NLS.
1
Example 1.3.6. Let F : Rn − {a} → Rn be defined by F (x) = ||x−a|| (x − a). In this case, certainly
a is a limit point of F but geometrically it is clear that limx→a F (x) does not exist. Notice for n = 1,
the discontinuity of F at a can be understood by seeing that left and right limits exist, but are not
||x−a||
equal. On the other hand, G(x) = ||x−a|| (x − a) clearly has limx→a G(x) = 0 and we could classify
the discontinuity of G at x = a as removeable. Clearly G̃(x) = x − a is a continuous extension of
G to all of Rn
Proof: Suppose a ∈ V and let > 0. Choose δ = and consider x ∈ V such that 0 < ||x − a|| < δ.
Observe ||x|| = ||x − a + a|| ≤ ||x − a|| + ||a|| = δ + ||a|| and hence
It is generally quite challenging to prove limits directly from the definition. Fortunately, there are
many useful properties which typically allow us to avoid direct attack.6 One fun point to make
here, if you missed the proof of the so-called limit laws in calculus then you can retroactively apply
the arguments we soon offer here.
Proof: Let > 0 and suppose limx→a f (x) = b1 ∈ W and limx→a g(x) = b2 ∈ W . Choose δ1 , δ2 > 0
such that 0 < ||x−a|| < δ1 implies ||f (x)−b1 || < /2 and 0 < ||x−a|| < δ2 implies ||g(x)−b2 || < /2.
Choose δ = min(δ1 , δ2 ) and suppose 0 < ||x − a|| < δ ≤ δ1 , δ2 hence
||(f + g)(x) − (b1 + b2 )|| = ||f (x) − b1 + g(x) − b2 || ≤ ||f (x) − b1 || + ||g(x) − b2 || < /2 + /2 = .
Item (2.) follows. To prove (2.) note that if c = 0 the result is clearly true so suppose c 6= 0.
Suppose > 0 and choose δ > 0 such that ||f (x) − b1 || < /|c|. Note that if 0 < ||x − a|| < δ then
||(cf )(x) − cb1 || = ||c(f (x) − b1 )|| = |c|||f (x) − b1 || < |c|/|c| = .
The claims about continuity follow immediately from the limit properties.
Induction easily extends the result above to linear combinations of three or more functions;
n
X n
X
lim ci Fi (x) = ci lim Fi (x). (1.14)
x→a x→a
i=1 i=1
We now turn to analyzing limits of a map in terms of the limits of its component functions. First
a Lemma which is a slight twist on what we already proved.
We soon need this Lemma to pull basis vectors out of a limit in the proof of Theorem 1.3.11.
Given the limits of each component function we may assemble the limit of the function. Notice,
this is a comment about breaking up the limit in the range of the map. In contrast, there is no
easy way to break a multivariate limit into one-dimensional limits in the domain, hopefully you
saw examples in multivariable calculus which illustrate this subtle point. Only in one dimension
to we have the luxury of reducing a full limit to a pair of path limits. See this question and
answer, beware wolfram alpha not so good here, Maple wins and this master list of advice on how
to calculate multivariate limits that arise in calculus III. There are many examples linked there if
you need to see evidence of my claim here.
16 CHAPTER 1. ON NORMS AND LIMITS
Theorem 1.3.11.
Suppose V, W are NLSs where W has basis γ = {w1 , . . . , wn } and F : dom(F ) ⊆ V → W
has component functions Fi : dom(F )P⊆ V → R for i = 1, . . . , m. If limx→a Fi (x) = Li ∈ R
for i = 1, . . . , m then limx→a F (x) = m
i=1 Li wi .
Therefore, the limit of a map may be assembled from the limits of its component functions.
It turns out the converse of this Theorem is also true, but, I need to prepare some preliminary
ideas to give the proof in the desired generality. Basically, the trouble is that at one point in my
proof I need the magnitude of a component to a vector x = x1 v1 + · · · + xn vn to be smaller than
the norm of the whole vector; |xi | ≤ kxk. Certainly this is true for orthonormal bases, but, notice
β = {(1, ε), (1, 0)} is a basis for R2 which is not orthonormal in the euclidean sense for any ε 6= 0
and:
x = (1, ε) − (1, 0) = (0, ε) (1.17)
hence kxk = |ε| and [x]β = (1, −1) so both components of x in the β basis have magnitude 1. But,
we can make |ε| as small as we like. So, clearly, I cannot just assume for any basis of a NLS we have
this property |xi | ≤ kx1 v1 + · · · + xn vn k. It is a special property for certain nice bases. In fact, it is
true for most examples we consider. You use it a great deal in study of complex analysis as it says
|Re(z)|, |Im(z)| ≤ |z|. But, we’re trying to study abstract NLSs, so we must face the difficulty.
thus
m m m m
!
X X X X
F i Pij wj = Fj wj ⇒ F i Pij = Fj (1.19)
j=1 i=1 j=1 i=1
1.3. TOPOLOGY AND LIMITS IN NORMED LINEAR SPACES 17
It always amuses me to see how the basis and components transform inversely. Continuing to use
the notation of the previous Theorem and Lemma,
Proposition 1.3.13.
Pm
If limx→a F (x) = Li for i = 1, . . . , m then limx→a F (x) = i,j=1 Pij Li .
Pm
Proof: use Lemma 1.3.12 to see Fi (x) = j=1 Pij F j (x). Then, by linearity of the limit,
m
X m
X
lim (Fi (x)) = Pij lim F j (x) = Pij Lj . (1.20)
x→a x→a
j=1 j=1
The coordinate change results above are most interesting when paired with an additional freedom
to analyze limits in finite dimensional vector spaces.
(1.) The metric topology for a finite dimensional normed linear space is independent of
our choice of norm7 . For example, in R2 , if we find a point is interior with respect
the euclidean norm then it’s easy to see the point is also interior w.r.t. the taxicab
or sup norm. I might assign a homework which helps you prove this claim.
(2.) Given normed linear spaces V, W and a function F : dom(F ) ⊆ V → W , we find
F is continuous if and only if the inverse image under F of each open set in W is
open in V .8
(3.) Since different choices of norm provide the same open sets it follows that the
calculation of a limit in a finite dimensional NLS is in fact independent of the
choice of norm.
Given any basis for finite dimensional real vector space we can construct an inner product by
essentially mimicking the dot-product.
Lemma 1.3.14. existence of inner product which makes given basis orthonormal.
If (V, k k) is a normed linear space with basis β = {v1 , . . . , vn } then hvi , vj i = δij ex-
tended bilinearly serves p
to define an inner product for V where β is an orthonormal basis.
Furthermore, if kxk2 = hx, xi and x = x1 v1 + · · · + xn vn then
q
kxk2 = x21 + x22 + · · · + x2n
7
see this question and the answers for some interesting discussion of this point
8
Notice, we insist that ∅ is open, my apologies if my earlier wording was insufficiently clear on this point.
18 CHAPTER 1. ON NORMS AND LIMITS
Theorem 1.3.15.
m
X
lim F (x) = B = B j wj ⇔ lim Fj (x) = Bj for all j = 1, 2, . . . m.
x→a x→a
j=1
|Fi (x) − Bi | = |Fi (x) − Bi |kwi k = k(Fi (x) − Bi )wi k ≤ kF (x) − Bk. (1.21)
Hence, for > 0 choose δ > 0 such that 0 < kx − ak < δ implies kF (x) − Bk < . Hence, by
Inequality 1.21 we find 0 < kx − ak < δ implies |Fi (x) − Bi | < for each i = 1, 2, . . . , m. Thus
limx→a Fj (x) = Bj for each j = 1, . . . , m and this remains true when a different norm is given to
W (here I use the result that the limit calculated in a finite dimensional NLS is independent of our
choice of norm since all norms produce the same topology).
The converse direction follows from Theorem 1.3.11, but I include argument below since it’s good
to see. Conversely, suppose limx→a Fj (x) = Bj as x → a for all j ∈ Nn . Let > 0 and choose
δj > 0 such that 0 < ||x − a|| < δj implies ||Fj (x) − Bj || < ||wj||m . We are free to choose such δj by
the given limits as clearly ||wj||m > 0 for each j. Choose δ = min(δj | j ∈ Nm } and suppose x ∈ V
such that 0 < ||x − a|| < δ. Using properties ||x + y|| ≤ ||x|| + ||y|| and ||cx|| = |c|||x|| multiple
times yield:
m m m m
X X X X
||F (x) − B|| = || (Fj (x) − Bj )wj || ≤ |Fj (x) − Bj |||wj || < ||wj || = = .
||wj ||m m
j=1 j=1 j=1 j=1
Our next goal is to explain why polynomials in coordinates of an NLS are continuous. Many
examples fall into this general category so it’s worth the effort. The first result we need is the
observation that we are free to pull limits out of continuous functions on an NLS:
Suppose V1 , V2 , V3 are normed vector spaces with norms || · ||1 , || · ||2 , || · ||3 respective. Let
f : dom(f ) ⊆ V2 → V3 and g : dom(g) ⊆ V1 → V2 be mappings. Suppose that
limx→xo g(x) = yo and suppose that f is continuous at yo then
lim (f ◦ g)(x) = f lim g(x) .
x→xo x→xo
Proof: Let > 0 and choose β > 0 such that 0 < ||y − b||2 < β implies ||f (y) − f (yo )||3 < . We
can choose such a β since Since f is continuous at yo thus it follows that limy→yo f (y) = f (yo ).
Next choose δ > 0 such that 0 < ||x − xo ||1 < δ implies ||g(x) − yo ||2 < β. We can choose such
a δ because we are given that limx→xo g(x) = yo . Suppose 0 < ||x − xo ||1 < δ and let y = g(x)
1.3. TOPOLOGY AND LIMITS IN NORMED LINEAR SPACES 19
note ||g(x) − yo ||2 < β yields ||y − yo ||2 < β and consequently ||f (y) − f (yo )||3 < . Therefore, 0 <
||x−xo ||1 < δ implies ||f (g(x))−f (yo )||3 < . It follows that limx→xo (f (g(x)) = f (limx→xo g(x)).
The following functions are suprisingly useful as we seek to describe continuity of functions.
Definition 1.3.17.
s(x, y) = x + y p(x, y) = xy
Proposition 1.3.18.
The proof that the product is continuous is not entirely trivial, but, once you have it, so many
things follow:
Proposition 1.3.19.
Of course, we can continue to products of three or more factors by iterating the product:
and by an argument much like that given in Equation 1.22 we can argue that the product of three
continous real-valued functions on a subset of a NLS V is once more continuous. It should be
clear we can extend by induction this result to any product of finitely many real-valued continuous
functions.
Lemma 1.3.20.
Let V be a NLS with basis {v1 , . . . , vn }. Define coordinate function xi : V → R as follows:
given a = a1 v1 + · · · + an vn set xi (a) = ai . Then Φβ = (x1 , x2 , . . . , xn ) and each coordinate
function is continuous on V .
Proof: if a = a1 v1 + · · · + an vn then Φβ (a) = (a1 , . . . , an ) = (x1 (a), . . . , xn (a)) therefore Φβ =
(x1 , . . . , xn ). I leave the proof that xi : V → R is continuous for each i = 1, . . . , m as a likely
homework for the reader.
20 CHAPTER 1. ON NORMS AND LIMITS
Definition 1.3.21.
Let x1 , . . . , xn be coordinate functions with respect to basis β for a NLS V then a function
f : V → R such that for constants c0 , ci , cij , . . . , ci1 ,...,ik ∈ R,
n
X n
X X
f (x) = c0 + ci xi + cij xi xj + · · · + ci1 ,...,ik xi1 · · · xik
i=1 i,j=1 i1 ,...,ik
Theorem 1.3.22.
hence det(A) ∈ R[Aij | 1 ≤ i, j ≤ n] is an n-th order multinomial in the coordinates Aij with respect
to the standard matrix basis for R n×n . Thus the determinant is a continuous real-valued function
of matrices.
I’ll let you explain why the complex-valued determinant function on Cn×n is also continuous. Let’s
enjoy the application of these results:
Example 1.3.24. The general linear group GL(n, R) = {A ∈ R n×n | det(A) 6= 0} is an open
subset of Rn×n . To see this notice that GL(n, R) = det−1 ((−∞, 0) ∪ (0, ∞)). But, the determinant
is continuous and the inverse image of open sets is open. Clearly (−∞, 0) ∪ (0, ∞) is open since
each point is interior.
To be picky, I have not shown the inverse image of open sets is open for a continuous map on an
NLS, but, I will likely assign that as a homework, so, don’t worry, you’ll get a chance to ponder it.
The squeeze theorem relies heavily on the order properties of R. Generally a normed vector space
has no natural ordering. For example, is 1 > i or is 1 < i in C ? That said, we can state a
squeeze theorem for real-valued functions whose domain reside in a normed vector space. This is
a generalization of what we learned in calculus I. That said, the proof offered below is very similar
to the typical proof which is not given in calculus I9
9
this is lifted word for word from my calculus I notes, however here the meaning of open ball is considerably more
general and the linearity of the limit which is referenced is the one proven earlier in this section
1.3. TOPOLOGY AND LIMITS IN NORMED LINEAR SPACES 21
Furthermore, if the limits of f (x) and h(x) exist with limx→a f (x) = limx→a h(x) = L ∈ R
then the limit of g(x) likewise exists and limx→a g(x) = L.
Proof: Suppose f (x) ≤ g(x) for all10 x ∈ Bδ1 (a)o for some δ1 > 0 and also suppose limx→a f (x) =
Lf ∈ R and limx→a g(x) = Lg ∈ R. We wish to prove that Lf ≤ Lg . Suppose otherwise towards a
contradiction. That is, suppose Lf > Lg . Note that limx→a [g(x) − f (x)] = Lg − Lf by the linearity
of the limit. It follows that for = 12 (Lf − Lg ) > 0 there exists δ2 > 0 such that x ∈ Bδ2 (a)o implies
|g(x) − f (x) − (Lg − Lf )| < = 12 (Lf − Lg ). Expanding this inequality we have
1 1
− (Lf − Lg ) < g(x) − f (x) − (Lg − Lf ) < (Lf − Lg )
2 2
adding Lg − Lf yields,
3 1
− (Lf − Lg ) < g(x) − f (x) < − (Lf − Lg ) < 0.
2 2
Thus, f (x) > g(x) for all x ∈ Bδ2 (a)o . But, f (x) ≤ g(x) for all x ∈ Bδ1 (a)o so we find a contradic-
tion for each x ∈ Bδ (a) where δ = min(δ1 , δ2 ). Hence Lf ≤ Lg . The same proof can be applied to
g and h thus the first part of the theorem follows.
Next, we suppose that limx→a f (x) = limx→a h(x) = L ∈ R and f (x) ≤ g(x) ≤ h(x) for all
x ∈ Bδ1 (a) for some δ1 > 0. We seek to show that limx→a f (x) = L. Let > 0 and choose δ2 > 0
such that |f (x) − L| < and |h(x) − L| < for all x ∈ Bδ (a)o . We are free to choose such a
δ2 > 0 because the limits of f and h are given at x = a. Choose δ = min(δ1 , δ2 ) and note that if
x ∈ Bδ (a)o then
f (x) ≤ g(x) ≤ h(x)
hence,
f (x) − L ≤ g(x) − L ≤ h(x) − L
but |f (x) − L| < and |h(x) − L| < imply − < f (x) − L and h(x) − L < thus
Therefore, for each > 0 there exists δ > 0 such that x ∈ Bδ (a)o implies |g(x) − L| < so
limx→a g(x) = L.
Our typical use of the theorem above applies to equations of norms from a normed vector space.
The norm takes us from V to R so the theorem above is essential to analyze interesting limits. We
shall make use of it in future analysis.
10
I use the notation Bδ1 (a)o to denote the deleted open ball of radius δ1 centered at a; Bδ1 (a)o = Bδ1 (a) − {a}.
22 CHAPTER 1. ON NORMS AND LIMITS
Fortunately all the main examples of this course are built on the real numbers which are complete,
this induces completeness for C, Rn and R m×n . The proof that R, C, Rn and R m×n are Banach
spaces follow from arguments similar to those given in the example below.
Example 1.4.4. Claim: R complete implies R2 is complete.
Proof: suppose (xn , yn ) is a Cauchy sequence in R2 . Therefore, for each > 0 there exists N ∈ N
such that m, n ∈ N with N < m < n implies ||(xm , ym ) − (xn , yn )|| < . Consider that:
p
||(xm , ym ) − (xn , yn )|| = (xm − xn )2 + (ym − yn )2
1.4. SEQUENTIAL ANALYSIS 23
p
Therefore, as |xm − xn | = (xm − xn )2 , it is clear that:
|xm − xn | ≤ ||(xm , ym ) − (xn , yn )||
But, this proves that {xn } is a Cauchy sequence of real numbers since for each > 0 we can choose
N > 0 such that N < m < n implies |xm − xn | < . The same holds true for the sequence {yn }.
By completeness of R we have xn → x and yn → y as n → ∞. We propose that (xn , yn ) → (x, y).
Let > 0 once more and choose Nx > 0 such that n > Nx implies |xn − x| < /2 and Ny > 0 such
that n > Ny implies |yn − y| < /2. Let N = max(Nx , Ny ) and suppose n > N :
||(xn , yn ) − (x, y)|| = ||xn − x, 0) + (0, yn − y)|| ≤ |xn − x| + |yn − y| < /2 + /2 = .
The key point here is that components of a Cauchy sequence form Cauchy sequences in R. That
will also be true for sets of matrices and complex numbers.
for such A ∈ R n×n as the series above converges. Convergence of a series of matrices is measured
by the convergence of the sequence of partial sums. For eA the n-th partial sum is simply:
n−1
X 1 k 1
Sn = A = I + A + ··· + An−1 (1.25)
k! (n − 1)!
k=0
The identity kABk ≤ kAkkBk inductively extends to kAk k ≤ kAkk for k ∈ N. With this identity
and the triangle inequality we find:
m−1
X 1
kSm − Sn k ≤ kAkk = sm − sn (1.27)
k!
k=n
I’m fond of the argument above, it was shown to me in some course I took with R.O Fulp, maybe
a few courses. There is another argument from linear algebra which uses the real Jordan form.
Since A = P −1 JP for some P ∈ GL(n, R) and eJ is easily calculated we obtain existence of eA
−1
from the fact that eJ = eP AP = P eA P −1 . But, admittedly, it does take a little work to prove the
existence of the real Jordan form for any A ∈ R n×n . I bet there are many other arguments to show
eA is well-defined. The abstract concept of the exponential is much more useful than you might
first expect. The past two summers I learned an exponential on the appropriate algebra solves any
constant coefficient ODE, even when the coefficients are taken from algebras with all sorts of weird
features.
24 CHAPTER 1. ON NORMS AND LIMITS
Chapter 2
differentiation
Our goal in this chapter is to describe differentiation for functions to and from normed linear spaces.
It turns out this is actually quite simple given the background of the preceding chapter. The dif-
ferential at a point is a linear transformation which best approximates the change in a function at
a particular point. We can quantify ”best” by a limiting process which is naturally defined in view
of the fact there is a norm on the spaces we consider.
The most important example is of course the case f : Rn → Rm . In this context it is natural
to write the differential as a matrix multiplication. The matrix of the differential is known as
the Jacobian matrix. Partial derivatives are also defined in terms of directional derivatives. The
directional derivative is sometimes defined where the differential fails to exist. We will discuss how
the criteria of continuous differentiability allows us to build the differential from the directional
derivatives. We study how the general concept of Frechet differentiation recovers all the derivatives
you’ve seen previously in calculus and much more.
The general theory of differentiation is a bit of an adjustment from our previous experience dif-
ferentiating. Dieudonne said it best: this is the introduction to his chapter on differentiation in
Modern Analysis Chapter VIII.
The subject matter of this Chapter is nothing else but the elementary theorems of
Calculus, which however are presented in a way which will probably be new to most
students. That presentation, which throughout adheres strictly to our general ”geomet-
ric” outlook on Analysis, aims at keeping as close as possible to the fundamental idea
of Calculus, namely the ”local” approximation of functions by linear functions. In
the classical teaching of Calculus, the idea is immediately obscured by the
accidental fact that, on a one-dimensional vector space, there is a one-to-
one correspondence between linear forms and numbers, and therefore the
derivative at a point is defined as a number instead of a linear form. This
slavish subservience to the shibboleth1 of numerical interpretation at any
cost becomes much worse when dealing with functions of several variables...
Dieudonne’s then spends the next half page continuing this thought with explicit examples of how
this custom of our calculus presentation injures the conceptual generalization. If you want to see
1
from wikipedia: is a word, sound, or custom that a person unfamiliar with its significance may not pronounce
or perform correctly relative to those who are familiar with it. It is used to identify foreigners or those who do not
belong to a particular class or group of people. It also refers to features of language, and particularly to a word or
phrase whose pronunciation identifies a speaker as belonging to a particular group.
25
26 CHAPTER 2. DIFFERENTIATION
differentiation written for mathematicians, that is the place to look. He proves many results for
infinite dimensions because, well, why not?
In this chapter I define the Frechet differential and exhibit a number of abstract examples. Then we
turn to proving the basic properties of the Frechet derivative including linearity and the chain rule.
My proof of the chain rule has a bit of a gap, but, I hope the argument gives you some intuition as
to why we should expect a chain rule. Next we explore partial derivatives in an NLS with respect
to a given abstract basis. After that we focus on Rn . Many many examples of Jacobians are given.
We study a few perverse examples which fail to be continuously differentiable. We show continuous
differentiability implies differentiability by a standard, but interesting, argument. I prove a quite
general product rule, discuss the problem of higher derivatives in the abstract (I punt details to
Zorich for now, sorry Fall 2017). Finally, I share some insights I’ve recently come to to understand
about A-Calculus. In particular, I discuss some of the rudiments of differentiating with respect to
algebra variables.
Definition 2.1.1.
Let (V, || · ||V ) and (W, || · ||W ) be normed vector spaces. Suppose that U is open and
F : U ⊆ V → W is a function the we say that F is differentiable at a ∈ U iff there exists
a linear mapping L : V → W such that
F (a + h) − F (a) − L(h)
lim = 0.
h→0 ||h||V
In such a case we call the linear mapping L the differential at a and we denote L = dFa .
In the case V = Rm and W = Rn the matrix of the differential is called the derivative of
F at a or the Jacobian matrix of F at a and we denote [dFa ] = F 0 (a) ∈ R m×n which
means that dFa (v) = F 0 (a)v for all v ∈ Rn .
Notice this definition gives an equation which implicitly defines dFa . For the moment the only way
we have to calculate dFa is educated guessing. We simply use brute-force calculation to suggest
a guess for L which forces the Frechet quotient to vanish. In the next section we’ll discover a
systematic calculational method for functions on euclidean spaces. The purpose of this section is
to understand the definition of the differential and to connect it to basic calculus. I’ll begin with
basic calculus as you probably are itching to understand where your beloved difference quotient
has gone:
2
Some authors might put a norm in the numerator of the quotient. That is an equivalent condition since a function
g : V → W has limh→0 g(h) = 0 iff limh→0 ||g(h)||W = 0
2.1. THE FRECHET DIFFERENTIAL 27
It is a simple exercise to show that if lim(A − B) = 0 and lim(B) exists then lim(A) exists and
lim(A) = lim(B). Identify A = f (x+h)−f
h
(x)
and B = dfxh(h) . Therefore,
f (x + h) − f (x)
m = lim .
h→0 h
Consequently, we find the 1 × 1 matrix m of the differential is precisely f 0 (x) as we defined it via
a difference quotient in first semester calculus. In summary, we find dfx (h) = f 0 (x)h . In other
words, if a function is differentiable in the sense we defined at the beginning of this chapter then it
is differentiable in the terminology we used in calculus I. Moreover, the derivative at x is precisely
the matrix of the differential.
Remark 2.1.3.
Incidentally, I should mention that dfx is the differential of f at the point x. The differential
of f would be the mapping x 7→ dfx . Technically, the differential df is a function from R to
the set of linear transformations on R. You can contrast this view with that of first semester
calculus. There we say the mapping x 7→ f 0 (x) defines the derivative f 0 as a function from
R to R. This simplification in perspective is only possible because calculus in one-dimension
is so special. More on this later. This distinction is especially important to understand if
you begin to look at questions of higher derivatives.
Example 2.1.4. Suppose T : V → W is a linear transformation of normed vector spaces V and W .
I propose L = T . In other words, I think we can show the best linear approximation to the change
in a linear function is simply the function itself. Clearly L is linear since T is linear. Consider the
difference quotient:
T (a + h) − T (a) − L(h) T (a) + T (h) − T (a) − T (h) 0
= = .
||h||V ||h||V ||h||V
Note h 6= 0 implies ||h||V 6= 0 by the definition of the norm. Hence the limit of the difference quotient
vanishes since it is identically zero for every nonzero value of h. We conclude that dTa = T .
3
√
unless we state otherwise, Rn is assumed to have the euclidean norm, in this case ||x||R = x2 = |x|.
28 CHAPTER 2. DIFFERENTIATION
Example 2.1.5. Let T : V → W where V and W are normed vector spaces and define T (v) = wo
for all v ∈ V . I claim the differential is the zero transformation. Linearity of L(v) = 0 is trivially
verified. Consider the difference quotient:
T (a + h) − T (a) − L(h) wo − wo − 0 0
= = .
||h||V ||h||V ||h||V
Using the arguments to the preceding example, we find dTa = 0.
Typically the difference quotient is not identically zero. The pair of examples above are very special
cases. Our next example requires a bit more thought:
Example 2.1.6. Suppose F : R2 → R3 is defined by F (x, y) = (xy, x2 , x + 3y) for all (x, y) ∈ R2 .
Consider the difference function 4F at (x, y):
Calculate,
Identify the linear part of 4F as a good candidate for the differential. I claim that:
L(h, k) = xk + hy, 2xh, h + 3k .
therefore L : R2 → R3 is manifestly linear. Use the algebra above to simplify the difference quotient
below:
(hk, h2 , 0)
4F − L(h, k)
lim = lim
(h,k)→(0,0) ||(h, k)|| (h,k)→(0,0) ||(h, k)||
√
Note ||(h, k)|| = h2 + k 2 therefore we fact the task of showing that √h21+k2 (hk, h2 , 0) → (0, 0, 0)
√
as (h, k) → (0, 0). Notice that: ||(hk, h2 , 0)|| = |h| h2 + k 2 . Therefore, as (h, k) → 0 we find
1
√ (hk, h2 , 0) = |h| → 0.
h + k2
2
Let (V, || · ||V ) and (W, || · ||W ) be normed vector spaces and suppose F : dom(F ) ⊆ V → W
is differentiable at p then the linearization of F at p is given by LpF (x) = F (p)+dFp (x−p)
for all x ∈ V . We also say LpF : V → W is the affinization of F at p.
Perhaps the term linearization is a holdover from the terminology linear function of the form
f (x) = mx + b. Of course, this is an offense to the student of pure linear algebra. Unless b = 0 such
a map is not technically linear. What is it? It’s an affine function. So, I added the terminology
affinization of F to the definition above. However, I must admit, I don’t think that terminology
is standard. Much can be said about affine maps of normed linear spaces, I probably fail to paint
the big picture of affine maps in these notes. Maybe I should make it homework...
Example 2.1.8. Suppose F : R2 → R3 is defined by F (x, y) = (xy, x2 , x + 3y) for all (x, y) ∈ R2
then calculate the linearization of f at (1, −2). Following Example 2.1.6 we find
y x −2 1
h h
df(x,y) (h, k) = 2x 0 ⇒ df(1,−2) (h, k) = 2 0 .
k k
1 3 1 3
Consider the problem of calculating limh→0 F (t+h)−Fh (t)−L(h) . We observe that a complex function
converges to zero iff the real and imaginary parts of the function separately converge to zero (this
is covered by Theorem 1.3.22). By differentiability of U and V we find again using Example 2.1.2
1 0 1 0
lim U (t + h) − U (t) − U (t)h = 0 lim V (t + h) − V (t) − V (t)h = 0.
h→0 h h→0 h
Therefore, dFt (h) = (U 0 (t) + iV 0 (t))h. Note that the quantity U 0 (t) + iV 0 (t) is not a real matrix
in this case. To write the derivative in terms of a real matrix multiplication we need to construct
some further notation which makes use of the isomorphism between C and R2 . Actually, it’s pretty
easy if you agree that a + ib = (a, b) then dFt (h) = (U 0 (t), V 0 (t))h so the matrix of the differential
is (U 0 (t), V 0 (t)) ∈ R1×2 which makes since as F : C ≈ R2 → R.
Example 2.1.10. Suppose V is a normed vector space with basis β = {f1 , f2 , . . . , fn }. Futhermore,
let G : I ⊆ R → V be defined by
Xn
G(t) = Gi (t)fi
i=1
Pn n
+ h) − Gi (t) − h dG Gi (t + h) − Gi (t) − h dG
X
i=1 [Gi (t dt ]fi
i i
dt
lim = lim fi = 0
h→0 h h→0 h
i=1
where the zero above follows from the supposed differentiability of each component function. It
follows that:
n
X dGi
dGt (h) = h fi
dt
i=1
(1.) V = R, functions on R, f : R → R
(2.) V = Rn , space curves in R, ~r : R → Rn
(3.) V = C, complex-valued functions of a real variable, f = u + iv : R → C
(4.) V = R m×n , matrix-valued functions of a real variable, F : R → R m×n .
In short, when we differentiate a function which has a real domain then we can define the derivative
of such a function by component-wise differentiation. It gets more interesting when the domain has
several independent variables as Examples 2.1.6 and 2.1.11 illustrate.
2.1. THE FRECHET DIFFERENTIAL 31
4F = F (A + H) − F (A) = (A + H)(A + H) − X 2 = AH + HA + H 2
Hence L : R n×n → R n×n is a linear transformation. By construction of L the linear terms in the
numerator cancel leaving just the quadratic term,
F (A + H) − F (A) − L(H) H2
lim = lim .
H→0 ||H|| H→0 ||H||
2
It suffices to show that limH→0 ||H ||
||H|| = 0 since lim(||g||) = 0 iff lim(g) = 0 in a normed vector
space. Fortunately the normed vector space R n×n is actually a Banach algebra. A vector space
with a multiplication operation is called an algebra. In the current context the multiplication is
simply matrix multiplication. A Banach algebra is a normed vector space with a multiplication that
satisfies ||XY || ≤ ||X|| ||Y ||. Thanks to this inequality4 we can calculate our limit via the squeeze
2 || ||H 2 ||
theorem. Observe 0 ≤ ||H ||H|| ≤ ||H||. As H → 0 it follows ||H|| → 0 hence limH→0 ||H|| = 0. We
find dFA (H) = AH + HA.
4
it does take a bit of effort to prove this inequality holds for the matrix norm, I omit it since it would be distracting
here
32 CHAPTER 2. DIFFERENTIATION
kηF (h)k
0 ≤ kηF (h)k < . (2.5)
khk
Thus kηF (h)k → 0 as h → 0 by the squeeze theorem. Consequently,
lim ηF (h) = 0. (2.6)
h→0
Proof: Let ηF (h) = F (a + h) − F (a) − dFa (h) and ηG (h) = G(a + h) − G(a) − dGa (h) for all h ∈ V .
Assume F and G differentiable at a hence limh→0 ηkhk F (h)
= 0 and limh→0 ηG (h)
khk = 0. Moreover,
dFa , dGa : V → W are linear hence cdFa + dGa : V → W is linear. Hence calculate,
ηcF +G (h) = (cF + G)(a + h) − (cF + G)(a) − (cdFa + dGa )(h) (2.7)
= c (F (a + h) − F (a) − dFa (h)) + G(a + h) − G(a) − dGa (h)
= cηF (h) + ηG (h)
Therefore, by Proposition 1.3.8, we complete the proof:
ηcF +G (h) cηF (h) + ηG (h) ηF (h) ηG (h)
lim = lim = c lim + lim = 0.
h→0 khk h→0 khk h→0 khk h→0 khk
The proof I offer here is not quite complete. The main ideas are here, but, there is a pesky term at
the end which I have not quite pinned down to my liking. I found these notes by J. C. M. Grajales
on page 40 have a proof which appears complete.
Proof: since G is differentiable at a we have the existence of ηG continuous at h = 0 defined by:
ηG (h) = G(a + h) − G(a) − dGa (h) (2.9)
Also, by differentiability of F at G(a) we have the existence of ηF continuous at k = 0 given by:
ηF (k) = F (G(a) + k) − F (G(a)) − dFG(a) (k) (2.10)
Furthermore, the differentials are linear transformations and thus their composite dFG(a) ◦ dGa is
likewise linear. It remains to show ηF ◦ G formed with dFG(a) ◦ dGa has the needed limiting property.
Thus consider,
ηF ◦ G (h) = (F ◦ G)(a + h) − (F ◦ G)(a) − (dFG(a) ◦ dGa )(h) (2.11)
= F (G(a + h)) − F (G(a)) − dFG(a) (dGa (h))
= F (G(a) + dGa (h) + ηG (h)) − F (G(a)) − dFG(a) (dGa (h))
= F (G(a)) + dFG(a) (dGa (h) + ηG (h)) + ηF (dGa (h) + ηG (h))
− F (G(a)) − dFG(a) (dGa (h))
= dFG(a) (ηG (h)) + ηF (dGa (h) + ηG (h))
where I used Equation 2.10 to make the expansion marked in blue. I need a bit of notation to help
guide the remainder of the proof:
ηF ◦ G (h) 1 1
= dF (ηG (h)) + ηF (dGa (h) + ηG (h)) (2.12)
khk khk G(a) khk
| {z } | {z }
(I.) (II.)
We can understand (I.) using linearity and continuity of the linear map dFG(a) :
1 ηG (h) ηG (h)
lim dFG(a) (ηG (h)) = lim dFG(a) = dFG(a) lim = dFG(a) (0) = 0. (2.13)
h→0 khk h→0 khk h→0 khk
F (a + hv) − F (a)
Dv F (a) = lim
h→0 h
One great contrast we should pause to note is that the definition of the directional derivative is
explicit whereas the definition of the differential was implicit. Naturally, if we take V = W = R
then we recover the first semester difference quotient definition of the derivative at a point. This
also reproduces the directional derivatives you were shown in multivariate calculus, except, we do
not insist v have kvk = 1. Don’t be fooled by the proof of the next Theorem, it’s easier than it
looks. Summary: since differentiability at a point controls the change of the map in all directions
at a point in terms of the differential we can control the change in the map in a particular direction
at the given point via the differential.
Theorem 2.3.3. Differentiability implies directional differentiability.
Let V, W be real normed linear spaces. If F : U ⊆ V → W is differentiable at a ∈ U then
the directional derivative Dv F (a) exists for each v ∈ V and Dv F (a) = dFa (v).
Proof: Suppose a ∈ U such that dFa is well-defined then we are given that
F (a + h) − F (a) − dFa (h)
lim = 0.
h→0 ||h||
2.3. PARTIAL DERIVATIVES OF DIFFERENTIABLE MAPS 35
This is a limit in V , when it exists it follows that the limits that approach the origin along particular
paths also exist and are zero. Consider the path t 7→ tv for v 6= 0 and t > 0, we find
F (a + tv) − F (a) − dFa (tv) 1 F (a + tv) − F (a) − tdFa (v)
lim = lim = 0.
tv→0, t>0 ||tv|| ||v|| t→0 + |t|
Hence, as |t| = t for t > 0 we find
F (a + tv) − F (a) tdFa (v)
lim = lim = dFa (v).
t→0+ t t→0 t
Likewise we can consider the path t 7→ tv for v 6= 0 and t < 0
F (a + tv) − F (a) − dFa (tv) 1 F (a + tv) − F (a) − tdFa (v)
lim = lim = 0.
tv→0, t<0 ||tv|| ||v|| t→0 − |t|
Note |t| = −t thus the limit above yields
F (a + tv) − F (a) tdFa (v) F (a + tv) − F (a)
lim = lim ⇒ lim = dFa (v).
t→0− −t t→0 − −t t→0− t
Therefore,
F (a + tv) − F (a)
lim = dFa (v)
t→0 t
and we conclude that Dv F (a) = dFa (v) for all v ∈ V since the v = 0 case follows trivially.
Partial derivatives are just directional derivatives in standard directions. In particular, given a
basis β = {v1 , . . . , vn } with coordinate maps x1 , . . . , xn there is a standard concept of partial
differentiation on an NLS:
Definition 2.3.4. partial derivative with respect to coordinate on an NLS.
Let V be a NLS with basis β = {v1 , . . . , vn } and coordinates Φβ = (x1 , . . . , xn ). Then if
F : dom(F ) ⊆ V → W we define, for such points a ∈ dom(F ) as the limit exists,
∂F F (a + hvi ) − F (a)
(a) = Dvi F (a) = lim .
∂xi h→0 h
If we know a map of normed linear spaces is differentiable then we can express the differential in
terms of partial derivatives.
Theorem 2.3.6. differentials can be built from partial derivatives.
Let V, W be real normed linear spaces where V has basis β = {v1 , . P . . , vn } with coordinates
x1 , . . . , xn . If F : dom(F ) ⊆ V → W is differentiable at a and h = ni=1 hi vi then
n
X ∂F
dFa (h) = hi (a).
∂xi
i=1
I should emphasize, at this point in our development, we cannot conclude the differential exists
merely from partial derivatives existing6 . The example above is reasonable because we have already
shown differentiability of the F (A) = A2 map in Example 2.1.11.
Remark 2.3.8.
I have deliberately defined the derivative in slightly more generality than we need for this
course. It’s probably not much trouble to continue to develop the theory of differentiation
for a normed vector space, however I will for the most part stop here modulo an example
here or there. If you understand many of the theorems that follow from here on out for
Rn then it is a simple matter to transfer arguments to the setting of a Banach space by
using an appropriate isomorphism. Traditionally this type of course only covers continuous
differentiability, inverse and implicit function theorems in the context of mappings from
Rn to Rm .
For the reader interested in generalizing these results to the context of an abstract normed
vector space feel free to discuss it with me sometime. Also, if you want to read a master
on these topics you could look at the text by Shlomo Sternberg on Advanced Calculus.
He develops many things for normed spaces. Or, take a look at Dieudonne’s Modern
Analysis which pays special attention to reaping infinite dimensional results from our finite-
dimensional arguments. I also find Zorich’s two volume set on Mathematical Analysis is
quite helpful. I’m hoping to borrow some arguments from Zorich in this update to my notes.
Any of these texts would be good to read to follow-up my course with something deeper.
6
we study this in depth in Section 2.4.
2.3. PARTIAL DERIVATIVES OF DIFFERENTIABLE MAPS 37
Similar pictures can be imagined for partial derivatives of more variables, even for vector-valued
maps, but direct visualization is not possible (at least for me).
The proposition below shows how the differential of a m-vector-valued function of n-real variables
is connected to a matrix of partial derivatives.
Proposition 2.3.10.
If F : U ⊆ Rn → Rm is differentiable at a ∈ U then the differential dFa has derivative
matrix F 0 (a) and it has components which are expressed in terms of partial derivatives of
the component functions:
[dFa ]ij = ∂j Fi
for 1 ≤ i ≤ m and 1 ≤ j ≤ n.
Perhaps it is helpful to expand the derivative matrix explicitly for future reference:
∂1 F1 (a) ∂2 F1 (a) · · · ∂n F1 (a)
∂1 F2 (a) ∂2 F2 (a) · · · ∂n F2 (a)
0
F (a) =
.. .. .. ..
. . . .
∂1 Fm (a) ∂2 Fm (a) · · · ∂n Fm (a)
Let’s write the operation of the differential for a differentiable mapping at some point a ∈ R in
terms of the explicit matrix multiplication by F 0 (a). Let v = (v1 , v2 , . . . vn ) ∈ Rn ,
∂1 F1 (a) ∂2 F1 (a) · · · ∂n F1 (a) v1
∂1 F2 (a) ∂2 F2 (a) · · · ∂n F2 (a) v2
dFa (v) = F 0 (a)v =
.. .. .. .. ..
. . . . .
∂1 Fm (a) ∂2 Fm (a) · · · ∂n Fm (a) vn
You may recall the notation from calculus III at this point, omitting the a-dependence,
T
∇Fj = grad(Fj ) = ∂1 Fj , ∂2 Fj , · · · , ∂n Fj
So if the derivative exists we can write it in terms of a stack of gradient vectors of the component
functions: (I used a transpose to write the stack side-ways),
T
F 0 = ∇F1 |∇F2 | · · · |∇Fm
(∇F1 )T
∂1 F1 ∂2 F1 ··· ∂n F1
∂1 F2 ∂2 F2 ··· ∂n F2 (∇F2 )T
F0 = = ∂1 F | ∂2 F | · · · | ∂n F =
.. .. .. .. ..
. . . . .
∂1 Fm ∂2 Fm · · · ∂n Fm (∇Fm )T
Example 2.3.11. Recall that in Example 2.1.6 we showed that F : R2 → R3 defined by F (x, y) =
(xy, x2 , x + 3y) for all (x, y) ∈ R2 was differentiable. In fact we calculated that
y x
h
dF(x,y) (h, k) = 2x 0 .
k
1 3
2.3. PARTIAL DERIVATIVES OF DIFFERENTIABLE MAPS 39
If you recall from calculus III the mechanics of partial differentiation it’s simple to see that
y
∂F ∂
= (xy, x2 , x + 3y) = (y, 2x, 1) = 2x
∂x ∂x
1
x
∂F ∂ 2
= (xy, x , x + 3y) = (x, 0, 3) = 0
∂y ∂y
3
Thus [dF ] = [∂x F |∂y F ] (as we expect given the derivations in this section!)
Directional derivatives and partial derivatives are of secondary importance in this course. They are
merely the substructure of what is truly of interest: the differential. That said, it is useful to know
how to construct directional derivatives via partial derivative formulas. In fact, in careless calculus
texts it sometimes presented as the definition.
Proposition 2.3.12.
If F : U ⊆ Rn → Rm is differentiable at a ∈ U then the directional derivative Dv F (a) can
be expressed as a sum of partial derivative maps for each v =< v1 , v2 , . . . , vn >∈ Rn :
n
X
Dv F (a) = vj ∂j F (a)
j=1
Proof: since F is differentiable at a the differential dFa exists and Dv F (a) = dFa (v) for all v ∈ Rn .
Use linearity of the differential to calculate that
Dv F (a) = dFa (v1 e1 + · · · + vn en ) = v1 dFa (e1 ) + · · · + vn dFa (en ).
Note dFa (ej ) = Dej F (a) = ∂j F (a) and the prop. follows.
Example 2.3.13. Suppose f : R3 → R then ∇f = [∂x f, ∂y f, ∂z f ]T and we can write the directional
derivative in terms of
Dv f = [∂x f, ∂y f, ∂z f ]T v = ∇f · v
if we insist that ||v|| = 1 then we recover the standard directional derivative we discuss in calculus
III. Naturally the ||∇f (a)|| yields the maximum value for the directional derivative at a if we limit
the inputs to vectors of unit-length. If we did not limit the vectors to unit length then the directional
derivative at a can become arbitrarily large as Dv f (a) is proportional to the magnitude of v. Since
our primary motivation in calculus III was describing rates of change along certain directions for
some multivariate function it made sense to specialize the directional derivative to vectors of unit-
length. The definition used in these notes better serves the theoretical discussion.7
Example 2.3.14. Let f (t) = (t, t2 , t3 ) then f 0 (t) = (1, 2t, 3t2 ). In this case we have
1
f 0 (t) = [dft ] = 2t
3t2
Example 2.3.15. Let f (~x, ~y ) = ~x · ~y be a mapping from R3 × R3 → R. I’ll denote the coordinates
in the domain by (x1 , x2 , x3 , y1 , y2 , y3 ) thus f (~x, ~y ) = x1 y1 + x2 y2 + x3 y3 . Calculate,
Example 2.3.16. Let f (~x, ~y ) = ~x · ~y be a mapping fromPRn × Rn → R. I’ll denote the coordinates
in the domain by (x1 , . . . , xn , y1 , . . . , yn ) thus f (~x, ~y ) = ni=1 xi yi . Calculate,
n
X Xn n
X
∂ ∂xi
xj xi yi = xj iy = δij yi = yj
i=1 i=1 i=1
Likewise,
n
X n
X n
X
∂
yj x i yi = xi ∂y i
yj = xi δij = xj
i=1 i=1 i=1
Remember these are actually column vectors in my sneaky notation; (v1 , . . . , vn ) = [v1 , . . . , vn ]T .
This means the derivative or Jacobian matrix of F at (x, y, z) is
yz xz xy
F 0 (x, y, z) = [dF(x,y,z) ] = 0 1 0
0 0 1
Example 2.3.20. Suppose P (x, v, m) = (Po , P1 ) = ( 12 mv 2 + 12 kx2 , mv) for some constant k. Let’s
calculate the derivative via gradients this time,
Hence,
cos θ −r sin θ
F 0 (r, θ) =
sin θ r cos θ
p
Example 2.3.22. Let G(x, y) = ( x2 + y 2 , tan−1 (y/x)). We calculate,
∂x G = √ x
, x2−y ∂y G = √ y x
+y 2
and , x2 +y 2
x2 +y 2 x2 +y 2
Hence,
"
√ x √ y #
y
x
p
0 x2 +y 2 x2 +y 2
G (x, y) = = r r using r = x2 + y 2
−y x −y x
x2 +y 2 x2 +y 2 r2 r2
p
Example 2.3.23. Let F (x, y) = (x, y, R2 − x2 − y 2 ) for a constant R. We calculate,
∇ R2 − x2 − y 2 = √ 2 −x2 2 , √ 2 −y 2 2
p
R −x −y R −x −y
p
Example 2.3.24. Let F (x, y, z) = (x, y, z, R2 − x2 − y 2 − z 2 ) for a constant R. We calculate,
−y
p
−x −z
∇ R −x −y −z = √ 2
2 2 2 2
2 2 2
, √
2 2 2 2
, √
2 2 2 2
R −x −y −z R −x −y −z R −x −y −z
It follows,
∂ X
(x × v) = 1jk vj ek = v2 e3 − v3 e2 = (0, −v3 , v2 )
∂x1
j,k
∂ X
(x × v) = 2jk vj ek = v3 e1 − v1 e3 = (v3 , 0, −v1 )
∂x2
j,k
∂ X
(x × v) = 3jk vj ek = v1 e2 − v2 e1 = (−v2 , v1 , 0)
∂x3
j,k
Thus the Jacobian is simply,
0 v3 −v2
[df(x,y) ] = −v3 0 −v1
v2 v1 0
In fact, dfp (h) = f (h) = h × v for each p ∈ R3 . The given mapping is linear so the differential of
the mapping is precisely the mapping itself (we could short-cut much of this calculation and simply
quote Example 2.1.4 where we proved dT = T for linear T ).
Example 2.3.29. Let f (x, y) = (x, y, 1 − x − y). You can calculate,
1 0
[df(x,y,z) ] = 0 1
−1 −1
Example 2.3.30. Let X(u, v) = (x, y, z) where x, y, z denote functions of u, v and I prefer to omit
the explicit depedendence to reduce clutter in the equations to follow.
∂X ∂X
= Xu = (xu , yu , zu ) and = Xv = (xv , yv , zv )
∂u ∂v
Then the Jacobian is the 3 × 2 matrix
xu xv
dX(u,v) = yu yv
zu zv
2.3. PARTIAL DERIVATIVES OF DIFFERENTIABLE MAPS 43
Remark 2.3.31.
I return to these examples in the next chapter and we’ll explore the geometric content of
these formulas as they support the application of certain theorems. More on that later, for
the remainder of this chapter we continue to focus on properties of differentiation.
Example 2.3.33. .
44 CHAPTER 2. DIFFERENTIATION
Example 2.3.34. .
Example 2.3.35. .
2.4. CONTINUOUS DIFFERENTIABILITY 45
You might be tempted to say then that this function is increasing at a rate of 1/2 for x near zero.
But this claim would be false since you can see that f 0 (x) oscillates wildly without end near zero.
We have a tangent line at (0, 0) with positive slope for a function which is not increasing at (0, 0)
(recall that increasing is a concept we must define in a open interval to be careful). This sort of
thing cannot happen if the derivative is continuous near the point in question.
The one-dimensional case is really quite special, even though we had discontinuity of the derivative
we still had a well-defined tangent line to the point. However, many interesting theorems in calculus
of one-variable require the function to be continuously differentiable near the point of interest. For
example, to apply the 2nd-derivative test we need to find a point where the first derivative is zero
and the second derivative exists. We cannot hope to compute f 00 (xo ) unless f 0 is continuous at xo .
The next example is sick.
Example 2.4.2. Let us define f (0, 0) = 0 and
x2 y
f (x, y) =
x2 + y 2
for all (x, y) 6= (0, 0) in R2 . It can be shown that f is continuous at (0, 0). Moreover, since
f (x, 0) = f (0, y) = 0 for all x and all y it follows that f vanishes identically along the coordinate
axis. Thus the rate of change in the e1 or e2 directions is zero. We can calculate that
∂f 2xy 3 ∂f x4 − x2 y 2
= 2 and = 2
∂x (x + y 2 )2 ∂y (x + y 2 )2
If you examine the plot of z = f (x, y) you can see why the tangent plane does not exist at (0, 0, 0).
8
”pathological” as in, ”your clothes are so pathological, where’d you get them?”
46 CHAPTER 2. DIFFERENTIATION
Notice the sides of the box in the picture are parallel to the x and y axes so the path considered
below would fall on a diagonal slice of these boxes9 . Consider the path to the origin t 7→ (t, t) gives
fx (t, t) = 2t4 /(t2 + t2 )2 = 1/2 hence fx (x, y) → 1/2 along the path t 7→ (t, t), but fx (0, 0) = 0 hence
the partial derivative fx is not continuous at (0, 0). In this example, the discontinuity of the partial
derivatives makes the tangent plane fail to exist.
One might be tempted to suppose that if a function is continuous at a given point and if all
the possible directional derivatives exist then differentiability should follow. It turns out this is
not sufficient since continuity of the function does not imply some continuity along the partial
derivatives. For example:
Example 2.4.3. Let us define f : R2 → R by f (x, y) = 0 for y = 6 x2 and f (x, x2 ) = x. I invite the
reader to verify that this function is continuous at the origin. Moreover, consider the directional
derivatives at (0, 0). We calculate, if v = ha, bi
To see why f (ah, bh) = 0, consider the intersection of ~r(h) = (ha, hb) and y = x2 the intersection
is found at hb = (ha)2 hence, noting h = 0 is not of interest in the limit, b = ha2 . If a = 0
then clearly (ah, bh) only falls on y = x2 at (0, 0). If a 6= 0 then the solution h = b/a2 gives
f (ha, hb) = ha a nontrivial value. However, as h → 0 we eventually reach values close enough
to (0, 0) that f (ah, bh) = 0. Hence we find all directional derivatives exist and are zero at (0, 0).
Let’s examine the graph of this example to see how this happened. The pictures below graph the
xy-plane as red and the nontrivial values of f as a blue curve. The union of these forms the graph
z = f (x, y).
9
the argument to follow stands alone, you don’t need to understand the picture to understand the math here, but
it’s nice if you do
2.4. CONTINUOUS DIFFERENTIABILITY 47
Clearly, f is continuous at (0, 0) as I invited you to prove. Moreover, clearly z = f (x, y) cannot be
well-approximated by a tangent plane at (0, 0, 0). If we capture the xy-plane then we lose the blue
curve of the graph. On the other hand, if we use a tilted plane then we lose the xy-plane part of
the graph.
The moral of the story in the last two examples is simply that derivatives at a point, or even all
directional derivatives at a point do not necessarily tell you much about the function near the point.
This much is clear: something else is required if the differential is to have meaning which extends
beyond one point in a nice way. Therefore, we consider the following:
It would seem the trouble has something to do with discontinuity in the derivative. Continuity of
the derivative requires the assignment a 7→ dFa is continuous. Or,
lim dFx = dFa . (2.18)
x→a
But, this is a limit of operators. Let us study this limit in view of the operator norm we discussed
in the previous chapter. Let > 0 then we must be able to find δ > 0 such that 0 < kx − ak < δ
implies kdFx −dFa k < . So, we need to control kdFx −dFa k to be sure the derivative is continuous.
Consider,
kdFx − dFa k = sup{k(dFx − dFa )(u)k : kuk = 1} (2.19)
= sup{kdFx (u) − dFa (u)k : kuk = 1}
( n n
)
X ∂F X ∂F
= sup ui (x) − ui (a) : kuk = 1
∂xi ∂xi
i=1 i=1
n
X ∂F ∂F
≤ (x) − (a)
∂xi ∂xi
i=1
∂F ∂F
Therefore, the data limx→a ∂xi
(x)
= ∂x i
(a) for i = 1, . . . , n allows us to prove limx→a dFx = dFa .
Naturally, when we teach multivariate calculus the preferred concept does not involve operator
norms. Therefore, to be nice to the non-math majors we define:
Definition 2.4.4.
A mapping F : U ⊆ Rn → Rm is continuously differentiable at a ∈ U iff the partial
derivative mappings Dj F exist on an open set containing a and are continuous at a.
Equation 2.19 shows maps continuously differentiable at x = a are those for which the mapping
x → dFx is a continuous mapping at x = a.
The import of the theorem below is that we can build the tangent plane from the Jacobian matrix
provided the partial derivatives exist near the point of tangency and are continuous at the point
of tangency. This is a very nice result because the concept of the linear mapping is quite abstract
but partial differentiation of a given mapping is often easy. The proof that follows here is found in
many texts, in particular see C.H. Edwards Advanced Calculus of Several Variables on pages 72-73.
48 CHAPTER 2. DIFFERENTIATION
Theorem 2.4.5.
If F : Rn → R is continuously differentiable at a then F is differentiable at a
Proof: Consider a+h sufficiently close to a that all the partial derivatives of F exist. Furthermore,
consider going from a to a+h by traversing a hyper-parallel-piped travelling n-perpendicular paths:
a → a + h1 e1 → a + h1 e1 + h2 e2 → · · · a + h1 e1 + · · · + hn en = a + h.
|{z} | {z } | {z } | {z }
po p1 p2 pn
Pj
Let us denote pj = a + bj where clearly bj ranges from bo = 0 to bn = h and bj = i=1 hi ei . Notice
that the difference between pj and pj−1 is given by:
j
X j−1
X
pj − pj−1 = a + hi ei − a − hi ei = hj ej
i=1 i=1
This is to say the change in F from po = a to pn = a + h can be expressed as a sum of the changes
along the n-steps. Furthermore, if we consider the difference F (pj ) − F (pj−1 ) you can see that only
the j-th component of the argument of F changes. Since the j-th partial derivative exists on the
interval for hj considered by construction we can apply the mean value theorem to locate cj such
that:
hj ∂j F (pj−1 + cj ej ) = F (pj ) − F (pj−1 )
Therefore, using the mean value theorem for each interval, we select c1 , . . . cn with:
n
X
F (a + h) − F (a) = hj ∂j F (pj−1 + cj ej )
j=1
It is clear that L is linear (in fact, perhaps you recognize this as L(h) = (∇F )(a) • h). Let us
prepare to study the Frechet quotient,
n
X n
X
F (a + h) − F (a) − L(h) = hj ∂j F (pj−1 + cj ej ) − hj ∂j F (a)
j=1 j=1
Xn
= hj ∂j F (pj−1 + cj ej ) − ∂j F (a)
| {z }
j=1
gj (h)
Observe that pj−1 + cj ej → a as h → 0. Thus, gj (h) → 0 by the continuity of the partial derivatives
at x = a. Finally, consider the Frechet quotient:
P
F (a + h) − F (a) − L(h) j hj gj (h)
X hj
lim = lim = lim gj (h)
h→0 ||h|| h→0 ||h|| h→0 ||h||
j
2.5. THE PRODUCT RULE 49
hj
Notice |hj | ≤ ||h|| hence ||h|| ≤ 1 and
hj
0≤ gj (h) ≤ |gj (h)|
||h||
Apply the squeeze theorem to deduce each term in the sum ? limits to zero. Consquently, L(h)
satisfies the Frechet quotient and we have shown that F is differentiable
P at x = a and the differen-
tial is expressed in terms of partial derivatives as expected; dFx (h) = nj=1 hj ∂j F (a) .
Theorem 2.4.6.
Theorem 2.5.1.
Proof: assume the notation given in the Theorem and define structure constants cijk ∈ R such
that:
m3
X
vi ? w j = cijk εk . (2.20)
k=1
These constants characterize the nature of the multiplication ?. Interestingly, they have little to do
with the proof, essentially the play the role of bystanders. Assuming F : U → W1 and G : U → W2
are continuously differentiable at a means their component functions F1 , . . . , Fm1 : U → R with
respect to γ1 and G1 , . . . , Gm2 : U → R with respect to γ2 are continuous at a. The component
functions of F ? G are naturally related to those of F and G as follows:
m1
! m
X X2
F ?G= Fi vi ? G j wj (2.21)
i=1 j=1
m1 X
X m2
= Fi Gj (vi ? wj )
i=1 j=1
m1 Xm2 m3
!
X X
= Fi Gj cijk εk
i=1 j=1 k=1
m3
X Xm1 X
m2
= Fi Gj cijk εk
k=1 i=1 j=1
where I used the calculation of Equation 2.21 in reverse in order to make the final step. The
calculation makes it explicitly clear that the partial derivatives of F ? G are sums and products of
continuous functions hence F ?G is continuously differentiable as claimed. Finally, we can construct
2.5. THE PRODUCT RULE 51
Pn
the differential from partial derivatives: for h = l=1 hl rl calculate:
n
X
d(F ? G)a (h) = hl ∂l (F ? G)(a) (2.23)
l=1
Xn
= hl [(∂l F )(a) ? G(a) + F (a) ? (∂l G)(a)]
l=1
" n # " n
#
X X
= hl (∂l F )(a) ? G(a) + F (a) ? hl (∂l G)(a) .
l=1 l=1
= dFa (h) ? G(a) + F (a) ? dGa (h).
This completes the proof.
Let’s unwrap a few common cases of this general product rule. I’ll continue to use the W1 , W2 , W3
and V notation to connect directly to Theorem 2.5.1.
(1.) Set W1 = W2 = W3 = R and V = R to produce the usual first semester calculus
product rule:
d df dg
(f g) = g + f .
dt dt dt
Of course, this was the heart of the proof.
(2.) Set W1 = W2 = W3 = R and V = Rn to produce the usual product rule for
real-valued functions of several variables:
∂ ∂f ∂g
(f g) = g+f .
∂xi ∂xi ∂xi
(3.) Set W1 = R and W2 = W3 and V = Rn to produce the usual product rule for a
scalar function multiplied on a vector-valued function:
∂ ∂f ∂~v
(f~v ) = ~v + f .
∂xi ∂xi ∂xi
(4.) Set W1 = W2 = Rn and W3 = R and V = R to produce the product rule for
dot-products of paths:
d d~v dw
~
(~v • w)
~ = •w
~ + ~v • .
dt dt dt
(5.) Set W1 = W2 = R3 and W3 = R3 and V = R to produce the product rule for
cross-products of paths:
d d~v dw
~
(~v × w)
~ = ×w
~ + ~v × .
dt dt dt
(6.) Set W1 = W2 = W3 = R n×n and V = R to produce the product rule for matrix-
valued functions of a real variable: t 7→ A(t), t 7→ B(t),
d dA dB
(AB) = B+A .
dt dt dt
(7.) Set W1 = W2 = W3 = C and V = C with z = x + iy we find for f1 = u1 + iv1 and
f2 = u2 + iv2
∂ ∂f1 ∂f2 ∂ ∂f1 ∂f2
(f1 f2 ) = f2 + f1 & (f1 f2 ) = f2 + f1 .
∂x ∂x ∂x ∂y ∂y ∂y
52 CHAPTER 2. DIFFERENTIATION
Of course, there is much more. I simply wish to impress on you that these product rules are all
simply the standard product rule married to the algebraic structure of the given product. So long
as the product has the needed linearity properties, there will be a corresponding product rule for
functions.
Example 2.7.2. The direct product algebra of A = R × R is defined by (a, b)(x, y) = (ax, by).
Here (1, 1)(x, y) = (x, y) for all (x, y) ∈ A and in fact 1A = (1, 1).
Example 2.7.3. The hyperbolic numbers are of the form a + bj where j 2 = 1. In particular,
define (a + bj)(c + jd) = ac + bd + j(ad + bc).
Example 2.7.4. The 3-hyperbolic numbers are of the form a + bj + cj 2 where j 3 = 1. In
particular, define
(a + bj + cj 2 )(x + jy + j 2 z) = ax + by + cz + j(bx + ay + cz) + j 2 (cx + by + az).
All the algebras I’ve listed thus far are commutative. There are also many noncommutative
algebras like the quaternions or matrix algebras. Notice R n×n forms an algebra. Basically, I think
of algebras as generalized number systems. So, given that, it is interesting to ask what it means
to differentiate with respect to a variable which takes values in A. In fact, we have a whole course
devoted to studying what happens when you do calculus with respect to a complex variable. Many
schools have such a course. What is less known, which is a shame since it’s really pretty simple, is
that you can differentiate with respect to an algebra variable in much the same way.
Definition 2.7.5. Let U ⊆ A be an open set containing p. If f : U → A is a function then we say
f is A-differentiable at p if there exists a linear function dp f ∈ RA such that
f (p + h) − f (p) − dp f (h)
lim = 0. (2.26)
h→0 ||h||
When I say dp f ∈ RA this simply means that dp f : A → A is R-linear mapping on A and dp f (v ?
w) = dp f (v) ? w for all v, w ∈ A. In other words, A-differentiability amounts to differentiability at
p with an extra condition. Furthermore, we define the derivative at p as follows:
(dp f )(h) = f 0 (p)h (2.27)
But, since (dp f )(h) = dp f (1 ? h) = dp f (1) ? h = f 0 (p)h we have f 0 (p) = dp f (1). In contrast to
the differential of an arbitrary real differentiable map on A, the formula for dp f is equivalent to
the selection of a number in A for p. In other words, there is a natural manner to interpret the
derivative of a function as a function once more. Furthermore, it can be shown for higher derivatives
of an A-differentiable function we have
dn f (v1 , v2 , . . . , vn ) = dn f (1, 1, . . . , 1) ? v1 ? v2 ? · · · ? vn (2.28)
So the n-th derivative is also uniquely fixed by the value of dn f (1, 1, . . . , 1). In fact, we can naturally
identify the n-th derivative of a function as a function once more. In general, the n-th derivative is
a symmetric n-linear functon. Finally, I must tell you a beautiful formula which makes A-Calculus
so very interesting: provided the basis for A has 1A = 1 paired with coordinate x1 :
∂nf ∂nf
= ? v i 1 ? v i2 ? · · · ? v in (2.29)
∂xi1 ∂xi2 · · · ∂xin ∂xn1
If A = Rn as a point set and e1 = 1 then the formulas describing A-calculus are quite nice.
Example 2.7.6. Consider f = u + iv which is complex differentiable at p ∈ C. Use z = x + iy
as the typical variable in C. Notice, dp f (i) = dp f (1)i implies that ∂f ∂f
∂y = ∂x i. These are the famed
Cauchy Riemann equations. To help the reader make the connection, note fy = uy + ivy and
fx = ux + ivx hence fy = ifx amounts to (uy + ivy ) = i(ux + ivx ) hence uy = −vx and vy = ux .
Jumping ahead a bit, with no intention of explaining why here, it is fun to note since i2 + 1 = 0
it follows fyy + fxx = 0 hence the component functions of a complex differentiable function are
solutions to Laplace’s equation.
54 CHAPTER 2. DIFFERENTIATION
Basically, any identity which appears amongst the basis elements of an algebra will be mirrored in
a PDE which is solve by each function differentiable over the algebra. Most familar case is with C
where harmonic functions are a standard and beatiful topic. But, this is just one of many function
theories. In ordinary real analysis essentially A = R itself so this feature cannot be seen. However,
once A is two or more dimensional, the differentiability with respect to A binds real variables
together in such a way that the change in one real variable is necessarily coupled to the rest.
Ok, so, let’s return to our uber product rule once more, assume f, g are A-differentiable at p in a
commutative algebra then note:
Many further results about the calculus over an algebra are known and many resemble closely
the calculus you’ve already seen. However, I’ve also found a few suprises, mostly thanks to the
students who’ve helped me study A-calculus the past few years. If this section was a bit too terse,
my apologies, I have much more to say in my primer on A-calculus: Introduction to A-Calculus
and my A-Calculus II paper with Daniel Freese and my differential equations on an algebra paper
with Nathan BeDell. I will probably share some tidbits about these papers when the time seems
right in this course. But, our main focus is elsewhere.
Chapter 3
It is tempting to give a complete and rigourous proof of these theorems at the outset, but I will
resist the temptation in lecture. I’m actually more interested that the student understand what the
theorem claims before I show the real proof. I will sketch the proof and show many applications.
A nearly complete proof is found in Edwards where he uses an iterative approximation technique
founded on the contraction mapping principle, we will go through that a bit later in the course. I
probably will not have typed notes on that material this semester, but Edward’s is fairly readable
and I think we’ll profit from working through those sections. That said, we develop an intuition for
just what these theorems are all about to start. That is the point of this chapter: to grasp what
the linear algebra of the Jacobian suggests about the local behaviour of functions and equations.
The arguments I just made are supported by theorems that are developed in calculus I. Let me shift
gears a bit and give a direct calculational explaination based on the linearization approximation.
If x ≈ p then f (x) ≈ f (p) + f 0 (p)(x − p). To find the formula for the inverse we solve y = f (x) for
x:
1
y ≈ f (p) + f 0 (p)(x − p) ⇒ x ≈ p + 0
y − f (p)
f (p)
1
Therefore, f −1 (y) ≈ p +
y − f (p) for y near f (p).
f 0 (p)
55
56 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
Example 3.1.1. Just to help you believe me, consider f (x) = 3x − 2 then f 0 (x) = 3 for all x.
Suppose we want to find the inverse function near p = 2 then the discussion preceding this example
suggests,
1
f −1 (y) = 2 + (y − 4).
3
I invite the reader to check that f (f −1 (y)) = y and f −1 (f (x)) = x for all x, y ∈ R.
In the example above we found a global inverse exactly, but this is thanks to the linearity of the
function in the example. Generally, inverting the linearization just gives the first approximation to
the inverse.
Therefore,
F −1 (y) ≈ p + (F 0 (p))−1 y − f (p)
for y near F (p). Apparently the condition to find a local inverse for a mapping on Rn is that the
derivative matrix is nonsingular1 in some neighborhood of the point. Experience has taught us
from the one-dimensional case that we must insist the derivative is continuous near the point in
order to maintain the validity of the approximation.
Recall from calculus II that as we attempt to approximate a function with a power series it takes
an infinite series of power functions to recapture the formula exactly. Well, something similar is
true here. However, the method of approximation is through an iterative approximation procedure
which is built off the idea of Newton’s method. The product of this iteration is a nested sequence
of composite functions. To prove the theorem below one must actually provide proof the recur-
sively generated sequence of functions converges. See pages 160-187 of Edwards for an in-depth
exposition of the iterative approximation procedure. Then see pages 404-411 of Edwards for some
material on uniform convergence2 The main analytical tool which is used to prove the convergence
is called the contraction mapping principle. The proof of the principle is relatively easy to
follow and interestingly the main non-trivial step is an application of the geometric series. For
the student of analysis this is an important topic which you should spend considerable time really
trying to absorb as deeply as possible. The contraction mapping is at the base of a number of
interesting and nontrivial theorems. Read Rosenlicht’s Introduction to Analysis for a broader and
better organized exposition of this analysis. In contrast, Edwards’ uses analysis as a tool to obtain
results for advanced calculus but his central goal is not a broad or well-framed treatment of analysis.
Consequently, if analysis is your interest then you really need to read something else in parallel to
get a better ideas about sequences of functions and uniform convergence. I have some notes from
a series of conversations with a student about Rosenlicht, I’ll post those for the interested student.
These notes focus on the part of the material I require for this course. This is Theorem 3.3 on page
185 of Edwards’ text:
1
nonsingular matrices are also called invertible matrices and a convenient test is that A is invertible iff det(A) 6= 0.
2
actually that later chapter is part of why I chose Edwards’ text, he makes a point of proving things in Rn in such
a way that the proof naturally generalizes to function space. This is done by arguing with properties rather than
formulas. The properties offen extend to infinite dimensions whereas the formulas usually do not.
3.1. INVERSE FUNCTION THEOREM 57
Go (y) = a and Gn+1 (y) = Gn (y) − [F 0 (a)]−1 [F (Gn (y)) − y] for all y ∈ V .
The qualifier local is important to note. If we seek a global inverse then other ideas are needed.
If the function is everywhere injective then logically F (x) = y defines F −1 (y) = x and F −1 so
constructed in single-valued by virtue of the injectivity of F . However, for differentiable mappings,
one might wonder how can the criteria of global injectivity be tested via the differential. Even in
the one-dimensional case a vanishing derivative does not indicate a lack of injectivity; f (x) = x3
√
has f −1 (y) = 3 y and yet f 0 (0) = 0 (therefore f 0 (0) is not invertible). One the other hand, we’ll see
in the examples that follow that even if the derivative is invertible over a set it is possible for the
values of the mapping to double-up and once that happens we cannot find a single-valued inverse
function3
Remark 3.1.3. James R. Munkres’ Analysis on Manifolds good for a different proof.
Another good place to read the inverse function theorem is in James R. Munkres Analysis
on Manifolds. That text is careful and has rather complete arguments which are not entirely
the same as the ones given in Edwards. Munkres’ text does not use the contraction mapping
principle, instead the arguments are more topological in nature.
To give some idea of what I mean by topological let be give an example of such an argument.
Suppose F : Rn → Rn is continuously differentiable and F 0 (p) is invertible. Here’s a sketch of the
argument that F 0 (x) is invertible for all x near p as follows:
2. note we are given F 0 (p) is invertible and hence det(F 0 (p)) 6= 0 thus the continuous function g
is nonzero at p. It follows there is some open set U containing p for which 0 ∈ / g(U )
I would argue this is a topological argument because the key idea here is the continuity of g.
Topology is the study of continuity in general.
Remark 3.1.4. James J. Callahan’s Advanced Calculus: a Geometric View, good reading.
James J. Callahan’s Advanced Calculus: a Geometric View has great merit in both visual-
ization and well-thought use of linear algebraic techniques. In addition, many student will
enjoy his staggered proofs where he first shows the proof for a simple low dimensional case
and then proceeds to the general case.
3
there are scientists and engineers who work with multiply-valued functions with great success, however, as a point
of style if nothing else, we try to use functions in math.
58 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
Example 3.1.5. Suppose F (x, y) = sin(y) + 1, sin(x) + 2 for (x, y) ∈ R2 . Clearly F is contin-
uously differentiable as all its component functions have continuous partial derivatives. Observe,
0 0 cos(y)
F (x, y) = [ ∂x F | ∂y F ] =
cos(x) 0
Hence F 0 (x, y) is invertible at points (x, y) such that det(F 0 (x, y)) = − cos(x) cos(y) 6= 0. This
means we may not be able to find local inverses at points (x, y) with x = 21 (2n + 1)π or y =
1 0
2 (2m + 1)π for some m, n ∈ Z. Points where F (x, y) are singular are points where one or both
of sin(y) and sin(x) reach extreme values thus the points where the Jacobian matrix are singular
are in fact points where we cannot find a local inverse. Why? Because the function is clearly not
1-1 on any set which contains the points of singularity for dF . Continuing, recall from precalculus
that sine has a standard inverse on [−π/2, π/2]. Suppose (x, y) ∈ [−π/2, π/2]2 and seek to solve
F (x, y) = (a, b) for (x, y):
y = sin−1 (a − 1)
sin(y) + 1 a sin(y) + 1 = a
F (x, y) = = ⇒ ⇒
sin(x) + 2 b sin(x) + 2 = b x = sin−1 (b − 2)
It follows that F −1 (a, b) = sin−1 (b − 2), sin−1 (a − 1) for (a, b) ∈ [0, 2] × [1, 3] where you should
note F ([−π/2, π/2]2 ) = [0, 2] × [1, 3]. We’ve found a local inverse for F on the region [−π/2, π/2]2 .
In other words, we just found a global inverse for the restriction of F to [−π/2, π/2]2 . Technically
we ought not write F −1 , to be more precise we should write:
It is customary to avoid such detail in many contexts. Inverse functions for sine, cosine, tangent
etc... are good examples of this slight of langauge.
and
y r sin(θ)
= = tan(θ).
x r cos(θ)
p
It follows that r = x2 + y 2 and θ = tan−1 (y/x) for (x, y) ∈ (0, ∞) × R. We find
p
−1 2 2 −1
T (x, y) = x + y , tan (y/x) .
Let’s see how the derivative fits with our results. Calcuate,
0 cos(θ) −r sin(θ)
T (r, θ) = [ ∂r T | ∂θ T ] =
sin(θ) r cos(θ)
3.2. IMPLICIT FUNCTION THEOREM 59
note that det(T 0 (r, θ)) = r hence we the inverse function theorem provides the existence of a local
inverse around any point except the origin. Notice the derivative does not detect the defect in the
angular coordinate. Challenge, find the inverse function for T (r, θ) = r cos(θ), r sin(θ) with
dom(T ) = [0, ∞) × (π/2, 3π/2). Or, find the inverse for polar coordinates in a neighborhood of
(0, −1).
Example 3.1.7. Suppose T : R3 → R3 is defined by T (x, y, z) = (ax, by, cz) for constants a, b, c ∈
R where abc 6= 0. Clearly T is continuously differentiable as all its component functions have
continuous partial derivatives. We calculate T 0 (x, y, z) = [∂x T |∂y T |∂z T ] = [ae1 |be2 |ce3 ]. Thus
det(T 0 (x, y, z)) = abc 6= 0 for all (x, y, z) ∈ R3 hence this function is locally invertible everywhere.
Moreover, we calculate the inverse mapping by solving T (x, y, z) = (u, v, w) for (x, y, z):
(ax, by, cz) = (u, v, w) ⇒ (x, y, z) = (u/a, v/b, w/c) ⇒ T −1 (u, v, w) = (u/a, v/b, w/c).
Example 3.1.8. Suppose F : Rn → Rn is defined by F (x) = Ax+b for some matrix A ∈ R n×n and
vector b ∈ Rn . Under what conditions is such a function invertible ?. Since the formula for
this function gives each component function as a polynomial in the n-variables we can conclude the
function is continuously differentiable. You can calculate that F 0 (x) = A. It follows that a sufficient
condition for local inversion is det(A) 6= 0. It turns out that this is also a necessary condition as
det(A) = 0 implies the matrix A has nontrivial solutions for Av = 0. We say v ∈ N ull(A) iff
Av = 0. Note if v ∈ N ull(A) then F (v) = Av + b = b. This is not a problem when det(A) 6= 0
for in that case the null space is contains just zero; N ull(A) = {0}. However, when det(A) = 0 we
learn in linear algebra that N ull(A) contains infinitely many vectors so F is far from injective. For
example, suppose N ull(A) = span{e1 } then you can show that F (a1 , a2 , . . . , an ) = F (x, a2 , . . . , an )
for all x ∈ R. Hence any point will have other points nearby which output the same value under F .
Suppose det(A) 6= 0, to calculate the inverse mapping formula we should solve F (x) = y for x,
In Munkres the inverse function theorem is given for r-times differentiable functions. In
short, a C r function with invertible differential at a point has a C r inverse function local
to the point. Edwards also has arguments for r > 1, see page 202 and arguments and
surrounding arguments.
A function cannot have two outputs for a single input, when we write ± in the expression above
it simply indicates our ignorance as to which is chosen. Once further information is given then we
may be able to choose a + or a −. For example:
√
1. if x2 + y 2 = 1 and we want to solve for y near (0, 1) then y = 1 − x2 is the correct choice
since y > 0 at the point of interest.
60 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
√
2. if x2 + y 2 = 1 and we want to solve for y near (0, −1) then y = − 1 − x2 is the correct choice
since y < 0 at the point of interest.
3. if x2 + y 2 = 1 and we want to solve for y near (1, 0) then it’s impossible to find a single
function which reproduces x2 + y 2 = 1 on an open disk centered at (1, 0).
What is the defect of case (3.) ? The trouble is that no matter how close we zoom in to the point
there are always two y-values for each given x-value. Geometrically, this suggests either we have a
discontinuity, a kink, or a vertical tangent in the graph. The given problem has a vertical tangent
and hopefully you can picture this with ease since its just the unit-circle. In calculus I we studied
implicit differentiation, our starting point was to assume y = y(x) and then we differentiated
equations to work out implicit formulas for dy/dx. Take the unit-circle and differentiate both sides,
dy dy x
x2 + y 2 = 1 ⇒ 2x + 2y =0 ⇒ =− .
dx dx y
dy
Note dx is not defined for y = 0. It’s no accident that those two points (−1, 0) and (1, 0) are
precisely the points at which we cannot solve for y as a function of x. Apparently, the singularity
in the derivative indicates where we may have trouble solving an equation for one variable as a
function of the remaining variable.
We wish to study this problem in general. Given n-equations in (m+n)-unknowns when can we solve
for the last n-variables as functions of the first m-variables ? Given a continuously differentiable
mapping G = (G1 , G2 , . . . , Gn ) : Rm × Rn → Rn study the level set: (here k1 , k2 , . . . , kn are
constants)
G1 (x1 , . . . , xm , y1 , . . . , yn ) = k1
G2 (x1 , . . . , xm , y1 , . . . , yn ) = k2
..
.
Gn (x1 , . . . , xm , y1 , . . . , yn ) = kn
We wish to locally solve for y1 , . . . , yn as functions of x1 , . . . xm . That is, find a mapping h : Rm →
Rn such that G(x, y) = k iff y = h(x) near some point (a, b) ∈ Rm × Rn such that G(a, b) = k. In
this section we use the notation x = (x1 , x2 , . . . xm ) and y = (y1 , y2 , . . . , yn ).
Before we turn to the general problem let’s analyze the unit-circle problem in this notation. We
are given G(x, y) = x2 + y 2 and we wish to find f (x) such that y = f (x) solves G(x, y) = 1.
Differentiate with respect to x and use the chain-rule:
∂G dx ∂G dy
+ =0
∂x dx ∂y dx
We find that dy/dx = −Gx /Gy = −x/y. Given this analysis we should suspect that if we are
given some level curve G(x, y) = k then we may be able to solve for y as a function of x near p
if G(p) = k and Gy (p) 6= 0. This suspicion is valid and it is one of the many consequences of the
implicit function theorem.
here we have the matrix multiplication of the n × (m + n) matrix G0 (a, b) with the (m + n) × 1
column vector (x − a, y − b) to yield an n-component column vector. It is convenient to define
partial derivatives with respect to a whole vector of variables,
∂G1 ∂G1 ∂G1 ∂G1
∂x1 · · · ∂x ∂y · · · ∂yn
∂G . .
m
∂G . 1 ..
= . . .
.
= . . .
∂x ∂y
∂Gn ∂Gn ∂Gn ∂Gn
∂x1 · · · ∂xm ∂y1 · · · ∂yn
In this notation we can write the n × (m + n) matrix G0 (a, b) as the concatenation of the n × m
matrix ∂G ∂G
∂x (a, b) and the n × n matrix ∂y (a, b)
0 ∂G ∂G
G (a, b) = (a, b) (a, b)
∂x ∂y
Therefore, for points close to (a, b) we have:
∂G ∂G
G(x, y) ≈ k + (a, b)(x − a) + (a, b)(y − b)
∂x ∂y
The nonlinear problem G(x, y) = k has been (locally) replaced by the linear problem of solving
what follows for y:
∂G ∂G
k≈k+ (a, b)(x − a) + (a, b)(y − b) (3.1)
∂x ∂y
Suppose the square matrix ∂G ∂y (a, b) is invertible at (a, b) then we find the following approximation
for the implicit solution of G(x, y) = k for y as a function of x:
−1
∂G ∂G
y =b− (a, b) (a, b)(x − a) .
∂y ∂x
Of course this is not a formal proof, but it does suggest that det ∂G
∂y (a, b) 6= 0 is a necessary
condition for solving for the y variables.
∂xi
we made use of the identity ∂x k
= δik to squash the sum of i to the single nontrivial term and the
∂
zero on the r.h.s follows from the fact that ∂x l
(k) = 0. Concatenate these derivatives from k = 1
up to k = m:
n n n
∂G X ∂G ∂hj ∂G X ∂G ∂hj ∂G X ∂G ∂hj
+ + ··· + = [0|0| · · · |0]
∂x1 ∂yj ∂x1 ∂x2 ∂yj ∂x2 ∂xm ∂yj ∂xm
j=1 j=1 j=1
The concatenation property of matrix multiplication states [Ab1 |Ab2 | · · · |Abm ] = A[b1 |b2 | · · · |bm ]
we use this to write the expression once more,
∂G −1 ∂G
∂G ∂G ∂h ∂h ∂h ∂G ∂G ∂h ∂h
+ ··· =0 ⇒ + =0 ⇒ =−
∂x ∂y ∂x1 ∂x2 ∂xm ∂x ∂y ∂x ∂x ∂y ∂x
∂G
where in the last implication we made use of the assumption that ∂y is invertible.
Theorem 3.2.1. (Theorem 3.4 in Edwards’s Text see pg 190)
We will not attempt a proof of the last sentence for the same reasons we did not pursue the details
in the inverse function theorem. However, we have already derived the first step in the iteration in
our study of the linearization solution.
function theorem applies to the function F at (a, b). Therefore, there exists F −1 : V ⊆ Rm × Rn →
U ⊆ Rm × Rn such that F −1 is continuously differentiable. Note (a, b) ∈ U and V contains the
point F (a, b) = (a, G(a, b)) = (a, k).
for all (x, y) ∈ U and (u, v) ∈ V . As usual to find the formula for the inverse we can solve
F (x, y) = (u, v) for (x, y) this means we wish to solve (x, G(x, y)) = (u, v) hence x = u. The
3.2. IMPLICIT FUNCTION THEOREM 63
formula for v is more elusive, but we know it exists by the inverse function theorem. Let’s say
y = H(u, v) where H : V → Rn and thus F −1 (u, v) = (u, H(u, v)). Consider then,
Let v = k thus (u, k) = (u, G(u, H(u, k)) for all (u, v) ∈ V . Finally, define h(u) = H(u, k) for
all (u, k) ∈ V and note that k = G(u, h(u)). In particular, (a, k) ∈ V and at that point we find
h(a) = H(a, k) = b by construction. It follows that y = h(x) provides a continuously differentiable
solution of G(x, y) = k near (a, b).
Uniqueness of the solution follows from the uniqueness for the limit of the sequence of functions
described in Edwards’ text on page 192. However, other arguments for uniqueness can be offered,
independent of the iterative method, for instance: see page 75 of Munkres Analysis on Manifolds.
Remark 3.2.2. notation and the implementation of the implicit function theorem.
Example 3.2.3. Suppose G(x, y, z) = x2 + y 2 + z 2 . Suppose we are given a point (a, b, c) such
that G(a, b, c) = R2 for a constant R. Problem: For which variable can we solve? What, if
any, influence does the given point have on our answer? Solution: to begin, we have one
equation and three unknowns so we should expect to find one of the variables as functions of the
remaining two variables. The implicit function theorem applies as G is continuously differentiable.
The point has no local solution for z if it is a point on the intersection of the xy-plane and the
sphere G(x, y, z) = R2 . Likewise, we cannot solve for y = y(x, z) on the y = 0 slice of the sphere
and we cannot solve for x = x(y, z) on the x = 0 slice of the sphere.
Notice, algebra verifies the conclusions we reached via the implicit function theorem:
p p p
z = ± R 2 − x2 − y 2 y = ± R 2 − x2 − z 2 x = ± R2 − y 2 − z 2
When we are at zero for one of the coordinates then we cannot choose + or − since we need both on
an open ball intersected with the sphere centered at such a point4 . Remember, when I talk about
local solutions I mean solutions which exist over the intersection of the solution set and an open
4
if you consider G(x, y, z) = R2 as a space then the open sets on the space are taken to be the intersection with
the space and open balls in R3 . This is called the subspace topology in topology courses.
64 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
ball in the ambient space (R3 in this context). The preceding example is the natural extension of
the unit-circle example to R3 . A similar result is available for the n-sphere in Rn . I hope you get
the point of the example, if we have one equation then if we wish to solve for a particular variable in
terms of the remaining variables then all we need is continuous differentiability of the level function
and a nonzero partial derivative at the point where we wish to find the solution. Now, the implicit
function theorem doesn’t find the solution for us, but it does provide the existence. In the section
on implicit differentiation, existence is really all we need since focus our attention on rates of change
rather than actually solutions to the level set equation.
Example 3.2.4. Consider the equation exy + z 3 − xyz = 2. Can we solve this equation for
z = z(x, y) near (0, 0, 1)? Let G(x, y, z) = exy + z 3 − xyz and note G(0, 0, 1) = e0 + 1 + 0 = 2 hence
(0, 0, 1) is a point on the solution set G(x, y, z) = 2. Note G is clearly continuously differentiable
and
Gz (x, y, z) = 3z 2 − xy ⇒ Gz (0, 0, 1) = 3 6= 0
therefore, there exists a continuously differentiable function h : dom(h) ⊆ R2 → R which solves
G(x, y, h(x, y)) = 2 for (x, y) near (0, 0) and h(0, 0) = 1.
The matrix above is invertible hence the implicit function theorem applies and we can solve for x
and y as functions of z. On the other hand, if we tried to solve for y = y(x) and z = z(x) then
we’ll get no help from the implicit function theorem as the matrix
∂G 1 1
= .
∂(y, z) 1 1
is not invertible. Geometrically, we can understand these results from noting that G(x, y, z) = (2, 1)
is the intersection of the plane x + y + z = 2 and y + z = 1. Substituting y + z = 1 into x + y + z = 2
yields x + 1 = 2 hence x = 1 on the line of intersection. We can hardly use x as a free variable for
the solution when the problem fixes x from the outset.
The method I just used to analyze the equations in the preceding example was a bit adhoc. In
linear algebra we do much better for systems of linear equations. A procedure called Gaussian
elimination naturally reduces a system of equations to a form in which it is manifestly obvious how
5 ∂(x,y)
this notation should not be confused with ∂(u,v) which is used to denote a particular determinant associated
with coordinate change of integrals, or pull-back of a differential form as explained on page 100 of H.M Edward’s
Advanced Calculus: A differential Forms Approach, we should discuss it in a later chapter.
3.2. IMPLICIT FUNCTION THEOREM 65
to eliminate redundant variables in terms of a minimal set of basic free variables. The ”y” of the
implicit function proof discussions plays the role of the so-called pivotal variables whereas the
”x” plays the role of the remaining free variables. These variables are generally intermingled in
the list of total variables so to reproduce the pattern assumed for the implicit function theorem
we would need to relabel variables from the outset of a calculation. In the following example, I
show how reordering the variables allows us to solve for various pairs. In short, put the dependent
variable first and the independent variables second so the Gaussian elimination shows the solution
with minimal effort. Here’s how:
We can immediately read from the result above that x, y can be taken to depend on u, v via the
formulas:
x = −u + 2v + 7, y = 2u − 3v − 11
On the other hand, if we order the variables (u, v, x, y) then Gaussian elimination gives:
−1 0 3 2 −1 1 0 −3 −2 1
rref =
0 −1 2 1 3 0 1 −2 −1 −3
u = 3x + 2y + 1, v = 2x + y − 3.
I could solve the problem below in the efficient style above, but I will instead follow the method in
which we discussed in the paragraphs surrounding Equation 3.1. In contrast to the general case,
because the problem is linear the solution of Equation 3.1 is also a solution of the actual problem.
Let us solve G(x, y, z, a, b) = (24, 30, 42) for x(a, b), y(a, b), z(a, b) by the method of Equation 3.1.
I’ll omit the point-dependence of the Jacobian since it clearly has none.
24 x−1
∂G ∂G a−4
G(x, y, z, a, b) = 30 +
y−2 +
∂(x, y, z) ∂(a, b) b − 5
42 z−3
Let me make the notational chimera above explicit:
24 1 1 1 x−1 2 2
a−4
G(x, y, z, a, b) = 30 + 1 0 2
y−2 + 2 3
b−5
42 3 2 1 z−3 3 4
To solve G(x, y, z, a, b) = (24, 30, 42) for (x, y, z) we may use the expression above. After a little
calculation one finds:
−1
1 1 1 −4 1 2
1
1 0 2 = 5 −2 −1
3
3 2 1 2 1 −1
The constant term cancels and we find:
x−1 −4 1 2 2 2
y − 2 = − 5 −2 −1 2 3 a − 4
1
3 b−5
z−3 2 1 −1 3 4
Multiplying the matrices gives:
x−1 0 3 0 −1 5−b
y − 2 = − 1 3 0 a − 4 = −1 0 a − 4 = 4 − a
3 b−5 b−5
z−3 3 3 −1 −1 9−a−b
Therefore,
x = 6 − b, y = 6 − a, z = 12 − a − b.
Is it possible to solve for any triple of the variables x, y, z, a, b for the given system? In fact,
no. Let me explain by linear algebra. We can calculate: the augmented coefficient matrix for
G(x, y, z, a, b) = (24, 30, 42) Gaussian eliminates as follows:
1 1 1 2 2 24 1 0 0 0 1 6
rref 1 0 2 2 3 30 = 0 1 0 1 0 6 .
3 2 1 3 4 42 0 0 1 1 1 12
First, note this is consistent with the answer we derived above. Second, examine the columns of
rref [G0 ]. You can ignore the 6-th column in the interest of this thought extending to nonlinear
systems. The question of the suitability of a triple amounts to the invertibility of the submatrix of
G0 which corresponds to the triple. Examine:
1 1 2 1 1 2
∂G ∂G
= 0 2 2 , = 1 2 3
∂(y, z, a) ∂(x, z, b)
2 1 3 3 1 4
both of these are clearly singular since the third column is the sum of the first two columns. Alter-
natively, you can calculate the determinant of each of the matrices above is zero. In contrast,
1 2 2
∂G
= 2 2 2
∂(z, a, b)
1 3 4
3.2. IMPLICIT FUNCTION THEOREM 67
is non-singular. How to I know there is no linear dependence? Well, we could calculate the de-
terminant is 1(8 − 6) − 2(8 − 2) + 2(6 − 2) = −2 6= 0. Or, we could examine the row reduction
above. The column correspondance property6 states that linear dependences amongst columns of a
matrix are preserved under row reduction. This means we can easily deduce dependence (if there
is any) from the reduced matrix. Observe that column 4 is clearly the sum of columns 2 and 3.
Likewise, column 5 is the sum of columns 1 and 3. On the other hand, columns 3, 4, 5 admit no
linear dependence. In general, more calculation would be required to ”see” the independence of the
far right columns. One reorders the columns and performs a new reduction to ascertain dependence.
No such calculation is needed here since the problem is not that complicated.
I find calculating the determinant of sub-Jacobian matrices is the simplest way for most students
to quickly understand. I’ll showcase this method in a series of examples attached to a later section.
I have made use of some matrix theory in this section. If you didn’t learn it in linear (or haven’t
taken linear yet) it’s worth learning. These are nice tools to keep for later problems in life.
6
I like to call it the CCP in my linear notes
68 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
dxm dyn
Example 3.3.1. Let’s return to a common calculus III problem. Suppose F (x, y, z) = k for some
constant k. Find partial derivatives of x, y or z with repsect to the remaining variables.
Solution: I’ll use the method of differentials once more:
dF = Fx dx + Fy dy + Fz dz = 0
We can solve for dx, dy or dz provided Fx , Fy or Fz is nonzero respective and these differential
expressions reveal various partial derivatives of interest:
Fy Fz ∂x Fy ∂x Fz
dx = − dy − dz ⇒ =− & =−
Fx Fx ∂y Fx ∂z Fx
Fx Fz ∂y Fx ∂y Fz
dy = − dx − dz ⇒ =− & =−
Fy Fy ∂x Fy ∂z Fy
Fx Fy ∂z Fx ∂z Fy
dz = − dx − dy ⇒ =− & =−
Fz Fz ∂x Fz ∂y Fz
In each case above, the implicit function theorem allows us to solve for one variable in terms of the
remaining two. If the partial derivative of F in the denominator are zero then the implicit function
theorem does not apply and other thoughts are required. Often calculus text give the following as a
homework problem:
∂x ∂y ∂z Fy Fz Fx
=− = −1.
∂y ∂z ∂x Fx Fy Fz
In the equation above we have x appear as a dependent variable on y, z and also as an independent
variable for the dependent variable z. These mixed expressions are actually of interest to engineering
and physics. The less mbiguous notation below helps better handle such expressions:
∂x ∂y ∂z
= −1.
∂y z ∂z x ∂x y
In each part of the expression we have clearly denoted which variables are taken to depend on the
others and in turn what sort of partial derivative we mean to indicate. Partial derivatives are not
taken alone, they must be done in concert with an understanding of the totality of the indpendent
variables for the problem. We hold all the remaining indpendent variables fixed as we take a partial
derivative.
The explicit independent variable notation is more important for problems where we can choose
more than one set of indpendent variables for a given dependent variables. In the example that
follows we study w =w(x, y) but we could just
as well consider w ∂w= w(x, z). Generally it will not
∂w ∂w
be the case that ∂x y is the same as ∂x z . In calculation of ∂x y we hold y constant as we
vary x whereas in ∂w
∂x z we hold z constant as we vary x. There is no reason these ought to be the
same8 .
dx + dy + dz + dw = 0
(2x − 2yz)dx − 2xzdy − 2xydz + 3w2 dw = 0
8
a good exercise would be to do the example over but instead aim to calculate partial derivatives for y, w with
respect to independent variables x, z
70 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
We can solve for (dz, dw). In this calculation we can treat the differentials as formal variables.
dz + dw = −dx − dy
−2xydz + 3w2 dw = −(2x − 2yz)dx + 2xzdy
I find matrix notation is often helpful,
1 1 dz −dx − dy
=
−2xy 3w2 dw −(2x − 2yz)dx + 2xzdy
Use Kramer’s rule, multiplication by inverse, substitution, adding/subtracting equations etc... what-
ever technique of solving linear equations you prefer. Our goal is to solve for dz and dw in terms
of dx and dy. I’ll use Kramer’s rule this time:
−dx − dy 1
det
−(2x − 2yz)dx + 2xzdy 3w2 3w2 (−dx − dy) + (2x − 2yz)dx − 2xzdy
dz = =
3w2 + 2xy
1 1
det
−2xy 3w2
Collecting terms,
−3w2 + 2x − 2yz −3w2 − 2xz
dz = dx + dy
3w2 + 2xy 3w2 + 2xy
From the expression above we can read various implicit derivatives,
−3w2 + 2x − 2yz −3w2 − 2xz
∂z ∂z
= & =
∂x y 3w2 + 2xy ∂y x 3w2 + 2xy
You should ask: where did we use the implicit function theorem in the preceding example? Notice
our underlying hope is that we can solve for z = z(x, y)
and w = w(x, y). The implicit function the-
∂G 1 1
orem states this is possible precisely when ∂(z,w) = is non singular. Interestingly
−2xy 3w2
this is the same matrix we must consider to isolate dz and dw. The calculations of the example
1 1
are only meaningful if the det 6= 0. In such a case the implicit function theorem
−2xy 3w2
applies and it is reasonable to suppose z, w can be written as functions of x, y.
3.3. IMPLICIT DIFFERENTIATION 71
Example 3.3.4. Suppose E = pv + t2 then dE = vdp + pdv + 2tdt. In this example the dependent
variable is E whereas the independent variables are p, v and t.
Example 3.3.5. Problem: what are ∂F/∂x and ∂F/∂y if we know that F = F (x, y) and
dF = (x2 + y)dx − cos(xy)dy.
Solution: if F = F (x, y) then the total differential has the form dF = Fx dx + Fy dy. We simply
compare the general form to the given dF = (x2 + y)dx − cos(xy)dy to obtain:
∂F ∂F
= x2 + y, = − cos(xy).
∂x ∂y
Example 3.3.6. Suppose w = xyz then dw = yzdx + xzdy + xydz. On the other hand, we can
solve for z = z(x, y, w)
w w w 1
z= ⇒ dz = − 2
dx − 2 dy + dw. ?
xy x y xy xy
If we solve dw = yzdx + xzdy + xydz directly for dz we obtain:
z z 1
dz = − dx − dy + dw ? ?.
x y xy
w xyz z w xyz
Are ? and ?? consistent? Well, yes. Note x2 y
= x2 y
= x and xy 2
= xy 2
= yz .
Which variables are independent/dependent in the example above? It depends. In this initial
portion of the example we treated x, y, z as independent whereas w was dependent. But, in the
last half we treated x, y, w as independent and z was the dependent variable. Consider this, if I
∂z
ask you what the value of ∂x is in the example above then this question is ambiguous!
∂z ∂z −z
=0 verses =
|∂x{z } |∂x {z x}
z indpendent of x z depends on x
Obviously this sort of ambiguity is rather unpleasant. A natural solution to this trouble is simply
to write a bit more when variables are used in multiple contexts. In particular,
∂z ∂z −z
=0 is different than = .
∂x y,z ∂x y,w x
| {z } | {z }
means x,y,z independent means x,y,w independent
9
I invite the reader to verify the notation ”defined” in this section is in fact totally sympatico with our previous
definitions
72 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
The key concept is that all the other independent variables are held fixed as an indpendent variable
∂z
is partial differentiated. Holding y, z fixed as x varies means z does not change hence ∂x y,z
= 0.
On the other hand, if we hold y, w fixed as x varies then the change in z need not be trivial;
∂z −z
∂x y,w = x . Let me expand on how this notation interfaces with the total differential.
Definition 3.3.7.
If w, x, y, z are variables then
∂w ∂w ∂w
dw = dx + dy + dz.
∂x y,z ∂y x,z ∂z x,y
Alternatively,
∂x ∂x ∂x
dx = dw + dy + dz.
∂w y,z ∂y w,z ∂z w,y
The larger idea here is that we can identify partial derivatives from the coefficients in equations of
differentials. I’d say a differential equation but you might get the wrong idea... Incidentally, there
is a whole theory of solving differential equations by clever use of differentials, I have books if you
are interested.
Example 3.3.8. Suppose w = x + y + z and x + y = wz then calculate ∂w ∂w
∂x y and ∂x z . Notice we
must choose dependent and independent variables to make sense of partial derivatives in question.
1. suppose w, z both depend on x, y. Calculate,
∂w ∂ ∂x ∂y ∂z ∂z
= (x + y + z) = + + =1+0+ ?
∂x y ∂x y ∂x y ∂x y ∂x y ∂x y
Therefore,
∂w
= 2.
∂x z
3.4. THE CONSTANT RANK THEOREM 73
I hope you can begin to see how the game is played. Basically the example above generalizes the
idea of implicit differentiation to several equations of many variables. This is actually a pretty
important type of calculation for engineering. The study of thermodynamics is full of variables
which are intermittently used as either dependent or independent variables. The so-called equation
of state can be given in terms of about a dozen distinct sets of state variables.
Example 3.3.9. The ideal gas law states that for a fixed number of particles n the pressure P ,
volume V and temperature T are related by P V = nRT where R is a constant. Calculate,
∂P ∂ nRT nRT
= =− 2 ,
∂V T ∂V V T V
∂V ∂ nRT nR
= = ,
∂T P ∂T P T P
∂T ∂ PV V
= = .
∂P V ∂P nR T nR
∂P ∂V ∂T
You might expect that ∂V T ∂T P ∂P V = 1. Is it true?
∂P ∂V ∂T nRT nR V −nRT
=− 2
· · = = −1.
∂V T ∂T P ∂P V V P nR PV
∂x ∂y Fy Fx
= · =1
∂y ∂x Fx Fy
for (x, y) such that Fx 6= 0 and Fy 6= 0. The condition Fx 6= 0 suggests we can solve for y = y(x)
whereas the condition Fy 6= 0 suggests we can solve for x = x(y).
Remark 3.4.1.
I have put remarks about the rank of the derivative in red for the examples below.
10
in 2017 it seems the situation has not changed, perhaps we’ll find it together this semester
74 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
Example 3.4.2. Let f (t) = (t, t2 , t3 ) then f 0 (t) = (1, 2t, 3t2 ). In this case we have
1
f 0 (t) = [dft ] = 2t
3t2
The Jacobian here is a single column vector. It has rank 1 provided the vector is nonzero. We
see that f 0 (t) 6= (0, 0, 0) for all t ∈ R. This corresponds to the fact that this space curve has a
well-defined tangent line for each point on the path.
Example 3.4.3. Let f (~x, ~y ) = ~x · ~y be a mapping from R3 × R3 → R. I’ll denote the coordinates
in the domain by (x1 , x2 , x3 , y1 , y2 , y3 ) thus f (~x, ~y ) = x1 y1 + x2 y2 + x3 y3 . Calculate,
The Jacobian here is a single row vector. It has rank 6 provided all entries of the input vectors are
nonzero.
Example 3.4.4. Let f (~x, ~y ) = ~x · ~y be a mapping fromP Rn × Rn → R. I’ll denote the coordinates
in the domain by (x1 , . . . , xn , y1 , . . . , yn ) thus f (~x, ~y ) = ni=1 xi yi . Calculate,
n
X n
X n
X
∂ ∂xi
xj xi yi = xj yi = δij yi = yj
i=1 i=1 i=1
Likewise,
n
X n
X n
X
∂
yj x i yi = xi ∂y
yj
i
= xi δij = xj
i=1 i=1 i=1
The Jacobian here is a single row vector. It has rank 2n provided all entries of the input vectors
are nonzero.
Remember these are actually column vectors in my sneaky notation; (v1 , . . . , vn ) = [v1 , . . . , vn ]T .
This means the derivative or Jacobian matrix of F at (x, y, z) is
yz xz xy
F 0 (x, y, z) = [dF(x,y,z) ] = 0 1 0
0 0 1
Note, rank(F 0 (x, y, z)) = 3 for all (x, y, z) ∈ R3 such that y, z 6= 0. There are a variety of ways to
see that claim, one way is to observe det[F 0 (x, y, z)] = yz and this determinant is nonzero so long
as neither y nor z is zero. In linear algebra we learn that a square matrix is invertible iff it has
nonzero determinant iff it has linearly indpendent column vectors.
3.4. THE CONSTANT RANK THEOREM 75
The maximum rank for F 0 is 2 at a particular point (x, y, z) because there are at most two linearly
independent vectors in R2 . You can consider the three square submatrices to analyze the rank for
a given point. If any one of these is nonzero then the rank (dimension of the column space) is two.
2x 0 2x 2z 0 2z
M1 = M2 = M3 =
0 z 0 y z y
We’ll need either det(M1 ) = 2xz 6= 0 or det(M2 ) = 2xy 6= 0 or det(M3 ) = −2z 2 6= 0. I believe
the only point where all three of these fail to be true simulataneously is when x = y = z = 0. This
mapping has maximal rank at all points except the origin.
The maximum rank is again 2, this time because we only have two columns. The rank will be two
if the columns are not linearly dependent. We can analyze the question of rank a number of ways
but I find determinants of submatrices a comforting tool in these sort of questions. If the columns
are linearly dependent then all three sub-square-matrices of F 0 will be zero. Conversely, if even one
of them is nonvanishing then it follows the columns must be linearly independent. The submatrices
for this problem are:
2x 2y 2x 2y y x
M1 = M2 = M3 =
y x 1 1 1 1
You can see det(M1 ) = 2(x2 − y 2 ), det(M2 ) = 2(x − y) and det(M3 ) = y − x. Apparently we have
rank(F 0 (x, y, z)) = 2 for all (x, y) ∈ R2 with y 6= x. In retrospect this is not surprising.
p
R2 − x2 − y 2 ) for a constant R. We calculate,
Example 3.4.8. Let F (x, y) = (x, y,
−y
p
−x
∇ R − x − y = √ 2 2 2, √ 2 2 2
2 2 2
R −x −y R −x −y
2 2 2
p that we need R − x − y > 0 for the
This matrix clearly has rank 2 where is is well-defined. Note
derivative to exist. Moreover, we could define G(y, z) = ( R2 − y 2 − z 2 , y, z) and calculate,
1 0
−y −z
G0 (y, z) = √R2 −y2 −z 2 √R2 −y2 −z 2 .
0 1
Observe that G0 (y, z) exists when R2 − y 2 − z 2 > 0. Geometrically, F parametrizes the sphere
above the equator at z = 0 whereas G parametrizes the right-half of the sphere with x > 0. These
parametrizations overlap in the first octant where both x and z are positive. In particular, dom(F 0 )∩
dom(G0 ) = {(x, y) ∈ R2 | x, y > 0 and x2 + y 2 < R2 }
p
Example 3.4.9. Let F (x, y, z) = (x, y, z, R2 − x2 − y 2 − z 2 ) for a constant R. We calculate,
−y
p
2 2 2 2 √ −x √ √ −z
∇ R −x −y −z = 2 2 2 2
, 2 2 2 2
, 2 2 2 2
R −x −y −z R −x −y −z R −x −y −z
This matrix clearly has rank 3 where is is well-defined. Note that we need R2 −x2 −y 2 −z 2 > 0 for the
derivative to exist. This mapping gives us a parametrization of the 3-sphere x2 + y 2 + z 2 + w2 = R2
for w > 0. (drawing this is a little trickier)
This matrix fails to have rank 3 if x, y or z are zero. In other words, f 0 (x, y, z) has rank 3 in
R3 provided we are at a point which is not on some coordinate plane. (the coordinate planes are
x = 0, y = 0 and z = 0 for the yz, zx and xy coordinate planes respective)
Example 3.4.13. Let X(u, v) = (x, y, z) where x, y, z denote functions of u, v and I prefer to omit
the explicit depedendence to reduce clutter in the equations to follow.
∂X ∂X
= Xu = (xu , yu , zu ) and = Xv = (xv , yv , zv )
∂u ∂v
Then the Jacobian is the 3 × 2 matrix
xu xv
dX(u,v) = yu yv
zu zv
The matrix dX(u,v) has rank 2 if at least one of the determinants below is nonzero,
xu xv xu xv yu yv
det det det
yu yv zu zv zu z v
78 CHAPTER 3. INVERSE AND IMPLICIT FUNCTION THEOREMS
Chapter 4
In this chapter we describe spaces inside Rn which are k-dimensional 1 . Technically, to make this
precise we would need to study manifolds with boundary. Careful discussion of manifolds with
boundary in euclidean space can be found in Munkres Analysis on Manifolds. In the interest of
focusing on examples, I’ll be a bit fuzzy about the defintion of a k-dimensional subspace S of
euclidean space. This much we can say: there are two ways to envision the geometry of S:
(2.) Implicitly: provide a level function G : Rk × Rp → Rp such that S = G−1 {c} = S. This
viewpoint casts S as points in x ∈ Rk × Rp for which G(x) = k. The cannonical example:
The cannonical examples of (1.) and (2.) are both the x1 . . . xk -coordinate plane embedded in Rn .
Just to take it down a notch. If n = 3 then we could look at the xy-plane in either view as follows:
Which viewpoint should we adopt? What is the dimension of a given space S? How should we
find tangent space to S? How should we find the normal space to S? These are the questions we
set-out to answer in this chapter.
Orthogonal complements help us to understand how all of this fits together. This is possible since we
deal with embedded manifolds for which the euclidean dot-product of Rn is available to sort out the
geometry. Finally, we use this geometry and a few simple lemmas to justify the method of Lagrange
multipliers. Lagrange’s technique paired with the theory of multivariate Taylor polynomials form
the basis for analyzing extrema for multivariate functions. In this chapter we deal with the question
of extrema on the edges of a set. The second half of the story is found in the next chapter where
we deal with the interior points via the theory of quadratic forms applied to the second-order
approximation to a function of several variables.
1
I’ll try to stick with this notation for this chapter, n ≥ k and n = p + k
79
80 CHAPTER 4. TWO VIEWS OF MANIFOLDS IN RN
Therefore, if γ(0) = p then γ(0) = (px , h(px )). Differentiate, use the chain-rule in the second factor
to obtain:
γ 0 (t) = (γx0 (t), h0 (γx (t))γx0 (t)).
We find that the tangent vector to p ∈ S of γ has a rather special form which was forced on us by
the implicit function theorem:
γ 0 (0) = (γx0 (0), h0 (px )γx0 (0)).
Or to cut through the notation a bit, if γ 0 (0) = v = (vx , vy ) then v = (vx , h0 (px )vx ). The second
component of the vector is not free of the first, it essentially redundant. This makes us suspect
that the tangent space to S at p is k-dimensional.
Theorem 4.2.1.
ψ(t) = Φ(Φ−1 (p) + tw) = Φ(px + tw) = (px + tw, h(px + tw))
is a curve from R to U ⊆ S such that ψ(0) = (px , h(px )) = (px , py ) = p and using the chain rule on
the final form of ψ(t):
ψ 0 (0) = (w, h0 (px )w).
The construction above shows that any vector of the form (vx , h0 (px )vx ) is the tangent vector of a
particular differentiable curve in the level set (differentiability of ψ follows from the differentiability
of h and the other maps which we used to construct ψ). In particular we can apply this to the
case w = v1x + v2x and we find γ(t) = Φ(Φ−1 (p) + t(v1x + v2x )) has γ 0 (0) = v1 + v2 and γ(0) = p.
82 CHAPTER 4. TWO VIEWS OF MANIFOLDS IN RN
Likewise, apply the construction to the case w = cv1x to write β(t) = Φ(Φ−1 (p) + t(cv1x )) with
β 0 (0) = cv1 and β(0) = p.
The idea of the proof is encapsulated in the picture below. This idea of mapping lines in a flat
domain to obtain standard curves in a curved domain is an idea which plays over and over as you
study manifold theory. The particular redundancy of the x and y sub-vectors is special to the
discussion level-sets, however anytime we have a local parametrization we’ll be able to construct
curves with tangents of our choosing by essentially the same construction. In fact, there are infinitely
many curves which produce a particular tangent vector in the tangent space of a manifold.
Theorem 4.2.1 shows that the definition given below is logical. In particular, it is not at all obvious
that the sum of two tangent vectors ought to again be a tangent vector. However, that is just what
the Theorem 4.2.1 told us for level-sets2 .
Definition 4.2.2.
Moreover, we define (i.) addition and (ii.) scalar multiplication of vectors by the rules
2
technically, there is another logical gap which I currently ignore. I wonder if you can find it.
3
In truth, as you continue to study manifold theory you’ll find at least three seemingly distinct objects which are
all called ”tangent vectors”; equivalence classes of curves, derivations, contravariant tensors.
4.2. TANGENTS AND NORMALS TO A LEVEL SET 83
We could set out to calculate tangent spaces in view of the definition above, but we are actually
interested in more than just the tangent space for a level-set. In particular. we want a concrete
description of all the vectors which are not in the tangent space.
Definition 4.2.3.
(p, v) · (p, w) = v · w.
The length of a vector (p, v) is naturally defined by ||(p, v)|| = ||v||. Moreover, we say two
vectors (p, v), (p, w) ∈ Vp are orthogonal iff v · w = 0. Given a set of vectors R ⊆ Vp we
define the orthogonal complement by
In particular, suppose for t = 0 we have γ(0) = p and v = γ 0 (0) which makes (p, v) ∈ Tp S with
G0 (p)v = 0.
Recall G : Rk × Rp → Rp has an p × n derivative matrix where the j-th row is the gradient vector
of the j-th component function. The equation G0 (p)v = 0 gives us p-independent equations as
we examine it componentwise. In particular, it reveals that (p, v) is orthogonal to ∇Gj (p) for
j = 1, 2, . . . , p. We have derived the following theorem:
Theorem 4.2.4.
It’s time to do some counting. Observe that the mapping φ : Rk → Tp S defined by φ(v) = (p, v)
is an isomorphism of vector spaces hence dim(Tp S) = k. But, by the same isomorphism we can
see that Vp = φ(Rk × Rp ) hence dim(Vp ) = p + k. In linear algebra we learn that if we have a
k-dimensional subspace W of an n-dimensional vector space V then the orthogonal complement
W ⊥ is a subspace of V with codimension k. The term codimension is used to indicate a loss
84 CHAPTER 4. TWO VIEWS OF MANIFOLDS IN RN
of dimension from the ambient space, in particular dim(W ⊥ ) = n − k. We should note that the
direct sum of W and W ⊥ covers the whole space; W ⊕ W ⊥ = V . In the case of the tangent space,
the codimension of Tp S ≤ Vp is found to be p + k − k = p. Thus dim(Tp S)⊥ = p. Any basis for
this space must consist of p linearly independent vectors which are all orthogonal to the tangent
space. Naturally, the subset of vectors {(p, (∇Gj (p))T )pj=1 forms just such a basis since it is given
to be linearly independent by the rank(G0 (p)) = p condition. It follows that:
many wiser authors wouldn’t bother. The comments above are primarily about notation. Certainly
hiding these details would make this section prettier, however, would it make it better? Finally, I
once more refer the reader to linear algebra where we learn that (Row(A))⊥ = N ull(AT ). Let me
walk you through the proof: let A ∈ R m×n . Observe v ∈ N ull(AT ) iff AT v = 0 for v ∈ Rm iff
v T A = 0 iff v T colj (A) = 0 for j = 1, 2, . . . , n iff v · colj (A) = 0 for j = 1, 2, . . . , n iff v ∈ Col(A)⊥ .
Another useful identity for the ”perp” is that (A⊥ )⊥ = A. With those two gems in mind consider
that:
(Tp S)⊥ ≈ Row(G0 (p)) ⇒ Tp S ≈ Row(G0 (p))⊥ = N ull(G0 (p)T )
Let me once more replace ≈ by a more tedious, but explicit, procedure:
Theorem 4.2.5.
Example 4.2.6. Let g : R4 → R be defined by g(x, y, z, t) = t+x2 +y 2 −2z 2 note that g(x, y, z, t) = 0
gives a three dimensional subset of R4 , let’s call it M . Notice ∇g =< 2x, 2y, −4z, 1 > is nonzero
everywhere. Let’s focus on the point (2, 2, 1, 0) note that g(2, 2, 1, 0) = 0 thus the point is on M .
The tangent plane at (2, 2, 1, 0) is formed from the union of all tangent vectors to g = 0 at the
point (2, 2, 1, 0). To find the equation of the tangent plane we suppose γ : R → M is a curve with
γ 0 6= 0 and γ(0) = (2, 2, 1, 0). By assumption g(γ(s)) = 0 since γ(s) ∈ M for all s ∈ R. Define
γ 0 (0) =< a, b, c, d >, we find a condition from the chain-rule applied to g ◦ γ = 0 at s = 0,
d
g ◦ γ(s) = ∇g (γ(s)) · γ 0 (s) = 0
⇒ ∇g(2, 2, 1, 0) · < a, b, c, d >= 0
ds
⇒ < 4, 4, −4, 1 > · < a, b, c, d >= 0
⇒ 4a + 4b − 4c + d = 0
Thus the equation of the tangent plane is 4(x − 2) + 4(y − 2) − 4(z − 1) + t = 0. In invite the
reader to find a vector in the tangent plane and check it is orthogonal to ∇g(2, 2, 1, 0). However,
this should not be surprising, the condition the chain rule just gave us is just the statement that
< a, b, c, d >∈ N ull(∇g(2, 2, 1, 0)T ) and that is precisely the set of vector orthogonal to ∇g(2, 2, 1, 0).
Example 4.2.7. Let G : R4 → R2 be defined by G(x, y, z, t) = (z + x2 + y 2 − 2, z + y 2 + t2 − 2). In
this case G(x, y, z, t) = (0, 0) gives a two-dimensional manifold in R4 let’s call it M . Notice that
G1 = 0 gives z + x2 + y 2 = 2 and G2 = 0 gives z + y 2 + t2 = 2 thus G = 0 gives the intersection of
both of these three dimensional manifolds in R4 (no I can’t ”see” it either). Note,
∇G1 =< 2x, 2y, 1, 0 > ∇G2 =< 0, 2y, 1, 2t >
It turns out that the inverse mapping theorem says G = 0 describes a manifold of dimension 2 if
the gradient vectors above form a linearly independent set of vectors. For the example considered
here the gradient vectors are linearly dependent at the origin since ∇G1 (0) = ∇G2 (0) = (0, 0, 1, 0).
In fact, these gradient vectors are colinear along along the plane x = t = 0 since ∇G1 (0, y, z, 0) =
∇G2 (0, y, z, 0) =< 0, 2y, 1, 0 >. We again seek to contrast the tangent plane and its normal at
some particular point. Choose (1, 1, 0, 1) which is in M since G(1, 1, 0, 1) = (0 + 1 + 1 − 2, 0 +
1 + 1 − 2) = (0, 0). Suppose that γ : R → M is a path in M which has γ(0) = (1, 1, 0, 1) whereas
γ 0 (0) =< a, b, c, d >. Note that ∇G1 (1, 1, 0, 1) =< 2, 2, 1, 0 > and ∇G2 (1, 1, 0, 1) =< 0, 2, 1, 1 >.
Applying the chain rule to both G1 and G2 yields:
(G1 ◦ γ)0 (0) = ∇G1 (γ(0))· < a, b, c, d >= 0 ⇒ < 2, 2, 1, 0 > · < a, b, c, d >= 0
0
(G2 ◦ γ) (0) = ∇G2 (γ(0))· < a, b, c, d >= 0 ⇒ < 0, 2, 1, 1 > · < a, b, c, d >= 0
This is two equations and four unknowns, we can solve it and write the vector in terms of two free
variables correspondant to the fact the tangent space is two-dimensional. Perhaps it’s easier to use
matrix techiques to organize the calculation:
a
2 2 1 0 b = 0
0 2 1 1 c 0
d
2 2 1 0 1 0 0 −1/2
We calculate, rref = . It’s natural to chose c, d as free vari-
0 2 1 1 0 1 1/2 1/2
ables then we can read that a = d/2 and b = −c/2 − d/2 hence
c
< a, b, c, d >=< d/2, −c/2 − d/2, c, d >= 2 < 0, −1, 2, 0 > + d2 < 1, −1, 0, 2 >
86 CHAPTER 4. TWO VIEWS OF MANIFOLDS IN RN
We can see a basis for the tangent space. In fact, I can give parametric equations for the tangent
space as follows:
Not surprisingly the basis vectors of the tangent space are perpendicular to the gradient vectors
∇G1 (1, 1, 0, 1) =< 2, 2, 1, 0 > and ∇G2 (1, 1, 0, 1) =< 0, 2, 1, 1 > which span the normal plane
Np to the tangent plane Tp at p = (1, 1, 0, 1). We find that Tp is orthogonal to Np . In summary
Tp⊥ = Np and Tp ⊕ Np = R4 . This is just a fancy way of saying that the normal and the tangent
plane only intersect at zero and they together span the entire ambient space.
If p = (a, b, ab) ∈ S then Tp S = {(a, b, ab)} × span{(1, 0, b), (0, 1, a)}. The normal space is found
from N ull(R0 (a, b)T ). A short calculation shows that
1 0 b
N ull = span{(−b, −a, 1)}
0 1 a
4.4. SUMMARY OF TANGENT AND NORMAL SPACES 87
As a quick check, note (1, 0, b) • (−b, −a, 1) = 0 and (0, 1, a) • (−b, −a, 1) = 0. We conclude, for
p = (a, b, ab) the normal space is simply:
In the previous example, we could rightly call Tp S the tangent plane at p and Np S the normal line
through p. Moreover, we could have used three-dimensional vector analysis to find the normal line
from the cross-product. However, that will not be possible in what follows:
1 0
If p = (1, 9, 3, 1) ∈ S then Tp S = {(1, 9, 3, 1)} × span{(2, 0, 0, 1), (0, 6, 3, 0)}. The normal space is
found from N ull(R0 (1, 3)T ). A short calculation shows that
2 0 0 1
N ull = span{(−1, 0, 0, 2), (0, −3, 6, 0)}
0 6 3 0
(II.) the tangent space at xo for the k-dimensional set S is found from:
(a) attaching the span of the vectors {∂1 R(to ), . . . , ∂k R(to )} to xo = R(to ) ∈ S.
(b) attaching the Row(F 0 (xo ))⊥ to xo ∈ S.
2. (f ◦ γ)0 (0) = 0
Let us expand a bit on both of these conditions:
1. G0 (xo )γ 0 (0) = 0
2. f 0 (xo )γ 0 (0) = 0
The first of these conditions places γ 0 (0) ∈ Txo S but then the second condition says that f 0 (xo ) =
(∇f )(xo )T is orthogonal to γ 0 (0) hence (∇f )(xo )T ∈ Nxo . Now, recall from the last section that
the gradient vectors of the component functions to G span the normal space, this means any vector
in Nxo can be written as a linear combination of the gradient vectors. In particular, this means
there exist constants λ1 , λ2 , . . . , λp such that
2. identify your objective function and write all constraints as level surfaces.
The obvious gap in the method is the supposition that an extrema exists for the restriction f |S .
Well examine a few examples before I reveal a sufficient condition. We’ll also see how absence of
that sufficient condition does allow the method to fail.
4.5. METHOD OF LAGRANGE MULITPLIERS 89
Example 4.5.1. Suppose we wish to find maximum and minimum distance to the origin for points
on the curve x2 − y 2 = 1. In this case we can use the distance-squared function as our objective
f (x, y) = x2 + y 2 and the single constraint function is g(x, y) = x2 − y 2 . Observe that ∇f =<
2x, 2y > whereas ∇g =< 2x, −2y >. We seek solutions of ∇f = λ∇g which gives us < 2x, 2y >=
λ < 2x, −2y >. Hence 2x = 2λx and 2y = −2λy. We must solve these equations subject to the
condition x2 − y 2 = 1. Observe that x = 0 is not a solution since 0 − y 2 = 1 has no real solution.
On the other hand, y = 0 does fit the constraint and x2 − 0 = 1 has solutions x = ±1. Consider
then
2x = 2λx and 2y = −2λy ⇒ x(1 − λ) = 0 and y(1 + λ) = 0
Since x 6= 0 on the constraint curve it follows that 1 − λ = 0 hence λ = 1 and we learn that
y(1 + 1) = 0 hence y = 0. Consequently, (1, 0 and (−1, 0) are the two point where we expect to find
extreme-values of f . In this case, the method of Lagrange multipliers served it’s purpose, as you
can see in the graph. Below the green curves are level curves of the objective function whereas the
particular red curve is the given constraint curve.
The picture below is a screen-shot of the Java applet created by David Lippman and Konrad
Polthier to explore 2D and 3D graphs. Especially nice is the feature of adding vector fields to given
objects, many other plotters require much more effort for similar visualization. See more at the
website: http://dlippman.imathas.com/g1/GrapherLaunch.html.
Note how the gradient vectors to the objective function and constraint function line-up nicely at
those points.
In the previous example, we actually got lucky. There are examples of this sort where we could get
false maxima due to the nature of the constraint function.
90 CHAPTER 4. TWO VIEWS OF MANIFOLDS IN RN
Example 4.5.2. Suppose we wish to find the points on the unit circle g(x, y) = x2 + y 2 = 1 which
give extreme values for the objective function f (x, y) = x2 − y 2 . Apply the method of Lagrange
multipliers and seek solutions to ∇f = λ∇g:
We must solve 2x = 2xλ which is better cast as (1 − λ)x = 0 and −2y = 2λy which is nicely written
as (1 + λ)y = 0. On the basis of these equations alone we have several options:
1. if λ = 1 then (1 + 1)y = 0 hence y = 0
When constrained to the unit circle we find the objective function attains a maximum value of 1 at
the points (1, 0) and (−1, 0) and a minimum value of −1 at (0, 1) and (0, −1). Let’s illustrate the
answers as well as a few non-answers to get perspective. Below the green curves are level curves of
the objective function whereas the particular red curve is the given constraint curve.
The success of the last example was no accident. The fact that the constraint curve was a circle
which is a closed and bounded subset of R2 means that is is a compact subset of R2 . A well-known
theorem of analysis states that any real-valued continuous function on a compact domain attains
both maximum and minimum values. The objective function is continuous and the domain is
compact hence the theorem applies and the method of Lagrange multipliers succeeds. In contrast,
the constraint curve of the preceding example was a hyperbola which is not compact. We have
no assurance of the existence of any extrema. Indeed, we only found minima but no maxima in
Example 4.5.1.
The generality of the method of Lagrange multipliers is naturally limited to smooth constraint
curves and smooth objective functions. We must insist the gradient vectors exist at all points of
4.5. METHOD OF LAGRANGE MULITPLIERS 91
inquiry. Otherwise, the method breaks down. If we had a constraint curve which has sharp corners
then the method of Lagrange breaks down at those corners. In addition, if there are points of dis-
continuity in the constraint then the method need not apply. This is not terribly surprising, even in
calculus I the main attack to analyze extrema of function on R assumed continuity, differentiability
and sometimes twice differentiability. Points of discontinuity require special attention in whatever
context you meet them.
At this point it is doubtless the case that some of you are, to misquote an ex-student of mine, ”not-
impressed”. Perhaps the following examples better illustrate the dangers of non-compact constraint
curves.
Incidentally, if you want additional discussion of Lagrange multipliers for two-dimensional problems
one very nice source I certainly profitted from was the YouTube video by Edward Frenkel of Berkley.
See his website http://math.berkeley.edu/ frenkel/ for links.
Chapter 5
In the typical calculus sequence you learn the first and second derivative tests in calculus I. Then
in calculus II you learn about power series and Taylor’s Theorem. Finally, in calculus III, in many
popular texts, you learn an essentially ad-hoc procedure for judging the nature of critical points
as minimum, maximum or saddle. These topics are easily seen as disconnected events. In this
chapter, we connect them. We learn that the geometry of quadratic forms is ellegantly revealed by
eigenvectors and more than that this geometry is precisely what elucidates the proper classifications
of critical points of multivariate functions with real values.
I remind the reader that a function is called entire if it is analytic on all of R, for example ex , cos(x)
and sin(x) are all entire. In particular, you should know that:
∞
1 X 1
e = 1 + x + x2 + · · · =
x
xn
2 n!
n=0
∞
X (−1)n
1 1
cos(x) = 1 − x2 + x4 · · · = x2n
2 4! (2n)!
n=0
93
94 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
∞
X (−1)n
1 3 1
sin(x) = x − x + x5 · · · = x2n+1
3! 5! (2n + 1)!
n=0
∞
1 1 X 1
cosh(x) = 1 + x2 + x4 · · · = x2n
2 4! (2n)!
n=0
∞
1 3 1 X 1
sinh(x) = x + x + x5 · · · = x2n+1
3! 5! (2n + 1)!
n=0
The geometric series is often useful, for a, r ∈ R with |r| < 1 it is known
∞
X a
a + ar + ar2 + · · · = arn =
1−r
n=0
1
= 1 − x2 + x4 − x6 + · · ·
1 + x2
1
= 1 + x3 + x6 + x9 + · · ·
1 − x3
x3
= x3 (1 + 2x + (2x)2 + · · · ) = x3 + 2x4 + 4x5 + · · ·
1 − 2x
Moreover, the term-by-term integration and differentiation theorems yield additional results in
conjuction with the geometric series:
∞ ∞
(−1)n 2n+1
Z Z X
dx X 1 1
tan−1 (x) = = (−1)n x2n dx = x = x − x3 + x5 + · · ·
1 + x2 2n + 1 3 5
n=0 n=0
∞ ∞
−1 −1 n+1
Z Z Z X
d n
X
ln(1 − x) = ln(1 − x)dx = dx = − x dx = x
dx 1−x n+1
n=0 n=0
Of course, these are just the basic building blocks. We also can twist things and make the student
use algebra,
1
ex+2 = ex e2 = e2 (1 + x + x2 + · · · )
2
or trigonmetric identities,
∞ ∞
X (−1)n X (−1)n
⇒ sin(x) = cos(2) (x − 2)2n+1 + sin(2) (x − 2)2n .
(2n + 1)! (2n)!
n=0 n=0
Feel free to peruse my most recent calculus II materials to see a host of similarly sneaky calculations.
5.1. MULTIVARIATE POWER SERIES 95
Consider the function of two variables f : U ⊆ R2 → R which is smooth with smooth partial
derivatives of all orders. Furthermore, let (a, b) ∈ U and construct a line through (a, b) with
direction vector (h1 , h2 ) as usual:
φ(t) = (a, b) + t(h1 , h2 ) = (a + th1 , b + th2 )
for t ∈ R. Note φ(0) = (a, b) and φ0 (t) = (h1 , h2 ) = φ0 (0). Construct g = f ◦ φ : R → R and
choose dom(g) such that φ(t) ∈ U for t ∈ dom(g). This function g is a real-valued function of a
real variable and we will be able to apply Taylor’s theorem from calculus II on g. However, to
differentiate g we’ll need tools from calculus III to sort out the derivatives. In particular, as we
differentiate g, note we use the chain rule for functions of several variables:
g 0 (t) = (f ◦ φ)0 (t) = f 0 (φ(t))φ0 (t)
= ∇f (φ(t)) · (h1 , h2 )
= h1 fx (a + th1 , b + th2 ) + h2 fy (a + th1 , b + th2 )
Note g 0 (0) = h1 fx (a, b) + h2 fy (a, b). Differentiate again (I omit (φ(t)) dependence in the last steps),
g 00 (t) = h1 fx0 (a + th1 , b + th2 ) + h2 fy0 (a + th1 , b + th2 )
= h1 ∇fx (φ(t)) · (h1 , h2 ) + h2 ∇fy (φ(t)) · (h1 , h2 )
= h21 fxx + h1 h2 fyx + h2 h1 fxy + h22 fyy
= h21 fxx + 2h1 h2 fxy + h22 fyy
Thus, making explicit the point dependence, g 00 (0) = h21 fxx (a, b) + 2h1 h2 fxy (a, b) + h22 fyy (a, b). We
may construct the Taylor series for g up to quadratic terms:
1
g(0 + t) = g(0) + tg 0 (0) + g 00 (0) + · · ·
2
t2 2
h1 fxx (a, b) + 2h1 h2 fxy (a, b) + h22 fyy (a, b) + · · ·
= f (a, b) + t[h1 fx (a, b) + h2 fy (a, b)] +
2
96 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
1
h21 fxx + 2h1 h2 fxy + h22 fyy + · · ·
f (a + h1 , b + h2 ) = f (a, b) + h1 fx (a, b) + h2 fy (a, b) + 2
Sometimes we’d rather have an expansion about (x, y). To obtain that formula simply substitute
x − a = h1 and y − b = h2 . Note that the point (a, b) is fixed in this discussion so the derivatives
are not modified in this substitution,
At this point we ought to recognize the first three terms give the tangent plane to z = f (z, y) at
(a, b, f (a, b)). The higher order terms are nonlinear corrections to the linearization, these quadratic
terms form a quadratic form. If we computed third, fourth or higher order terms we will find that,
using a = a1 and b = a2 as well as x = x1 and y = x2 ,
∞ X
2 X
2 2
X X 1 ∂ (n) f (a1 , a2 )
f (x, y) = ··· (xi − ai1 )(xi2 − ai2 ) · · · (xin − ain )
n! ∂xi1 ∂xi2 · · · ∂xin 1
n=0 i1 =0 i2 =0 in =0
Example 5.1.1. Expand f (x, y) = cos(xy) about (0, 0). We calculate derivatives,
fx = −y sin(xy) fy = −x sin(xy)
This is actually a very interesting function, I think it defies our analysis in the later portion of this
chapter. The second order part of the expansion reveals nothing about the nature of the critical
point (0, 0). Of course, any student of trigonometry should recognize that f (0, 0) = 1 is likely
a local maximum, it’s certainly not a local minimum. The graph reveals that f (0, 0) is a local
maxium for f restricted to certain rays from the origin whereas it is constant on several special
directions (the coordinate axes).
5.1. MULTIVARIATE POWER SERIES 97
And, if you were wondering, yes, we could also derive this from subsitution of u = xy into the
standard expansion for cos(u) = 1 − 12 u2 + 4!1 u4 + · · · . Often such subsitutions are the quickest way
to generate interesting examples.
If we omit the explicit dependence on φ(t) then we find the simple formula g 0 (t) = ni=1 hi ∂i f .
P
Differentiate a second time,
n Xn Xn
00 d X d
hi ∇∂i f (φ(t)) · φ0 (t)
g (t) = hi ∂i f (φ(t)) = hi ∂i f (φ(t)) =
dt dt
i=1 i=1 i=1
Omitting the φ(t) dependence and once more using φ0 (t) = h we find
n
X
g 00 (t) = hi ∇∂i f · h
i=1
Pn
Recall that ∇ = j=1 ej ∂j and expand the expression above,
n
X n
X n X
X n
00
g (t) = hi ej ∂j ∂i f ·h= hi hj ∂j ∂i f
i=1 j=1 i=1 j=1
where we should remember ∂j ∂i f depends on φ(t). It should be clear that if we continue and take
k-derivatives then we will obtain:
n X
X n n
X
(k)
g (t) = ··· hi1 hi2 · · · hik ∂i1 ∂i2 · · · ∂ik f
i1 =1 i2 =1 ik =1
More explicitly,
n X
X n n
X
(k)
g (t) = ··· hi1 hi2 · · · hik (∂i1 ∂i2 · · · ∂ik f )(φ(t))
i1 =1 i2 =1 ik =1
98 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Hence, by Taylor’s theorem, provided we are sufficiently close to t = 0 as to bound the remainder1
∞ n n n
X 1 XX X
g(t) = ··· hi1 hi2 · · · hik (∂i1 ∂i2 · · · ∂ik f )(φ(t)) tk
k!
k=0 i1 =1 i2 =1 ik =1
1
Recall that g(t) = f (φ(t)) = f (a + th). Put2 t = 1 and bring in the k! to derive
∞ X
n X
n n
X X 1
f (a + h) = ··· ∂i1 ∂i2 · · · ∂ik f (a) hi1 hi2 · · · hik .
k!
k=0 i1 =1 i2 =1 ik =1
∞ X
n X
n n
X X 1
f (x) = ··· ∂i1 ∂i2 · · · ∂ik f (a) (xi1 − ai1 )(xi2 − ai2 ) · · · (xik − aik ).
k!
k=0 i1 =1 i2 =1 ik =1
Example 5.1.2. Suppose f : R3 → R let’s unravel the Taylor series centered at (0, 0, 0) from the
general formula boxed above. Utilize the notation x = x1 , y = x2 and z = x3 in this example.
∞ X
3 X
3 3
X X 1
f (x) = ··· ∂i1 ∂i2 · · · ∂ik f (0) xi1 xi2 · · · xik .
k!
k=0 i1 =1 i2 =1 ik =1
f (x) = f (0)
+ fx (0)x + fy (0)y + fz (0)z
+ 12 fxx (0)x2 + fyy (0)y 2 + fzz (0)z 2 +
+fxy (0)xy + fxz (0)xz + fyz (0)yz + fyx (0)yx + fzx (0)zx + fzy (0)zy + ···
f (x) = f (0)
+ fx (0)x + fy (0)y + fz (0)z
1 2 2 2
+2 fxx (0)x + fyy (0)y + fzz (0)z + 2fxy (0)xy + 2fxz (0)xz + 2fyz (0)yz
+ 3!1 fxxx (0)x3 + fyyy (0)y 3 + fzzz (0)z 3 + 3fxxy (0)x2 y + 3fxxz (0)x2 z
+3fyyz (0)y 2 z + 3fxyy (0)xy 2 + 3fxzz (0)xz 2 + 3fyzz (0)yz 2 + 6fxyz (0)xyz + ···
1
there exist smooth examples for which no neighborhood is small enough, the bump function in one-variable has
higher-dimensional analogues, we focus our attention to functions for which it is possible for the series below to
converge
2
if t = 1 is not in the domain of g then we should rescale the vector h so that t = 1 places φ(1) in dom(f ), if f is
smooth on some neighborhood of a then this is possible
5.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 99
Example 5.1.3. Suppose f (x, y, z) = exyz . Find a quadratic approximation to f near (0, 1, 2).
Observe:
fx = yzexyz fy = xzexyz fz = xyexyz
fxx = (yz)2 exyz fyy = (xz)2 exyz fzz = (xy)2 exyz
fxy = zexyz + xyz 2 exyz fyz = xexyz + x2 yzexyz fxz = yexyz + xy 2 zexyz
Evaluating at x = 0, y = 1 and z = 2,
Another way to calculate this expansion is to make use of the adding zero trick,
1 2
f (x, y, z) = ex(y−1+1)(z−2+2) = 1 + x(y − 1 + 1)(z − 2 + 2) + x(y − 1 + 1)(z − 2 + 2) + · · ·
2
Keeping only terms with two or less of x, (y − 1) and (z − 2) variables,
1
f (x, y, z) = 1 + 2x + x(y − 1)(2) + x(1)(z − 2) + x2 (1)2 (2)2 + · · ·
2
Which simplifies once more to f (x, y, z) = 1 + 2x + 2x(y − 1) + x(z − 2) + 2x2 + · · · .
Generally, if [Aij ] ∈ R n×n and ~x = [xi ]T then the associated quadratic form is
X n
X X
Q(~x) = ~xT A~x = Aij xi xj = Aii x2i + 2Aij xi xj .
i,j i=1 i<j
In case you wondering, yes you could write a given quadratic form with a different matrix which
is not symmetric, but we will find it convenient to insist that our matrix is symmetric since that
100 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Some texts actually use the middle equality above to define a symmetric matrix.
Example 5.2.2.
2 2
2 1 x
2x + 2xy + 2y = x y
1 2 y
Example 5.2.3.
2 1 3/2 x
2x2 + 2xy + 3xz − 2y 2 − z 2 =
x y z 1 −2 0 y
3/2 0 −1 z
Proposition 5.2.4.
Proof: Let Q(~x) = ~xT A~x. Notice that we can write any nonzero vector as the product of its
magnitude ||x|| and its direction x̂ = ||~x1|| ~x,
The proposition above is very interesting. It says that if we know how Q works on unit-vectors then
we can extrapolate its action on the remainder of Rn . If f : S → R then we could say f (S) > 0
iff f (s) > 0 for all s ∈ S. Likewise, f (S) < 0 iff f (s) < 0 for all s ∈ S. The proposition below
follows from the proposition above since ||~x||2 ranges over all nonzero positive real numbers in the
equations above.
Proposition 5.2.5.
3.(non-definite) Q(Rn∗ ) = R − {0} iff Q(Sn−1 ) has both positive and negative values.
Before I get too carried away with the theory let’s look at a couple examples.
5.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 101
Example 5.2.6. Consider the quadric form Q(x, y) = x2 + y 2 . You can check for yourself that
z = Q(x, y) is a cone and Q has positive outputs for all inputs except (0, 0). Notice that Q(v) = ||v||2
so it is clear that Q(S1 ) = 1. We find agreement with the preceding proposition. Next, √ think about
the application of Q(x, y) to level curves; x2 + y 2 = k is simply a circle of radius k or just the
origin. Here’s a graph of z = Q(x, y):
Example 5.2.7. Consider the quadric form Q(x, y) = x2 − 2y 2 . You can check for yourself
that z = Q(x, y) is a hyperboloid and Q has non-definite outputs since sometimes the x2 term
dominates whereas other points have −2y 2 as the dominent term. Notice that Q(1, 0) = 1 whereas
Q(0, 1) = −2 hence we find Q(S1 ) contains both positive and negative values and consequently we
find agreement with the preceding proposition. Next, think about the application of Q(x, y) to level
curves; x2 − 2y 2 = k yields either hyperbolas which open vertically (k > 0) or horizontally (k < 0)
or a pair of lines y = ± x2 in the k = 0 case. Here’s a graph of z = Q(x, y):
1 0 x
The origin is a saddle point. Finally, let’s take a moment to write Q(x, y) = [x, y]
0 −2 y
in this case the matrix is diagonal and we note that the e-values are λ1 = 1 and λ2 = −2.
102 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Example 5.2.8. Consider the quadric form Q(x, y) = 3x2 . You can check for yourself that z =
Q(x, y) is parabola-shaped trough along the y-axis. In this case Q has positive outputs for all inputs
except (0, y), we would call this form positive semi-definite. A short calculation reveals that
Q(S1 ) = [0, 3] thus we again find agreement with the preceding proposition (case 3). Next, p think
2
about the application of Q(x, y) to level curves; 3x = k is a pair of vertical lines: x = ± k/3 or
just the y-axis. Here’s a graph of z = Q(x, y):
3 0 x
Finally, let’s take a moment to write Q(x, y) = [x, y] in this case the matrix is
0 0 y
diagonal and we note that the e-values are λ1 = 3 and λ2 = 0.
Example 5.2.9. Consider the quadric form Q(x, y, z) = x2 +2y 2 +3z 2 . Think about the application
of Q(x, y, z) to level surfaces; x2 + 2y 2 + 3z 2 = k is an ellipsoid. I can’t graph a function of three
variables, however, we can look at level surfaces of the function. I use Mathematica to plot several
below:
1 0 0 x
Finally, let’s take a moment to write Q(x, y, z) = [x, y, z] 0 2 0 y in this case the matrix
0 0 3 z
is diagonal and we note that the e-values are λ1 = 1 and λ2 = 2 and λ3 = 3.
Definition 5.2.10.
Let A ∈ R n×n . If v ∈ R n×1 is nonzero and Av = λv for some λ ∈ C then we say v is an
eigenvector with eigenvalue λ of the matrix A.
Proposition 5.2.11.
Let A ∈ R n×n then λ is an eigenvalue of A iff det(A − λI) = 0. We say P (λ) = det(A − λI)
the characteristic polynomial and det(A − λI) = 0 is the characteristic equation.
Proof: Suppose λ is an eigenvalue of A then there exists a nonzero vector v such that Av = λv
which is equivalent to Av − λv = 0 which is precisely (A − λI)v = 0. Notice that (A − λI)0 = 0
thus the matrix (A − λI) is singular as the equation (A − λI)x = 0 has more than one solution.
Consequently det(A − λI) = 0.
Conversely, suppose det(A − λI) = 0. It follows that (A − λI) is singular. Clearly the system
(A − λI)x = 0 is consistent as x = 0 is a solution hence we know there are infinitely many solu-
tions. In particular there exists at least one vector v 6= 0 such that (A − λI)v = 0 which means the
vector v satisfies Av = λv. Thus v is an eigenvector with eigenvalue λ for A.
Remark 5.2.12.
I found a pretty derivation of the eigenvector condition from the method of Lagrange mul-
tipliers. I shared in the Lecture 10 part 1. It’s likely I cover that argument again in Lecture
this year, my apologies it has not made it to these notes at this time.
3 1
Example 5.2.13. Let A = find the e-values and e-vectors of A.
3 1
3−λ 1
det(A − λI) = det = (3 − λ)(1 − λ) − 3 = λ2 − 4λ = λ(λ − 4) = 0
3 1−λ
We find λ1 = 0 and λ2 = 4. Now find the e-vector with e-value λ1 = 0, let u1 = [u, v]T denote the
e-vector we wish to find. Calculate,
3 1 u 3u + v 0
(A − 0I)u1 = = =
3 1 v 3u + v 0
Obviously the equations above are redundant and we have infinitely
many
solutions
of the form
u 1
3u + v = 0 which means v = −3u so we can write, u1 = =u . In applications we
−3u −3
often make a choice to select a particular e-vector. Most modern graphing calculators can calcu-
late e-vectors. It is customary for the e-vectors to be chosen to have length one. That is a useful
choice forcertain
applications as we will later discuss. If you use a calculator it would likely give
1 √
u1 = √110 although the 10 would likely be approximated unless your calculator is smart.
−3
Continuing we wish to find eigenvectors u2 = [u, v]T such that (A − 4I)u2 = 0. Notice that u, v
are disposable variables in this context, I do not mean to connect the formulas from the λ = 0 case
with the case considered now.
−1 1 u −u + v 0
(A − 4I)u1 = = =
3 −3 v 3u − 3v 0
104 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Againthe equations
are
redundant and we have infinitely many solutions of the form v = u. Hence,
u 1
u2 = =u is an eigenvector for any u ∈ R such that u 6= 0.
u 1
Theorem 5.2.14.
There is a geometric proof of this theorem in Edwards4 (see Theorem 8.6 pgs 146-147) . I prove half
of this theorem in my linear algebra notes by a non-geometric argument (full proof is in Appendix C
of Insel,Spence and Friedberg). It might be very interesting to understand the connection between
the geometric verse algebraic arguments. We’ll content ourselves with an example here:
0 0 0
Example 5.2.15. Let A = 0 1 2 . Observe that det(A − λI) = −λ(λ + 1)(λ − 3) thus λ1 =
0 2 1
0, λ2 = −1, λ3 = 3. We can calculate orthonormal e-vectors of v1 = [1, 0, 0]T , v2 = √12 [0, 1, −1]T
and v3 = √1 [0, 1, 1]T . I invite the reader to check the validity of the following equation:
2
1 0 0 1 0 0
0 0 0 0 0 0
0 √1 −1
√ 0 √1 √1
0 1 2 = 0 −1 0
2 2 2 2
√1 √1 −1 √1
0 2 2
0 2 1 0 √
2 2
0 0 3
Its really neat that to find the inverse of a matrix of orthonormal e-vectors we need only take the
transpose; note
1 0 0 1 0 0
1 0 0
0 √1 √ −1 0 √1 √1
= 0 1 0 .
2 2 2 2
−1
0 √12 √12 0 √
2
√1
2
0 0 1
Proposition 5.2.16.
4
think about it, there is a 1-1 correspondance between symmetric matrices and quadratic forms
5.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 105
Example 5.2.17. Consider the quadric form Q(x, y) = 2x2 + 2xy + 2y 2 . It’s not immediately
obvious (to me) what the level curves Q(x, y) = k look like. We’ll make
use of the preceding
2 1 x
proposition to understand those graphs. Notice Q(x, y) = [x, y] . Denote the matrix
1 2 y
of the form by A and calculate the e-values/vectors:
2−λ 1
det(A − λI) = det = (λ − 2)2 − 1 = λ2 − 4λ + 3 = (λ − 1)(λ − 3) = 0
1 2−λ
I just solved u + v = 0 to give v = −u choose u = 1 then normalize to get the vector above. Next,
−1 1 u 0 1 1
(A − 3I)~u2 = = ⇒ ~u2 = √
1 −1 v 0 2 1
I just solved u − v = 0 to give v = u choose u = 1 then normalize to get the vector above. Let
P = [~u1 |~u2 ] and introduce new coordinates ~y = [x̄, ȳ]T defined by ~y = P T ~x. Note these can be
inverted by multiplication by P to give ~x = P ~y . Observe that
x = 21 (x̄ + ȳ) x̄ = 21 (x − y)
1 1 1
P = ⇒ or
2 −1 1 y = 21 (−x̄ + ȳ) ȳ = 12 (x + y)
The proposition preceding this example shows that substitution of the formulas above into Q yield5 :
It is clear that in the barred coordinate system the level curve Q(x, y) = k is an ellipse. If we draw
the barred coordinate system superposed over the xy-coordinate system then you’ll see that the graph
of Q(x, y) = 2x2 + 2xy + 2y 2 = k is an ellipse rotated by 45 degrees. Or, if you like, we can plot
z = Q(x, y):
5
technically Q̃(x̄, ȳ) is Q(x(x̄, ȳ), y(x̄, ȳ))
106 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Example 5.2.18. Consider the quadric form Q(x, y) = x2 + 2xy + y 2 . It’s not immediately obvious
(to me) what the level curves Q(x, y) = k look like.
We’ll
make use of the preceding proposition to
1 1 x
understand those graphs. Notice Q(x, y) = [x, y] . Denote the matrix of the form by
1 1 y
A and calculate the e-values/vectors:
1−λ 1
det(A − λI) = det = (λ − 1)2 − 1 = λ2 − 2λ = λ(λ − 2) = 0
1 1−λ
I just solved u + v = 0 to give v = −u choose u = 1 then normalize to get the vector above. Next,
−1 1 u 0 1 1
(A − 2I)~u2 = = ⇒ ~u2 = √
1 −1 v 0 2 1
I just solved u − v = 0 to give v = u choose u = 1 then normalize to get the vector above. Let
P = [~u1 |~u2 ] and introduce new coordinates ~y = [x̄, ȳ]T defined by ~y = P T ~x. Note these can be
inverted by multiplication by P to give ~x = P ~y . Observe that
x = 21 (x̄ + ȳ) x̄ = 21 (x − y)
1 1 1
P = ⇒ 1 or
2 −1 1 y = 2 (−x̄ + ȳ) ȳ = 12 (x + y)
The proposition preceding this example shows that substitution of the formulas above into Q yield:
It is clear that in the barred coordinate system the level curve Q(x, y) = k is a pair of paralell
lines. If we draw the barred coordinate system superposed over the xy-coordinate system then you’ll
see that the graph of Q(x, y) = x2 + 2xy + y 2 = k is a line with slope −1. Indeed, with a little
algebraic√insight we could 2
√ have anticipated this result since Q(x, y) = (x+y) so Q(x, y) = k implies
x + y = k thus y = k − x. Here’s a plot which again verifies what we’ve already found:
5.2. A BRIEF INTRODUCTION TO THE THEORY OF QUADRATIC FORMS 107
Example 5.2.19. Consider the quadric form Q(x, y) = 4xy. It’s not immediately obvious (to
me) what the level curves Q(x, y) = k look like. We’llmake
use of the preceding proposition to
0 2 x
understand those graphs. Notice Q(x, y) = [x, y] . Denote the matrix of the form by
0 2 y
A and calculate the e-values/vectors:
−λ 2
det(A − λI) = det = λ2 − 4 = (λ + 2)(λ − 2) = 0
2 −λ
x = 21 (x̄ + ȳ) x̄ = 21 (x − y)
1 1 1
P = ⇒ 1 or
2 −1 1 y = 2 (−x̄ + ȳ) ȳ = 12 (x + y)
The proposition preceding this example shows that substitution of the formulas above into Q yield:
It is clear that in the barred coordinate system the level curve Q(x, y) = k is a hyperbola. If we
draw the barred coordinate system superposed over the xy-coordinate system then you’ll see that
the graph of Q(x, y) = 4xy = k is a hyperbola rotated by 45 degrees. The graph z = 4xy is thus a
hyperbolic paraboloid:
The fascinating thing about the mathematics here is that if you don’t want to graph z = Q(x, y),
but you do want to know the general shape then you can determine which type of quadraic surface
you’re dealing with by simply calculating the eigenvalues of the form.
108 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Remark 5.2.20.
I made the preceding triple of examples all involved the same rotation. This is purely for my
lecturing convenience. In practice the rotation could be by all sorts of angles. In addition,
you might notice that a different ordering of the e-values would result in a redefinition of
the barred coordinates. 6
We ought to do at least one 3-dimensional example.
Therefore, the e-values are λ1 = 4, λ2 = 8 and λ3 = 5. After some calculation we find the following
orthonormal e-vectors for A:
1 1 0
1 1
~u1 = √ 1 ~u2 = √ −1 ~u3 = 0
2 0 2 0 1
Let P = [~u1 |~u2 |~u3 ] and introduce new coordinates ~y = [x̄, ȳ, z̄]T defined by ~y = P T ~x. Note these
can be inverted by multiplication by P to give ~x = P ~y . Observe that
x = 12 (x̄ + ȳ) x̄ = 12 (x − y)
1 1 0
1
P = √ −1 1 √0 ⇒ y = 12 (−x̄ + ȳ) or ȳ = 12 (x + y)
2 0 0 2 z = z̄ z̄ = z
The proposition preceding this example shows that substitution of the formulas above into Q yield:
It is clear that in the barred coordinate system the level surface Q(x, y, z) = k is an ellipsoid. If we
draw the barred coordinate system superposed over the xyz-coordinate system then you’ll see that
the graph of Q(x, y, z) = k is an ellipsoid rotated by 45 degrees around the z − axis. Plotted below
are a few representative ellipsoids:
5.3. SECOND DERIVATIVE TEST IN MANY-VARIABLES 109
In summary, the behaviour of a quadratic form Q(x) = xT Ax is governed by it’s set of eigenvalues7
{λ1 , λ2 , . . . , λk }. Moreover, the form can be written as Q(y) = λ1 y12 + λ2 y22 + · · · + λk yk2 by choosing
the coordinate system which is built from the orthonormal eigenbasis of col(A). In this coordinate
system the shape of the level-sets of Q becomes manifest from the signs of the e-values. )
Remark 5.2.22.
If you would like to read more about conic sections or quadric surfaces and their connection
to e-values/vectors I reccommend sections 9.6 and 9.7 of Anton’s linear algebra text. I
have yet to add examples on how to include translations in the analysis. It’s not much
more trouble but I decided it would just be an unecessary complication this semester.
Also, section 7.1,7.2 and 7.3 in Lay’s linear algebra text show a bit more about how to
use this math to solve concrete applied problems. You might also take a look in Gilbert
Strang’s linear algebra text, his discussion of tests for positive-definite matrices is much
more complete than I will give here.
Clearly if λ1 > 0 and λ2 > 0 then f (a, b) yields the local minimum whereas if λ1 < 0 and λ2 < 0
then f (a, b) yields the local maximum. Edwards discusses these matters on pgs. 148-153. In short,
supposing f ≈ f (p) + Q, if all the e-values of Q are positive then f has a local minimum of f (p)
at p whereas if all the e-values of Q are negative then f reaches a local maximum of f (p) at p.
Otherwise Q has both positive and negative e-values and we say Q is non-definite and the function
has a saddle point. If all the e-values of Q are positive then Q is said to be positive-definite
whereas if all the e-values of Q are negative then Q is said to be negative-definite. Edwards
gives a few nice tests for ascertaining if a matrix is positive definite without explicit computation
of e-values. Finally, if one of the e-values is zero then the graph will be like a trough.
7
this set is called the spectrum of the matrix
110 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Example 5.3.1. Suppose f (x, y) = exp(−x2 − y 2 + 2y − 1) expand f about the point (0, 1):
expanding,
f (h, 1 + k) = 1 − h2 − k 2 + · · ·
If (h, k) is near (0, 0) then the dominant terms are simply those we’ve written above hence the graph
is like that of a quadraic surface with a pair of negative e-values. It follows that f (0, 1) is a local
maximum. In fact, it happens to be a global maximum for this function.
f (1 + h, 2 + k) = 4 − h2 − k 2 + Aexp(−h2 − k 2 ) + 2Bhk
= 4 − h2 − k 2 + A(1 − h2 − k 2 ) + 2Bhk · · ·
= 4 + A − (A + 1)h2 + 2Bhk − (A + 1)k 2 + · · ·
There is no nonzero linear term in the expansion at (1, 2) which indicates that f (1, 2) = 4 + A
may be a local extremum. In this case the quadratic terms are nontrivial which means the graph of
this function is well-approximated by a quadraic surface near (1, 2). The quadratic form Q(h, k) =
−(A + 1)h2 + 2Bhk − (A + 1)k 2 has matrix
−(A + 1) B
[Q] = .
B −(A + 1)2
3. if just one of λ1 , λ2 is zero then f is constant along one direction and min/max along another
so technically it is a local extremum.
In particular, the following choices for A, B will match the choices above
Here are the graphs of the cases above, note the analysis for case 3 is more subtle for Taylor
approximations as opposed to simple quadraic surfaces. In this example, case 3 was also a local
minimum. In contrast, in Example 5.2.18 the graph was like a trough. The behaviour of f away
from the critical point includes higher order terms whose influence turns the trough into a local
minimum.
Example 5.3.3. Suppose f (x, y) = sin(x) cos(y) to find the Taylor series centered at (0, 0) we can
simply multiply the one-dimensional result sin(x) = x − 3!1 x3 + 5!1 x5 + · · · and cos(y) = 1 − 2!1 y 2 +
1 4
4! y + · · · as follows:
The origin (0, 0) is a critical point since fx (0, 0) = 0 and fy (0, 0) = 0, however, this particular
critical point escapes the analysis via the quadratic form term since Q = 0 in the Taylor series
for this function at (0, 0). This is analogous to the inconclusive case of the 2nd derivative test in
calculus III.
Example 5.3.4. Suppose f (x, y, z) = xyz. Calculate the multivariate Taylor expansion about the
point (1, 2, 3). I’ll actually calculate this one via differentiation, I have used tricks and/or calculus
II results to shortcut any differentiation in the previous examples. Calculate first derivatives
fx = yz fy = xz fz = xy,
It follows,
f (a + h, b + k, c + l) =
= f (a, b, c) + fx (a, b, c)h + fy (a, b, c)k + fz (a, b, c)l +
1
2 ( fxx hh + fxy hk + fxz hl + fyx kh + fyy kk + fyz kl + fzx lh + fzy lk + fzz ll ) + · · ·
112 CHAPTER 5. CRITICAL POINT ANALYSIS FOR SEVERAL VARIABLES
Of course certain terms can be combined since fxy = fyx etc... for smooth functions (we assume
smooth in this section, moreover the given function here is clearly smooth). In total,
1 1
f (1 + h, 2 + k, 3 + l) = 6 + 6h + 3k + 2l + 3hk + 2hl + 3kh + kl + 2lh + lk + (6)hkl
2 3!
Of course, we could also obtain this from simple algebra:
6.1 history
The problem of variational calculus is almost as old as modern calculus. Variational calculus seeks
to answer questions such as:
Remark 6.1.1.
2. what is the path of least time for a mass sliding without friction down some path
between two given points ?
3. what is the path which minimizes the energy for some physical system ?
4. given two points on the x-axis and a particular area what curve has the longest
perimeter and bounds that area between those points and the x-axis?
You’ll notice these all involve a variable which is not a real variable or even a vector-valued-variable.
Instead, the answers to the questions posed above will be paths or curves depending on how you
wish to frame the problem. In variational calculus the variable is a function and we wish to find
extreme values for a functional. In short, a functional is an abstract function of functions. A
functional takes as an input a function and gives as an output a number. The space from which
these functions are taken varies from problem to problem. Often we put additional contraints
or conditions on the space of admissable solutions. To read about the full generality of the
problem you should look in a text such as Hans Sagan’s. Our treatment is introductory in this chap-
ter, my aim is to show you why it is plausible and then to show you how we use variational calculus.
We will see that the problem of finding an extreme value for a functional is equivalent to solving
the Euler-Lagrange equations or Euler equations for the functional. Euler predates Lagrange in his
discovery of the equations bearing their names. Eulers’s initial attack of the problem was to chop
the hypothetical solution curve up into a polygonal path. The unknowns in that approach were
the coordinates of the vertices in the polygonal path. Then through some ingenious calculations
he arrived at the Euler-Lagrange equations. Apparently there were logical flaws in Euler’s origi-
nal treatment. Lagrange later derived the same equations using the viewpoint that the variable
was a function and the variation was one of shifting by an arbitrary function. The treatment of
113
114 CHAPTER 6. INTRODUCTION TO VARIATIONAL CALCULUS
variational calculus in Edwards is neither Euler nor Lagrange’s approach, it is a refined version
which takes in the contributions of generations of mathematicians working on the subject and then
merges it with careful functional analysis. I’m no expert of the full history, I just give you a rough
sketch of what I’ve gathered from reading a few variational calculus texts.
Physics played a large role in the development of variational calculus. Lagrange was a physicist
as well as a mathematician. At the present time, every physicist takes course(s) in Lagrangian
Mechanics. Moreover, the use of variational calculus is fundamental since Hamilton’s principle says
that all physics can be derived from the principle of least action. In short this means that nature is
lazy. The solutions realized in the physical world are those which minimize the action. The action
Z
S[y] = L(y, y 0 , t) dt
is constructed from the Lagrangian L = T − U where T is the kinetic energy and U is the potential
energy. In the case of classical mechanics the Euler Lagrange equations are precisely Newton’s
equations. The Hamiltonian H = T + U is similar to the Lagrangian except that the fundamental
variables are taken to be momentum and position in contrast to velocity and position in Lagrangian
mechanics.
Hamiltonians and Lagrangians are used to set-up new physical theories. Euler-Lagrange equations
are said to give the so-called classical limit of modern field theories. The concept of a force is not
so useful to quantum theories, instead the concept of energy plays the central role. Moreover, the
problem of quantizing and then renormalizing field theory brings in very sophisiticated mathemat-
ics. In fact, the math of modern physics is not understood. In this chapter I’ll just show you a
few famous classical mechanics problems which are beatifully solved by Lagrange’s approach. We’ll
also see how expressing the Lagrangian in non-Cartesian coordinates can give us an easy way to
derive forces that arise from geometric contraints.
I am following the typical physics approach to variational calculus. Edwards’ last chapter is more
natural mathematically but I think the math is a bit much for your first exposure to the subject.
The treatment given here is close to that of Arfken and Weber’s Mathematical Physics text, how-
ever I suspect you can find these calculations in dozens of classical mechanics texts. More or less
our approach is that of Lagrange.
We suppose that f is given but y is a variable. Consider that if we are given a function y ∗ ∈ Fo
and another function η such that η(x1 ) = η(x2 ) = 0 then we can reach a whole family of functions
indexed by a real variable α as follows (relabel y ∗ (x) by y(x, 0) so it matches the rest of the family
of functions):
y(x, α) = y(x, 0) + αη(x)
6.2. THE VARIATIONAL PROBLEM 115
δy = αη(x)
This means y(x, α) = y(x, 0) + δy. We may write J as a function of α given the variation we just
described: Z x2
J(α) = f (y(x, α), y(x, α)0 , x) dx.
x1
It is intuitively obvious that if the function y ∗ (x) = y(x, 0) is an extremum of the functional then
we ought to expect
∂J(α)
=0
∂α α=0
Notice that we can calculate the derivative above using multivariate calculus. Remember that
∂y 0
y(x, α) = y(x, 0) + αη(x) hence y(x, α)0 = y(x, 0)0 + αη(x)0 thus ∂α = η and ∂y 0 dη
∂α = η = dx .
Consider that:
Z x2
∂J(α) ∂ 0
= f (y(x, α), y(x, α) , x) dx
∂α ∂α x1
∂f ∂y 0 ∂f ∂x
Z x2
∂f ∂y
= + + dx
x1 ∂y ∂α ∂y 0 ∂α ∂x ∂α
Z x2
∂f ∂f dη
= η+ 0 dx (6.1)
x1 ∂y ∂y dx
Observe that
d ∂f d ∂f ∂f dη
0
η = 0
η+ 0
dx ∂y dx ∂y ∂y dx
Hence continuing Equation 6.1 in view of the product rule above,
Z x2
∂J(α) ∂f d ∂f d ∂f
= η+ η − η dx
∂α x1 ∂y dx ∂y 0 dx ∂y 0
∂f x2
Z x2
∂f d ∂f
= 0η + η− η dx (6.2)
∂y x1 x1 ∂y dx ∂y 0
Z x2
∂f d ∂f
= − η dx
x1 ∂y dx ∂y 0
∂f x
2 ∂f ∂f
Note we used the conditions η(x1 ) = η(x2 ) to see that ∂y 0 η x = ∂y 0 η(x2 ) − ∂y 0 η(x1 ) = 0. Our goal
1
is to find the extreme values for the functional J. Let me take a few sentences to again restate
our set-up. Generally, we take a function y then J maps to a new function J[y]. The family of
functions indexed by α gives a whole ensemble of functions in Fo which are near y ∗ according to
the formula,
y(x, α) = y ∗ (x) + αη(x)
Let’s call this set of functions Wη . If we took another function like η, say ζ such that ζ(x1 ) =
ζ(x2 ) = 0 then we could look at another family of functions:
and we could denote the set of all such functions generated from ζ to be Wζ . The total variation
of y based at y ∗ should include all possible families of functions in Fo . You could think of Wη and
Wζ be two different subspaces in Fo . If η 6= ζ then these subspaces of Fo are likely disjoint except
116 CHAPTER 6. INTRODUCTION TO VARIATIONAL CALCULUS
for the proposed extremal solution y ∗ . It is perhaps a bit unsettling to realize there are infinitely
many such subspaces because there are infinitely many choices
for
the function η or ζ. In any event,
∂J(α)
each possible variation of y ∗ must satisfy the condition ∂α = 0 since we assume that y ∗
α=0
is an extreme value of the functional J. It follows that the Equation 6.2 holds for
R xall possible η.
Therefore, we ought to expect that any extreme value of the functional J[y] = x1 f (y, y 0 , x) dx
2
Therefore, since δy = 0 at the endpoints of integration, the Euler-Lagrange equations follow from
δJ = 0. Now, if you’re like me, the argument above is less than satisfying since we never actually
defined what it means to ”take δ” of something. Also, why could I commute the variational δ and
d
dx )? That said, the formal method is not without use since it allows the focus to be on the Euler
Lagrange equations rather than the technical details of the variation.
Remark 6.3.1.
The more adept reader at this point should realize the hypocrisy of me calling the above
calculation formal since even my presentation here was formal. I also used an analogy, I
assumed that the theory of extreme values for multivariate calculus extends to function
space. But, Fo is not Rn , it’s much bigger. Edwards builds the correct formalism for a
rigourous calculation of the variational derivative. To be careful we’d need to develop the
norm on function space and prove a number of results about infinite dimensional linear
algebra. Take a look at the last chapter in Edwards’ text if you’re interested. I don’t
believe I’ll have time to go over that material this semester.
6.4. EULER-LAGRANGE EXAMPLES 117
∂f ∂f y0
=0 and = .
∂y 0
p
∂y 1 + (y 0 )2
y0 y0
d ∂f ∂f d
0
= ⇒ p =0 ⇒ p =k
dx ∂y ∂y dx 1 + (y 0 )2 1 + (y 0 )2
where x0 = dx/dy and the Euler Lagrange equations would yield the solution
x2 − x1
x = x1 + (y − y1 ).
y2 − y1
Finally, if both coordinates are equal then (x1 , y1 ) = (x2 , y2 ) and the shortest path between these
points is the trivial path, the armchair solution. Silly comments aside, we have shown that a
straight line provides the curve with the shortest arclength between any two points in the plane.
118 CHAPTER 6. INTRODUCTION TO VARIATIONAL CALCULUS
p
If we choose x as the parameter this yields dA = 2πy 1 + (y 0 )2 dx. To find the surface of minimal
surface area we ought to consider the functional:
Z x2 p
A[y] = 2πy 1 + (y 0 )2 dx
x1
The usual Euler-Lagrange equations are not easy to solve for this problem, it’s easier to work with
the equations you derived in homework,
∂f d 0 ∂f
− f −y = 0.
∂x dx ∂y 0
Hence,
2πy(y 0 )2
d p
0 2
2πy 1 + (y ) − p =0
dx 1 + (y 0 )2
Dividing by 2π and making a common denominator,
d y y
p =0 ⇒ p =k
dx 1 + (y 0 )2 1 + (y 0 )2
where k is a constant with respect to x. Squaring the equation above yields
y2 dy 2
dy 2
= k2 ⇒ y 2 − k 2 = k 2 ( dx )
1+ ( dx )
Solve for dx, integrate, assuming the given points are in the first quadrant,
Z Z
kdy
x = dx = p = k cosh−1 ( ky ) + c
y2 − k2
Hence,
x−c
y = k cosh
k
6.4. EULER-LAGRANGE EXAMPLES 119
generates the surface of revolution of least area between two points. These shapes are called
Catenoids they can be observed in the formation of soap bubble between rings. There is a vast
literature on this subject and there are many cases to consider, I simply exhibit a simple solution.
For a given pair of points it is not immediately obvious if there exists a solution to the Euler-
Lagrange equations which fits the data. (see page 622 of Arfken).
6.4.3 Braichistochrone
Suppose a particle slides freely along some curve from (x1 , y1 ) to (x2 , y2 ) = (0, 0) under the influence
of gravity where we take y to be the vertical direction. What is the curve of quickest descent?
Notice that if x1 = 0 then the answer is easy to see, however, if x1 6= 0 then the question is not
trivial. To solve this problem we must first offer a functional which accounts for the time of descent.
R (x1 ,y1 ) ds
Note that the speed v = ds/dt so we’d clearly like to minimize J = (0,0) v . Since the object is
assumed to fall freely we may assume that energy is conserved in the motion hence
1 p
mv 2 = mg(y − y1 ) ⇒ v= 2g(y1 − y)
2
p
As we’ve discussed in previous examples, ds = 1 + (y 0 )2 dt so we find
Z x1 s
1 + (y 0 )2
J[y] = dx
0 2g(y1 − y)
| {z }
f (y,y 0 ,x)
∂f d 0 ∂f
Notice that the modified Euler-Lagrange equations ∂x − dx f − y ∂y0 = 0 are convenient since
fx = 0. We calculate that
∂f 1 2y 0 y0
= =
∂y 0
q p
1+(y 0 )2 2g(y1 − y) 2g(y1 − y)(1 + (y 0 )2 )
2 2g(y 1 −y)
√
Hence there should exist some constant 1/(k 2g) such that
s
1 + (y 0 )2 (y 0 )2 1
−p = √
2g(y1 − y) 0 2
2g(y1 − y)(1 + (y ) ) k 2g
It follows that,
2
1 1 dy
p = ⇒ y1 − y 1+ = k2
0 2
(y1 − y)(1 + (y ) ) k dx
The integral is not trivial. It turns out that the solution is a cycloid (Arfken p. 624):
a+b a+b
x= θ + sin(θ) − d y= 1 − cos(θ) − b
2 2
This is the curve that is traced out by a point on a wheel as it travels. If you take this solution
and calculate J[ycycloid ] you can show the time of descent is simply
r
π y1
T =
2 2g
if the mass begins to descend from (x2 , y2 ). But, this point has no connection with (x1 , y1 ) except
that they both reside on the same cycloid. It follows that the period of a pendulum that follows
a cycloidal path is indpendent of the starting point on the path. This is not true for a circular
pendulum in general, we need the small angle approximation to derive simple harmonic motion.
It turns out that it is possible to make a pendulum follow a cycloidal path if you let the string be
guided by a frame which is also cycloidal. The neat thing is that even as it loses energy it still
follows a cycloidal path and hence has the same period. The ”Brachistochrone” problem was posed
by Johann Bernoulli in 1696 and it actually predates the variational calculus of Lagrange by some
50 or so years. This problem and ones like it are what eventually prompted Lagrange and Euler to
systematically develop the subject. Apparently Galileo also studied this problem however lacked
the mathematics to crack it.
See this Geogebra demonstration to compare and contrast lines, verses parabolas, verses the cycloid.
A google search will show you dozens of these.
here we use (yi ) as shorthand for (y1 , y2 , . . . , yn ) and (ẏi ) as shorthand for (ẏ1 , ẏ2 , . . . , ẏn ). We
suppose that n-conditions are given for each of the endpoints in this problem; yi (t1 ) = yi1 and
yi (t2 ) = yi2 . Moreover, we define Fo to be the set of paths from R to Rn subject to the conditions
just stated. We now set out to find necessary conditions on a proposed solution to the extreme
value problem for the functional J above. As before let’s assume that an extremal solution y∗ ∈ Fo
exists. Moreover, imagine varying the solution by some variational function η = (ηi ) which has
η(t1 ) = (0, 0, . . . , 0) and η(t2 ) = (0, 0, . . . , 0). Consequently the family of paths defined below are
all in Fo ,
y(t, α) = y ∗ (t) + αη(t)
Thus y(t, 0) = y ∗ . In terms of component functions we have that
∗ ∗
that δyi = yi (t, α) − yi (t) = αηi (t). Since y is an extreme solution we should
You can identify
∂J
expect that ∂α = 0. Differentiate the functional with respect to α and make use of the
α=0
chain rule for f which is a function of some 2n + 1 variables,
Z t2
∂J(α) ∂
= f (yi (t, α), y˙i (t, α), t) dt
∂α ∂α t1
Z t2 X n
∂f ∂yj ∂f ∂ y˙j
= + dt
t1 j=1 ∂yj ∂α ∂ y˙j ∂α
Z t2 X n
∂f ∂f dηj
= ηj + dt (6.4)
t1 j=1 ∂yj ∂ y˙j dt
n n
∂f t2
Z t2 X
X ∂f d ∂f
= η + − ηj dt
∂ y˙j t1 t1 ∂yj dt ∂ y˙j
j=1 j=1
Since η(t1 ) = η(t2 ) = 0 the first term vanishes. Moreover, since we may repeat this calculation for
all possible variations about the optimal solution y ∗ it follows that we obtain a set of Euler-Lagrange
equations for each component function of the solution:
Z t2
∂f d ∂f
− = 0 j = 1, 2, . . . n Euler-Lagrange Eqns. for J[(yi )] = f (yi , y˙i , t) dt
∂yj dt ∂ ẏj t1
Often we simply use y1 = x, y2 = y and y3 = z which denote the position of particle or perhaps
just the component functions of a path which gives the geodesic on some surface. In either case
we should have 3 sets of Euler-Lagrange equations, one for each coordinate. We will also use non-
Cartesian coordinates to describe certain Lagrangians. We develop many useful results for set-up
of Lagrangians in non-Cartesian coordinates in the next section.
6.5.2 geodesics in R3
A geodesic is the path of minimal length between a pair of points on some manifold. Note we
already proved that geodesics in the plane are just lines. In general, for R3 , the square of the
infinitesimal arclength element is ds2 = dx2 + dy 2 + dz 2 . The arclength integral from p = 0 to
q = (qx , qy , qz ) in R3 is most naturally given from the parametric viewpoint:
Z 1p
S= ẋ2 + ẏ 2 + ż 2 dt
0
We assume (x(0), y(0), z(0)) = (0, 0, 0) and (x(1), y(1), z(1)) = q and it should be clear that the
integral above calculates the arclength. The Euler-Lagrange equations for x, y, z are
d ẋ d ẏ d ż
p = 0, p = 0, p = 0.
dt ẋ2 + ẏ 2 + ż 2 dt ẋ2 + ẏ 2 + ż 2 dt ẋ2 + ẏ 2 + ż 2
It follows that there exist constants, say a, b and c, such that
ẋ ẏ ż
a= p , b= p , c= p .
ẋ + ẏ 2 + ż 2
2 ẋ + ẏ 2 + ż 2
2 ẋ2 + ẏ 2 + ż 2
These equations are said to be coupled since each involves derivatives of the others. We usually
need a way to uncouple the equations if we are to be successful in solving the system. We can
calculate, and equate each with the constant 1:
ẋ ẏ ż
1= p = p = p .
a ẋ2 + ẏ 2 + ż 2 b ẋ2 + ẏ 2 + ż 2 c ẋ2 + ẏ 2 + ż 2
But, multiplying by the denominator reveals an interesting identity
p ẋ ẏ ż
ẋ2 + ẏ 2 + ż 2 = = =
a b c
The solution has the form, x(t) = tqx , y(t) = tqy and z(t) = tqz . Therefore,
for 0 ≤ t ≤ 1. These are the parametric equations for the line segment from the origin to q.
ẍ = 0, ÿ = 0, z̈ = 0
The solution of these equations is clearly a line. In this formalism the equations were uncoupled
from the outset.
6.6. THE EUCLIDEAN METRIC 123
Definition 6.6.1.
x = r cos(θ) y = r sin(θ)
For which we have implicit inverse coordinate transformations r2 = x2 + y 2 and θ = tan−1 (y/x).
From these inverse formulas we calculate:
Thus, ||∇r|| = 1 whereas ||∇θ|| = 1/r. We find that the metric in polar coordinates takes the form:
Physicists and engineers tend to like to think of these as arising from calculating the length of
infinitesimal displacements in the r or θ directions. Generically, for u, v, w coordinates
1 1 1
dlu = du dlv = dv dlw = dw
||∇u|| ||∇v|| ||∇w||
and ds2 = dl2u + dl2v + dl2w . So in that notation we just found dlr = dr and dlθ = rdθ. Notice then
that cylindircal coordinates have the metric,
For spherical coordinates x = r cos(φ) sin(θ), y = r sin(φ) sin(θ) and z = r cos(θ) (here 0 ≤ φ ≤ 2π
and 0 ≤ θ ≤ π, physics notation). Calculation of the metric follows from the line elements,
Thus,
ds2 = dr2 + r2 sin2 (θ)dφ2 + r2 dθ2 .
We now have all the tools we need for examples in spherical or cylindrical coordinates. What about
other cases? In general, given some p-manifold in Rn how does one find the metric on that manifold?
If we are to follow the approach of this section we’ll need to find coordinates on Rn such that the
manifold S is described by setting all but p of the coordinates to a constant. For example, in R4
we have generalized cylindircal coordinates (r, φ, z, t) defined implicitly by the equations below
On the hyper-cylinder r = R we have the metric ds2 = R2 dθ2 + dz 2 + dw2 . There are mathemati-
cians/physicists whose careers are founded upon the discovery of a metric for some manifold. This
is generally a difficult task.
124 CHAPTER 6. INTRODUCTION TO VARIATIONAL CALCULUS
6.7 geodesics
A geodesic is a path of smallest distance on some manifold. In general relativity, it turns out that
the solutions to Eistein’s field equations are geodesics in 4-dimensional curved spacetime. Particles
that fall freely are following geodesics, for example projectiles or planets in the absense of other
frictional/non-gravitational forces. We don’t follow a geodesic in our daily life because the earth
pushes back up with a normal force. Also, do be honest, the idea of length in general relativity is a
bit more abstract that the geometric length studied in this section. The metric of general relativity
is non-Euclidean. General relativity is based on semi-Riemannian geometry whereas this section
is all Riemannian geometry. The metric in Riemannian geometry is positive definite. The metric
in semi-Riemannian geometry can be written as a quadratic form with both positive and negative
eigenvalues. In any event, if you want to know more I know some books you might like.
ds2 = R2 dθ2 + dz 2
Therefore, we ought to minimize the following functional in order to locate the parametric equations
2 2
of a geodesic on the cylinder: note ds2 = R2 dθ
dt2
+ dz
dt2
dt2 thus:
Z
S= (R2 θ̇2 + ż 2 ) dt
θ̈ = 0 z̈ = 0.
θ(t) = θo + At z(t) = zo + Bt
Therefore, the geodesic on a cylinder is simply the line connecting two points in the plane which is
curved to assemble the cylinder. Simple cases that are easy to understand:
d
Euler-Lagrange equations for the dependent variables φ and θ are simply: fθ = dt (fθ̇ ) and fφ =
d
dt (fφ̇ ) which yield:
2 2 d 2 d 2 2
2R sin(θ) cos(θ)φ̇ = dt (2R θ̇) 0= 2R sin (θ)φ̇ .
dt
We find a constant of motion L = 2R2 sin2 (θ)φ̇ inserting this in the equation for the azmuthial
angle θ yields:
2 2 d 2 d 2 2
2R sin(θ) cos(θ)φ̇ = dt (2R θ̇) 0= 2R sin (θ)φ̇ .
dt
If you can solve these and demonstrate through some reasonable argument that the solutions are
great circles then I will give you points. I have some solutions but nothing looks too pretty.
and Px = mẋ, Py = mẏ and Pz = mż. These equations are easiest to solve when the force is
not a function of velocity or time. In particular, if the force F~ is conservative then there exists a
potential energy function U : R3 → R such that F~ = −∇U . We can prove that in the case the
force is conservative the total energy is conserved.
1
T = m(ẋ2 + ẏ 2 + ż 2 ).
2
If F~ is a conservative force then it is independent of path so we may construct the potential energy
function as follows: Z ~r
U (~r) = − F~ · d~r
O
Here O is the origin for the potential and we can prove that the potential energy constructed in
this manner has F~ = −∇U . We can prove that the total (mechanical) energy E = T + U for
a conservative system is a constant; dE/dt = 0. Hopefully these comments are at least vaguely
familiar from some physics course in your distant memory. If not relax, calculationally this chapter
is self-contained, read onward.
We already calculated that if we use T as the Lagrangian then the Euler-Lagrange equations
produce Newton’s equations in the case that the force is zero (see 6.5.1). Suppose that we define
the Lagrangian to be L = T −U for a system governed by a conservative force with potential energy
function U . We seek to prove the Euler-Lagrange equations are precisely Newton’s equations for
this conservative system1 Generically we have a Lagrangian of the form
1
L(x, y, z, ẋ, ẏ, ż) = m(ẋ2 + ẏ 2 + ż 2 ) − U (x, y, z).
2
R
We wish to find extrema for the functional S = L(t) dt. This yields three sets of Euler-Lagrange
equations, one for each dependent variable x, y or z
d ∂L ∂L d ∂L ∂L d ∂L ∂L
= = = .
dt ∂ ẋ ∂x dt ∂ ẏ ∂y dt ∂ ż ∂z
Note that ∂L ∂L ∂L
∂ ẋ = mẋ, ∂ ẏ = mẏ and ∂ ż = mż. Also note that
∂L
∂x = − ∂U
∂x = Fx ,
∂L
∂y = − ∂U
∂y = Fy
and ∂L ∂U
∂z = − ∂z = Fz . It follows that
Of course this is precisely m~a = F~ for a net-force F~ =< Fx , Fy , Fz >. We have shown that
Hamilton’s principle reproduces Newton’s Second Law for conservative forces. Let me take a
moment to state it.
1
don’t mistake this example as an admission that Lagrangian mechanics is limited to conservative systems. Quite
the contrary, Lagrangian mechanics is actually more general than the orginal framework of Newton!
6.8. LAGRANGIAN MECHANICS 127
If a physical system has generalized coordinates qj with velocities q˙j and Lagrangian L =
T − U then the solutions of physics will minimize the action S defined below:
Z t2
S= L(qj , q˙j , t) dt
t1
Example 6.8.2. Projectile motion: take z as the vertical direction and suppose a bullet is fired
with initial velocity vo =< vox , voy , voz >. The potential energy due to gravity is simply U = mgz
and kinetic energy is given by T = 21 m(ẋ2 + ẏ 2 + ż 2 ). Thus,
1
L = m(ẋ2 + ẏ 2 + ż 2 ) − mgz
2
Euler-Lagrange equations are simply:
d d d ∂
mẋ = 0 mẏ = 0 mż = (−mgz) = −mg.
dt dt dt ∂z
Integrating twice and applying initial conditions gives us the (possibly familiar) equations
Example 6.8.3. Simple Pendulum: let θ denote angle measured off the vertical for a simple
pendulum of mass m and length l. Trigonmetry tells us that
Thus T = 21 m(ẋ2 + ẏ 2 ) = 21 ml2 θ̇2 . Also, the potential energy due to gravity is U = −mgl cos(θ)
which gives us
1
L = ml2 θ̇2 + mgl cos(θ)
2
Then, the Euler-Lagrange equation in θ is simply:
d ∂L ∂L d g
= ⇒ (ml2 θ̇) = −mgl sin(θ) ⇒ θ̈ + sin(θ) = 0.
dt ∂ θ̇ ∂θ dt l