Lagrange Multipliers On Manifolds: 1 Statements of Theorem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Lagrange multipliers on manifolds

stevecheng
2013-03-21 19:30:54
We discuss in this article the theoretical aspects of the Lagrange multiplier
method.
To enhance understanding, proofs and intuitive explanations of the Lagrange
multipler method will be given from several different viewpoints, both elementary and advanced.

Contents
1

Statements of theorem

Let N be a n-dimensional differentiable manifold (without boundary), and


f: N T
R, and gi : N R, for i = 1, . . . , k, be continuously differentiable.
k
Set M = i=1 gi1 ({0}).

1.1

Formulation with differential forms

Theorem 1. Suppose dgi are linearly independent at each point of M . If p M


is a local minimum or maximum point of f restricted to M , then there exist
Lagrange multipliers 1 , . . . , k R, depending on p, such that
df (p) = 1 dg1 (p) + + k dgk (p) .
Here, d denotes the exterior derivative.
P
Of course, as in one-dimensional calculus, the condition df (p) = i i dgi (p)
by itself does not guarantee p is a minimum or maximum point, even locally.
hLagrangeMultipliersOnManifoldsi created: h2013-03-21i by: hstevechengi version:
h37276i Privacy setting: h1i hTopici h58C05i h49-00i
This text is available under the Creative Commons Attribution/Share-Alike License 3.0.
You can reuse this document or portions thereof only if you do so under terms that are
compatible with the CC-BY-SA license.

1.2

Formulation with gradients

The version of Lagrange multipliers typically used in calculus is the special case
N = Rn in Theorem 1. In this case, the conclusion of the theorem can also be
written in terms of gradients instead of differential forms:
Theorem 2. Suppose gi are linearly independent at each point of M . If
p M is a local minimum or maximum point of f restricted to M , then there
exist Lagrange multipliers 1 , . . . , k R, depending on p, such that
f (p) = 1 g1 (p) + + k gk (p) .
This formulation and the first one are equivalent since the 1-form df can be
identified with the gradient f , via the formula f (p) v = df (p; v) = dfp (v).

1.3

Formulation with tangent maps

The functions gi can also be coalesced into a vector-valued function g : N Rk .


Then we have:
Theorem 3. Let g = (g1 , . . . , gk ) : N Rk . Suppose the tangent map D g is
surjective at each point of M . If p M is a local minimum or maximum point
of f restricted to M , then there exists a Lagrange multiplier vector (Rk ) ,
depending on p, such that
D f (p) = D g(p) .
Here, D g(p) : (Rk ) (Tp N ) denotes the pullback of the linear transformation
D g(p) : Tp N Rk .
If D g is represented by its Jacobian matrix, then the condition that it be
surjective is equivalent to its Jacobian matrix having full rank.
Note the deliberate use of the space (Rk ) instead of Rk to which the
former is isomorphic to for the Lagrange multiplier vector. It turns out
that the Lagrange multiplier vector naturally lives in the dual space and not
the original vector space Rk . This distinction is particularly important in the
infinite-dimensional generalizations of Lagrange multipliers. But even in the
finite-dimensional setting, we do see hints that the dual space has to be involved,
because a transpose is involved in the matrix expression for Lagrange multipliers.
If the expression D g(p) is written out in coordinates, then it is apparent
that the components i of the vector are exactly those Lagrange multipliers
from Theorems 1 and 2.

Proofs

The proof of the Lagrange multiplier theorem is surprisingly short and elegant,
when properly phrased in the language of abstract manifolds and differential
forms.
2

However, for the benefit of the readers not versed in these topics, we provide,
in addition to the abstract proof, a concrete translation of the arguments in the
more familiar setting N = Rn .

2.1

Beautiful abstract proof

Tk
Proof. Since dgi are linearly independent at each point of M = i=1 gi1 ({0}),
M is an embedded submanifold of N , of dimension m = n k. Let : U M ,
with U open in Rm , be a coordinate chart for M such that (0) = p. Then f
has a local minimum or maximum at 0, and therefore 0 = d( f ) = df at 0.

But at p is an isomorphism (Tp M ) (T0 Rm ) , so the preceding equation


says that df vanishes on Tp M .
Now, by the definition of gi , we have gi = 0, so 0 = d( gi ) = dgi . So
like df , dgi vanishes on Tp M .
0
In other words, dgi (p) is in the annihilator (Tp M ) of the subspace Tp M
Tp N . Since Tp M has dimension m = n k, and Tp N has dimension n, the
0
0
annihilator (Tp M ) has dimension k. Now dgi (p) (Tp M ) are linearly inde0
pendent, so they must in fact be a basis for (Tp M ) . But we had argued that
0
df (p) (Tp M ) . Therefore df (p) may be written as a unique linear combination
of the dgi (p):
df (p) = 1 dg1 (p) + + k dgk (p) .
The last paragraph of the previous proof can also be rephrased, based on the
same underlying ideas, to make evident the fact that the Lagrange multiplier
vector lives in the dual space (Rk ) .
Alternative argument. A general theorem in linear algebra states that for any
linear transformation L, the image of the pullback L is the annihilator of the
kernel of L. Since ker D g(p) = Tp M and df (p) (Tp M )0 , it immediately
follows that (Rk ) exists such that df (p) = D g(p) .
Yet another proof could be devised by observing that the result is obvious
if N = Rn and the constraint functions are just coordinate projections on Rn :
gi (y1 , . . . , yn ) = yi ,

i = 1, . . . , k .

We clearly must have f /yk+1 = = f /yn = 0 at a point p that minimizes


f (y) over y1 = = yk = 0. The general case can be deduced to this by a
coordinate change:
Alternate argument. Since dgi are linearly independent, we can find a coordinate
chart for N about the point p, with coordinate functions y1 , . . . , yn : N R such
that yi = gi for i = 1, . . . , k. Then
f
dy1 + +
y1
f
=
dg1 + +
g1

df =

f
dyn
yn
f
f
f
dgk +
dyk+1 + +
dyn ,
gk
yk+1
yn
3

but f /yk+1 = = f /yn = 0 at the point p. Set i = f /gi at p.

2.2

Clumsy, but down-to-earth proof

Proof. We assume that N = Rn . Consider the list vector g = (g1 , . . . , gk )


discussed earlier, and its Jacobian matrix D g in Euclidean coordinates. The
ith row of this matrix is


gi
gi
T
= (gi ) .
...
x1
xn
So the matrix D g has full rank (i.e. rank D g = k) if and only if the k gradients
gi are linearly independent.
Consider each solution q M of g(q) = 0. Since D g has full rank, we
can apply the implicit function theorem, which states that there exist smooth
solution parameterizations : U M around each point q M . (U is an
open set in Rm , m = n k.) These are the coordinate charts which give to
M = g 1 ({0}) a manifold structure.
We now consider specially the point q = p; without loss of generality, assume
(0) = p. Then f is a function on Euclidean space having a local minimum
or maximum at 0, so its derivative vanishes at 0. Calculating by the chain
rule, we have 0 = D(f )(0) = D f (p) D (0). In other words, ker D f (p)
range of D (0) = Tp M . Intuitively, this says that the directional derivatives
at p of f lying in the tangent space Tp M of the manifold M vanish.
By the definition of g and , we have g = 0. By the chain rule again, we
derive 0 = D g(p) D (0).
Let the columns of D (0) be the column vectors v1 , . . . , vm , which span the
m-dimensional space Tp M , and look at the matrix equation 0 = D f (p) D (0)
again. The equation for each entry of this matrix, which consists of only one
row, is:
f (p) vj = 0 , j = 1, . . . , m .
In other words, f (p) is orthogonal to v1 , . . . , vm , and hence it is orthogonal to
the entire tangent space Tp M .
Similarly, the matrix equation 0 = D g(p) D (0) can be split into individual
scalar equations:
gi (p) vj = 0 ,

i = 1, . . . , k, j = 1, . . . , m .

Thus gi (p) is orthogonal to Tp M . But gi (p) are, by hypothesis, linearly


independent, and there are k of these gradients, so they must form a basis for
the orthogonal complement of Tp M , of n m = k dimensions. Hence f (p)
can be written as a unique linear combination of gi (p):
f (p) = 1 g1 (p) + + k gk (p) .

Intuitive interpretations

We now discuss the intuitive and geometric interpretations of Lagrange multipliers.

3.1

Normals to tangent hyperplanes

Each equation gi = 0 defines a hypersurface Mi in Rn , a manifold of dimension n 1. If we consider the tangent hyperplane at p of these hypersurfaces,
Tp Mi , the gradient gi (p) gives the normal vector to these hyperplanes.
The manifold M is the intersection of the hypersurfaces Mi . Presumably, the
tangent space Tp M is the intersection of the Tp Mi , and the subspace perpendicular to Tp M would be spanned by the normals gi (p). Now, the direction
derivatives at p of f with respect to each vector in Tp M , as we have proved,
vanish. So the direction of f (p), the direction of the greatest change in f at
p, should be perpendicular to Tp M . Hence f (p) can be written as a linear
combination of the gi (p).
Note, however, that this geometric picture, and the manipulations with the
gradients f (p) and gi (p), do not carry over to abstract manifolds. The
notions of gradients and normals to surfaces depend on the inner product structure of Rn , which is not present in an abstract manifold (without a Riemannian
metric).
On the other hand, this explains the mysterious appearance of annihilators
in the last paragraph of the abstract proof. Annihilators and dual space theory
serve as the proper tools to formalize the manipulations we made with the
matrix equations 0 = D f (p) D (0) and 0 = D g(p) D (0), without resorting
to Euclidean coordinates, which, of course, are not even defined on an abstract
manifold.

3.2

With infinitesimals

If we are willing to interpret the quantities df and dgi as infinitesimals, even


the abstract version of the result has an intuitive explanation. Suppose we are
at the point p of the manifold M , and consider an infinitesimal movement p
about this point. The infinitesimal movement p is a vector in the tangent
space Tp M , because, near p, M looks like the linear space Tp M . And as p
moves, the function f changes by a corresponding infinitesimal amount df that
is approximately linear in p.
Furthermore, the change df may be decomposed as the sum of a change as
p moves along the manifold M , and a change as p moves out of the manifold
M . But if f has a local minimum at p, then there cannot be any change of f
along M ; thus f only changes when moving out of M . Now M is described by
the equations gi = 0, so a movement out of M is described by the infinitesimal
changes dgi . As df is linear in the change p, we ought to be able to write it

as a weighted sum of the changes dgi . The weights are, of course, the Lagrange
multipliers i .
The linear algebra performed in the abstract proof can be regarded as the
precise, rigorous translation of the preceding argument.

3.3

As rates of substitution

Observe that the formula for Lagrange multipliers is formally very similar to
the standard formula for expressing a differential form in terms of a basis:
df (p) =

f
f
dy1 + +
dyk .
y1
yk

In fact, if dgi (p) are linearly independent, then they do form a basis for
(Tp M )0 , that can be extended to a basis for (Tp N ) . By the uniqueness of the
basis representation, we must have
i =

f
.
gi

That is, i is the differential of f with respect to changes in gi .


In applications of Lagrange multipliers to economic problems, the multipliers
i are rates of substitution they give the rate of improvement in the objective
function f as the constraints gi are relaxed.

Stationary points

In applications, sometimes we are interested in finding stationary points p of


f defined as points p such that df vanishes on Tp M , or equivalently, that
the Taylor expansion of f at p, under any system of coordinates for M , has
no terms of first order. Then the Lagrange multiplier method works for this
situation too.
The following theorem incorporates the more general notion of stationary
points.
Theorem 4. Let N be a n-dimensional differentiable manifold (without boundary), and f : N R, gi : N R, for i = 1, . . . , k, be continuously differentiable.
Tk
Suppose p M = i=1 gi1 ({0}), and dgi (p) are linearly independent.
Then p is a stationary point (e.g. a local extremum point) of f restricted to
M , if and only if there exist 1 , . . . , k R such that
df (p) = 1 dg1 (p) + + k dgk (p) .
The Lagrange multipliers i , which depend on p, are unique when they exist.
In this formulation, M is not necessarily a manifold, but it is one when
intersected with a sufficiently small neighborhood about p. So it makes sense
to talk about Tp M , although we are abusing notation here. The subspace
6

in question can be more accurately described as the annihilated subspace of


span{dgi (p)}.
It is also enough that dgi be linearly independent only at the point p. For dgi
are continuous, so they will be linearly independent for points near p anyway,
and we may restrict our viewpoint to a sufficiently small neighborhood around
p, and the proofs carry through.
The proof involves only simple modifications to that of Theorem 1 for
instance, the converse implication follows because we have already proved that
the dgi (p) form a basis for the annihilator of Tp M , independently of whether
or not p is a stationary point of f on M .

References
[1] Friedberg, Insel, Spence. Linear Algebra. Prentice-Hall, 1997.
[2] David Luenberger. Optimization by Vector Space Methods. John Wiley &
Sons, 1969.
[3] James R. Munkres. Analysis on Manifolds. Westview Press, 1991.
[4] R. Tyrrell Rockafellar. Lagrange Multipliers and Optimality. SIAM Review. Vol. 35, No. 2, June 1993.
[5] Michael Spivak. Calculus on Manifolds. Perseus Books, 1998.

You might also like