Introduction to the Calculus of Variations
Jim Fischer
March 20, 1999
Abstract
This is a self-contained paper which introduces a fundamental problem in the calculus of variations, the problem of finding extreme values
of functionals. The reader should have a solid background in onevariable calculus.
Contents
1 Introduction
1
2 Partial Derivatives
2
3 The Chain Rule
3
4 Statement of the Problem
4
5 The Euler-Lagrange Equation
5
6 The Brachistochrone Problem
10
7 Concluding Remarks
12
1
Introduction
We begin with an introduction to partial differentiation of functions of several variables. After partial derivatives are introduced we discuss some forms
of the chain rule. In section 4 we formulate the fundamental problem. In
section 5 we state and prove the fundamental result (The Euler-Lagrange
Equation). We conclude the paper with a solution of one of the most famous
problems from the calculus of variations, The Brachistochrone Problem.
1
2
Partial Derivatives
Given a function of one variable say f (x), we define the derivative of f (x)
at x = a to be
f (a + h) − f (a)
f ′ (a) = lim
,
h→0
h
provided this limit exists. For a function of several variables the total derivative of a function is not as easy to define, however, if we set all but one of the
independent variables equal to a constant, we can define the partial derivative of a function by using a limit similar to the one above. For example, if
f is a function of the variables x, y and z, we can set x = a and y = b and
define the partial derivative of f with respect to z to be
∂f
f (a, b, z + h) − f (a, b, z)
(x, y, z) = lim
,
h→0
∂z
h
wherever this limit exists. Note that the partial derivative is a function of
all three variables x, y and z. The partial derivative of f with respect to z
gives the instantaneous rate of change of f in the z direction. The definition
for the partial derivative of f with respect to x or y is defined in a similar
way. Computing partial derivatives is no harder than computing ordinary
one-variable derivatives, one simply treats the fixed variables as constants.
Example 1. Suppose f (x, y, z) = x2 y 2 z 2 + y cos(z), then
∂f
(x, y, z) = 2xy 2 z 2
∂x
∂f
(x, y, z) = 2x2 yz 2 + cos(z)
∂y
∂f
(x, y, z) = 2x2 y 2 z − y sin(z)
∂z
We can take higher order partial derivatives by continuing in the same
manner. In the example above, first taking the partial derivative of f with
respect to y and then with respect to z yields:
∂2f
= 4x2 yz − sin(z).
∂z∂y
Such a derivative is called a mixed partial derivative of f . From a well
known theorem of advanced calculus, if the second order partial derivatives
of f exist in a neighborhood of the point (a, b, c) and are continuous at
2
(a, b, c), then the mixed partial derivatives do not depend on the order in
which they are derived. That is, for example
∂2f
∂2f
(a, b, c) =
(a, b, c).
∂z∂y
∂y∂z
This result was first proved by Leonard Euler in 1734.
r
1 + y2
Exercise 1. Let f (x, y, z) =
, compute all three partial derivatives
z2
of f .
Exercise 2. For f in Example 1, compute a couple of mixed partial derivatives and verify that the order in which you differentiate does not matter.
3
The Chain Rule
We begin with a review of the chain rule for functions of one variable. Suppose f (x) is a differentiable function of x and x = x(t) is differentiable of
t. By the chain rule theorem, the composite function z(t) = f ◦ x(t) is a
differentiable function of t and
dz
dz dx
=
.
dt
dx dt
(3.1)
For example, if f (x) = sin(x) and x(t) = t2 , then the derivative with respect
to t of z = sin(t2 ) is given by cos(t2 )·2t. It turns out that there is a chain rule
for functions of several variables. For example, suppose x and y are functions
of t and consider the function z = [x(t)]2 + 3[y(t)]3 . We can think of z as the
composite of the function f (x, y) = x2 +3y 3 withthef unctions x(t) and y(t).
By a chain rule theorem for functions of several variables,
dz
∂z dx ∂z dy
=
+
dt
∂x dt
∂y dt
(3.2)
Note the similarity between (3.1) and (3.2). For functions of several variables, one needs to keep track of each of the independent variables separately,
applying a chain rule to each. The hypothesis for the chain rule theorem
require the function z = f (x, y) to have continuous partial derivatives and
for x(t) and y(t) to be differentiable.
3
4
Statement of the Problem
We begin with a simple example. Let P and Q be two points in the xy-plane
and consider the collection of all smooth curves which connect P to Q. Let
y(x) be such a curve with P = (a, y(a)) and Q = (b, y(b)). The arc-length
of the curve y(x) is given by the integral
Z bp
1 + [y ′ (x)]2 dx.
a
Suppose now that we wish to determine which curve will minimize the above
integral. Certainly our knowledge of ordinary geometry suggests that the
curve which minimizes the arc-length is the straight line connecting P to Q.
However, what if instead we were interested in finding which curve minimizes
a different integral? For example, consider the integral
Z bs
1 + [y ′ (x)]2
dx.
y(x)
a
It is not obvious what choice of y(x) will result in minimizing this integral.
Further, it is not at all obvious that such a minimum exists!
One way to proceed is to notice that the above integrals can be viewed as
special kinds of functions, functions whose inputs are functions and whose
outputs are real numbers. For example we could write
F [y(x)] =
Z bp
1 + [y ′ (x)]2 dx
a
More generally we could write:
F [y(x)] =
Z
b
f (x, y(x), y ′ (x)) dx
a
A function like F is actually called a functional, this name is used to
distinguish F from ordinary real-valued functions whose domains consist
of ordinary variables. The function f in the integral is to be viewed as
an ordinary function of the variables x, y and y ′ (this should become more
clear in the next section).1 One of the fundamental problems of which the
calculus of variations is concerned, is locating the extrema of functionals.
1
We don’t call f (x, y, y ′ ) a functional because its range is not
4
R.
Before we formally state the problem, we need to specify the domain of
1
F more precisely. Consider the interval [a, b] ⊂ R and define C[a,b]
to be the
set
1
C[a,b]
= {y(x)| y : [a, b] 7→ R, x has a continuous first derivative on [a,b]} .
We will consider only functionals which have certain desirable properties.
1 ,
Let F be a functional whose domain is C[a,b]
F [y(x)] =
Z
b
f (x, y(x), y ′ (x)) dx.
a
We will require that the function f in the integral have continuous partial
derivatives of x, y and y ′ . We require the continuity of derivatives because
we will need to apply chain rules and the Leibniz rule for differentiation.
We now state the fundamental problem.
2 , with F [y(x)]
Problem: Let F be a functional defined on C[a,b]
given by
Z b
f (x, y(x), y ′ (x)) dx.
F [y(x)] =
a
Suppose the functional F obtains a minimum (or maximum)
value2 . How do we determine the curve y(x) which produces
such a minimum (maximum) value for F ?
In the next section we will show that the minimizing curve y(x) must satisfy
a differential equation known as the Euler-Lagrange Equation.
5
The Euler-Lagrange Equation
We begin this section with the fundamental result:
2
Theorem 1. If y(x) is a curve in C[a,b]
which minimizes the functional
F [y(x)] =
Z
b
f (x, y(x), y ′ (x)) dx,
a
then the following differential equation must be satisfied:
d
∂f
∂f
−
= 0.
∂x dx ∂x′
This equation is called the Euler-Lagrange Equation.
2
This is an important assumption for there do exist Functionals which have no extrema.
5
Before proving this theorem, we consider an example.
Rbp
Example 2. If F [y(x)] = a 1 + [y ′ (x)]2 dx, then the Euler-Lagrange Equation is given by:
d
∂f
∂f
−
0=
∂y
dx ∂y ′
!
d
y ′ (x)
p
=0−
dx
1 + [y ′ (x)]2
p
− 1
1 + [y ′ (x)]2 y ′′ (x) − [y ′ (x)]2 y ′′ (x) 1 + [y ′ (x)]2 2
=−
1 + [y ′ (x)]2
1 + [y ′ (x)]2 y ′′ (x) − [y ′ (x)]2 y ′′ (x)
=−
3
(1 + [y ′ (x)]) 2
y ′′ (x)
=−
3
(1 + [y ′ (x)]) 2
Exercise 3. Show that the solution to
0=
y ′′ (x)
3
(1 + [y ′ (x)]) 2
is a straight line. That is y(x) = Ax + B. Is this a proof that the shortest
path between two points is a straight line?
The proof of Theorem 1 relies on three things, the Leibniz rule, integration by parts and Lemma 1. It is assumed that the reader is familiar with
integration by parts, we will discuss the Leibniz rule later, and we state and
prove Lemma 1 now.
Lemma 1. Let M (x) be a continuous function on the interval [a, b]. Suppose
that for any continuous function h(x) with h(a) = h(b) = 0 we have
Z
b
M (x)h(x) dx = 0.
a
Then M (x) is identically zero3 on [a, b].
3
Actually the function is zero almost everywhere. This means that the set of x values
where the function is not zero has a length of zero.
6
Proof of Lemma 1: Since h(x) can be any continuous function with
h(a) = h(b) = 0, we choose h(x) = −M (x)(x − a)(x − b). Clearly h(x)
is continuous since M is continuous. Also, M (x)h(x) ≥ 0 on [a, b] (check
this). But, if the definite integral of a non-negative function is zero then the
function itself must be zero. So we conclude that
0 = M (x)h(x)
= [M (x)]2 [−(x − a)(x − b)].
This and the fact that [−(x−a)(x−b)] > 0 on (a, b) implies that [M (x)]2 = 0
on [a, b]. Finally, [M (x)]2 = 0 on [a, b] implies that M (x) = 0 on [a, b].
Proof of Theorem 1: Suppose y(x) is a curve which minimizes the
functional F . That is, for any other permissible curve g(x), F [y(x)] ≤
F [g(x)]. The basic idea in this proof will be to construct a function of one
real variable say H(ǫ) which has the following properties:
1. H(ǫ) is a differentiable function near ǫ = 0.
2. H(0) is a local minimum for H.
After constructing H, we show that Property 2 implies the Euler-Lagrange
equation must be satisfied.
We begin by constructing a variation of y(x). Let ǫ be a small real number
(positive or negative), and consider the new function given by:
yǫ (x) = y(x) + ǫh(x)
2
where h(x) ∈ C[a,b]
and h(a) = h(b) = 0.
We can now define the function H to be
H(ǫ) = F [yǫ (x)].
Since x0 (t) = y(x) and y(x) minimizes F [y(x)], it follows that 0 minimizes
H(ǫ). Now, since H(0) is a minimum value for H, we know from ordinary
calculus that H ′ (0) = 0. The function H can be differentiated by using the
Leibniz rule4 :
Z
d
d b
(H(ǫ)) =
f (x, yǫ , yǫ′ ) dx
dǫ
dǫ a
Z b
∂
=
f (x, yǫ , yǫ′ ) dx
∂ǫ
a
4
For a proof of the Leibniz rule, check out a text on advanced calculus.
7
P
variation of y(x)
y(x)
Q
a
b
Figure 1: A variation of y(x).
Applying the chain rule within the integral we obtain:
∂
∂f ∂x
∂f ∂yǫ
∂f ∂x′
f (x, yǫ , yǫ′ ) =
+
+ ′ ǫ
∂ǫ
∂x ∂ǫ
∂yǫ ∂ǫ
∂yǫ ∂ǫ
∂f ∂yǫ
∂f ∂x′ǫ
=
+ ′
∂yǫ ∂ǫ
∂yǫ ∂ǫ
∂f
∂f
=
h(x) + ′ h′ (x)
∂yǫ
∂yǫ
Exercise 4. Show that equations (5.1) through (5.3) are true.
8
(5.1)
(5.2)
(5.3)
x axis
From these computations, we have
Z b
∂f
∂f ′
′
H (ǫ) =
h(x) + ′ h (x) dx.
∂yǫ
∂yǫ
a
Evaluating this equation at ǫ = 0 yields
Z b
∂f
∂f
0=
h(x) + ′ h′ (x) dx.
∂y
∂y
a
(5.4)
(5.5)
At this point we would like to apply Lemma 1 but in order to do so, we
must first apply integration by parts to the second term in the above integral.
Once this is done, the following equation is obtained from equation (5.5):
Z b
d
∂f
∂f
0=
−
h(x) dx.
(5.6)
∂y
dx ∂y ′
a
Finally, since this procedure works for any function h(x) with h(a) = h(b) =
0, we can apply Lemma 1 and conclude that
∂f
d
∂f
0=
−
.
∂y
dx ∂y ′
This completes the proof of Theorem 1.
Exercise 5. Verify equation (5.6) by doing the integration by parts in equation (5.5).
Beltrami Identity
Often in applications, the function f which appears in the integrand does not
depend directly on the variable x. In these situations, the Euler-Lagrange
equation takes a particularly nice form. This simplification of the EulerLagrange equation is known as the Beltrami Identity. We present without
proof the Beltrami Identity, it is not obvious how it arises from the EulerLagrange equation, however, its derivation is straight-forward.
The Beltrami Identity: If
is equivalent to:
∂f
∂x
f − y′
= 0 then the Euler-Lagrange equation
∂f
=C
∂y ′
(5.7)
where C is a constant.
Exercise 6. Use the Beltrami identity to produce the differential equation
in Example 2.
9
6
The Brachistochrone Problem
Suppose P and Q are two points in the plane. Imagine there is a thin,
flexible wire connecting the two points. Suppose P is above Q, and we
let a frictionless bead travel down the wire impelled by gravity alone. By
changing the shape of the wire we might alter the amount of time it takes for
the bead to travel from P to Q. The brachistochrone problem (or quickest
descent problem) is concerned with determining what shape (if any) will
result in the bead reaching the point Q in the least amount of time. This
problem was first introduced by J. Bernoulli in the mid 17th century, and
was first solved by Isaac Newton. In this section we set up the relevant
functional and then apply Theorem 1 to see what the differential equation
associated with this problem looks like. Finally, we provide a solution to
this differential equation.
First, we let a curve y(x) that connects P and Q represent the wire. As
before assume P = (a, y(a)) and Q = (b, y(b)). We will restrict ourselves to
2 . Given such a curve, the time it takes for the
curves that belong to C[a,b]
bead to go from P to Q is given by the functional5
Z b
ds
(6.1)
F [y(x)] =
a v
p
where ds = 1 + [y ′ (x)]2 dx and v(x) = y ′ (x). By using Newton’s second
law (Potential and Kinetic energies are equal) we obtain
1
m[v(x)]2 = mg(y(a) − y(x))
2
This allows us to rewrite the functional (6.1) as
Z bs
1 + [y ′ (x)]2
F [y(x)] =
dx
2g(y(a) − y(x)
a
Assuming a minimum time exists, we can apply Theorem 1 to the functional F . Notice that the integrand does not depend directly on the variable
x and therefore we can apply the Beltrami Identity. We can also make computations a little easier letting P = (0, 0), the resulting equation is then
s
s
!
y ′ (x)
1 + [y ′ (x)]2 1 ′
2gy(x)
− y (x)
=C
(6.2)
2gy(x)
2
1 + [y ′ (x)]2 gy(x)
5
For convenience we use the down direction to represent positive y values.
10
Equation (6.2) simplifies to
1 + [y ′ (x)]2 y(x) =
1
= k2
2gC 2
(6.3)
Finally, equation (6.3) is well known and the solution is a cycloid.6 The
parametric equations of the cycloid are given by:
1
x(θ) = k2 (θ − sin θ)
2
1
y(θ) = k2 (1 − cos θ)
2
π
π
Example 3. With P = (0, 0) and Q = ( , −1),k = 1 and 0 ≤ θ ≤ .
2
2
Figure 2 shows the cycloid solution to the Brachistochrone problem.
A strange property of the cycloid is the followig: If we let the frictionless
bead start from rest at any point on the cycloid, the amount of time it takes
to reach point Q is always the same.
x
P
y
Q = (pi/2,−1)
Figure 2: The Cycloid Solution for Example 3.
6
The Cycloid curve is the path of a fixed point on the rim of a wheel as the wheel is
rotated along a straight line.
11
7
Concluding Remarks
After introducing the notion of partial derivatives and the chain rule for
functions of several variables, we were able to state a problem that the calculus of variations is concerned. This is the problem of identifying extreme
values for functionals (which are functions of functions). In section 5 we
showed that under the assumption that a minimum (or maximum) solution
exists, the solution must satisfy the Euler-Lagrange equation. The analysis
which lead to this equation relied heavily on the fact from ordinary calculus
that the derivative of a function at an extreme value is zero (provided that
the derivative exists there).
We admit that much of the analytical details have been omitted and
encourage the interested reader to look further into these matters. It turns
out that problems like the brachistochrone can be extended to situations
with an arbitrary number of variables as well as to regions which are nonEuclidean. In fact, many books about General Relativity deduce the Einstein field equations via a variational approach which is based on the ideas
that were discussed in this paper.
References
[1] Widder, D.,
Advanced Calculus, Prentice-Hall, 1961.
[2] Boyer, C., A History of Mathematics, John Wiley and Sons, 1991.
[3] Troutman, J., Variational Calculus with Elementary Convexity,
Springer-Verlag, 1980.
12