Second Text
Second Text
Second Text
Joseph D. Fehribach
WPI Mathematical Sciences
100 Institute Rd.
Worcester, MA 01609-2247
Copyright
c 2019 Joseph D. Fehribach
Foreword
In the late 1990s, I was asked to teach our second-year course on vector and tensor calculus.
When I looked at what books were available covering this material at roughly this level, I found
that there were few choices. In fact, as best as I can now recall, I found only one: P.C. Matthews,
Vector Calculus. Happily, this book covered exactly the topics in our course description, and
at precisely the correct level. I adopted Vector Calculus for my class that year, and as far as I
know, it has been used for the course ever since. With few exceptions, students from my classes
that I have spoken to about the book have also liked it and found it helpful in learning the
material.
About ten years later, as I was teaching our course on multivariable calculus (which is typically
the course that WPI students take before perhaps taking vector and tensor calculus), it occurred
to me that there should be a book covering this multivariable material that was the counterpart
to Vector Calculus. When I checked, I could not find such a book, so, eventually, I proposed to
write it myself. The current text is the result. It is similar in spirit, though perhaps somewhat
more rigorous mathematically (more emphasis on proofs) than the book that inspired it.
This book covers the material normally included in an American multivariable and vector
calculus course. It is written, I hope, at a relatively high level, designed for students who have
earned high marks on the AP Calculus BC exam (American system) or a maths A-level (British
system) and who are interested in learning multivariable calculus in some depth. As is the case
for Prof. Matthews text, this book is an alternative to the standard thousand-page calculus
text that covers all of classical calculus with many applications and much discussion of ancillary
material. Because this text is relatively brief, there is no remedial material, no discussion of
numerical analysis, and there is no extensive treatment of applications (though a number of
connections to physics are highlighted). It can be used as the text for a course or for self study
by students working on their own. But in any case, I hope that it is useful to students trying
to master multivariable and vector calculus.
Joseph D. Fehribach
Worcester, Massachusetts
August 2019
ii
To all those who have been my teachers and mentors in mathematics, physics and chemistry
over the years, in particular
iii
Notes to the Reader
Readers should be aware of the following points as they work through this book:
• Through out the book, several bits of mathematical shorthand are included, in part to
help the reader understand what they mean and how they are used.
· The symbol “:=” is read as “defined equal to” and is used in definitions to indicate
that the new entity on the left of the colon is defined to be the previously discussed
entity on the right.
· The symbol “≡” is read as “identically equal to” and means that the two equalities
on either side are always equal; so, for example, sin2 θ + cos2 θ ≡ 1 no matter what
angle θ is chosen.
· The standard equal sign “=” is used in all other cases; so x = 2 in one example, but
x may take on other values in other discussions.
iv
Contents
I Introduction/Background 1
1.2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
II Vector Functions 27
v
2.3.1 Tangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.3 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Limits in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Continuity in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vi
3.4.1 The Basic Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Maximums and Minimums for Continuous Functions on Closed and Bounded
Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.4 The Fubini Theorem and the Relationship between Riemann and Iterated
Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.5 When It All Goes Wrong: Functions Not Riemann Integrable . . . . . . 117
vii
5.2.4 What Does It All Mean ? What Do Double Integrals Represent ? . . . . 126
viii
Chapter I
Introduction/Background
As its title indicates, this book discusses multivariable and vector calculus. Most or all students
who use this book will be familiar with single-variable calculus where, typically, y = f (x).
Most books or courses on single-variable calculus cover topics such as single-variable limits,
differentiation, integration, and frequently, sequences and series. All of these will be taken as
basic and background to this book.
Concepts from single-variable calculus are widely used today. Many problems in science, engi-
neering, social science and elsewhere, however, do not fit within the boundaries of single-variable
calculus. They involve, for example, temperature, that can depend on the location in three-
dimensional space as well as time, or sales figures that can vary with products, store locations,
population density and time. In the case of temperature, a function T might depend on four
independent variables: x, y, z and t. Writing another way, T : R4 → R, which is read “T
maps R4 into the real numbers.” Often one does not want to concentrate on the view of T as
a function, but rather to think of it as a dependent variable, i.e., T = T (x, y, z, t). So in this
case T is both a multivariable function and the dependent variable—this is one type of function
that we will concentrate on in this book.
In another case, one might want to study the track of a particle travelling through space. Now
the only independent variable might just be time t, but the dependent variables are the particles
location in three-dimensional space. Then among several alternatives, one might choose to
express this location as a vector function: x : R → R3 or x = x(t). Here one can think of x as
(1) the point where the particle is located in R3 , (2) the location vector in three-dimensional
space from the origin to the point where the particle is located, and (3) the vector function
giving that location as a function of time. All of these views of x will be discussed in this text.
1
Single Variable Calculus
y = f(x) x = f( t)
Vector Functions
Multivariable Calculus
z = f(x,y)
w = F(x,y,z) y = F(x)
Multivariable Functions Vector Fields
Figure 1.1: General overview of calculus. As one moves across this diagram, the number of
domain variables increases; as one moves downward, the number of range variables increases.
The diagram shows the main areas of calculus; those discussed in this book are inside the red
polygon.
An overview of the structure or layout of topics in calculus is displayed graphically in Figure 1.1.
In the upper left-hand corner of the diagram is single-variable calculus where y = f (x). As
mentioned above, all of the topics, concepts and material in this area is taken as given for our
present discussion. We focus on extending calculus to the functions inside the red polygon:
multivariable functions, vector functions and vector fields. Multivariable functions have several
independent variables, but a single dependent variable, and are typically of the forms z = f (x, y)
or w = F (x, y, z). Vector functions, on the other hand, have a single independent variable and
a vector with several components as the dependent variables. Vector functions are typically
of the form x = x(t) and the independent variable is often time. Vector fields have multiple
independent and dependent variables. The arrows in Figure 1.1 indicate roughly the flow of the
discussion of the material: We will first extend the concepts of calculus to vector functions and
multivariable functions, then combine all of this to understand calculus and vector fields.
2
1.2 Vectors, Lines and Planes in R3
This short section deals with several basic concepts in three-dimensional space (R3 ). While
nothing in this section involves calculus, a clear understanding of this material is necessary to
study multivariable calculus. Students familiar with the basics of these concepts, particularly
in regard to vectors, may wish to skip this section or at least the first subsection.
1.2.1 Vectors
(x , y , z )
( a1 , a2 , a3 )
a = <a1 , a2 , a3>
< x − x0 , y − y0 , z − z0 >
a3
( x0 , y0 , z0 )
y
(0 , 0 , 0 )
a1
a2
Figure 1.2: Coordinate axes, vectors (red) between points (blue) in R3 . The vector from the
origin to a point (a1 , a2 , a3 ) is a := ha1 , a2 , a3 i; the vector from (xo , yo , zo ) to (x, y, z) is
hx − xo , y − yo , z − zo i.
3
Another issue that will arise from time to time is how to name the coordinate axes. Following
the standard tradition, the axes will normally be labeled as x and y in R2 and x, y and z in
R3 . But at times, it will be convenient to number the axes; thus in R3 the axes would be x1 , x2
and x3 . This is particularly true if we are working in Rn for n > 3. The reader should become
familiar with both of these conventions since both are used in various areas of mathematics,
science and engineering.
q
Definition: The length of a vector a ∈ R3 is defined as |a| := a21 + a22 + a23 .
Definition: Vector Addition: The sum of a and b is defined as a+b := ha1 +b1 , a2 +b2 , a3 +b3 i .
Vector Subtraction: The difference between a and b is a − b := ha1 − b1 , a2 − b2 , a3 − b3 i . So
the difference is the sum of a and the negative of b.
Definition: Scalar Multiplication: If c is a real number (c ∈ R), then ca := hca1 , ca2 , ca3 i .
Definition: The scalar product, inner product, or dot product (three names for the same prod-
uct) of two vectors a, b ∈ R3 is defined as a · b := a1 b1 + a2 b2 + a3 b3 .
Definition: The cross product of two vectors a, b ∈ R3 is given by expanding the following
pseudo-determinant by its first row:
i j k
a × b := ha2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 i ≡ a1 a2 a3
b1 b2 b3
Remarks:
1. Of these five definitions, the first three are probably fairly straight-forward. A vector’s
length is simply how long the arrow is. This definition of length is of course based on
the Pythagorean theorem.∗ It is probably not so obvious why the definitions of the two
products (inner and cross) are the correct ones, i.e., the ones that match what nature and
reality require. There are in fact other ways to define vector products, but as we shall
see, these two arise naturally in many applications in math, physics and engineering.
∗
Named for Greek philosopher Pythagoras of Samos from the 6-th century BC.
4
2. There is a secret word for how the basic vector operations other than the cross product
work: componentwise. Vector addition, subtraction, scale multiplication and the dot
product all involve matching-up the various components of the vectors involved.
3. Vector addition has an important graphical implication: from the definition of addition,
if the tail of vector b is placed at the head of vector a, then the sum a + b is the vector
from the tail of a to the head of vector b as is shown in Figure 1.3.
a
a+b
b
b
a
Figure 1.3: The sum of a and b is the red vector a + b that runs from the tail of a to the head
of b. The diagram also makes clear the order that vectors are added does not change the sum:
a + b = b + a. What is the blue vector in terms of a and b?
4. In the definition of the cross product, the array to the right of the identity sign, ≡, is a
pseudo-determinant. It is a pseudo-determinant (rather than a determinant) because its
first row is made up of unit vectors rather than real numbers, and this is why it must
be expanded by this row. The coordinate unit vectors are i := h1, 0, 0i, j := h0, 1, 0i
and k := h0, 0, 1i (see Figure 1.4). Anyone who is not familiar with determinants can
either ignore this array and simply use the vector between the definition sign, :=, and the
identity sign as the definition, or can look in almost any linear algebra text to see how a
determinant is expanded.
5. All the definitions above and the results below are basically the same in Rn for all n ∈ Z+ ,
except for the cross product which is fundamentally a three-dimensional concept.†
6. Notice that the cross product of two vectors is itself a vector, but the dot product of two
vectors is a scalar, that is, a real number.
Example 1.1: If x = h3, −2, 7i and y = h−1, 2, 5i, it is easy to compute |x|, |y|, x + y,
y − x, x · y, x × y:
†
There is, in fact, a cross product in R7 associated with the Fano plane, but this latter version is much less
frequently used.
5
Answer:
p √
|x| = 32 + (−2)2 + 72 = 62
p √
|y| = (−1)2 + 22 + 52 = 30
The next proposition gives some basic results that begin to show why length, dot product and
cross product are so important.
Proposition 1 For any two vectors a, b ∈ R3 , with θ the (smaller) angle between them,
1. a · b = b · a
2. |a|2 = a · a
3. a × b = −b × a
4. a · b = |a||b| cos θ
5. |a × b| = |a||b| sin θ
7. The direction of a × b is perpendicular to the plane containing a and b, and obeys the
right-hand rule. This relationship is shown in Figure 1.4.
Remark: The right-hand rule is a mnemonic for remembering the direction of the cross
product relative to the two vectors, as well as the standard orientation of the coordinate axes.
According to the rule, if one points the fingers of ones right hand in the direction of a vector
a ∈ R3 , then curl the fingers of that hand toward the direction of b ∈ R3 , then the thumb points
in the direction of the cross product a × b. This rule also gives the orientation of a right-handed
coordinate system in R3 ; if one points the fingers of ones hand in the x or x1 direction, then
curl the right fingers toward the direction of y or x2 , then the thumb points in the direction of
6
z
ax b
k
a
j
y
Figure 1.4: Coordinate axes, coordinate unit vectors (blue) and the cross product (red) in R3 .
As depicted here, the standard convention is that i points in the x-direction, j points in the
y-direction, and k points in the z-direction. The vectors a and b are more-or-less randomly
placed in this diagram, but given a and b, the vector a × b is completely determined. Notice
that the cross product a × b is perpendicular to both a and b.
z or x3 . Notice that if one carries out the above procedure with ones left hand, then the thumb
points in the opposite direction.
Proof: For (1), (2) and (3), each proof is simply the application of the definition and is left
as an exercise (Exercise 1.7). For (4), the key is the law of cosines applied to the diagram in
Figure 1.5 below. According to the law of cosines, |a − b|2 = |a|2 + |b|2 − 2|a||b| cos θ. On the
a−b
θ b
a
Figure 1.5: Diagram to determine the relationship between the dot product and the angle
θ . By the law of cosines, the lengths of the three vectors must be related as |a − b|2 =
|a|2 + |b|2 − 2|a||b| cos θ .
Example 1.2: Suppose that a = h1, −1, 3i and b = h6, 2, −2i. Please find the angle between
these two vectors when they are placed tail-to-tail.
7
Answer: The angle between two given vectors can be found using either (3) or (5) from Propo-
sition 1, but since dot products are easier to compute, (3) is the easier choice. Using (3), one
finds that
a·b 1(6) − 1(2) + 3(−2) −1
=p p = = cos θ ,
|a||b| 12 + (−1)2 + 32 62 + 22 + (−2)2 11
There are two important results that follow immediately from (3) and (5) in the previous
proposition and help us determine whether vectors are perpendicular or parallel‡ .
Definition: Two nonzero vectors a, b ∈ R3 are perpendicular ( a ⊥ b) iff the angle between
them is ±π/2. These vectors are parallel iff the angle between them is zero. They are antiparallel
iff the angle between them is ±π. If a and b are either parallel or antiparallel, then a||b.
Example 1.3: For what value of x1 are the vectors hx1 , 6, −3i and h5, −2, 7i perpendicular?
Answer: This result follows from the first corollary above: vectors are perpendicular exactly
when their dot product is zero. So the requirement is that
which means that h33/5, 6, −3i and h5, −2, 7i are perpendicular.
1.2.2 Planes in R3
Next we need to define something that most people have an intuitive sense of: a plane in R3 .
8
(x, y, z) is an arbitrary point on Π, the the vector hx − xo , y − yo , z − zo i is perpendicular to
N (see Figure 1.6). In other words
N = < A, B, C >
( x0 , y0 , z0) ( x, y, z )
Figure 1.6: The point-normal equation of a plane. Here N represents any vector normal or
perpendicular to the plane, and hx − xo , y − yo , z − zo i is a vector in the plane, starting at a
fixed point (xo , yo , zo ) and ending at any point (x, y, z) on the plane. Since these vectors are
always perpendicular, their dot product must be zero.
Example 1.4: Find the equation of the plane perpendicular to h3, −4, 7i that passes through
the point (1, 2, 6).
Answer: In this case, N = h3, −4, 7i and (xo , yo , zo ) = (1, 2, 6). So the equation of the plane
is simply 3(x − 1) − 4(y − 2) + 7(z − 6) = 0 or 3x − 4y + 7z = 37 .
Example 1.5: Find the equation of the plane passing through the points (1, 0, −1), (0, −1, 1)
and (−1, 1, 0).
The key to solving this problem is to note that the three points can be used to form several
different vectors that all lie in the plane. The cross product of any two of these vectors (which
are not scalar multiples of each other) will produce a normal vector, that can then be used with
any of the three points in the point-normal equation of the plane.
Answer: So the vector a can be defined as h1 − 0, 0 − (−1), −1 − 1i, while b can be defined as
h0 − (−1), −1 − 1 , 1 − 0i. Then N = a × b = h−3, −3, −3i. Taking (xo , yo , zo ) = (1, 0, −1),
one finds that
9
Remark: An equivalent equation (or perhaps the same equation) would be obtained if a and
b were defined using the given points in a different order, or if a different point in the plane
were used as (xo , yo , zo ).
1.2.3 Lines in R3
Before studying even the most basic version of calculus, everyone is likely to be familiar with
the equation that represents a line in the x, y-plane. Specifically any line in the x, y-plane that
is not vertical can be represented by an equation of the form y = mx + b where m ∈ R is its
slope and b ∈ R is its y-intercept. For a horizontal line, the equation is y = b which has this
form with m = 0. For a vertical line, the equation is x = a where a ∈ R is some fixed value;
this equation is of course not in the slope-intercept form.
Vector Form: Now we return our attention to representing lines in R3 . At first, one might
expect again that this would be done by a single linear equation. A single linear equation,
however, was shown in the previous section to represent a plane in R3 , not a line. As we shall
see, a line in R3 will require a vector equation, or equivalently, to three scalar equations.
Consider a line L in R3 depicted in Figure 1.7. Let xo be any vector from the origin to L, and
x3
x(2 )
x0
x (−2) x2
x1
Figure 1.7: The use of vectors to represent a line L in R3 . The vector xo is from the origin to
some fixed point on L; the vector m lies along the line. The vector function x(t) = mt + xo is
from the origin to a point that sweeps out the line as t varies from −∞ to ∞.
let m be any vector of positive length lying along L. Notice that m can be placed at the head
of xo , as depicted in Figure 1.7, and that the line L is then traced out by the vector equation
x(t) = mt + xo (I.1)
10
as t runs through all real values.§ In particular, x(0) = xo , while for t = 1, the vector x has
moved so that its head is now at the head of m in Figure 1.7. For t = k, where k is a positive
integer, the vector x has moved so that its head is k lengths of m along the line L. For t = −k,
the vector x has moved k lengths of m in the opposite direction from the head of xo . The vector
m is the direction vector for L, and is the rough equivalent of the slope m in the slope-intercept
form described above. This vector equation (I.1) is called the vector form of the equation of a
line in R3 .
Example 1.6: Find an equation in vector form for the line passing through the points (1, 0, −2)
and (−1, 3, 1).
Answer: Among the many possible choices for how to define xo and m in this case, let’s pick
xo = h1, 0, −2i (based on the first of the two points), and m = h−1 − 1, 3 − 0, 1 − (−2)i (a
vector connecting the two points). The equation of the line is then
Remark:
Keep in mind that the equation in the previous example is an equation for the line; it is not
the equation—it is not unique. Infinitely many other equations of this general form represent
the same line, for example,
which uses the other of the two points as xo , and uses an m of a different length and in the
opposite direction. The two things that all representations of this line will have in common,
however, are that xo will always correspond to a point of the line and that the differing choices
for m will all be nonzero multiples of each other. Also, any line in R3 can be represented in
this way, even a vertical line or one that parallels any coordinate axis.
Parametric Form: While any line can be represented in the vector form described just above,
this is not the only format to represent lines. Instead of using vectors, it is often useful to
use parametric form which means using three scalar linear equations. Still the most important
thing to keep in mind about these two formats, is that they are equivalent.
11
In this form, the parameter is t. As is the case with vector form, the representation in this form
is not unique: ao , bo and co correspond to any point on L and m1 , m2 and m3 correspond to
the vector between any two points on the plane.
Example 1.7: Find an equation in parametric form for the line passing through the points
(1, 0, −2) and (−1, 3, 1). (These are the same points and thus the same line as in the previous
example.)
Answer: The same xo and m as in the previous example can be used to write the equation in
parametric form:
x1 (t) = −2t + 1
x2 (t) = 3t
x3 (t) = 3t − 2
1.2.4 Projections
Suppose there is a plane Π in R3 , and a vector a not in this plane (see Figure 1.9). How can
one find the vector lying in the plane closest to the given vector? Alternatively, if we have two
vectors, how can one find the vector that parallels the second vector while being closest to the
first? This subsection addresses these questions; interestingly, both have essentially the same
answer.
Let us first consider the projection of one vector onto another; the definition of this projection
is based on some simple geometry as shown in Figure 1.8. Given two vectors a and b, the vector
a
b
θ
a. b
|b|
Figure 1.8: The length of the vector projection. This length is the distance along the vector b
defined by the dashed perpendicular segment from the head of a. The vector projection is then
this length multiplied by the unit vector in the direction of b.
a·b
projb (a) := b
|b|2
12
provide that the vector b 6= 0. To justify this definition, suppose that the angle between a and
b is denoted θ. Then the length of the projection should be
The direction for projb (a) is then given by the unit vector in the direction of b, which of course
is just b/|b|. Combining these length and direction results, one finds the vector projection given
in (1.2.4), i.e.,
a·b a·b b
projb (a) = b= .
|b|2 |b| |b|
The projection of a vector onto a plane is depicted in Figure 1.9. The formal definition for
proj (a)
N Π
a
proj (a)
Π
this projection onto a plane Π is based on the observation that vector a should be the sum
of its vector projection onto the normal vector N for the plane and its projection onto the
corresponding plane:
projΠ (a) := a − projN (a)
Example 1.8: Find the vector in Π, the plane 2x − y + 5z = 1, that is closest to the vector
a = h−1, 0, 4i. Also find the projection of a onto any vector perpendicular to Π.
Answer: Notice that for this plane, an easy choice for a normal vector is N = h2, −1, 5i, and
since this normal vector and a are not perpendicular, a is not in the plane Π. So what vector
in this plane is closest to h−1, 0, 4i? Of course it is the projection:
= h−11/5, 3/5, 1i
13
Notice that this vector is perpendicular to N , and hence it does lie in the plane. On the other
hand, the projection of a onto N (or any other vector perpendicular to Π) is
14
1.3 Basic Surfaces in R3
We have already seen one type of surface in R3 : planes. This section discusses more general
surfaces, particularly quadratic (or quadric) surfaces—those with only first and second degree
terms.
Although it is easy to overstate their importance, there are a number of named quadratic
surfaces in R3 . The most significant of these will be introduced through a series of examples.
Example 1.9: Both the simplest and most important quadratic surface is the unit sphere:
x2 + y 2 + z 2 = 1 .
This is of course the set of all points a distance 1 from the origin in R3 . In terms of the definition
for a surface given above, F (x, y, z) = x2 + y 2 + z 2 − 1 is one possible function that would define
this surface. The general equation of a sphere in R3 is (x − xo )2 + (y − yo )2 + (z − zo )2 = r2 ;
this sphere is centred at (xo , yo , zo ) with radius r.
15
z
y
a b
Figure 1.10: An ellipsoid centred at the origin. The three principal axes a, b and c are shown
in red. This shape is similar to that of a rugby ball; the ends of an American football are too
pointed for it to be considered an ellipsoid.
Example 1.11: Consider the surface 2x2 + 3y 2 − 6y − z + 8 = 0. Because there are both
quadratic and linear terms in y, one must complete the square with respect to y to find the
standard form for this surface: z = 2x2 + 3(y − 1)2 + 5. This surface is an elliptic paraboloid
opening upward in z with vertex at (0, 1, 5). It is elliptic in that the scaling differs for the two
quadratic terms. Thus the intersection of this surface with any horizontal plane z = c (where
c > 5 is a constant) is an ellipse.
Example 1.12: The surface z 2 = x2 + y 2 is a circular cone. For any fix r > 0, the intersection
of this cone with either of the planes z = −r and z = r is the circle x2 + y 2 = r2 . The origin
is the vertex of this cone; this is the only singular point where there is no normal vector to the
surface.
Example 1.13: A circular cylinder is a surface of the form x21 + x22 = r2 in R3 . Its axis is
the third coordinate axis, here the x3 axis. In general, the axis of the cylinder is the coordinate
axis for the coordinate not mentioned in the equation.
Example 1.14: Finally the quadratic surface to be discussed here is a hyperbolic paraboloid:
x2 y 2
z= − 2
a2 b
This is an example of a more general type of surface called a saddle surface which we will
consider later.
16
1.4 Polar, Cylindrical and Spherical Coordinates
There is an old aphorism that says “You can’t put a square peg in a round hole.” (Or is it “You
can’t put a round peg in a square hole.”?) This aphorism gets at a key issue in mathematics
and science: some two or three dimensional problems are most easily described in rectangular
coordinates, (either x and y in two dimensions, or x, y and z in three dimensions), while others
are most easily described by the distance between some object and a reference point (usually
the origin) and some set of angles giving the direction of that object from that reference point.¶
The latter situation usually leads to polar coordinates in two-dimensional space, and either
cylindrical and spherical coordinates in three-dimensional space. The relationship between
these coordinate systems is discussed in this section.
Consider a point on the standard x, y-plane (R2 ) as is shown in Figure 1.11. As is customary,
II I
( x,y )
r
y
θ
x x
III IV
Figure 1.11: The standard x, y-plane, with one point distinguished. In rectangular coordinates,
this point is (x, y). In polar coordinates, this point is (r, θ), a distance r from the origin at an
angle θ with the positive x-axis. The four quadrants are enumerated by Roman numerals.
suppose that position on this plane is measured by the two coordinate axes, with the x-axis
on the horizontal and the y-axis on the vertical. By convention, the four quadrants defined by
these axes are enumerated starting with the first quadrant where both x and y are positive, and
increasing from there as one moves counterclockwise between the quadrants, again as shown in
Figure 1.11. It is relatively easy to see that there is a one-to-one correspondence between points
on this plane and x and y values on the respective axes, but are there other ways to determine
¶
Rectangular coordinates are also called Cartesian coordinates in honor of René Descartes (1596–1650).
17
the location of points on the plane? The answer in fact is that there are many other ways, but
in truth there is one of particular importance: polar coordinates.
To understand the motivation behind polar coordinates, it is perhaps helpful to think of our-
selves as being on a ship on the surface of an ocean. The ocean surface is for our purposes
a plane. Suppose we have a compass, so we can determine east, west, north and south, and
we can define east as the positive x direction and north as the positive y direction. Our ship
then is at (0, 0), the origin as in Figure 1.11. Suppose there is another ship within sight of our
ship; how can we determine its location? There is no easy way to directly measure this other
ship’s x or y coordinate value, but it is easy to measure angle that a line between the two ships
makes with the positive x direction (i.e., east). There are also a number of ways to measure the
distance between the two ships, for example, by measuring the time between seeing a cannon
flash and hearing the sound of that cannon. Notice that these two pieces of information, this
angle and this distance, determine the position of the other ship on the ocean surface relative
to the position of our ship.
Now let us return to the standard plane in Figure 1.11. Let r (for radial) be the distance
between the point at (x, y) and the origin (0, 0), and let θ be the angle between the positive
x-axis and a ray (half line) from the origin through the point at (x, y). This distance, angle and
ray, along with the point and the axes are depicted in red in Figure 1.11. Notice that generally
by convention r > 0 and θ ∈ [0, 2π).
There is one remaining issue that must be dealt with: What is the relationship between these
two coordinate systems, rectangular coordinates and polar coordinates? The answer is just a
basic application of right-angle trigonometry: Notice that r is the length of the hypotenuse
of a right triangle one of whose sides has length x and the other has length y, and θ is the
angle between the side of length x and the hypotenuse. Applying right-angle trigonometry, one
finds that x = r cos θ and y = r sin θ. To find r in terms of x and y, one can appeal to the
Pythagorean
p theorem: r2 = x2 + y 2 , and since r is a distance and therefore never negative,
2
r = x +y . 2
What should always be remembered about the relationship between polar and rectangular coor-
dinates? Along with the diagram in Figure 1.11 that is certainly the key memorable image, three
of the above formulas and equations are particularly simple and always worth remembering:
x = r cos θ
y = r sin θ
r 2 = x2 + y 2
These three and Figure 1.11 can always be used to construct any of the other details.
Example 1.15:
18
√
Answer:
p For (a) the solution is simply x = 6 cos(π/6) = 3 3 and y = 6 sin(π/6) = 3. For (b),
r = (−3)2 + 42 = 5, while tan(θ) = y/x = −4/3 ⇒ θ ' 2.214. There are in fact two values
of θ for which tan(θ) = −4/3, one in the second quadrant, and one in the fourth. The correct
choice must match the location of (x, y), which in this case is in the second quadrant.
Example 1.16: Please describe the curve whose equation in polar coordinates is r = 4 cos θ.
Answer: At first glance, one might be tempted to simply replace r and θ by their expressions
in rectangular coordinates. This approach, however, does not produce an equation that is easy
to recognize:
p
x2 + y 2 = 4 cos(Tan−1 (y/x)) .
While this expression could be reduced to the desired result, it is perhaps better to go back
to the polar expression and simplify it. Specifically, if r = 4 cos θ, then multiplying both sides
by r produces r2 = 4r cos θ which in rectangular coordinates is just x2 + y 2 = 4x. Bringing
the right-hand side to the left and completing the square, one finds that (x − 2)2 + y 2 = 4 .
This expression is now easy to recognize and describe as the equation of the circle of radius 2
centered on the point (x, y) = (2, 0).
Polar coordinates in R2 generalize in two particularly important ways in R3 . The first is simply
to take two of the three rectangular coordinates for R3 and replace them by polar coordinates.
The result is cylindrical coordinates: (r, θ, z) where as in polar coordinates, x = r cos θ and y =
r sin θ. The third coordinate z is unchanged between rectangular and cylindrical coordinates.
The relationship between the rectangular and cylindrical coordinates for a point in R3 is shown
graphically in Figure 1.12.
z
(x , y , z )
y
r
θ x
y
Figure 1.12: Cylindrical coordinates (r, θ, z) along with rectangular coordinates (x, y, z) for R3 .
19
Example 1.17: Consider the surface z = x4 − y 4 ; what is its equations in cylindrical coordi-
nates?
(x , y , z )
ρ
ϕ z
y
r
θ x
y
Figure 1.13: Spherical coordinates (ρ, θ, φ) along with rectangular coordinates (x, y, z), and
cylindrical coordinates (r, θ, z) for R3 .
The definitions in spherical coordinates in R3 are based on the same key idea as polar coordinates
are in R2 . Suppose one is standing at the origin in R3 ; consider the position of a point that
in rectangular coordinates is located at (x, y, z). What is the distance between the origin and
this point? As in polar coordinates, thep answer comes from the Pythagorean theorem and gives
us the first spherical coordinate: ρ := x2 + y 2 + z 2 . Once this distance is determined, as in
the polar case, the rest of the work is down to setting up the correct angles. But whereas one
angle was needed in R2 , two angles are needed in R3 . Fortunately, the first of these is just the
polar or cylindrical angle θ. Here this angle in defined by projecting the ray from the origin to
the point at (x, y, z) onto the x, y-plane. The result of this projection is the ray in our polar
coordinate discussion above, and θ is again the angle this ray in the x, y-plane makes with the
positive x-axis. The second angle (the third coordinate) in the spherical triple is φ, which is
defined relative to the positive z-axis: the angle φ is measured from the positive z-axis to the
ray in R3 from the origin to the point at (x, y, z). This third coordinate then must take on
20
values between 0 (at the positive z-axis) and π (at the negative z-axis). In the middle, when
ϕ = π/2, is the x, y-plane.†† The relationship between the rectangular, cylindrical and spherical
coordinates for a point in R3 is shown graphically in Figure 1.13.
Example 1.18: If (x, y, z) = (1, 2, −3), please represent this point in spherical coordinates.
√
Answer: Here ρ = 12 + 22 +√(−3)2 = 14, θ = Tan−1 (2/1) ' 1.107 (about 63 degrees), and
p
φ = Cos−1 (z/ρ) = Cos−1 (−3/ 14) ' 2.501 (about 143 degrees).
z x 2 + y2 + z2 = 9
x 2 + y 2 = 9/2
z2 = x2 + y 2
Figure 1.14: The circle (in blue) formed by the intersection of the sphere x2 + y 2 + z 2 = 9 and
the half cone z 2 = x√
2 + y 2 , z > 0. The equations for this circle in rectangular coordinates is
x2 + y 2 = 9/2, z = 3 2/2.
Example 1.19: Describe in cylindrical and spherical coordinates the intersection of the sphere
x2 + y 2 + z 2 = 9 and the half cone z 2 = x2 + y 2 , z > 0.
Answer: To obtain the cylindrical representation of this intersection, combine the equations
√ for
2
the sphere and the cone, eliminating both x and y. √ One finds that 2z = 9 ⇒ z = 3 2/2
(since z > 0), and thus r2 = x2 + y 2 = z 2 ⇒ r = 3 2/2 √ √ θ can take on any value,
as well. Since
the intersection in cylindrical coordinates
√ √ 0 ≤ θ < 2π. So
is (r, θ, z) = (3 2/2, θ, 3 2/2) where
this is the circle with radius 3 2/2 centered on the z-axis in the plane z = 3 2/2.
††
Unlike cylindrical coordinates where the coordinate-triple is almost always given as (r, θ, z), the order,
names and even definitions of the three spherical coordinates varies among authors and disciplines. The angle ϕ
sometimes takes on values in [−π/2, π/2], rather than [0, π], with ϕ = 0 being the x, y-plane. Also many authors
use ρ to denote density, and hence must use something else for the spherical radial distance. And the order that
the angles θ and ϕ appear in the coordinate triple may be reversed. Nevertheless, this text will stick to the
definitions and order given above since they are traditional in calculus.
21
In spherical coordinates, the entire half cone is simply z 2 = r2 , which reduces to z = r because
both z and r are positive. Thus
z ρ sin φ
1= = = tan φ
r ρ cos φ
22
Exercises 1
1. If a = h1, 3, −2i and b = h4, −1, −1i, please compute the length of a, the length of b, the
sum of these two vectors, and their dot and cross products.
√ √
Answer: |a| = 14, |b| = 18, a + b = h5, 2, −3i, a · b = 3, a × b = h−5, −7, −13i.
3. What is the (smaller) angle θ between the vectors h1, 3, −2i and h4, −1, −1i?
√
Answer: θ = Cos−1 ( 7/14) ' 1.381
4. Consider the six vectors a1 = h1, 0, −3i, a2 = h−2, 1, 4i, a3 = hπ, 0, −3πi, a4 = h3, 2, 1i,
a5 = h1, −1/2, −2i and a6 = h1, 8, 0i. Which of these vectors are perpendicular to each
other? Which are parallel?
7. Use the definitions of the dot and cross products to prove the first three identities in
Proposition 1.
8. Please show that for any vectors a, b ∈ R3 , the cross product a × b is perpendicular to
both a and b. Hint: Use the definition of a × b and compute the dot products a × b · a
and a × b · b. Notice that the notation a × b · a implicitly requires that one compute the
cross product first and then the dot product.
9. Prove that |a × b| = |a||b| sin θ. Hint: Compute |a × b|2 from the definitions, then
separately expand out |a|2 |b|2 sin2 θ = |a|2 |b|2 (1 − cos2 θ) = |a|2 |b|2 − (a · b)2 to arrive at
the same polynomial.
10. Please prove the triangle inequality: If a, b ∈ R3 , then |a + b| ≤ |a| + |b|. Hint: Write
|a + b|2 as the dot product of a vector with itself, then distribute out this product, use
the Cauchy-Schwarz inequality to bound a · b and form (|a| + |b|)2 .
11. What is an equation for the plane passing through the point (4, 0, −2) perpendicular to
the vector h−1, 5, 3i?
Answer: x − 5y − 3z = 11 (or any equivalent equation).
23
12. In Example 1.5, choose the vectors a and b in a different way, and choose a different
point on the plane for (xo , yo , zo ), then carry out the computation to arrive at the same
equation for the plane as in the example.
13. Find an equation for the plane passing through the points (2, 1, 0), (1, 0, 2) and (0, 2, 1).
14. Please find a vector equation for the line passing through the points (1, −1, 3) and (−3, 2, 1).
Answer: x(t) = h1, −1, 3i + h4, −3, 2i t (there are many other possibilities).
15. Please find a parametric representation for the line of intersection for the planes x + 2y −
3z = 4 and 3x − y + z = −1.
16. Please find the vector projection of h6, 5, −1i onto h−2, 1, −3i .
Answer: Here a = h6, 5, −1i, b = h−2, 1, −3i, and projb (a) = h4, −2, 6i/7 .
17. For the plane Π: −3x − 2y + z = 5, and the vector a = h1, −1, 3i, please find projN (a)
and projΠ (a) .
18. Consider the vector form for the equation of a line: x(t) = xo + mt where x = hx, y, zi,
xo = hxo , yo , zo i, and m = hm1 , m2 , m3 i.
(a) By eliminating the time parameter t, obtain the following system of two linear equa-
tions:
x − xo y − yo z − zo
= =
m1 m2 m3
provided that mi 6= 0 for i = 1, 2, 3.
(b) These two equations determine a line as the intersection of two planes. What is the
equation for each of the plane? (The choice of the two equations and the two planes
is not unique.)
p
19. If z = g(x, y) = 3 − x2 − y 2 , how can one describe this surface in words? What is an
equivalent implicit form for the equation for this surface? Why is x2 + y 2 + z 2 − 3 = 0
not an equivalent implicit form?
20. Is it possible to give a single explicit equation for the surface given implicitly by x2 − y +
z 2 − 1 = 0? If possible, please write one example of an equivalent explicit equation.
(a) x2 + y 2 + z = 0 (c) 5 − x2 + y 2 − z = 0
24
22. For (x, y) = (1, 2) ∈ R2 , what are the corresponding polar coordinates?
√
Answer: (r, θ) = ( 5, Tan−1 (2)) ' (2.236, 1.107) .
23. Please write (x, y, z) = (−1, 3, −7) ∈ R3 in both cylindrical and spherical coordinates.
√
Answer: (r, θ, z) = ( √10, Tan−1 (−3), −7) ' (3.162,
√ −1.249, −7);
−1 −1
(ρ, θ, φ) = ( 59, Tan (−3), Cos (−7/ 59) ' (7.681, −1.249, 2.717) .
24. Please express the vector from the origin to (x, y, z) first in cylindrical, then in spherical
coordinates. What
√ are the cylindrical and spherical expressions of this vector for the
specific point ( 3, 1, 0)?
25. Please transform the equations of each of the following surfaces from rectangular co-
ordinates to both cylindrical and spherical coordinates (where possible, also name the
surface):
(a) z = x2 + y 2 (e) z = x2
(b) z = x2 − y 2 (f) x = y
(d) x = y (h) z 2 = x2 + y 2
26. Please transform the equations of each of the following surfaces from cylindrical coordi-
nates to rectangular coordinates (where possible, also name the surface):
(b) z = x2 − y 2 (f) x = y
(c) r2 = z 2 (g) x = y
25
26
Chapter II
Vector Functions
We now turn our attention to extending calculus from scalar functions (as one studies in a
basic calculus course) to vector functions. This first section introduces and extends the most
important calculus concepts to the realm of vectors. Later sections discuss and interpret vector
functions more geometrically.
As was the case earlier in dealing with constant vectors, this component representation of vector
functions is used heavily in the present discussion.∗
27
z
√ 2.1: Two specimen vectors v(0) and v(1) for the vector function v(t) = h 3t, 1/(1 +
Figure
2
t ), 1 + t2 i. The vectors are drawn emanating from the origin.
p
1
w(t) = 1 − t2 ,
1 − t2
For this w, the range is a subset of R2 , while the domain is D = (−1,
√1). This domain is the
intersection of the domains of each of the two components: Domain( 1 − t2 ) = [−1, 1] while
Domain(1/(1 − t2 )) = (−∞, −1) ∪ (−1, 1) ∪ (1, ∞).
There are of course many, many other vector functions. Can you think of one that takes on
values in R3 ?
provided that each and every component limit on the right exists and is finite.
Notice that solution is valid for all to ∈ R since each component is continuous on its entire
domain. Another example: Suppose that
cos2 (1 − t) − 1 1 − t
w(t) = , √
(1 − t)2 1− t
for t 6= 1. Then
cos2 (1 − t) − 1
1−t
lim w(t) = lim , lim √ = h −1, 2 i
t→1 t→1 (1 − t)2 t→1 1 − t
28
In computing the limits of the components, one has all the usual tools: trig identities,
√ factoring,
√
l’Hôpital’s rule, etc. Thus in the previous example, one can notice that 1 − t = (1 − t)(1 + t)
to compute the limit of the second component.
Remark: It is possible to give a -δ definition for the limit of a vector function as is normally
done for scalar limits. Indeed in a mathematical sense, -δ definition should be preferred. But
the definition given here is equivalent, and it is easier to use computationally.
As in the single-variable case, the velocity of the particle is the derivative of position, and
acceleration is the derivative of velocity. So to continue our discussion, we need to define the
derivative of a vector function; this definition must be consistent with the single-variable case
and uses the definition of limit above.
Definition: The derivative of a vector function can now be defined as a vector function limit:
For a vector function v, the derivative is
dv v(t + h) − v(t)
v̇(t) ≡ := lim
dt h→0 h
v1 (t + h) − v1 (t) v2 (t + h) − v2 (t) vn (t + h) − vn (t)
= lim , lim , . . . , lim
h→0 h h→0 h h→0 h
Example 2.3: Suppose that v(t) := h cos t, sin t, 1 i. Here the natural domain is again
D = R (t can take on any real value). Computing componentwise, one finds that v̇(t) =
h − sin t, cos t, 0 i. Notice that in this example, v(t) ⊥ v̇(t) (i.e., v(t) · v̇(t) = 0) regardless of
the value of t. This situation is a special case of a general result that will be discussed below.
Remark: The use of a dot over a time-dependent variable to denote derivative goes back
to Isaac Newton and is similar to prime notation. Here both dot and prime notation will be
used, with dot being reserved for differentiation with respect to time, and prime indicating
differentiation with respect to some other (specified) variable.
Finally we closed this section by defining the integral of a vector function; the definition is again
given componentwise:
29
provided that each of the component integrals exist. The time to is some convenient reference
time; often to = 0.
Notice that when each of the components vi is continuous, then by the fundamental theorem
of calculus,
V̇ (t) = v(t) = h v1 (t), v2 (t), . . . , vn (t) i ,
that is, the derivative of a vector function define as a function of the upper limit of integration
of an integral is simply the vector integrand evaluated at that variable of differentiation t.
Example 2.4: As in the previous example, suppose that v(t) := h cos t, sin t, 1 i. Then with
to = 0, ˆ t
V (t) ≡ v(τ ) dτ
0
ˆ t ˆ t ˆ t
= cos τ dτ, sin τ dτ, 1 dτ
0 0 0
= h sin t, 1 − cos t, t i .
Also
V̇ (t) = h cos t, sin t, 1 i = v(t) .
30
2.2 Parametric Curves in R2 and R3
This section is a brief pause in our discussion of vector functions to turn our attention to a very
related topic: parametric curves.
Definition: Suppose that f , g and h are three continuous, real-valued functions, all defined
on some interval I ⊂ R. In symbols, f, g, h : I ⊂ R → R. For these functions and this interval
I, the parametric curve C is the set of all points (x, y, z) of the form x = f (t), y = g(t),
z = h(t) for some t ∈ I. Again, in symbols C := {(x, y, z) ∈ R3 x = f (t), y = g(t), z =
h(t) for some t ∈ I }. Also γ := (f, g, h) is a parameterization C, and one can write γ(t) for the
point (f (t), g(t), h(t)) that lies on the curve C and is reached when the parameter takes on the
value t. Indeed one can think of the curve C being traced out by the parameterization γ as the
parameter t runs through all its values.
Figure 2.2: A parametric curve for some finite interval in the x, y-plane. The curve crosses
itself, so γ(t1 ) = γ(t2 ) for some t1 < t2 .
A generic example of a parametric curve is shown in Figure 2.2. If I = [a, b] is a finite, closed
interval for some real numbers a < b (in general, it could be infinite), then γ begins at the point
α = (f (a), g(a), h(a)) and ends at the point ω = (f (b), g(b), h(b)). To obtain a parametric
curve in R2 (the x, y-plane), simply leave off the third function h and the third variable z:
γ := {(x, y) ∈ R2 x = f (t), y = g(t) for some t ∈ I }. Also one can write γ(to ) for either the
point (f (to ), g(to ), h(to )) or singleton set {(x, y, z) ∈ R3 x = f (to ), y = g(to ), z = h(to ) } where
to is a specific time.
Example 2.5: Suppose that I = [−2, 1], that f (t) = 2t + 1 and g(t) = (1 + t)2 . Then the
parametric curve C is a parabola with x = 2t + 1 and y = (1 + t)2 , beginning at α = (−3, 1)
and ending at ω = (3, 4). This curve C is shown in Figure 2.3. Because C is in the x, y-plane,
and because one can solve for t in terms of x, a single equation for the curve can found by
eliminating the parameter t. Notice that t = (x − 1)/2 and this formula for t can be substituted
into g: y = (1 + t)2 = (1 + (x − 1)/2)2 = (x + 1)2 /4. Also notice that one can not solve for t or
31
x in terms of y.
y
ω
α
(1, 1)
(−1, 0) x
Figure 2.3: Parabolic arc from α = (−3, 1) to ω = (3, 4) along y = (x + 1)2 /4.
For a given interval I and given functions f , g and h, there is a single (unique) curve C
Interestingly, though, this does not work the other way around: a given curve C will have many
parameterizations using different I, f , g and/or h. Which parameterization is preferred often
depends on what one is trying to do.
Example 2.6: Suppose that I1 = [−1, 1], that f1 (t) = 2t and and g1 (t) = t2 . Also suppose
that I2 = [−2, 2], that f2 (t) = t and and g2 (t) = t2 /4. Both of these parameterizations describe
the curve y = x2 /4 beginning at (−2, 1) and ending at (2, 1).
y
α x
The previous two examples may seem rather trivial, but there are more important ones:
Example 2.7: Let C be the portion of the unit circle lying in the first quadrant: C =
{(x, y)x2 + y 2 = 1, x ≥ 0, y ≥ 0}. This circular arc is depicted in Figure 2.4. One parame-
32
√
terization is simply f1 (t) = 1 − t2 , g1 (t) = t and I1 = [0, 1] beginning at (1, 0) and ends at
(0, 1). Another rather different parameterization, also beginning at (1, 0) and ending at (0, 1),
is f2 (t) = cos t, g2 (t) = sin t and I2 = [0, π/2]. The first parameterization may seem simpler,
and it may be preferred in some circumstances, but it does not extend to the entire circle. The
second does; one simply needs to extend I2 to I2 = [0, 2π].
The real advantage of a parametric representation for a curve becomes most clear in R3 for
more complicated and thus more interesting curves, or curves that cross themselves.
Figure 2.5: The portion of the helix C = {(x, y, z) x = cos(πt), y = sin(πt), z = t, t ∈ [0, 4] }.
Example 2.9: Let I = (−∞, ∞), with f (t) = 1−t2 and g(t) = t(1−t2 ). Then this parametric
curve is an alpha curve: C = {(x, y) x = 1 − t2 , y = t(1 − t2 ), t ∈ (−∞, +∞) } . It is possible to
give a single equation in x and y that represents this curve: y 2 = (1 − x)x2 but notice that one
can not solve for either x or y. So the parametric representation again has advantages. This
alpha curve is shown in Figure 2.6; the curve is in the shape of the Greek letter alpha.
Notice that the alpha curve in the previous example (Example 2.9) and the generic curve shown
in Figure 2.2 both have the feature that they cross over themselves, i.e., there are two or more
distinct times t1 and t2 such that γ(t1 ) = γ(t2 ). This is an important issue to watch for.
Another is whether or not a curve is smooth, a term that is defined next.
33
y
Anyone who has studied calculus should not be surprised that the definition of “smooth” involves
differentiability, but the second condition (that there be no time t where all the derivatives are
simultaneously zero) may be a surprise. Why is zero a problem here? The next example should
make this clear.
Example 2.10: Let I = [−1, 1] and γ(t) = (f (t), g(t)) = (t3 , t2 ). Since both f and g are both
monomials, they are both differentiable, and one might think that nothing too interesting will
happen. But a plot of this curve shows otherwise: Figure 2.7. If the parameter is eliminated,
and we solve for y in terms of x, the resulting equations is y = x2/3 . This curve has a sharp
point at (x, y) = (0, 0) known as a cusp.
Understanding what goes wrong when all the derivatives are simultaneously zero is perhaps best
seen by considering a particle moving along the curve whose position on the plane at time t is
(f (t), g(t)) (a view that will be explored more fully in the next section). If all of the derivatives
are simultaneously zero, the particle will stop at least for a moment. This means that when it
starts again, it can move in a very different direction from the one it had been heading before.
This is why there can be a sharp corner or cusp at such a point. Interestingly there is a very
important example of this sort of thing in weather forecasting: hurricanes. Predicting where
a hurricane is heading is much easier when the hurricane is moving; when it stops, it can be
difficult or impossible to say where it will go next after it starts moving again.
34
α y ω
(−1, 1) (1, 1)
Figure 2.7: The cusp in the curve y = x2/3 on the interval [−1, 1].
Now we connect the two previous sections and use them to represent particle motion in two
and three dimensions. There are many situations that involve particle motion, but one of the
most dramatic was a news item literally around the world on 4 October 1957: This was the
first artificial satellite ever to orbit the earth; it was more-or-less spherical, a little more than
half a meter in diameter, and it admitted regular radio beep that made it easy to track. But
how could someone describe its position, velocity, acceleration, etc.? As one might expect, the
central mathematical concepts for this description are vector functions and parametric curves.
Suppose that the position of Sputnik or any moving particle is determined by a vector function
x(t) = h x1 (t), x2 (t), x3 (t) i whose tail is at some fixed reference point and whose head is at the
position of the particle at time t (see Figure 2.8). The particle’s path is then the parametric
x(t ) x(t 0 )
Earth
Figure 2.8: Sputnik orbiting the earth, tracked by a vector function x. Its position at time t is
x(t), and its initial position at some reference time to is x(to ). The blue point on the earth is
the location of the tracking station.
curve traced out of the particle, parameterized by the components of the vector function:
f (t) = x1 (t) , g(t) = x2 (t) , h(t) = x3 (t)
As in single-variable calculus, velocity of the particle is defined to be the derivative of position,
and acceleration is defined to be the derivative of velocity. A key question, then, is “Where do
35
the velocity and acceleration vectors lie relative to the path of the particle and the position
vector?” The rest of this section is devoted to answering this question.
How is a vector function related to its derivative? To answer this question, one needs to recall
the definition of the derivative for vector functions: For the vector function x, by definition its
derivative is
x(t + h) − x(t)
ẋ(t) := lim
h→0 h
provided this limit exists.
What can be seen from this definition is that the derivative is the limit of a difference quotient
whose numerator is a secant vector connecting points on the particle’s path. As h → 0, the
direction of this secant vector converges to a direction tangent to the particle path. The length
of this secant vector converges to zero, but of course, the denominator also goes to zero. As a
result, the length of the derivative vector (which is the speed that the particle is moving along
its path) is a real value between zero (if the numerator goes to zero faster than the denominator)
and infinity (if the denominator goes to zero faster than the numerator). This representation
of the derivative is illustrated in Figure 2.9. The discussion makes clear that the direction of
.
x(t )
y
x( t )
x (t + h )
Figure 2.9: Secant vectors x(t+h)−x(t) (in green) moves to the tangent vector ẋ(t) (in brown)
as h → 0. As h → 0, the head of the green secant vector moves backward along the blue curve
toward the head of x(t) (in red). Since the derivative is the limit of the difference quotient, the
direction of the derivative must be the limit of the direction of the numerator. This limiting
direction is tangent to the curve.
the derivative is tangent to the path traced out by the particle and its position vector function.
All of this discussion motivates the following definition:
Definition: Suppose that x(t) is the position vector (as a function of time t) of a particle
moving in either R2 or R3 . Then v(t) := ẋ(t) is the velocity vector, and a(t) := v̇(t) = ẍ(t) is
the acceleration vector. The length of the velocity vector is the speed † that the particle moves
along its path: ṡ(t) := |v(t)| . Notice that ∀ t, 0 ≤ ṡ(t) < +∞.
†
One might have expected that s would be speed, rather than ṡ, but in fact s is traditionally arc length, and
this will be discussed below.
36
Since velocity is always tangent to the curve traced out by its position vector, the velocity vector
can be used to define a tangent vector with unit length, provided the velocity is nonzero:
v(t)
T (t) :=
|v(t)|
Example 2.11: Consider a vector function that traces out an elliptic helix:
for 0 ≤ t ≤ 7. Please find a parametric representation of the curve traced out by this function,
the speed, and the velocity, acceleration and unit tangent vectors.
Answer: A parameterization for the curve traced out by x(t) is given by simply using the
components of x as f , g and h: the parametric curve is γ = {(x, y, z) ∈ R3 | x = cos(πt), y =
2 sin(πt), z = t for some t ∈ [0, 7] }. If one thinks of a particle moving along the helix whose
position is given by x(t), for t ∈ [0, 7], then the velocity of the particle is given by
the acceleration is
p p
and speed is ṡ(t) = π 2 sin2 (πt) + 4π 2 cos2 (πt) + 1 = 1 + π 2 + 3π 2 cos2 (πt) . Finally, the
unit tangent vector is
h −π sin(πt), 2π cos(πt), 1i
T (t) = p
1 + π 2 + 3π 2 cos2 (πt)
So far, we have found that the derivative of a vector function is tangent to the curve traced
out by that vector function and that this derivative can be used to find the unit tangent vector
T . Interestingly, this unit tangent vector can be used to find a unit normal vector N that is
perpendicular to the particle’s path (in this sense, “normal” means “perpendicular”). The key
mathematical result that allows us to define N is the following proposition:
37
Proposition 2 Suppose that ∀ τ , u(τ ) is a unit vector: |u(τ )| = 1 ∀ τ . (Here the independent
variable or parameter τ may denote time or may be some other quantity.) Then ∀ τ
u(τ ) ⊥ u̇(τ )
Proof: Since |u(τ )| = 1, then ∀ τ , |u(τ )|2 = u(τ ) · u(τ ) = 1. Differentiating with respect to
τ , one finds that
d d
(u(τ ) · u(τ )) = (1) = 0
dτ dτ
But by the product rule, this equation becomes ∀τ
Thus since T (t) is a unit vector ∀ t, Proposition 2 implies that Ṫ (t) is perpendicular to T (t)
and also to the curve traced out by our vector function x. Hence Ṫ (t) is a normal vector. All
of this leads to the following definition:
Ṫ (t)
N (t) :=
|Ṫ (t)|
Remark: One might think that the derivative of a unit vector is itself a unit vector; this is
not the case. Therefore one must divide Ṫ (t) by its length to obtain a unit vector.
38
But notice that since we are only interested in the direction of Ṫ (t), not its length, we may
drop the scalar coefficient (including the denominator) and simply use the vector to compute
N (t):
Still this result is rather messy; see how much simpler these calculations are for a circular helix
(Exercise 2.10).
2.3.3 Acceleration
Unlike the velocity vector, which is always tangent to the particle path, there is no fixed direction
for the acceleration vector. But for a particle moving in either R2 or R3 , it is possible to write the
acceleration vector as the sum to two vectors whose directions and lengths can be interpreted.
Specifically, provided that |v(t)| =
6 0, acceleration can be written as
v (t)
d d
a(t) = dt (v(t)) = dt |v(t)| |v (t)|
d
= dt (ṡ(t) T (t))
So acceleration can be written as the sum of two terms, the first being tangent to the particle
path, the second, normal to that path. The coefficient of the unit tangent vector T is s̈(t),
the time derivative of speed. This means that s̈(t) is the magnitude of the acceleration in the
tangential direction. This is the sort of acceleration one causes in a car by pushing down on
either the accelerator or brake pedals; it changes the speed, but not the direction of motion.
x2
T(t)
x(t)
.
T(t)
x
1
Figure 2.10: The unit tangent vector T (t) and its derivative Ṫ (t) for the vector function x.
By Proposition 2 above, one can see that the second term is normal to the curve: since T (t) is
a unit vector ∀t, Ṫ (t) ⊥ T (t). But understanding the meaning of the coefficient of Ṫ (t) is not
yet easy because although T (t) is a unit vector, Ṫ (t) in general is not. Also since the particle’s
path is independent of the speed at which the particle moves along the curve, using time t as the
39
parameter for the motion is not best for understanding the meaning of the normal coefficient.
To resolve these two issues, we need to introduce arc length, the distance the particle has moved
along the curve, and then use arc length rather than time to locate points on the curve traced
out by the vector function. Arc length is defined and discussed in the next section, and then
we return to our discussion of acceleration.
40
2.4 Arc Length
In the previous section, speed was symbolized by ṡ(t); this symbolism is traditional, but also
leads to the obvious question: “What is speed the derivative of?”, or in other words, “What is
s?” The answer is arc length.
provided this integral exists‡ . If the arc length s is finite, then the curve C is rectifiable.
If each of the component derivatives dx1 /dt, dx2 /dt and dx3 /dt are continuous (i.e., if the
components themselves are continuously differentiable), then the integral defining s will exist
and be finite, so in this case, the curve C is rectifiable.
z
x (t) ω
α x (0)
x (1)
Figure 2.11: Arc Length: the distance along a curve between two specific points, α and ω. The
curve C is traced out by the vector function x. Things associated with the curve C are in blue;
those associated with the vector function x are in red.
‡
Arc length can be defined for curves traced out by vector functions that are not differentiable, but for our
purposes, it is sufficient to only define arc length for differentiable vector functions
41
Remarks:
1. Arc length can be defined for curves traced out by vector functions that are not differ-
entiable, but for our purposes, it is sufficient to only define it for differentiable vector
functions. One can extend this definition in the obvious way for vector functions that are
piecewise differentiable by computing the arc length of each piece.
2. Essentially the same definition works in two-dimensional space. One must simply leave
out the z-component of the vector function and the dz/dt term in the integrand.
3. One might expect that using [0, 1] as the integration interval would be unnecessarily
restrictive. This is not the case since for a given curve C, we can always rescale or
translate the vector function so that t = 0 corresponds to α and t = 1 corresponds to ω.
So this choice of [0, 1] is simply a matter of convenience, not a restriction.
Why does the above integral represent distance along the curve? The answer can most easily
be seen in Figure 2.12. There the integration interval [0, 1] is partitioned into n subintervals,
and the end points of these subintervals correspond to points on the curve C. One can now
sum up the lengths of the line segments that connect successive points on C:
n p
X
(xi − xi−1 )2 + (yi − yi−1 )2 + (zi − zi−1 )2
i=1
s
n 2 2 2
X xi − xi−1 yi − yi−1 zi − zi−1
= + + (ti − ti−1 ) .
ti − ti−1 ti − ti−1 ti − ti−1
i=1
Taking the limit as the length of the longest subinterval in the partition (the norm of the
partition) goes to zero (and thus n goes to infinity), one sees that fractions inside the square
roots all approach derivatives, while the summation approaches an integral. The result is the
expression for s given in the definition above.
Now let us consider several examples. The first is a simple example set in just the x, y-plane,
the second is more interesting, and the third can only be computed numerically.
√
Example 2.12: Suppose that x(t) := h 3t/2, t3/2 i. Then ẋ(t) = h dx/dt, dy/dt i = h 3/2, 3 t/2 i,
α = (0, 0), ω = (3/2, 1), and
ˆ 1 ˆ 1√ √
3
s= |ẋ(t)|dt = 1 + t dt = 1 + 2.
0 2 0
Example 2.13: Now suppose that x(t) traces out a portion of a helix: x(t) := hcos 4πt, sin 4πt, t i.
As shown in Figure 2.13, as t runs from 0 to 1, x(t) traces out two cycles of our helix. But
42
z
(x4 , y4 , z4 )
ω
y
(x1 , y1 , z1 )
Figure 2.12: Arc Length: the curve C is approximated by a sequence of line segments between
two specific points, α and ω. Again things associated with the curve C are in blue; those
associated with the approximating segments are in green.
despite this being a more interesting curve, its arc length is computed exactly as in the previous
example: ẋ(t) = h−4π sin 4πt, 4π cos 4πt, 1i, α = (1, 0, 0), ω = (1, 0, 1), and
ˆ 1 ˆ 1p √
s= |ẋ(t)|dt = 4π sin2 4πt + cos2 4πt + 1 dt = 4π 2 .
0 0
Example 2.14: Our final example in this set shows that not all smooth curves have arc
length integrals that can be computed in closed form. Suppose that x(t) := h t, t3 i. Then
ẋ(t) = h 1, 3t2 i, α = (0, 0), ω = (1, 1), and
ˆ 1 ˆ 1p
s= |ẋ(t)|dt = 1 + 9t4 dt .
0 0
Unfortunately this integral can not be computed in closed form (i.e., exactly). The exact value
of s this time can only be approximated numerically: s ' 1.548
Up to now, we have consider only the arc length of a curve between two fixed points; now we
turn our attention to arc length as a function of time where the beginning point is still fixed,
but the ending point now varies with t.
43
Figure 2.13: Two cycles of the helix whose arclength is computed in Example 2.12
C. Then ˆ t
s(t) := |ẋ(τ )|dτ
0
is arc length as a function of t. It is the distance along the curve C from the reference point to
the point on the curve at the head of x(t) (see Figure 2.14) provided this integral exists.
z
x (t)
x (0)
y
s (t)
Figure 2.14: Arc Length as a function of time t: the distance along a curve between some
reference point corresponding to x(0) and a point at time t corresponding to x(t). Again,
things associated with the curve C are in blue; those associated with the vector function x are
in red. Things associated with arc length are in copper.
Remark: Notice that when t < 0 then s(t) < 0, (i.e., here, arc length can be negative).
The above definition for s(t) and the fundamental theorem of calculus give the answer to the
question at the start of this section: By the fundamental theorem of calculus, if the integrand
is continuous, the derivative of any integral that is a function of the upper limit of integration
44
is just that integrand evaluated at the upper limit of integration. So in this case
ˆ t
d
ṡ(t) = |ẋ(τ )|dτ = |ẋ(t)| = ṡ(t) .
dt 0
The final equality above is just a reiteration of the fact that both |ẋ(t)| and ṡ(t) were described
as speed in the previous section.
Example 2.15: Suppose that a particle moves along a curve C with its velocity given by
v(t) = h cos t, sin t i. Please find s(t).
p
Answer: Recall that ṡ(t) = |v(t)|; thus in this case, ṡ(t) = cos2 t + sin2 t = 1, implying that
s(t) = t. (Remember that s(0) = 0 because we measure the arc length from the point on the
curve corresponding to t = 0.)
45
2.5 Acceleration Decomposition
We can now complete our decomposition of acceleration. Recall that provided that |v(t)| =
6 0,
v (t)
d d
a(t) = dt (v(t)) = dt |v(t)| |v (t)|
d
= dt (ṡ(t) T (t))
Previously we discussed the first term; now we consider the second. To understand the second
term, the best parameter to measure position along the curve is arc length s. Using arc length
allows one to describe the curve without regard to the speed of motion. By the chain rule§ ,
dT dT ds
Ṫ (t) = = = T 0 (s)ṡ(t)
dt ds dt
where prime (0 ) denotes differentiation with respect to s. One now forms a unit vector here in
the same way as always: for a nonzero vector, divide the vector by its length:
dT T 0 (s)
= T 0 (s) = |T 0 (s)| 0 = |T 0 (s)|n(s)
ds |T (s)|
0
where by definition n(s) := T 0 (s) is the unit normal vector as a function of arc length. Notice
|T (s)|
that since |T 0 (s)| is here always positive, n(s) points in the same normal direction as T 0 (s).
This direction is into the direction that the motion is changing, i.e., into the curve. Also notice
that n(s(t)) = N (t).
Now combining both of the above steps, one finds that the acceleration can be written as
where κ(s) := |T 0 (s)| is defined to be the curvature of the motion, aT (t) := s̈(t) is the (scalar)
tangential component of the acceleration, and aN (t) := (s0 (t))2 κ(s) is the (scalar) normal
§
Mathematicians often object to using both T (t) and T (s) since then T (1) becomes ambiguous. (Is this
t = 1 or s = 1?) This indeed can be a serious problem, but the problem will be avoided here by using a prime
rather than a dot to indicate differentiation with respect to s and only writing T 0 as a function of s.
46
component (see Figure 2.15). This curvature κ(s) measures the tendency of the motion to turn
(or curve) into the motion of the particle, i.e., into the curve. Curvature is always nonnegative;
it is zero for straight-line motion and positive if the path is curving. This part of the acceleration
is always perpendicular to the motion, and it corresponds to the acceleration one achieves in
a car using the steering wheel to change the path of motion without necessarily changing the
speed.
aN(t)N(t)
aT(t)T( t)
a(t)
a (t)
a (t) N
T
Figure 2.15: Acceleration and its normal and tangential components form a right triangle, thus
the lengths of these vectors satisfy |a(t)|2 = (aT (t))2 + (aN (t))2
Example 2.16: Decompose the acceleration of the planar vector function x(t) = h t, t2 i into
its tangential and normal components. Also compute the unit tangent and unit normal vectors,
the speed and the curvature.
Answer: By direct computation, v(t) = h 1, 2t i, a(t) = h 0, 2 i and the speed is ṡ(t) = |v(t)| =
√
1 + 4t2 . The unit tangent vector is
v(t) h 1, 2t i 1 2t
T (t) = =√ = √ ,√
|v(t)| 1 + 4t2 1 + 4t2 1 + 4t2
The (scalar) tangential component of acceleration is
4t
aT (t) = s̈(t) = √
1 + 4t2
The curvature and the (scalar) normal component of acceleration can be computed directly,
but it is more convenient to compute these indirectly from the results that have already been
obtained. Since a(t) and its normal and tangential components form the sides of a right triangle
(cf. Figure 2.15), the main tool in this indirect computation is the Pythagorean theorem. So
r
p
2 2
16t2 2
aN (t) = |a(t)| − (aT (t)) = 4 − 2
=√
1 + 4t 1 + 4t2
Since aN (t) = (ṡ(t))2 κ(s(t)) and (ṡ(t))2 = 1 + 4t2 ,
2
κ(s(t)) =
(1 + 4t2 )3/2
47
Finally
2
a(t) − aT (t)T (t) (0, 2) − (4t,8t
1+4t2
)
h −2t, 1 i
n(s(t)) = = 2 =√
aN (t) √ 1 + 4t2
1+4t2
Notice how n(s(t)) and T (t) are related—they are perpendicular as expected.
Example 2.17: For the circle centred at the origin with radius r (satisfying x2 + y 2 = r2 ),
please find the curvature κ(s).
Answer: Because curvature is a function of arc length, it will be the same regardless of how
one parameterizes the circle. So one should choose the simplest parameterization for computing
derivatives: let x = r cos t and y = r sin t. Then x(t) = h r cos t, r sin t i, and ṡ(t) = |v(t)| = r
(constant for all t). Because ṡ is constant, aT = s̈ = 0, and thus aN (t) = |a(t)| = r. Finally
since aN (t) = (ṡ(t))2 κ(s(t)), one finds that κ(s) = 1/r. So not surprisingly, the curvature of a
circle is constant and inversely proportional to its radius.
48
Exercises 2
1. For each of the following vector functions, v, please determine its domain, and decide
whether or not the indicated value of to is in the domain. Then provided the limit exists,
please compute lim v(t).
t→to
1 − 5t2 , 14πe−2t ,
(a) v(t) = to = 3
1 sin t p 2
(b) v(t) = , , 4−t , to = 0
1 − t2 t
1 − t sin πt p 2
(c) v(t) = , , 1−t , to = 1
1 − t2 1 − t
√
1− t 3t
(d) v(t) = , , to = 1
1 − t2 et−1 − e1−t
√
1− t 3t 2
(e) v(t) = , , 14 − 9t + t , to = 7
1 − t2 et−1 − e1−t
2. For each of the following expressions w(t), please find its derivative ẇ(t). When necessary,
please mention any restrictions on the domains of either w or ẇ.
(a) w(t) = h 3t2 − 4t + 7, e2 , cosh(t2 ) i (d) w(t) = h ln(1 + 4t2 ), t ln(1 + 4t2 ) i
3. For v(t) = h e−t cos 2t, e−t sin 2t i, please find v̇(t). Can you see an orthogonality relation-
ship between v(t) and v̇(t)?
4. For v(τ ) = h 6τ 2 − 5, tan τ i, please find the integral of v over the interval [0, t]. What is
domain D of the function defined by this integral?
Answer: h 2t3 − 5t, 1 − ln(cos t) i; D = (−π/2, π/2).
5. Consider the vector equation v̇(t) + v(t) = h 1, 0 i . Verify that v(t) = h 2e−t + 1, 3e−t i
satisfies this equation along with the condition that v(0) = h 3, 3 i .
49
6. For each parameterization γ(t), please find an equation for the corresponding curve γ in
R2 in terms of just x and y
√
(a) γ(t) = ( 5t2 , 5t ), t ∈ [−3, 5]
7. Please parameterize each of the following curves in R2 using sine and cosine functions.
(a) x2 + y 2 = 4
(c) 3x2 + 2y 2 = 12
Answer: (a) x = 2 cos t, y = 2 sin t, t ∈ [0, 2π). There are many other parameterizations.
8. Describe the curve γ in R3 traced out by the parameterization γ(t) = ( 1−t2 , t(1−t2 ), t ) .
9. For x(t) = h cos t, sin t, t i, please find the velocity, speed, acceleration, and unit tangent
and unit normal vectors. Notice that for the direct calculation of the unit normal vector,
only the vector portion of the derivative of the unit tangent vector is needed, since the
scalar portion is cancelled out when one computes the length.
√
Answer: v(t) = h − sin t, √ cos t, 1i, s0 (t) = |v(t)| = 2, a(t) = h − cos t, − sin t, 0i,
T (t) = h − sin t, cos t, 1i/ 2, N (t) = h − cos t, − sin t, 0i.
12. For a baseball (particle) whose acceleration vector is a(t) = h 0, 0, −32 i, please find the
velocity and position (give the most general answer possible). This is projectile motion
neglecting air resistance and using g = 32 ft/s2 as the acceleration due to gravity. Notice
that without air resistance, throwing a curve ball is not possible. Hint: Suppose that
the initial position of the baseball is x(0) = xo = h ao , bo , co i, and the initial velocity is
v(0) = v o = h uo , vo , wo i where all of the components of xo and v o are constant.
50
13. (a) Please calculate velocity, speed and the unit tangent vector T (t) for x(t) = h t2 , t3 i.
√
Answer: s0 (t) = |t| 4 + 9t2 .
(b) Now calculate the unit normal vector N (t), the tangential and normal components
of acceleration, aT (t) and aN (t), and the curvature κ(s(t)).
Answer: aN (t) = 6th −3t, 2 i/(4 + 9t2 ), κ(s(t)) = 6/t(4 + 9t2 )3/2 .
14. Calculate T (t), N (t), the tangential and normal components √ of acceleration, and the
curvature for the vector function x(t) = h cos t, sin t, t i/ 2. Notice that these calculations
are much simpler for this circular helix then for the elliptical one in Example 2.11.
15. Please draw (either by hand or using your favorite calculator or computer software) the
curves in Example 2.12 and Example 2.14.
16. Suppose that a particle is moving along a curve so that its arc length is a linear function
of time: s(t) = σt for some constant σ.
17. For x(t) = h t2 , 2t3 /3 i, please find α, ω and the arc length of the curve traced out by this
vector function x, first for (a) t ∈ [1, 4], then for (b) t ∈ [−2, 2].
18. Consider the graph of y = f (x) = x2 /2 between the points (0, 0) and (1, 1/2). What is
the length of this graph (its arc length as a curve)? Hint: One can always parameterize
the graph of y = f (x) as γ(t) = (t, f (t)), i.e., x = t and y = f (t), then choosing the
appropriate values for the beginning and ending points in t.
19. Consider the curve traced out by the vector function x(t) = h t, t2 i, as in Example 2.16.
Please find s(t). Hint: Computing s(t) by hand requires a trig substitution, writing sec3 θ
as sec θ(1 + tan2 θ), then a double integration by parts. To avoid all this, one can use the
internet, computer software, or a good table of integrals.
20. Consider the helix traced out by the vector function x(t) = h 2 cos t, 2 sin t, 3t i. Find
s(t), T (t), N (t), aT (t), aN (t) and κ(s).
21. Suppose that for a certain curve, s(t) = 5t and curvature is constant κ = 2. Please find
|a(t)| for the vector function that traces out this curve.
51
23. Suppose that for t > 0, the arc length of a certain curve is s(t) = t2 . Please find the
curvature for this curve as a function of arc length in terms of the length of acceleration.
p √
Answer: κ(s) = |a( s)|2 − 4/4s .
52
Chapter III
Multivariable
Derivatives—Differentiation in Rn
3.1 Limits in Rn
The most important new concept introduced by calculus is the concept of limit; “limit” is the
thing that Leibniz and Newton never understood, and it was up to Cauchy to define the concept
about a century later∗ . This section discusses limits in Rn and explores the similarities and
differences between these limits and single-variable limits—those in R or for vector functions.
Before we can define the multivariable limit, we need to state the definition of distance in Rn :
Thus distance is a function that assigns a nonnegative real number to any two points or vectors
x and y in Rn , and when n = 2 or n = 3, this nonnegative real number is the classical Euclidean
distance. Hence this is the traditional idea of distance, given in mathematical terms. For n = 2,
the diagram in Figure 3.1 may be helpful, as should the next example.
∗
Augustin-Louis Cauchy (1789-1857) was a French mathematician, scientist and engineer who worked to
make the basic concepts of calculus understood via rigorous proof.
53
2
(y1, y2 )
1
( x1, x2 )
Figure 3.1: Distance in R2 where the dimensions and the axes are numbered. What point is
at the corner of the right angle of the triangle? Where are x and y? Where should d(x, y) be
placed on this diagram?
Example 3.1: For which values of β is the distance between the points x = (3, −1, 2) and
y = (β, 1 − β, 3) less than 2? Which vector v has its tail at x and its head at y, and what is
its length?
Answer: We must decide for which β is d(x, y) < 2, which means that (3 − β)2 + (−1 − (1 −
β))2 + (2 − 3)2 < 4. Simplifying this inequality, 2
√ one finds that it is equivalent to β − 5β + 5 < 0.
Since the roots of this quadratic are (5 ± 5)/2, we are looking for
√ √ !
5− 5 5+ 5
β∈ , .
2 2
The vector with (β, 1 − β, 3) at its head and (3, −1, 2) at its tail is v = hβ − 3, −1 − β − (−1), 3 −
2i = hβ − 3, 2 − β, 1i. Finally,
p the length of this vector is exactly thep distance between these
two points: |v| = d(x, y) = (3 − β)2 + (−1 − (1 − β))2 + (2 − 3)2 = 2β 2 − 10β + 14
An observant reader might have noticed that there is a connection between distance as it is
defined here and the length of a difference vector (see Figure 3.2):
(y1 , y2)
y−x
1
( x1 , x2)
Proposition 3 From the definitions above for distance and vector length,
d(x, y) = d(y, x) = |x − y| = |y − x| .
54
Proof: From their definitions, all four of the expression in this proposition are equal to
p p
(x1 − y1 )2 + (x2 − y2 )2 + ... + (xn − yn )2 = (y1 − x1 )2 + (y2 − x2 )2 + ... + (yn − xn )2 .
Our distance function can now be used to give a mathematical definition of limit. In every-day
language, this definition precisely says that if x is sufficiently close to xo , then f (x) is as close
as we want to a limiting value, L.
Remarks:
1. If the reader has seen the ,δ-definition of the limit in single-variable calculus, there is an
important observation about this definition: It is exactly the same as the single-variable
case! If the reader has only seen a more-intuitive development of the concept of limit,
the important thing to understand is that this definition is just a mathematically-exact
statement of the intuition. It says that whenever x is close to xo , then f (x) is close to L.
Or in other words, you tell me how close you want f (x) to be to L (the ), and then I’ll tell
you how close x must be to xo to guarantee the result you want (the δ). A single-variable
limit is depicted in Figure 3.3; a multivariable depiction is given in Figure 3.4.
x0 x
Figure 3.3: A single-variable limit. How close must x be to xo to guarantee that f (x) is as close
to L as we want—where do the δ and go in this diagram? Notice that the portion of the curve
y = f (x) above the interval on the x-axis lies entirely within the band centered on L coming
across the y-axis.
2. As written, this definition only applies to functions defined for all x ∈ Rn ; often D :=
Domain(f ) ( Rn , i.e., f is not defined at some points in Rn . In this case, the definition
is essentially the same, at least if D is a open set, except that xo must be in or adjacent
to D and f : D → Rm .
55
f
2 2
L
f (Bοδ(xo))
xo
Bε(L)
Bοδ(xo)
1 1
Figure 3.4: A multivariable limit. Here the function f sends x-values from a domain in R2 on
the left to y-values in some range in R2 on the right, that is y = f (x). Let xo be any point
in the domain on the left. The ball of radius δ centred at xo excluding the centre xo itself is
◦
denoted as Bδ (xo ). If the limit of f as x → xo exists, then for some δ sufficiently small, f
◦
sends every point from Bδ (xo ) into the ball of radius centred at L, B (L). In symbols, this
◦
is written f (Bδ (xo )) ⊂ B (L). Again, where do the δ and go in this diagram?
3. f (xo ) is not necessarily defined! It might be, but it does not have to be. Also x must
be different from xo ; the two must be close, but different.
4. For a given , if the limit exists, there will be many choices for δ. Indeed if a certain δ
works, then so does δ/2, δ/4, δ/10 and so forth.
Notation: If the limit of f as x approaches xo exists and equals L, then we can write
lim f (x) = L
x→xo
Example 3.2: Some limits are rather easy to evaluate without carefully referring to the
definition. For example, if x = (x, y) ∈ R2 , to evaluate
lim x cos(y) + y
(x,y)→(1,−π)
one needs only to note that as y approaches −π, cos(y) approaches −1. Since x is approaching
1, the entire expression is approaching 1(−1) − π = −1 − π, and thus L = −1 − π. Notice that
56
in evaluating this limit we have implicitly used that cosine and indeed the entire expression is
continuous. Continuity will be discussed in the next section.
One can use our δ- definition to that a limit exists for a specific function, but this is not the
most important reason for having a clear, mathematical definition. The real advantage of such
a definition is that it can be used to prove that limits have certain important properties, for
example:
Proof: Suppose there are two values, that is, suppose that lim f (x) = L1 and also
x→xo
lim f (x) = L2 . From the definition of limit, given any > 0, ∃ δ > 0 such that when
x→xo
0 < d(x, xo ) < δ, then d(f (x), L1 ) < /2 and d(f (x), L2 ) < /2. But from the triangle
inequality,
d(L1 , L2 ) ≤ d(L1 , f (x)) + d(f (x), L2 )
=
So d(L1 , L2 ) is less than any positive real number, which means that d(L1 , L2 ) = 0 . This is
only possible if L1 = L2 , so there can not be two distinct values for the limit.
Again, as in the single-variable case, one of the most interesting situations occurs when there is
a quotient with the numerator and denominator both going to zero as x approaches xo . This
situation is called a 0/0 indeterminate form. Other indeterminate forms are possible in the
multivariable case, but since derivatives are always of the 0/0 indeterminate form, this is the
only indeterminate form studied here. We will first consider limits of this form when there is
no single limiting value, implying that the limit does not exist. Later we will discuss computing
limits of this form when a single limiting value does exist.
When the Limit Does Not Exist (DNE): The uniqueness of the limit value as proven in
Theorem 1 has a very important practical consequence: if one appears to obtain distinct values
for a limit as one approaches xo ∈ Rn along two distinct paths, then the limit in fact does not
exist. This sort of thing happens with single-variable limits too—when the value from the left
differs from the value from the right. But in the multivariable case, there are an uncountable
number of ways to approach xo (see Figure 3.5), and for the limit to exist, all of these approach
paths must yield the same limiting value.
57
2
x
0
Figure 3.5: Three Distinct Paths to xo . Can you draw another path?
In this case, both the numerator and the denominator go to zero as (x, y) → (0, 0). So this
limit is in 0/0 indeterminate form.
Consider the value as one approaches the origin along the path y = 0, x > 0, x → 0. Along
this path, the numerator is always zero, while the denominator is positive. So
xy x(0) 0
lim = lim 2 = lim 2 = lim 0 = 0
x→0+ , y=0 x2 +y 2 x→0+ x + 0 2 x→0+ x x→0+
Now try the path x = 0, y > 0, y → 0; the same sort of analysis leads one to
xy 0
lim+ = lim 2 = 0 .
x→0 x2 +y 2 x→0 x
+
y=0
So at first glance, one might be lead to suspect that this limit exists and L = 0. But two paths
are surely not all paths. Consider the path y = x, x > 0, x → 0. Here†
xy x(x) x2 1
lim+ 2 2
= lim 2 2
= lim 2
= lim = 1/2
x→0 x +y x→0 x + x
+ x→0 2x
+ x→0 2
+
y=x
Thus since the value of this limit would depend on the path of approach, the limit itself does
not exist (DNE).
Where does the name of this problem come from? A three-dimensional graph (see Figure 3.6
of the expression in this problem can answer this question, as well as help make clear why this
limit is path-dependent. Imagine two railroad lines travelling on the three-dimensional surface
z = xy/(x2 + y 2 ), one on y = x and the other on y = −x. Notice that a (two-dimensional,
zero-width) train on the y = −x line will pass under another train on the y = x line as both
cross through (0, 0).
†
Keep in mind that one can divide through by x2 exactly because x 6= 0.
58
Figure 3.6: The surface z = xy/(x2 + y 2 ) near the origin.
There are, of course, other ways that limits can fail to exist, for example, if f (x) becomes
unbounded or oscillates wildly as x approaches xo . These cases may be familiar from single-
variable calculus, but they will also be discussed in the section on types of discontinuities below.
When the Limit Does Exist: The above example more or less gives the method for showing
that a limit does not exist: find two or more paths of approach along which the limit would
either not exist or take on differing values. Unfortunately it is often more difficult to show that
a limit that exists really does exist. To begin with, it is never possible to show that a limit does
exist by considering two or more paths—it always might be the case that there is still another
path where the limit would approach a distinct value. One can never check every possible path
separately because there are uncountably many of them.
The next two examples present two key principles for evaluating limits that do exist. The first is
to reduce (if possible) a multivariable limit to a single-variable limit. The second is to use polar
coordinates (or spherical coordinates in three or more dimensions). Neither of these techniques
works in all cases, but both are useful for a wide variety of problems.
Again, both the numerator and the denominator go to zero as (x, y) → (0, 0), so this limit is
also in indeterminate form. Since the numerator is a trig expression while the denominator is
a (very simple) polynomial, there is no way to simplify this quotient. On the other hand, the
quotient is reminiscent of sin x/x and the limit of this latter quotient is a famous one:
sin x
lim =1
x→0 x
(see almost any single-variable calculus text for a proof of this result). Can knowledge of this
latter limit be used to compute the limit in this example? The answer is “yes,” and to see how
to do this, one must recall that the limit of a product is the product of the limits. This allows
regrouping after multiplying the numerator and the denominator by y provided that y 6= 0,
59
followed by the substitution u = xy:
sin xy sin xy
lim = lim lim y
(x, y) → (0, 0) x (x, y) → (0, 0) xy (x,y)→(0,0)
x 6= 0 x, y 6= 0
sin u
= lim lim y
u→0 u (x,y)→(0,0)
= (1) (0) = 0
The substitution u = xy allows one to treat the first limit in the product as a single-variable
limit, making it relatively easy to compute. Notice that writing the limit of a product is the
product of the limits depends on the latter two limits both existing, which fortunately they do
in this example. Also one should note that y = 0 is allowed in the original limit, so this case
must be considered too. Again fortunately sin xy/x is zero when y = 0.
The computation in the previous example required the following technical proposition:
• lim α(x)f (x) = lim α(x) lim f (x)
x→xo x→xo x→xo
lim f (x)
f (x) x→xo
• For m = 1, lim =
x→xo g(x) lim g(x)
x→xo
Proof: Let us start with the limit of a sum: Since the limits on the right exist, let Lf :=
lim f (x) and Lg := lim g(x). Given any > 0, pick δ sufficiently small so that when
x→xo x→xo
60
d(x, xo ) < δ, both d(f (x), Lf ) < /2 and d(g(x), Lg ) < /2 (this is what the definition of limit
gives us). Then by Proposition 3 and the triangle inequality (Proposition 1),
d(f (x) + g(x), Lf + Lg ) = |(f (x) + g(x) − (Lf + Lg )| = |(f (x) − Lf ) − (Lg − g(x))| ≤
|f (x) − Lf | + |Lg − g(x)| = d(f (x), Lf ) + d(g(x), Lg ) ≤ /2 + /2 = .
So, again by the definition of limit, lim (f (x) + g(x) = Lf + Lg = lim f (x) + lim g(x).
x→xo x→xo x→xo
The proofs of all the other parts of this proposition are left as exercises (see Exercise 3.7).
Answer: Notice that since limits of vector expressions are computed componentwise, there
are in fact two separate limits to consider here. The limit for the first component is of 0/0
indeterminate form; for the second component, there actually is no indeterminate form. One
could proceed to compute these limits separately, but the expression in the second component
gives a hint as to what to do with the first component:
√ √
e3xy
3 + xy − 3 − xy 1 (3 + xy) − (3 − xy) 3xy
,√ √ = √ √ ,e
3xy 3 + xy + 3 − xy 3 + xy + 3 − xy 3xy
1 1 3xy
= √ √ ,e
3 + xy + 3 − xy 3
provided that no denominator is zero. The key is that the numerator of the first component and
the denominator of the second are conjugates, so multiplying the numerator and denominator
of the first component by the sum of the square roots allows us to simply the first component.
So using the second point from Proposition 4 above, we can compute
√ √
e3xy
3 + xy − 3 − xy
lim ,√ √
(x, y) → (0, 0) 3xy 3 + xy + 3 − xy
x, y > 0
1 1 3xy
= lim √ √ lim ,e
(x, y) → (0, 0) 3 + xy + 3 − xy (x, y) → (0, 0) 3
x, y > 0 x, y > 0
√
= 3/18 h1, 3i .
61
Notice that this looks rather like the first example of this section, Example 3.1.2, where the
limit did not exist. Is the story the same here? The answer is “no,” and perhaps the best way
to see this is to use polar coordinates. Let
x = r cos θ y = r sin θ
As r goes to zero, the value of theta may vary between 0 and 2π, but both cos θ and sin θ are
bounded between −1 and 1. Thus the extra power of r in the numerator forces the limit to
zero:
x2 y
lim =0
(x,y)→(0,0) x2 + y 2
By the way, it is worth noting that polar coordinates could also have been used in the first
example above where the limit DNE:
xy
lim = lim cos θ sin θ = cos θ sin θ
(x,y)→(0,0) x2 + y 2 r→0
since the expression in the limit does not depend on r. But since the apparent value of the limit
depends on θ (i.e., the direction of approach), this limit can not exist.
The above examples give some sense of how to compute limits that appear to exist, but they
do not attempt to show how to handle all possible cases where a multivariable limit is in
indeterminate form. Other techniques may be needed. For example, algebraic simplifications
are often helpful in the multivariable case as it is in the single-variable case, and although there
is no multivariable l’Hop̂ital’s rule, one can still use this tool after reducing a multivariable limit
to a single-variable limit as in the second example above.
Putting this limit into standard polar coordinates does not help, and considering different paths
does not help either, since this limit does exist. The key step in simplifying this limit is factoring
the numerator and cancelling the common factor of x − y. After doing this, one finds that this
limit is 2.
62
3.1.3 Something that Does Not Work
A tempting approach for trying to show that a limit exists is to show that f (x) approaches
the same limiting value L as x → xo along every line that passes through xo . Specifically, if
f : R2 → R, and if xo = (0, 0), one would consider y = mx for all values of m and show that
independent of m. This idea is appealing, but unfortunately it is not sufficient. The limit must
take on the same value as x → xo in any manner, along any path, not just along lines.
Consider the follow counterexample: Suppose f (x, y) = 0 for all (x, y) ∈ R2 except for x2 /2 <
y < 2x2 . For the region bounded by these two parabolas, suppose that f = 1. Then the limit
along‡ y = mx is 0 for every m > 0, because every such line is above the upper parabola for x
near zero (see Figure 3.7). On the parabola y = x2 when x > 0, however, f (x, x2 ) = 1. So the
limit
lim f (x, y) DNE .
(x,y)→(0,0)
y y = 2 x2
f =1
y = mx
f =0
y = x 2/2
Figure 3.7: Why testing along lines is not enough to show that a limit exists. Because the line
y = mx (in red) is above the upper parabola for every m > 0 when x is sufficiently close to 0,
the limit of f is 0 along any of these lines. But the limit is 1 along y = x2 , so in general, the
limit DNE.
‡
One would also have to consider the line x = 0 separately.
63
3.2 Continuity in Rn
Now we turn our attention to the concept of continuity, a concept very related to that of limit.
In fact, at first glance, the definitions look the same. Can you spot the two differences?
Remarks:
1. This definition is just like the definition of the limit, except that here d(x, xo ) = 0 is
possible, so we must now consider x = xo and thus f (xo ) must be defined. In addition
now the value of the limit must be the value of the function: L = f (xo ). This is all
exactly the same as the single-variable case.
x0 x
Figure 3.8: Single-Variable Continuity. This diagram tells the story in both the single-variable
and multivariable cases. Where is L in this case? Where are δ and ?
2. If lim f (x) DNE, then f can not possibly be continuous at xo , and there is no need
x→xo
for any further consideration. So since the limit DNE in the first example in the previous
section, there is no need to consider further whether or not the railroad underpass function
is continuous at the origin. On the other hand, if lim f (x) exists and has the value L,
x→xo
then f can be made continuous at xo (if it is not already) by changing the value of f (xo )
to L.
64
That is, continuity is equivalent to being able to exchange the order of taking the limit
and evaluating the function.
Since this is indeed the value that one finds by plugging (1, −π) into x cos(y) + y, the function
f (x, y) = x cos(y) + y is continuous at (1, −π). Unfortunately many limit/continuity problems
are not as easy as this.
sin xy
lim =0
(x, y) → (0, 0) x
x 6= 0
sin xy
but this is, in itself, not enough to make continuous. The problem is that as written,
x
sin xy
is undefined at the origin (do you see why?). But since the limit exists, it is possible to
x
overcome this problem by extending the domain to include the origin. Thus if one defines
sin xy
x 6= 0
f (x, y) = x
y x=0
The next proposition is really just a restatement of Proposition 4 where now the value of each
limit is the value of the function at x = xo .
Proposition 5 For some open domain D ⊂ Rn , suppose that f , g : D → Rm are both contin-
uous at xo ∈ D, and suppose that α : D → R is a scalar valued multivariable function that is
also continuous at xo . Provided that there is no division by zero, then:
65
• (αf )(x) := α(x)f (x) is continuous at xo .
f f (x)
• For m = 1, (x) = is continuous at xo .
g g(x)
Remark: Proposition 5 may seem formulaic, even trivial, but it has many important con-
sequences. For example, it means that any polynomial or vector with polynomial components
must be continuous no matter how many variables are involved.
Thus the only point not in the domain of f is (x, y) = (1, 2). If the numerator is nonzero
at (x, y) = (1, 2), then there is no way to extend the definition of f at this point to make it
continuous. For this f , however, the numerator is in fact zero at (x, y) = (1, 2), so
lim f (x, y)
(x,y)→(1,2)
To determine how f behaves near (x, y) = (1, 2), it is helpful to transform the domain variables;
define
x̃ := x − 1 ỹ := y − 2 .
Substituting x = x̃ + 1 and y = ỹ + 2 into f , one finds that
x̃ỹ 2 + x̃2 + ỹ 2 x̃ỹ 2
f (x̃ + 1, ỹ + 2) = 2 2
= 2 + 1,
x̃ + ỹ x̃ + ỹ 2
66
and thus using polar coordinates x̃ = r̃ cos θ̃ and ỹ = r̃ sin θ̃
lim f (x, y) = lim f (x̃ + 1, ỹ + 2) = lim (r̃ cos θ̃ sin2 θ̃ + 1) = 1 .
(x,y)→(1,2) (x̃,ỹ)→(0,0) r̃→0
The final value of this limit is obtained by introducing polar coordinates as in Example 3.6.
Hence this function can be extended to be continuous for all of R2 :
2 2
xy + x − 4xy + 2x + 1 (x, y) 6= (1, 2)
x2 + y 2 − 2x − 4y + 5
f (x, y) = .
1 (x, y) = (1, 2)
For real-valued multivariable functions, there are four main types of discontinuities: removable,
jump, pole and essential. It is possible to combine these to achieve a discontinuities not of any
one of these types, but nonetheless these four are important to know.
Examples of removable discontinuities are given in Example 3.9 and 3.10. There the function or
expression is defined and continuous except at a single point or along a curve. But in addition,
limits approaching this point or curve exist and are all equal, and one can use the values of
these limits to “remove” the discontinuity and extend the definition of the function or expression
continuously.
Jump discontinuities, in contrast, can not be removed. These exist typically along a curve, and
while the limits exist from both sides up to this curve, the limiting values are different on the
two sides of the curve. Such a function was given in the example in Section 3.1.3; there the
function takes on the value 1 on one side of the parabolas and 0 on the other side.
In contrast to jumps, poles typically occur at points, and more importantly, the function or
expression becomes unbounded as this point is approached, i.e., the function or expression
diverges to ±∞ as the point is approached. A simple example of a pole is
1
f (x, y) =
x2 + y2
which is defined on all of R2 except at the origin. As the origin is approached, this f diverges
to +∞.
The final discontinuity type, an essential discontinuity, is characterized by wild oscillations near
a point or curve. Perhaps the best example of an essential discontinuity is the function
!
1
f (x, y) = sin p
x2 + y 2
near (x, y) = (0, 0). This f has all of the oscillation of sin(r) for r ∈ (1, +∞) compressed into
the circular disk of radius 1 centred on the origin.
67
3.3 The Derivative in Rn
Everyone who has taken a basic calculus course should be familiar with the following definition:
f (x) − f (xo )
lim .
x→xo x − xo
df
When this limit exists, it is called the derivative and denoted by f 0 (xo ) ≡ (xo ).
dx
This definition for the derivative in R is central to everything in single-variable calculus, but
unfortunately, it is not easy to extend directly to the multivariable case where n > 1, i.e., the
derivative in Rn . To reach a useful multivariable definition for the derivative, it is useful to
think geometrically.
In single-variable calculus, if f 0 (xo ) exists, then the graph y = f (x) must have a unique non-
vertical tangent line at xo . This means both that f is continuous at xo and that y = f (x) has
no jumps or corners at xo (see Figure 3.9). This attribute of the single-variable derivative is
f g
h
x0 x x0 x x0 x
Figure 3.9: The graphs of three functions. Which functions are continuous at xo ? Which
functions are differentiable at xo ?
in fact much more helpful in defining the multivariable derivative. To make the presentation
easier to visualize, suppose n = 2 and m = 1.
Remarks:
1. The above definition easily generalizes to other values of m and n, but it is then no longer
possible to easily represent the tangent “plane” on a two-dimensional page.
68
√ √
Figure 3.10: The tangent plane for the surface z = 4 − x2 − y 2 at the point ( 2/2, 2/2, 3).
What is the equation for this plane (see §3.2) ?
2. The above definition is intuitively appealing, but it suffers from one serious mathematical
flaw: up to this point we have not precisely defined what “a unique nonvertical tangent
plane” is. Perhaps surprisingly, this definition is not too difficult to make, but does require
some new concepts that we turn our attention to now.
Before we can meaningfully describe what it means to have a unique nonvertical tangent plane
and the entire derivative, we must define and study an important part of the story: the partial
derivative.
Remark: This definition explicitly contains the key idea behind the partial derivative: that
all of the variables other than the one mentioned in the derivative notation are held constant.
For those familiar with American football, this is similar to how the offensive must be set just
69
before a play begins: all players except one must be stationary; one player only is allowed to
be in motion. In the partial derivative, ξ is this moving player.
Notation: There may be no concept in mathematics with more notation than partial deriva-
tive. If f : R2 → R is a real-valued function given by z = f (x, y), then the following are all the
equivalent:
∂f ∂z
≡ ≡ fx ≡ zx ≡ ∂x f ≡ ∂x z .
∂x ∂x
df
The fractional notation is an extension of the standard Leibniz derivative notation dx and
the subscript notation is an extension of the standard Newton prime (or dot) notation f 0 .
The symbol ‘∂’ is a Cyrillic script ‘d’ and is perhaps the only Cyrillic letter widely used in
mathematics today.
ξ−
(x)(ξ + x)
= 3y 3 sin z lim
ξ→x ξ−
x
Notice that once the factors involving y and z are factored out, what remains is just a single-
variable derivative—something that should be familiar from previous study. The two other
partial derivatives, fy and fz can be computed in the same way, and one finds that they too
are just what one gets by holding the other two variables constant and carrying out a standard
single-variable differentiation with respect to the variable in question, either y or z. Indeed
because of this, all of the standard rules of calculus (product rule, chain rule, etc.) apply
in the usual way, and we can apply these rules to compute partial derivatives, rather than
always computing the above limit. Still it should be kept in mind that in any situation where
the standard rules do not apply (for example, if one wants to compute a partial derivative of
f (x, y) = xy|xy| at (x, y) = (0, 0)), one must fall back to computing this limit. But either by
explicitly computing the limit, or by applying the standard rules, one can find that
∂f
(x, y, z) = 9x2 y 2 sin z
∂y
and that
∂f
(x, y, z) = 3x2 y 3 cos z .
∂z
70
3.3.2 Higher Partial Derivatives
∂2f
∂ ∂f
2
:=
∂y ∂y ∂y
again provided all of the limits implicit in this definition exist. In simple words, this is just the
second partial with respect to y and it just means to differentiate with respect to y twice.
∂2f ∂2f
∂ ∂f
:= = −2 csc2 x =
∂x∂t ∂x ∂t ∂t∂x
Notation: There is a bit of an ambiguity here: for subscript notation, when the order matters,
which order does one use? In this (and most, but not all) publications, the convention is
∂ ∂f
fxy ≡ .
∂y ∂x
If it matters, the reader needs to be certain that a given publication follows this standard.
Example 3.3.2 leads to an obvious question: Are mixed partial derivatives always the same no
matter which order the various derivatives are taken? It turns out that, perhaps surprisingly,
the answer is no unless there is an important additional hypothesis:
∂2f ∂2f
(x, y) = (x, y)
∂x∂y ∂y∂x
The proof of this theorem is not particularly difficult, but it is also not particularly helpful, so
it is not included here; see [5, pp. 235-236].
71
3.3.3 Tangent Planes and Unique Tangent Planes
Suppose we are given a continuous surface z = f (x, y) in R3 . Can we find the equation for the
tangent plane to this surface at some point (xo , yo , zo ) = (xo , yo , f (xo , yo )) on the surface? For
example, we find the equation for the tangent plane to the surface z = 4 − x2 − y 2 at the
√ can √
point ( 2/2, 2/2, 3) as shown in Figure 3.10?
Ax + By + Cz + D = 0
Any plane in R3 is defined by this equation for some choice of the constants A, B, C and D.
If C = 0, then the plane is vertical, i.e., perpendicular to the x, y-plane, hence not the sort of
plane we are looking for here. So assume that C 6= 0; one can then divide through by C and
solve for z. The resulting equation has the form
z = d + ax + by
where a := −A/C, b := −B/C and d := −D/C. The question now is how do we choose a, b
and d so that this becomes the equation of the tangent plane for the surface z = f (x, y) at the
point (xo , yo , f (xo , yo )).
z slope: fx(x0 ,y0)
z = f (x,y ) z = f (x,y0)
y = y0
x
Figure 3.11: The partial derivative fx at the point (xo , yo ) is the slope of the tangent line with
yo fixed. The surface is z = f (x, y). The curve z = f (x, yo ) (in red) is the intersection of this
surface and the plane y = yo . The point of tangency (xo , yo , zo ) is in blue, while the tangent
line is in green.
First let us consider what a must be. Consider the curve lying on the surface z = f (x, y) passing
through the point (xo , yo , f (xo , yo )) with y fixed at y = yo (cf. Figure 3.11). The equations for
this curve is z = f (x, yo ), y = yo , and, of course, the slope of this curve is the derivative at
(xo , yo ): zx (xo , yo ). So the line tangent to this curve (and therefore tangent to the surface) is
the line passing through this point with slope zx (xo , yo ) (again cf. Figure 3.11). Since a is the
72
slope of the line z = d + ax + byo (recall that y = yo is here fixed),
∂z
a = zx (xo , yo ) ≡ (xo , yo )
∂x
By similar reasoning, one finds that b = zy (xo , yo ). To find d, notice that the tangent plane
must also pass through the point (xo , yo , f (xo , yo )). Thus
∂z ∂z
z = f (xo , yo ) = d + axo + byo = d + (xo , yo )xo + (xo , yo )yo
∂x ∂y
or
∂z ∂z
d = f (xo , yo ) − (xo , yo )xo − (xo , yo )yo
∂x ∂y
Using these values for a, b and d, the equation for the tangent plane is
z = d + ax + by
∂z ∂z ∂z ∂z
= f (xo , yo ) − (xo , yo )xo − (xo , yo )yo + (xo , yo ) x + (xo , yo ) y
∂x ∂y ∂x ∂y
∂z ∂z
= f (xo , yo ) + (xo , yo )(x − xo ) + (xo , yo )(y − yo )
∂x ∂y
Example 3.13: Find the tangent plane to the surface z = 4 − x2 + y 3 /3 at the point (1, 3, 12).
In this case, zx = −2x and zy = y 2 . So zx (1, 3) = −2 while zy (1, 3) = 9, which implies that the
tangent plane to z = 4 − x2 + y 3 /3 at (1, 3, 12) is
z = 12 − 2(x − 1) + 9(y − 3)
= −2x + 9y − 13
The equation presented above defines the unique nonvertical tangent plane for a surface z =
f (x, y), provided the surface actually has a tangent plane. That is, no other equation can give
the tangent plane. But one might ask if there are any cases where all of the terms in this
equation can be computed, but the surface does not actually have a tangent plane. Could a
tangent plane not exist even though it would be unique if it did? The answer is “Yes” and the
following example presents such a case.
This surface is shown graphically in Figure 3.12. Does this surface have a tangent plane at the
73
y y=2x
f=0
f=1
f=0 y=x/2
1
f=0
−1 1 x
f=0 −1 f=0
Figure 3.12: A surface with no tangent plane, but where the equation for the tangent plane
can be computed. The surface is z = f (x, y) = 0 everywhere, except in the wedge between the
two rays in the first quadrant. In this wedge the surface varies smoothly, rising to a maximum
value of f = 1 on the dashed line. What is the equation of the dashed line?
origin (0, 0, 0)? The answer must be no, since the surface is not even continuous there. None
the less, since f (x, y) ≡ 0 on both the x and y axes, each term in the above equation can be
found, and indeed, a = 0, b = 0 and c = 0. So each of the partial derivatives can be found even
though this surface has no tangent plane at the origin.
The final example of the previous section gets at a sort of mathematical paradox: it is possible
to compute the equation of a plane that is the only possible tangent plane, even though no
tangent plane exists. In other words, uniqueness does not imply existence. How can one be
certain that a tangent plane exists?
To answer this question, we must observe what works in the basic example above, but fails in
the final example. The difference is that the slopes of tangent lines to curves in the first surface
that pass near (xo , yo , f (xo , yo )) are all close to the slope of the tangent plane, while in the final
example, the slopes of some curves passing near the origin are far from the zero slope of the
plane z = 0. So in the first case, the tangent plane is a good approximation to the surface, both
in the sense that points of the surface are close to the plane and in the sense that the slopes
are close, while in the latter case, the plane is not at all a good approximation. The following
definition makes this general idea mathematically exact:
Definition: For some open domain D ⊂ R2 , suppose that f : D → R, and let (xo , yo ) ∈ D.
Then f has a unique nonvertical tangent plane at (xo , yo , f (xo , yo )) provided that fx (xo , yo ) and
74
fy (xo , yo ) both exist and that
h i
∂f ∂f
f (x, y) − f (xo , yo ) + ∂x (xo , yo ) (x − xo ) + ∂y (xo , yo ) (y − yo )
lim =0
(x,y)→(xo ,yo ) d((x, y), (xo , yo ))
Notice that quotient inside the limit is exactly the difference between the function and the only
possible tangent plane, all divided by the distance between the base point (xo , yo ) and the point
(x, y) where the function and the plane are being evaluated.
Example 3.15: Show that the function defined by f (x, y) = 4 − x2 + y 3 /3 has a unique
nonvertical tangent plane at (1, 3, 12).
Answer: From Example 3.13 above, the only possible tangent plane is z = −2x + 9y − 13. So
by the definition of unique nonvertical tangent plane, the limit that must be zero is
4 − x2 + y 3 /3 − [−2x + 9y − 13]
lim p
(x,y)→(1,3) (x − 1)2 + (y − 3)2
The expression inside the limit can be written in terms of shifted polar coordinates: define
x̃ := x − 1 and ỹ := y − 3, and convert (x̃, ỹ) to polar coordinates using x̃ = r̃ cos θ̃ and
ỹ = r̃ sin θ̃. This shift is similar to the one in Example 3.10 above. Notice that this choice of
coordinates is dictated by the distance expression in the denominator; this choice reduces the
denominator to just r̃:
ỹ 3 + 9ỹ 2 − 3x̃2
= p
3 x̃2 + ỹ 2
Thus
4 − x2 + y 3 /3 − [−2x + 9y − 13]
1 2 3
lim p = lim r̃ sin θ̃ + 3r̃ sin2 θ̃ − r̃ cos2 θ̃ = 0
(x,y)→(1,3) (x − 1)2 + (y − 3)2 r̃→0 3
since sin θ and cos θ are bounded between −1 and +1. So in this example, f has a unique
nonvertical tangent plane at (1, 3, 12).
75
3.3.5 Multivariable Derivative
Everything discussed above deals with when a function has a tangent plane and when it is
differentiable. But we have not yet stated what the derivative is in the multivariable setting—
we need one more definition:
The derivative has various meanings in various settings. Among the most important are the
gradient (when f : Rn → R) and the derivative of a vector function (when f : R → Rm ) that
was discussed in the previous chapter. But in general (when f : Rn → Rm ) if f is differentiable,
then the Jacobian matrix gives the linear approximation of f near xo :
where ∗ denotes matrix/vector multiplication and ' should be read “approximately equal to”
in a way that for the moment, we will not make exact.
Answer: Here
−3
f (xo ) =
9
and
2xy cos(zπ) x2 cos(zπ) −πx2 y sin(zπ)
J(f )(xo ) =
3yz 2 3xz 2 6xyz
(x o ,yo ,zo )=(1,3,−1)
−6 −1 0
= .
9 3 −18
76
So the linear approximation in this case is
x−1
−3 −6 −1 0 y−3
f (x) = f (x, y, z) ' +
9 9 3 −18
z+1
or just
−6x − y + 6
f (x, y, z) ' near (1, 3, −1) .
9x + 3y − 18z − 27
This linearization is two hyperplanes, one in each of the two dimensions of the range. These are
hyperplanes rather than just planes in that each is a three-dimensional affine-linear subspace
of four-dimensional space.
77
3.4 The Chain Rule in Rn
At some point in single-variable calculus, students likely learn to remember the chain rule§ as
dy dy dx
=
dt dx dt
with the admonition not to think of this as cancellation of the dx “factors” on the right hand
side. Why cancellation is not the correct view becomes clear when one studies the multivariable
chain rule.
Suppose that f, g, h : R2 → R are three differentiable functions with w = f (x, y), x = g(s, t)
and y = h(s, t). Then the chain rule implies that
∂w ∂w ∂x ∂w ∂y
= +
∂s ∂x ∂s ∂y ∂s
and also
∂w ∂w ∂x ∂w ∂y
= + .
∂t ∂x ∂t ∂y ∂t
The pattern for these chain rules is perhaps best remembered using the diagram in Figure 3.13.
There, derivatives of the top variable w with respect to either of the bottom variables s or t
can be obtained by considering all of the paths through the middle variables x and y.
x y
s t
Figure 3.13: Dependency diagram for this first example. Notice that there are two paths from
the top variable w to, for example, s at the bottom, one through x and the other through y.
78
Example 3.17: Suppose that w = f (x, y) = ex cos y, x = g(s, t) = 2s + 5t and y = h(s, t) =
3s − 7t. Then substituting g and h into f , one finds that w = e2s+5t cos(3s − 7t), and computing
∂w/∂s directly (recall that t is by definition a constant for this computation), one finds that
∂w
= 2e2s+5t cos(3s − 7t) − 3e2s+5t sin(3s − 7t)
∂s
which of course is exactly the first expression above when all x and y are written in terms of s
and t. Similarly
∂w
= 5e2s+5t cos(3s − 7t) + 7e2s+5t sin(3s − 7t)
∂t
which is the second expression.
Of course an example does not prove that the expressions above are valid in general; a theorem
is needed along with a clear proof:
Theorem 3 For some open domain U ⊂ R2 , suppose that f : U → R, and for some open
domain V ⊂ R2 , suppose that g : V → R2 with g(V ) ⊂ U . Suppose that (so , to ) ∈ V , that
xo = g(so , to ) ∈ U , and that g(s, t) = hg1 (s, t), g2 (s, t)i when written componentwise. If f is
differentiable at xo , and g is differentiable at (so , to ), then (f ◦ g)(s, t) := f (g1 (s, t), g2 (s, t)) is
differentiable at (so , to ), and if w = (f ◦ g)(s, t) with x = g1 (s, t) and y = g2 (s, t), then
∂w ∂w ∂x ∂w ∂y
= +
∂s ∂x ∂s ∂y ∂s
and
∂w ∂w ∂x ∂w ∂y
= + .
∂t ∂x ∂t ∂y ∂t
where all of the partial derivatives are evaluated at xo and/or (so , to ). This result can also be
written as
∇(f ◦ g)(so , to ) = J(g)(so , to ) ∗ ∇f (xo , yo )
where J(g)(so , to ) is the Jacobian matrix¶ for the vector field g evaluated at (so , to ).
The proof given here assumes that the partial derivatives are continuous; this assumption is not
necessary for the theorem, but it does simplify the presentation and leads to a more-transparent
proof using the mean-value theorem. For a proof not using this assumption, see [5, pp. 214-215].
Proof: First consider ∂w/∂s; from the definition of partial derivative, the variable t is held
¶
See Section 3.3.
79
constant at to when this partial is computed. So using this definition, one finds that
∂w ∂
= [f (x1 (s, t), x2 (s, t)]
∂s ∂s (so ,to )
where f (x1 (so , to ), x2 (s, to )) was subtracted and added in the numerator. Next, we apply the
mean value theorem from single-variable calculus to both fractions in this sum. Recall that if
F is a differentiable function of a single variable x, then F (x) − F (xo ) = F 0 (c)(x − xo ) for some
c between x and xo . The same result works for multivariable functions, if all but one variable
is fixed:
∂w ∂f x1 (s, to ) − x1 (so , to )
= lim (c1 , x2 (s, to )) +
∂s s→so ∂x1 s − so
∂f x2 (s, to ) − x2 (so , to )
(x1 (so , to ), c2 )
∂x2 s − so
where for k = 1, 2, the value ck is somewhere between xk (so , to ) and xk (s, to ). Finally, using
the continuity of the two partial derivatives, one can take the limit through the sum:
∂w ∂f x1 (s, to ) − x1 (so , to )
= (x1 (so , to ), x2 (so , to )) lim +
∂s ∂x1 s→so s − so
∂f x2 (s, to ) − x2 (so , to )
(x1 (so , to ), x2 (so , to )) lim
∂x2 s→s o s − so
∂w ∂x ∂w ∂y
= +
∂x ∂s ∂y ∂s
The rest of this section expands on the basic multivariable chain rule, presenting by example
several interesting extensions.
Example 3.18: Suppose w = f (u, v), u = g(x, y), v = h(x, y, z), x = φ(r, s), y = ξ(r, t) and
∂w
z = ζ(s, t). If all of the functions are differentiable, use the chain rule to find .
∂t
80
Answer: The variable dependency diagram may be helpful in determining this derivative. From
u v
x y z
r s t
Figure 3.14: Dependency diagram for this second example. Why are there no lines between y
and s, or between v and x?
this diagram, one can see that there are four paths from w to r and three from w to t, thus the
∂w ∂w
chain rule expansion for has four terms, while the one for has three terms:
∂r ∂t
∂w ∂w ∂u ∂x ∂w ∂u ∂y ∂w ∂v ∂x ∂w ∂v ∂y
= + + +
∂r ∂u ∂x ∂r ∂u ∂y ∂r ∂v ∂x ∂r ∂v ∂y ∂r
∂w ∂w ∂u ∂y ∂w ∂v ∂y ∂w ∂v ∂z
= + +
∂t ∂u ∂y ∂t ∂v ∂y ∂t ∂v ∂z ∂t
Example 3.19: In fluid dynamics and materials, one needs to distinguish between the partial
derivative with respect to time and the total or material derivative with respect to time. Suppose
that the velocity vector v for a small fluid or material element is a function of both the position of
this element and time. If the motion is in a two-dimensional plane, then v = v(t, x1 , x2 , x3 ). But
in addition, since the element is moving, its position is also a function of time. So there are func-
tions ξ, η and ζ such that x = ξ(t), y = η(t) and z = ζ(t) meaning that v = v(t, ξ(t), η(t), ζ(t))
Please find a chain rule expansion for the acceleration a := dv/dt, the total derivative of velocity
with respect to time t.
Answer: As before in our discussion of vector functions, acceleration is defined to be the deriva-
tive of velocity with respect to all time dependencies, not just with respect to the first variable
slot. The dependency diagram for velocity is given in Figure 3.15. Let a = ha1 , a2 , a3 i and
v = hv1 , v2 , v3 i. Noting that dt/dt ≡ 1, we can use the chain rule to compute each component
of the acceleration:
81
u
t x y
t
Figure 3.15: Dependency diagram for the velocity vector in fluid dynamics and materials.
∂vi
= + v · ∇vi
∂t
Here hdx1 /dt, dx2 /dt, dx3 /dti = hξ 0 (t), η 0 (t), ζ 0 (t)i is the derivative of position with respect to
time, thus it is the velocity vector v, and ∇ is understood to be the vector of spacial derivatives.
One often see this component equation (III.1) written as a vector equation
dv ∂v
a= = + v · ∇v
dt ∂t
but one must be careful to view the final term on the right as the dot product of v with ∇.
Remark: The presence of the negative sign makes clear that the idea of cancellation of
“factors,” even as a mnemonic, does not work for the multivariable chain rule.
Proof: Since F (x1 , x2 , ..., xn ) = 0 for all (x1 , x2 , ..., xn ) on the hypersurface, both sides of
this equation can be differentiated with respect to xj ; applying the chain rule yields
∂x1 ∂x2 ∂xn ∂
Fx1 + F x2 + ... + Fxn = (0) = 0 .
∂xj ∂xj ∂xj ∂xj
82
Now there are three values for the partial derivatives involving the variables: (1) ∂xk /∂xj = 0
for k 6= i, j, because one holds xk constant for k 6= i, j when differentiating with respect to xj ;
(2) ∂xj /∂xj = 1; and (3) ∂xi /∂xj is the quantity to be found. These values imply that
∂xi
Fxi + Fxj = 0 ,
∂xj
and if one solves for ∂xi /∂xj , one finds the desired result.
Example 3.20: Suppose a certain gas obeys the ideal gas law: P V = nRT where P is pressure,
V is volume, n is the number of gas molecules present, R is the universal gas constant, and T
is temperature. How does volume change as pressure changes if all the other state variables are
held constant?
Answer: Let F (P, V, n, T ) = P V − nRT = 0 be our version of the ideal gas law. We wish to
find ∂V /∂P . Since FV = P and FP = V , the desired partial derivative is ∂V /∂P = −P/V =
−nRT /V 2 . Which of these two version of the answer is preferred depends on which of the state
variables one wishes to work with.
83
3.5 Directional Derivatives
Previously we’ve considered derivatives in certain directions, the coordinate directions. These
are the partial derivatives. This section deals with derivatives in other directions—directional
derivatives.
Remark: The above definition for the directional derivative differs from that found in many
texts in that it is one-sided. Many texts use h → 0 rather than h → 0+ meaning that both
the direction of u and the direction opposite u are both considered. The alternate definition
has the advantage that it agrees with the definition of partial derivative when u lies in a
coordinate direction. But the definition used here allows the directional derivative to exist in
some meaningful cases where it would not if one was required to use a two-sided limit. Consider
the following example:
Example 3.21: Compute the directional derivative for each direction for a certain cone
opening downward with vertex at (0, 0, 1):
p
z = f (x, y) = 1 − x2 + y 2
Answer: It is relatively easily to show that this surface has no unique tangent plane and is thus
not differentiable at its vertex. But regarding the directional derivative, the limit to consider is
f (0 + hu) − f (0)
Du f (0) := lim
h→0+ h
where u = hcos θ, sin θi is an easy way to represent the unit vector in any direction determined
by θ ∈ [0, 2π). Hence p
−h cos2 θ + sin2 θ
Du f (0) := lim = −1
h→0+ h
independent of θ and the direction of u. This is what one would expect standing on the pointed
vertex of this cone: Every direction is the same and equally downhill.
One might think that having the same slope in each direction is in some sense the typical
case—it is not. This is in fact the singular case where a generally smooth surface is not smooth
at a specific point (in the previous example, this is the point of the cone), and thus it is not
84
differentiable at a specific point. The generic case where a surface is smooth at and near a given
point can be seen in the next example:
√ √
Example 3.22: For the paraboloid z = f (x, y) = 4 − x2 − y 2 at the point ( 2/2, 2/2, 3),
in which direction does the surface increase most rapidly? In which direction does it decrease
most rapidly?
Answer: As in the previous example, the unit direction vector can be represented as u =
hcos θ, sin θi for some θ ∈ [0, 2π). So the directional derivative is
√ √ √ √
f ( 2/2 + h cos θ, 2/2 + h sin θ) − f ( 2/2, 2/2)
Du f (0) = lim
h→0+ h
√ √
4 − ( 2/2 + h cos θ)2 − ( 2/2 + h sin θ)2 − 3
= lim
h→0+ h
√ √
− 2h cos θ − 2h sin θ − h2
= lim
h→0+ h
√
= − 2(cos θ + sin θ)
This time, the value of this limit definitely does depend on θ and therefore the direction. Noting
the minus sign, one sees that the maximum value of the directional derivative occurs when
θ = 5π/4 and it minimum value occurs when θ = π/4. All of this is exactly what one would
expect on a paraboloid that opens downward: from this point (or any point) on the paraboloid,
the direction of most rapid increase is the direction from the point toward the vertex and the
direction of most rapid decrease is opposite this direction.
√ √
The surface in the previous example along with its tangent plane at ( 2/2, 2/2, 3) is shown
in Figure 3.10. Notice that on this tangent plane, the direction in the x, y-plane of most rapid
increase on this tangent plane is opposite the direction in the x, y-plane of most rapid decrease
on the tangent plane, and the plane is level in the perpendicular directions. This in fact is
always true when a function f is differentiable and there is a unique nonvertical tangent plane,
as is stated in the next theorem:
where ∇f := hfx1 , fx2 , . . . fxn i is the vector of first partial derivatives (the n-th entry is the n-th
first partial derivative of f ).
85
f (x1 , x2 , . . . , xn ) has a unique, nonvertical tangent plane (or hyperplane), and thus
h i
∂f ∂f ∂f
f (ξ) − f (x) + ∂x 1
(x)(ξ1 − x 1 ) + ∂x2 (x)(ξ 2 − x 2 ) + . . . + ∂xn (x)(ξn − x n )
lim =0
ξ →x d(ξ, x)
∇f (x) · hξ1 − x1 , ξ2 − x2 , . . . ξn − xn i
= lim
h→0+ h
∇f (x) · (
hu)
= lim
h→0+ h
= ∇f (x) · u
Theorem 5 has an immediate important implication whose proof is a direct application of this
theorem:
∇f (x)
• u = |∇f (x)| points in the direction of most rapid increase for f , and the rate of increase
in this direction is |∇f (x)|.
∇f (x)
• u = − |∇f (x)| points in the direction of most rapid decrease for f . and the rate of decrease
in this direction is −|∇f (x)|.
86
• Any u where u ⊥ ∇f (x) points in a direction where the directional derivative is zero and
hence f is not changing.
Example 3.23: For the surface z = f (x, y) = sin(x) cos(y) at the point (π/4, π/4, 1/2), in
which direction does the surface decrease most rapidly? In which direction is the surface level?
Answer: According to Corollary 2, the direction of steepest decrease is that opposite the gradi-
ent:
1
−∇f (π/4, π/4) = − cos(π/4) cos(π/4), − sin(π/4) sin(π/4) = − 1/2, 1/2 = − 1, 1
2
or simply
the direction
of −1, 1 . The level directions are those perpendicular to −1, 1 , so
either 1, 1 or −1, −1 .
87
Exercises 3
2. Please find the equation satisfied by all points equidistant from (3, −1, 1) and (1, 2, 5).
3. Please describe the values of α and β for which the distance between the points x =
(α, 1, α) and y = (β, β, 1) equals 1? Answer: The circle which is the intersection of the
sphere (α − 1)2 + (β − 1)2 + γ 2 = 1 and the plane γ = α − β in α,β,γ-space.
4. Calculate the path-dependent limit for the Railroad Underpass surface (Example 3.1.2)
when y = −2x and x < 0; show that for this path, the value of the path-dependent limit
is −2/5.
5. For each of the following, please compute the limit, or explain why it does not exist. In
some cases, considering separate paths will be helpful. In other cases, polar coordinates,
factoring/cancellation or a substitution will be helpful. Also computer plotting software
may be helpful.
x2 − y 2
(a) lim x2 + 2xy (e) lim
(x,y)→(4,3) (x,y)→(0,0) x2 + y 2
x−y x+y
(b) lim (f) lim
x2 − y 2
p
(x, y) → (0, 0)
y 6= ±x
(x,y)→(0,0) x2 + y 2
27x3 − y 3
(c) lim ex+y (g) lim
(x,y)→(0,0) (x, y) → (1, 3) 9x2 − y 2
y 6= 3x
x6 − y 6 sin 4(x2 + y 2 )
(d) lim (h) lim
(x, y) → (1, 1) x2 − y 2 (x,y)→(0,0) x2 + y 2
y 6= x
6. The expressions below are the same as those in the limits in the previous exercise. In each
case, is the expression continuous at (xo , yo ), or is it possible to extend the definition of
the expression so that it is continuous at (xo , yo )? If such an extension is possible, what
value should the expression be defined as to make it continuous at (xo , yo ).
88
x2 − y 2
(a) x2 + 2xy at (xo , yo ) = (4, 3) (e) at (xo , yo ) = (0, 0)
x2 + y 2
x−y x+y
(b) at (xo , yo ) = (0, 0) (f) p at (xo , yo ) = (0, 0)
x2 − y 2 x2 + y 2
27x3 − y 3
(c) ex+y at (xo , yo ) = (0, 0) (g) at (xo , yo ) = (1, 3)
9x2 − y 2
x6 − y 6 sin 4(x2 + y 2 )
(d) at (xo , yo ) = (1, 1) (h) at (xo , yo ) = (0, 0)
x2 − y 2 x2 + y 2
7. Please prove the remaining portions of Proposition 4. For the limit of f − g, follow the
proof give in the text for the limit of f + g. For the limit of the scalar product, let
αL := lim α(x), f L := lim f (x), and notice that
x→xo x→xo
8. Suppose functions f : D ⊂ R2 → R are given by each of the expressions listed below, with
D being all of R2 where the denominator is not zero. Following Example 3.2.1 above,
decide whether or not each function can it be extended to be continuous for all of R2 ?
1
f (x, y) =
y−x
where it is not continuous (and perhaps not defined) on the x, y-plane. Is the discontinuity
removable, a jump, a pole or some combination of these?
10. For each of the following expressions, please find the first and second partial derivatives:
∂ ∂ ∂2 ∂2 ∂2
, , , and
∂x ∂y ∂x2 ∂x∂y ∂y 2
89
2 +y 2
(a) x2 + 2xy (c) xyex
x+y
(b) x2 y sin(xy 2 ) (d) p
x2 + y 2
11. Please find an equation for the tangent plane to the surface z = x sin y at (2, π/6, 1) .
13. Please explain why the coefficient of y in the equation of the tangent plane to the surface
z = f (x, y) at the point (xo , yo , f (xo , yo )) must be fy (xo , yo ). Draw a diagram similar
to that in Figure 3.11 but now illustrating fy (xo , yo ) as the slope of the tangent curve
z = f (xo , y).
15. For F (x, y, z) = 2y 3 + 3x2 y − xyz + 5yz 2 − 9, please find the equation of the plane tangent
to the surface F (x, y, z) = 0 at (−1, 1, −1).
16. For the ellipsoid x2 + 9y 2 + 4z 2 = 36, please find the equations of the two tangent planes
whose normal vector is N = h 1, 3, 2 i.
17. Suppose the definition of the directional derivative is changed from a one-sided limit to a
two-sided limit:
f (x + hu) − f (x)
D̃u f (x) := lim
h→0 h
p Show that D̃u f (0) does not exist for any direction for the
provided this limit exists.
cone z = f (x, y) = 1 − x2 + y 2 . (This is the cone discussed in the first example in the
directional derivative section.)
90
df df
(b) Please compute and .
dx dx
df
(c) Please compute for f (x, x2 , x3 ), both directly using substitution and by using
dx
the chain rule.
p 2 2
20. For x2 (1 + y + z 2 )exy z = 1, please compute ∂z/∂x implicitly.
21. An alternative to the ideal gas law is the (more elaborate) van der Waals gas law: (P +
n2 a/V 2 )(V − nb) = nRT where a and b are constants that depend on the specific gas.
Please find ∂V /∂P relative to the van der Waals gas law.
23. Prove that the basic multivariable chain rule implies the expansion in Example 3.18.
24. For the surface z = f (x, y) = x2 + y 2 , please find the directional derivative Du f (1, 2) in
the direction given by the vector h2, 1i.
√
Answer: Du f (1, 2) = 8 5/5
25. For the surface z = f (x, y) = sin(x) cos(y) at the point (π/4, π/4, 1/2), what is the max-
imum rate of increase in f , i.e., what is the maximum value of the directional derivative?
In which direction does this increase occur?
26. For the surface z = f (x, y) = xy 2 exy in which directions from the origin is the directional
derivative equal to half of its maximum value?
91
92
Chapter IV
Implications of Multivariable
Derivatives
Several important implications of multivariable derivatives are covered in this section. Before
discussing these, however, we introduce one more basic topic: level curves and surfaces.
There are many ways that various curves (or surfaces) can relate to each other. One of the
most important is that of being level curves (or level surfaces).
Definition: Given a function f : D → R for some domain D ⊂ R2 , the level curves or contours
for f are the curves of the form f (x, y) = c for some constant c in the range of f .
This definition may seem complicated, but a simple example should clarify the concept:
Example 4.1: Suppose that f (x, y) = x2 + y 2 . The the level curves for this function f are
√
circles of the form x2 + y 2 = c where in this case c is the radius of the circle provided of course
that c > 0. The one special case is c = 0 which is a single point, the origin. It can be thought
of as a degenerate circle in this case. This family of level surfaces is shown in Figure 4.1.
The concept of level curves can be extended to three-dimensional space in the obvious way:
Definition: Given a function F : D → R for some domain D ⊂ R3 , the level surfaces for F are
the surfaces of the form F (x, y, z) = c for some constant c in the range of F .
93
y
c =4
c =1
c =0 x
c =9
Figure 4.1: Level curves for f (x, y) = x2 + y 2 . Here level curves are circles; the constant values
shown are c0 = 0, c1 = 1, c2 = 4 and c3 = 9.
where α, β, γ > 0 are constants. Notice that F (x, y, z) = c2 is a family of ellipsoids with
principal axes cα, cβ and cγ.
Example 4.3: Consider the function G defined as G(x1 , x2 , x3 , x4 ) = x4 − x23 − x22 − x21 .
Then G(x1 , x2 , x3 , x4 ) = c is a family spherical-hyperparaboloids opening upward in x4 . For
each fixed value of x4 > c, each three-dimensional surface in (x1 , x2 , x3 ) is a sphere with radius
√
R = x4 − c.
94
4.2 The Gradient ∇F for the surface F (x, y, z) = 0
Previously we discussed the role and significance of the gradient ∇f for the surface z = f (x, y),
i.e., when one can explicitly solve for z in terms of the other two variables x and y. Now consider
the case when one can not (or perhaps has not) solved for any of the three variables x, y and
z. So here F : R3 → R, and we consider the level surface F (x, y, z) = 0. ∗ The big question to
consider now is “Where is the gradient ∇F relative to the surface F (x, y, z) = 0?” This is both
a very interesting result in its own right, and a very good review of many of the key concepts
that have been covered so far.
x (t0)
}
x = x1(t)
y = x2 (t) x(t)
z = x3(t)
F(x,y,z) = 0
Figure 4.2: The gradient ∇F (xo , yo , zo ) is perpendicular to any curve lying in the surface, and
thus perpendicular to the surface itself.
95
Applying the chain rule, one finds that
d
F (x1 (t), x2 (t), x3 (t)) = Fx (x1 (t), x2 (t), x3 (t))x01 (t) + Fy (x1 (t), x2 (t), x3 (t))x02 (t)+
dt
Fz (x1 (t), x2 (t), x3 (t))x03 (t)
0
= ∇F (x1 (t), x2 (t), x3 (t)) · x (t) .
But since F (x1 (t), x2 (t), x3 (t)) = 0 ∀ t (the curve is in the surface), one also has that
d
F (x1 (t), x2 (t), x3 (t)) = 0 .
dt
d
Combining these two expression for dt F (x1 (t), x2 (t), x3 (t)), one finds that at t = to
∇F (xo , yo , zo ) · x0 (to ) = 0 .
Now recall what the dot product of two vectors being zero implies: they must be perpendicular.
And since x0 (to ) is tangent to the curve defined by x(t), the gradient ∇F (xo , yo , zo ) must also
be perpendicular to the curve. Now since this calculation applies to any curve lying in the
surface, ∇F (xo , yo , zo ) must be perpendicular (or normal) to the surface.
Example 4.4: For F (x, y, z) = 2x3 + 3xy 2 − xyz + 5yz 2 − 9, please find the equation of the
plane tangent to the surface F (x, y, z) = 0 at (1, 1, 1).
Answer: First, it is important to confirm that F (1, 1, 1) = 0, but indeed this is the case.
Applying Theorem 6, one finds that N = ∇F (1, 1, 1) = h8, 10, 9i. So the equation of the
tangent plane is just
Remark: The previous theorem tells us that ∇F is perpendicular to the surface F (x, y, z) = 0.
The same sort of result is true in other dimensions: in R2 , if f (x, y) = 0 is a curve, then ∇f is
perpendicular to this curve (see Exercise 4.5); in Rn , if F (x1 , x2 , . . . , xn ) = 0 is a hypersurface,
then ∇F is perpendicular to this hypersurface.
96
4.3 Maximums and Minimums for Continuous Functions on Closed
and Bounded Domains
Towards the middle of the 19-th century, almost any mathematician would have believed that
a continuous function on a closed and bounded set would achieve its minimum and maximum
values, but none of them would have known exactly how to prove it. The key concept which
finally allowed a full, rigorous proof was compactness. The details of why compactness is
important and a proof of the result is beyond the scope of this text (for details, see for example
Rudin [5] or Rosenlicht [4]), but the following theorem is the foundation of what is discussed
below.
Remarks:
1. Notice that this theorem guarantees the existence of points in Q where F achieves max-
imum and minimum values, but it says nothing about those locations being unique. For
example, there could separate points x1 and x2 where F (x1 ) = F (x2 ) = F (xM ), even
though x1 6= x2 6= xM . The same result is of course true for the locations of the minimum
values. So in short, the maximum and minimum values are unique, but their locations
are not.
The previous theorem guarantees the existence of locations where minimum and maximum
values occur, but it does not tell us anything about how to find these values. The next result
addresses this issue, but first a little more notation is needed.
o
Q Q
Figure 4.3: Schematic drawing of a closed, bounded region, its boundary and its interior.
97
Notation: Suppose that Q is a closed, bounded region in Rn (so in R2 , the region Q could
be a rectangular area, a circular disk, or some other region). Then ∂Q denotes the boundary
of the region (the rectangle around the box, the circle around the disk, etc.), and Q◦ denotes
the interior of the region (Q = Q◦ ∪ ∂Q since Q is closed). The interior and the boundary are
shown schematically in Figure 4.3.
1. x• ∈ Q◦ and ∇F (x• ) = 0 .
3. x• ∈ ∂Q .
At first glance, it might seem that Theorem 8 might be difficult to prove; quite the opposite is
true.
Proof: From basic topology, Q = Q◦ ∪ ∂Q (i.e., a closed, bounded region is the union of its
interior and its boundary), and Q◦ and ∂Q are disjoint. So either 3. is true, or if not, x• ∈ Q◦ .
If x• ∈ Q◦ , then either F is differentiable at x• or it is not. So at x• , either 2. is true, or if
not, F is differentiable at x• .
Remark: All of the above works equally well if “region” is replaced by “set.” But complicated
sets may have complicated boundaries, so in terms of the present discussion, only regions will
be covered.
Theorems 7 and 8 together show us how to find the locations of maximum and minimum values
and the values themselves in many useful cases, as the next two examples demonstrate. These
examples are in R2 , but the approach presented here generalizes to Rn . It is true, however, that
if n is larger than n = 2 or n = 3, the procedure discussed here would likely require many steps.
Example 4.5: Suppose that f (x, y) = x3 − x2 y + y and suppose that R ⊂ R2 is the triangle
with corners (0, 0), (3, 0) and (0, 3).
98
Answer: In this case, f is a polynomial function, hence it is continuously differentiable on
the entire plane: f ∈ C 1 (R2 ). So there are no critical points of type 2. from Theorem 8. If
∇F (x• ) = 0, then ∂F/∂x(x, y) = 3x2 − 2xy = 0 and ∂F/∂y(x, y) = −x2 + 1 = 0. Notice that
we need both of these equations to be satisfied. So from the second equation, x = ±1, and
from the first equation, the two possible critical points are (−1, −3/2) and (1, 3/2). But after
consider where R is, one finds that (−1, −3/2) ∈ / R so only (1, 3/2) is a critical point in R.
Now we must turn our attention to ∂R which in this case is three line segments. Notice that
this leads to three single-variable maximum-minimum problems:
For the first of these segments f (x, 0) = x3 , so f 0 (x, 0) = 3x2 > 0 on the interior of this segment,
and there are no single-variable critical points on this segment. For the second, f (0, y) = y, so
again f 0 (0, y) = 1 > 0 on this segment, and again there are no critical points. Finally, for the
x2 (3 − x) + (3 − x) = 2x3 − 3x2 − x + 3, so f 0 (x, 3 − x)√= 6x2 − 6x − 1 = 0
third, f (x, 3 − x) = x3 −√
implies that √ x = 1/2 ± 3/6. Both of these are in the interval: x = 1/2 − 3/6 ' 0.2113 and
x = 1/2 + 3/6 ' 0.7887.
Finally there are three more points that must be considered: the corners, that are the boundary
of the boundary segments. Evaluating f at the points, one finds the results in the following
table:
(1, 3/2) 1
(0.2113, 2.7887) 2.6736
(0.7887, 2.2113) 1.3264
(0, 0) 0
(3, 0) 27
(0, 3) 3
Theorem 8 guarantees that f achieves its maximum and minimum value at two of these six
points, so these values and their location can simply be read off the list above. For the maximum,
xM = (3, 0) and the value is f (3, 0) = 27, while for the minimum, xm = (0, 0) and the value is
f (0, 0) = 0,
Remark: It may seem that the above list of possible extrema locations and function values
could get rather long, but the important point is that such a list is often finite.
99
Example 4.6: What are the maximum and minimum values of the function g(x, y) = 4x2 +
9y 2 − 4x − 6y + 2 on the circular disk of radius 1 centered at the origin?
Answer: Again, this function g is a polynomial so an interior critical point can only occur where
∇g = 0. Notice that in this case, the surface z = g(x, y) is a paraboloid opening upward, so its
minimum value will be at its vertex provided that this point is in the unit disk.
Here ∇g(x, y) = h8x − 4, 18y − 6i = 0 imply that (x, y) = (1/2, 1/3). This must be the vertex
of the paraboloid, and since it is in the interior of the disk, this is the location of the minimum
of g. By direct evaluation, one finds that g(1/2, 1/3) = 0.
The maximum must occur on the boundary of the disk, the unit circle. Because the boundary
is the unit circle, polar coordinates are most helpful; here g(cos θ, sin θ) = 4 cos2 θ + 9 sin2 θ −
4 cos θ − 6 sin θ + 2 = 5 sin2 θ − 4 cos θ − 6 sin θ + 6. On differentiating with respect to θ,
one finds that g 0 (cos θ, sin θ) = 10 sin θ cos θ + 4 sin θ − 6 cos θ. Setting this derivative equal
to zero and solving this equation numerically, one sees that the two critical points on the
boundary are at θ1 ' 0.42977 and θ2 ' 4.4628. Again by direct evaluation, one finds that
g(cos θ1 , sin θ1 ) ' 0.73182 and that g(cos θ2 , sin θ2 ) ' 17.497. Therefore the minimum value of
g on this unit disk is 0 and its maximum value is 17.497.
Finally one might wonder if the endpoints of polar coordinates, say θ = 0 and θ = 2π. Because
this parameterization is fully periodic, the answer is “No.”
100
4.4 Local Extrema
The previous section dealt with finding the absolute maximum and absolute minimum values
for continuous functions defined on closed and bounded sets. In that case, there were often
interior critical points that might have had the highest or lowest value of the function among
any of the nearby points, but whose values were superseded by other points on the boundary or
elsewhere as the absolute maximum or minimum. This section will now consider in more detail
these local maxima and minima—local extrema.
Before discussing local extrema in detail, there are several new terms and bits of notation that
need to be introduced. The first was briefly used earlier, but perhaps a formal definition now
is in order:
Definition: Suppose that xo ∈ Rn is some point and > 0. The ball of radius centred at xo
is defined as n o
n
B (xo ) := x ∈ R d(x, xo ) <
So all the points that are a distance less than from xo are in this ball.
where Cb is the vector resulting from the standard matrix-vector product. This bra-ket notation
is due to Dirac† and is widely used in quantum mechanics.
For those who are not familiar with matrix multiplication, the next example may be helpful.
101
So the product Cb is obtained by computing the dot product of each row of C with the vector
b.
Definition: An n × n matrix C is positive definite if and only if for any nonzero vector x ∈ Rn ,
hx|C|xi > 0. C is negative definite if and only if hx|C|xi < 0 when x 6= 0.
Answer: One must show that hx|C|xi > 0 for any x 6= 0. But by direct computation,
x1 2 1 x1 x1 2x1 + x2
hx|C|xi = · = · =
x2 1 2 x2 x2 x1 + 2x2
2x21 + 2x1 x2 + 2x22 > x21 + 2x1 x2 + x22 = (x1 + x2 )2 > 0
as long as x 6= 0.
The key result in determining where local extrema are located is the multivariable version of
the Taylor theorem‡ :
Definition: The Hessian matrix for f ∈ C 2 evaluated at xo is the matrix of second partial
derivatives, all evaluated at xo :
fx1 x1 (xo ) fx1 x2 (xo ) ··· fx1 xn (xo )
Hf (xo ) := fx2 x1 (xo ) fx2 x2 (xo )
··· fx2 xn (xo )
.. .. .. ..
. . . .
fxn x1 (xo ) fxn x2 (xo ) · · · fxn xn (xo )
‡
Named for Brook Taylor (1685–1731) who stated but did not prove an early version of the theorem
102
Proof: The proof of this multivariable result is just an extension of the single-variable version;
see e.g., Marsden & Tromba [2], §3.2, for a detailed proof.
Theorem 10 (First Derivative Test) Suppose that D ⊂ Rn is open and that a function
f : D → R has continuous partial derivatives to at least second order: f ∈ C 2 (D). A point
xo ∈ D is the location of a local extremum of f only if ∇f (xo ) = 0, i.e. xo is a local extremum
⇒ ∇f (xo ) = 0
Proof: This result can be proven using Theorem 8, but it also follows from the Taylor
Theorem 9. Since f ∈ C 2 (D), the Taylor expansion must hold using xo ∈ D as the base point.
If ∇f (xo ) 6= 0, then f must increase in the direction of this gradient, and must decrease in the
direction opposite this gradient.
But what if ∇f (xo ) = 0? The Taylor expansion then says that the Hessian term is the next
one to consider to decide how f behaves near xo ∈ D, at least if Hf (xo ) 6= [0]. In general, if
the Hessian is positive definite, xo is a local minimum; if the Hessian is negative definite, xo
is a local maximum; if the Hessian is neither positive definite nor negative definite, but also is
nonzero, then xo is some sort of saddle point:
Theorem 11 (Second Derivative Test) Suppose that D ⊂ Rn is open and that a function
f : D → R has continuous partial derivatives to at least second order: f ∈ C 2 (D). Suppose also
that for some xo ∈ D, ∇f (xo ) = 0. Then if Hf (xo ) is positive definite, xo is a local minimum;
if Hf (xo ) is negative definite, xo is a local maximum; if Hf (xo ) is neither positive definite nor
negative definite, but also is nonzero, then xo is a saddle point.
Proof: This result follows directly from the Taylor theorem, Theorem 9. The details are a
bit beyond the scope of our discussion, but can be found in Marsden & Tromba [2], §3.3, at
least when f ∈ C 3 (D).
The second derivative test has a simpler form at least in two dimensions, when f : D ⊂ R2 → R.
Corollary 3 (Second Derivative Test for D ⊂ R2 ) Suppose that D ⊂ R2 is open and that
a function f : D → R has continuous partial derivatives to at least second order: f ∈ C 2 (D).
Suppose also that for some xo ∈ D, ∇f (xo ) = 0. Let D := det(Hf (xo )) ≡ (fxx (xo ))(fyy )(xo ) −
(fxy (xo ))2 be the discriminant (the determinant of the Hessian).
103
• If D > 0 and fxx (xo ) < 0, then xo is a local maximum.
Example 4.9: Suppose that f (x, y) = x2 + 3y 2 − 2x − 12y + 13. Please find and characterize
the critical points of f .
Answer: Since this function is a polynomial, it is everywhere differentiable, so the only critical
points occur where ∇f (x) = 0. Here ∇f (x) = h2x − 2, 6y − 12i, and so the only critical point
is xo = (1, 2). From the second derivative test, ∀(x, y) ∈ R2 ,
2 0
det(Hf )(1, 2) =
= 12 > 0
0 6
Thus (1, 2) is a local minimum. Of course, for this polynomial function, one could also have
arrived at this result by completing the square in x and y separately and finding that the surface
z = f (x, y) is a paraboloid that opens upward.
Example 4.10: Suppose that f (x, y) = 2x3 − y 2 − 6x − 2y. Please find and characterize the
critical points of f .
Answer: Again since this function is a polynomial, it is everywhere differentiable, so the only
critical points occur where ∇f (x) = 0. Now ∇f (x) = h6x2 − 6, 2y − 2i, so critical points occur
at (−1, 1) and (1, 1). In this case,
12x 0
det(Hf )(−1, 1) = (−1, 1) = −24 < 0
0 2
12x 0
det(Hf )(1, 1) = (1, 1) = 24 > 0
0 2
The first of these calculations implies that (−1, 1) is a saddle (pringle) point. Because both
fxx (1, 1) > 0 and fyy (1, 1) > 0, the second implies that (1, 1) is a local minimum.
104
4.5 Lagrange Multipliers
There is an interesting connection between gradients and extrema; this connection was dis-
covered by Joseph-Louis Lagrange¶ and bears his name: Lagrange multipliers. The method
of Lagrange multipliers allows one to efficiently find extreme values for functions subject to
constraints (side conditions). A simple example will help illustrate this method:
Example 4.11: What is the minimum value of f (x, y) = x2 + y 2 on the parabola g(x, y) =
x2 + 3x + 3 − y = 0?
Answer: Based on what we have discussed so far, one might solve this problem by solving
g(x, y) = 0 for y, plugging this expression for y into f (x, y), and then differentiating the
resulting expression in x with respect to x and setting this derivative equal to zero. Following
this approach will work, and the result is that the minimum occurs at (−1, 1). (This must be a
minimum because f is unboundedly positive on this parabola.) But notice what also happens
at this point; consider Figure 4.4. The gradient of the function f is a scalar multiple of the
y = x2 + 3 x + 3 y
f (x, y ) = x2 + y2
Figure 4.4: An example of how Lagrange multipliers works. The red curve is the constraint
g(x, y) = 0 (y = x2 + 3x + 3); the blue curves are the level curves of f (x, y) = x2 + y 2 . The
red and blue vectors are the gradients for the g and f respectively. When the constraint curve
and a level curve are tangent (and therefore when their gradients are parallel), f achieves its
minimum value on the constraint curve.
gradient of the constraint function g. Suppose that cm is the value of f at the point where
these gradients line up. The level curves of f for c > cm cross the constraint curve transversely;
those for c < cm miss the constraint curve altogether. Only when f achieves its extreme value
are the curves tangent and the gradients parallel.
105
Theorem 12 (Lagrange) Suppose that for some domain D ⊂ R2 , the functions f, g : D → R,
and suppose that both f and g are continuously differentiable on their domain: f, g ∈ C 1 (D).
Suppose also that c is in the range of g, and consider the constraint curve C defined by g(x, y) =
c. Finally suppose that ∇g(xo , yo ) 6= 0 for some point (xo , yo ) ∈ C. If f restricted to C has an
extreme value at (xo , yo ), then
∇f (xo , yo ) = λ∇g(xo , yo )
for some λ ∈ R. So extrema occur only when these gradients are parallel.
Proof: The proof of this result is a variation on the proof of Theorem 6; see Exercise 4.x.
Remark: The real number λ in Theorem 12 is the Lagrange multiplier. Notice that the
Lagrange multiplier might be zero.
Example 4.11: What is the minimum value of f (x, y) = x2 + y 2 on the parabola g(x, y) =
x2 + 3x + 3 − y = 0?
Answer: Now let us use Lagrange multipliers to answer this question: Here ∇f (x, y) = 2hx, yi
and ∇g(x, y) = h2x + 3, −1i. So the minimum of f must occur when
2x = λ(2x + 3)
2y = λ(−1)
y = x2 + 3x + 3
(the first two equations above come from the gradients being parallel, and the third is the
constraint). Solving these three equations simultaneously, one finds that (x, y) = (−1, 1) and
that λ = −2. Notice that the exact value of the Lagrange multiplier is not really important in
answering the question.
The Lagrange multiplier method can be used in higher dimensions; the theorem for n = 3 is
similar to Theorem 12 above, except that the constraint is now a surface rather than a curve.
106
Answer: To use Lagrange multipliers (or the Lagrange multiplier method), one first needs to
write the constraint in the form g(x, y, z) = 0. In this case, the constraint can be g(x, y, z) =
2x + 3y − z = 0. So ∇f (x, y, z) = h2x, 4y, −1i and ∇g(x, y, z) = h2, 3, −1i; the Lagrange
multiplier system that must be solved is
In this case, the system is easy to solve: λ = 1, x = 1, y = 3/4 and z = 17/4. The minimum
value of f is then f (1, 3/4, 17/4) = −17/8. This value is certainly a minimum (rather than a
maximum) since f (0, 0, 0) = 0.
For this particular example, one could solve the problem without using Lagrange multipliers by
using the constraint to eliminate z and then completing the squares:
107
Exercises 4
1. For each of the following functions F , please describe the level curves or level surfaces cor-
responding to F (x, y) = c or F (x, y, z) = c, and please state which values of c correspond
to curves or surfaces.
F (x, y) = mx + b − y
(c) (g) F (x, y, z) = x2 + 2y 2 − 3z 2
where m, b are constants
(h) F (x, y, z) = ln x2 + 7y 2 + 1 − z
(d) F (x, y) = sin xy
2. What can be said about the level hypersurfaces for the function defined by F (x1 , x2 , x3 , x4 ) =
x21 + x22 − x23 + x4 ?
3. Find an equation for the plane tangent to the surface F (x, y, z) = 5xy 2 z 2 −x2 y 3 +3x3 yz =
25 at the point (x, y, z) = (1, 1, 2) .
Answer: 36(x − 1) + 43(y − 1) + 23(z − 2) = 0
4. Please find an equation for the plane tangent to the surface F (x, y, z) = xz −x2 y +yz 2 = 1
at the point (x, y, z) = (xo , yo , zo ) assuming that this point is on the surface.
6. Find the maximum and minimum values and their location for following functions f :
(a) f (x, y) = x2 + 9y 2 − 2x − 12y + 8 on the closed triangular domain whose corners are
(0, 0), (4, 0) and (0, 5).
(b) f (x, y) = x2 + 9y 2 − 2x − 12y + 8 on the closed circular disk of radius 5 centred at
the origin.
(c) f (x, y) = (1 − x + y)/(1 + x2 + y 2 ) on the closed circular disk of radius 2 centred at
the origin.
Hint: Define g(r, θ) := f (r cos θ, r sin θ) and solve the entire problem in polar coor-
dinates.
108
7. What points on the circle (x − 2)2 + (y − 3)2 = 1 and on the line y = −x − 1 are closest
to each other? (Which point on this circle is closest to which point on this line?)
Hint: To use the concepts discussed in this chapter to solve this problem, let (x, y) be
a point on the circle and (X, Y ) be a point on the line, and minimize the square of the
distance between these points. As an alternative, can you think of a geometric way to
solve this problem?
2
8. What points on the ellipse (x−2)
4 + (y − 5)2 = 1 and on the line y = −2x − 3 are closest
to each other? (Which point on this ellipse is closest to which point on this line?)
Hint: As in the previous exercise, let (x, y) be a point on the ellipse and (X, Y ) be a
point on the line, and minimize the square of the distance between these points. Again,
can you think of a geometric way to solve this problem? Notice that geometry can be
helpful in simplifying calculations.
10. Please find and classify the local extrema for the following functions:
2 −2x−y 2
(a) f (x, y) = 3x2 + y 2 − 2xy (d) f (x, y) = e−x
1−x+y
(b) f (x, y) = 5x2 + 3y 2 − 2xy + 2x − 6y (e) f (x, y) =
1 + x2 + y 2
Answer: (a) Local and global minimum at (0, 0) where f (0, 0) = 0; no maximum. (d)
No minimum; local and global maximum at (−1, 0) where f (−1, 0) = e.
12. What is the maximum value of f (x, y) = −x2 − y 2 on the parabola g(x, y) = x2 − 4x +
3 + y = 0? Where does this maximum occur?
109
14. Please find the minimum distance between the point (1, 2, 3) and the hyperboloid x2 +
y2 − z2 = 1 .
110
Chapter V
Multiple Integrals—Integration in Rn
In principle, the definition for Riemann∗ integration in Rn is essentially the same as that of
Riemann integration in R1 , so let’s begin by reviewing the classic definition of the Riemann
integral from single-variable calculus. Suppose that f : [a, b] → R, i.e., f is a real-valued
function defined on an interval [a, b] ⊂ R. Suppose further that f is bounded on [a, b], meaning
that there is a fixed real number M > 0 such that |f (x)| < M for all x ∈ [a, b]. Let P be a
partition of this interval: P := {xo , x1 , x2 , ..., xn } where xo = a, xn = b and for all i, xi−1 < xi .
So P divides the interval into n subintervals. Now define ∆xi := xi − xi−1 for all i, and
define the norm of the partition P as the length of the longest subinterval: |P | := maxi ∆xi .
Finally let SP be a set of sampling points for this partition: SP := {ξ1 , ξ2 , ..., ξn } where for
all i, ξi ∈ [xi−1 , xi ]. So each subinterval of the partition has a corresponding sampling point
in SP . The function is sampled at each of these points, and the value f (ξi ) is used as the
representative value of the function for the entire subinterval. For a given partition P and a
given sampling-point set SP , the Riemann sum R(f, P, SP ) is
n
X
R(f, P, SP ) := f (ξi )∆xi .
i=1
∗
Named in honor of Bernhard Riemann (1826–1866), a German mathematician who first carried out much
of the rigorous work on this type of integration in the middle of the 19-th century.
111
f
...
... x
x0 x1 x2 x3 x n−2 x n−1 x n
a b
Figure 5.1: A Riemann sum for a continuous function f where f > 0. The partition P := {a =
xo , x1 , x2 , ..., xn = b} is indicated by the green vertical segments, and the sampling points are
indicated by the red vertical segments. Notice that for the second subinterval, the sampling
point is at the right endpoint of that subinterval. Where on the diagram are the sampling points
ξi for each of the other subintervals?
Definition: The Riemann integral of the function f over the interval [a, b] is
ˆ b n
X
f (x) dx := lim f (ξi )∆xi
a |P |→0
i=1
provided that this limit exits no matter how the partition is chosen and no matter how the
sampling points are chosen, as long as the partition is sufficiently fine (i.e., its norm is sufficiently
small). Any function f for which this limit exists and is finite for a given interval [a, b] is said
to be Riemann integrable on [a, b].
Assuming that a certain function f is Riemann integrable on [a, b], i.e., that the above limit
converges to a finite value for all partitions and all sampling sets, then one can compute it
directly by choosing a uniform partition with ∆x = (b−a)/n being the width of each subinterval,
and by sampling each subinterval using the right endpoint of each subinterval: ξi = xi . As it
turns out, continuous functions are always Riemann integrable.
Example 5.1: Using the definition of the Riemann integrable with a uniform partition, please
compute the value of
ˆ 3
x2 dx .
1
112
equivalent to n → ∞. Hence
ˆ n
3
2 2 2
X
2
x dx = lim 1+i
1 |P |→0 n n
i=1
n
2X 4 2 4
= lim 1+i +i 2
n→∞ n n n
i=1
n n n
!
2 X 4X 4 X 2
= lim 1+ i+ 2 i
n→∞ n n n
i=1 i=1 i=1
2 4 n(n + 1) 4 n(n + 1)(2n + 1)
= lim n+ + 2
n→∞ n n 2 n 6
1 4 1 1
= lim 2+4 1+ + 1+ 2+
n→∞ n 3 n n
8 26
= 2+4+ = .
3 3
The above calculation uses two of the standard summation formulas which can be proven by
induction:
n
X n(n + 1) †
• i=
2
i=1
n
X n(n + 1)(2n + 1)
• i2 =
6
i=1
Of course, anyone who has studied calculus knows that the long computation in the previous
example is not necessary to evaluate the given integral. By the fundamental theorem of calculus,
the integral is just
ˆ 3 3 3
2 x 1 26
x dx = =9− = .
1 3 1 3 3
This indicates something that will be demonstrated many times across this chapter and the rest
of this book: The definition of the Riemann integral is important to establish many of its basic
properties and to understand what it represents, but it is seldom used to evaluate an integral.
One almost always uses powerful theorems to evaluate integrals.
†
This formula has been known for hundreds of years, but according to legend, as a child Gauss rediscovered
it, and used it to avoid the punishment of having to add up the first hundred integers.
113
5.1.2 Multivariable Riemann Integration
The generalization of the definition of the Riemann integral from R1 to Rn , at least when the
domain of integration is rectangular, follows immediately from the single-variable definition.
For simplicity, assume that n = 2. Let R := [a, b] × [c, d] be a rectangle in R2 . Suppose that
f : R → R, i.e., f is a real-valued function defined on the rectangle R. Suppose further that f is
bounded on R. Let P1 be a partition of [a, b], i.e., P1 := {xo , x1 , x2 , ..., xn } where xo = a, xn = b
and for all i, xi−1 < xi , and let P2 be a partition of [c, d], i.e., P2 := {yo , y1 , y2 , ..., ym } where
yo = c, ym = d and for all j, yj−1 < yj . So P1 × P2 divides the rectangle into nm subrectangles.
Again define ∆xi := xi − xi−1 for all i, and in addition, define ∆yj := yj − yj−1 for all j. Define
∆Aij := ∆xi ∆yj . Let P := P1 × P2 and define the norm of P as the largest subrectangle
length or width: |P | := maxij {∆xi , ∆yj }. Finally let SP be a set of sampling points for this
partition: SP := {ξ11 , ξ21 , ξ12 , ..., ξnm } where for all (i, j), ξij ∈ [xi−1 , xi ] × [yj−1 , yj ]. So again
each subrectangle has a corresponding sampling point.
provided this limit exits no matter how the partitions are chosen and no matter how the sampling
points are chosen, as long as the partition is sufficiently fine (i.e., its norm is sufficiently small).
Any function f for which this limit exists and is finite for a given rectangle R is said to be
Riemann integrable on R.
Remarks:
1. For integration over rectangular prisms (boxes) in Rn for n ≥ 3, the definition is extended
in the obvious way: the prism is partitioned into subprisms with sufficiently small edge
lengths, the the integrand f is sampled at points inside each subprism. If the sum of the
products of these sampled function values times the volumes of each subprism approaches a
fixed, finite value as the edge length decreases no matter how the partitioning or sampling
is done, than this f is Riemann integrable over the given rectangular prism, and this fixed,
finite value is the value of the integral.
2. As was mentioned above, this definition is not very helpful in evaluating integrals. Its main
use is in proving basic results about integrals. In particular, one can use this definition
to show that if f is continuous on a given rectangle, then f is integrable. Most of the
functions we consider in this chapter are continuous.
3. Strictly speaking, the definition given above only allows integration over rectangles, and
by extension, rectangular prisms; frequently, we will need to integrate over more general
domains in R2 and R3 . Although the details are somewhat technical and beyond the scope
of this text, our definition can be extended to domains that can be well-approximated by
114
rectangles or rectangular prisms. Riemann integrals over domains in R2 are called double
integrals; those over domains in R3 are called triple integrals.
where for the inner integral, the outer variable x is treated as a constant. A specific example is
useful to clarify this definition.
ˆ 1 ˆ 5
2
= x y + 1 dy dx
0 3
ˆ 1
10
= 10 x2 dx = .
0 3
Notice that iterated integrals never involve higher-dimensional Riemann sums. They are simply
several single-variable integrals evaluate one after the other.
5.1.4 The Fubini Theorem and the Relationship between Riemann and It-
erated Integrals
The examples above suggest the following question: “When do iterated integrals agree with the
corresponding Riemann integrals?” Notice that when these integrals agree, we can evaluate
the Riemann integral by evaluating a corresponding iterated integral as we did in Example 5.2.
115
The most general answer to this question is in fact complicated and hinges on more advanced
concepts from measure theory. But when the integrand f is continuous on the domain of
integration R, a famous result known as the Fubini theorem gives the needed answer:
Answer: By using the Fubini theorem, this double integral can be computed as two single
integrals in either order. When integrating with respect to one variable, all other variables are
treated as constants. Thus
ˆˆ ˆ 3 ˆ 5 ˆ 3 2 2
y=5
xy 3
2 2 2 2 x y
x y + xy dA = x y + xy dy dx = + dx =
1 2 1 2 3 y=2
R
ˆ 3
7 3 117 2 x=3
21 2 117
x + x dx = x + x = 247 .
1 2 3 2 6 x=1
In the next example, it is important to realize that the x-integral should be computed first,
even though Fubini’s theorem guarantees that the value is the same in either order:
Answer: Because the integrand for this integral is continuous on the domain of integration R,
†
Named in honor of Italian mathematician Guido Fubini (1879–1943). The result that Fubini actually proved
involves measure theory and Lebesgue integration. This result was known before Fubini’s work, but nonetheless,
has come to be known by his name.
116
the Fubini theorem can be applied to evaluate this integral:
ˆˆ ˆ 1 ˆ 3π
!
2
y sin xy dA = y sin (xy) dx dy
π
−2 2
R
ˆ 1 ˆ 3π
y
!
2
= sin u du dy
π
−2 2
y
ˆ 1 3π
y
= (− cos u| π2y dy
−2 2
ˆ 1
3π π
= − cos y − cos y dy
−2 2 2
3π 1
2 2 π 1
= − sin y + sin y
3π 2 −2 π 2 −2
8
= .
3π
Evaluation of the inner integral above uses the substitution u = xy.
Remark: Notice that the Fubini theorem says that the result will be the same regardless of
whether the x-integral or the y-integral is computed first (is on the inside). In this example,
however, because of the nature of the integrand, it is much easier to compute the x-integral
first. If one computes the y-integral first, one quickly faces the integrand u sin u which requires
integration by parts to compute by hand. Indeed it is possible that picking the wrong order
can lead to an integral that can not be computed by hand, while picking the right order yields
an integral that is relatively easy to compute.
Which functions are Riemann integrable? Perhaps surprisingly, this question is difficult to an-
swer exactly. The limit in the definition of the Riemann integral exists for continuous functions,
piecewise continuous functions and many other important functions, but by no means for all
functions. Even proving that continuous functions are Riemann integrable requires relatively
advanced concepts; a rigorous proof did not exist until the concept of compactness was under-
stood in the second half of the 19-th century. To study the details of why continuous functions
are Riemann integrable, one can see, for example, Rosenlicht [4], Rudin [5] or Kosmala [1].
While continuity everywhere on the integration domain is not necessary for a function to be
Riemann integrable, a complete lack of continuity is a serious problem. There is, for example,
one easily-defined function where the limit in the definition of the Riemann integral does not
exist:
117
Example 5.5: Consider the characteristic function for the rationals:
1 x∈Q
XQ (x) := .
0 x 6∈ Q
In words, XQ (x) is 1 when x is rational, and 0 when x is irrational. What happens when the
limit in the definition of the Riemann integral is applied to this characteristic function over
the interval [0, 1]? Consider any partition P of [0, 1]. Notice that the value of the sum over P
depends not on the norm of the partition, but rather whether the sampling is done at rational
or irrational numbers (since there are both rational and irrational numbers in each subinterval
of any partition). That is, if one samples only at irrational numbers, then ξi 6∈ Q for all i,
1 ≤ i ≤ n, and XQ (ξi ) = 0, implying that the sum is zero. If on the other hand one samples
only at rational numbers, then ξi ∈ Q for all i, 1 ≤ i ≤ n, and XQ (ξi ) = 1. In the first
case, every Riemann sum must be zero no matter what partition is used, and in the second
case, every Riemann sum must be one. Thus the limit can not exist, and this function is not
Riemann integrable.
In many ways the problem with the integration in the previous example is with the definition
of the Riemann integral, not with the function XQ . Years after the work of Riemann, it was
understood that a more sophisticated definition for the integral was possible, and under this
latter definition, the value of the integral of XQ over any interval is zero. Discussion of this
newer integral, the Lebesgue† integral, is beyond the scope of this book. Nonetheless, the
defining of the Lebesgue integral was a major step forward in our understanding of this area
of mathematics. Indeed Lebesgue measure also gives the amount of continuity that a function
must possess to be Riemann integrable: a function is Riemann integrable if and only if its set
of discontinuities has Lebesgue measure zero.
†
Henri Lebesgue (1875–1941) was a French mathematician famous for extending the concept of integration
in his 1902 PhD thesis.
118
5.2 Double Integrals: Integration over Domains in R2
In the previous section, iterated integrals where used to evaluate Riemann integrals over rect-
angles in R2 . Now iterated integrals will be used to evaluate Riemann integrals over more
general domains in R2 . As the Fubini theorem was first stated, it would apply only to contin-
uous functions and only to rectangular domains. Neither of these two assumptions, however,
are essential, and we will now use the Fubini theorem for more complicated domains and for
piecewise continuous integrands.
where the integration domain D is bounded by two piecewise smooth curves given by the
y
y = d (x )
y = c (x )
a b x
Figure 5.2: Domain of integration D between two curves, y = c(x) and y = d(x), with c(x) ≤
y ≤ d(x) and a ≤ x ≤ b. For this domain, the y-integral should be inside, and the x-integral
should be outside. The gold rays indicate how the y-integration depends on x: for each x-value
corresponding to D, the ray begins at y = c(x) and ends at y = d(x). The single red ray
indicates the final x-integration.
expressions y = c(x) and y = d(x) over some interval [a, b] where c(x) ≤ d(x) and c and d
are continuous and piecewise differentiable functions (see Figure 5.2). So in this case, D =
{(x, y) | a ≤ x ≤ b, c(x) ≤ y ≤ d(x)}, and our double integral can be evaluated as an iterated
integral:
119
Remarks:
And if our domain D can be describe in both of these ways, then one has a choice of how
to set up the iterated integrals.
2. At times, it will be useful to write an iterated integral without giving the specific integra-
tion limits for each variable, but rather just stating the domain as is done for a double
integral. In this case, the differentials at the end of the integral will indicate the type of
integral; dA still indicates a double integral, while dy dx or dx dy indicates an iterated
integral. Thus the Fubini theorem can be written as
ˆˆ ˆˆ ˆˆ
f (x, y) dA = f (x, y) dy dx = f (x, y) dx dy
D D D
provided that D is bounded and its boundary is rectifiable so that we can make sense of
the iterated integrals.
Example 5.6: For the triangular integration domain bounded by the lines y = x, y = 0 and
x = 1 (see Figure 5.3), please compute
y
y=x
T
y=0
x
x=1
Figure 5.3: Triangular domain of integration T set off by solid blue segments, bounded by
y = x, x = 1 and y = 0. The gold rays indicate how the y-integration depends on x: for each
x-value corresponding to T , the ray begins at y = 0 and ends at y = x. The single red ray
indicates the final x-integration from x = 0 to x = 1.
ˆˆ
x2 sin πy dA .
T
120
Answer: The crucial question here is “What are the bounds of integration for the iterated
integrals that correspond to this bound?” The first guess might be to have both x- and y-
integrals go from 0 to 1, but in fact we already know that these choices correspond to the unit
square: S := [0, 1] × [0, 1]. Notice that for the upper boundary, neither x or y is constant.
Let us put the y-integral inside the x-integral; notice that the lower integration limit is y = 0
independent of x. The key observation about the upper limit of integration is that it depends
on x: the upper limit is the line y = x. So the integration is
ˆˆ ˆ 1 ˆ x ˆ 1 ˆ x
2 2 2
x sin πy dA = x sin πy dy dx = x sin πy dy dx
0 0 0 0
T
ˆ 1 ˆ 1 2
x2 x x 1 2 π+6
=− (cos πy|0 dx = (1 − cos πx) dx = + 2 = .
0 π 0 π 3π π 3π 2
Notice that because the integrand was in fact the product of a function depending only on x
and a function depending only on y, the x-function can be treated as a constant for the y-
integration, and thus placed outside the y-integral. Also evaluating the final integral requires
(a) double integration by parts, (b) a set of tables of integrals, or (c) a CAS (computer algebra
system). The author recommends a computer.
Our next example shows that we may not always want to carry out the y-integration first.
where the domain of integration D is the bounded region between the x-axis, the curve y = x2
and the line y = 8 − 2x as shown in Figure 5.4,
2 y = 8 −2 x
y=x
2 4 x
Figure 5.4: Domain of integration D bounded by the x-axis, the curve y = x2 and the line
y = 8 − 2x. Again the gold rays indicate how the y-integration depends on x, but now the red
rays indicate how the x-integration depends on y
Answer: Suppose that we try the typical first approach of carrying out the y-integration first.
This approach is indicated by the gold rays in Figure 5.4. The small problem is that under
121
this approach, the upper boundary of the domain changes as one carries out the integration: it
changes from y = x2 for 0 ≤ x ≤ 2 to y = 8 − 2x for 2 ≤ x ≤ 4. This change means that to
complete the y-integration first, one must break the iterated integral into two part, based on
the two distinct upper boundary curves. Thus
ˆˆ ˆ 2 ˆ x2
! ˆ 4 ˆ 8−2x
2 2 2
4xy dA = 4xy dy dx + 4xy dy dx
0 0 2 0
D
ˆ 2
x2 ˆ 4 3 8−2x
y 3
y
= 4 x dx + 4 x dx
0 3 0
2 3 0
ˆ 2 ˆ 4
4 7 4
= x dx + 512x − 384x2 + 96x3 − 8x4 dx
3 0 3 2
1 8 2 4 8 4
= x + 256x2 − 128x3 + 24x4 − x5
6 0 3 5 2
128 4 8
= + 256(12) − 128(56) + 24(240) − (992)
3 3 5
128 4 7936 2176
= + 1664 − =
3 3 5 15
The above computation is simple but tedious, in part because there are two separate integra-
tions. One might ask whether there might be an easier way? The answer is yes, though there
is still some tedious work: carry out the x-integration first. This is indicated by the red rays in
Figure 5.4.
ˆˆ ˆ 4 ˆ 4−y/2
! ˆ 4
4−y/2
x2
2 2 2
4xy dA = √
4xy dx dy = 4 y dy
0 y 0 2 √y
D
ˆ 4
4
y 5
2 2
32 3 4
= 2 y (4 − y/2) − y dy = y − 2y +
0 3 10 0
This second approach is surely less tedious; it is helped in part because of how the limits
of integration interact with the integrand as the integration proceeds. If the integrand were
different, this second approach could be more tedious.
In general, it is not always possible to predict which integration order is best when both are
possible. One simply must try both orders to see which is easier. Of course, as was the case
above, these to calculations can be used as a check on each other.
122
5.2.2 Polar Integration
When the domain of integration D has polar rather than rectangular symmetry, the use of polar
coordinates may be called for.
Example 5.8: Consider the semicircular disk D shown in Figure 5.5. Please compute
ˆˆ
x2 y dA .
D
r =1
Figure 5.5: Semicircular integration domain D for 0 ≤ r ≤ 1 and 0 ≤ θ ≤ π. The gold rays
indicate integration in the radial direction; the red arc indicates the outer θ integration.
Answer: In this case, the limits of integration are constant in polar coordinates. For each θ,
radial integration must go from r = 0 to r = 1. Then the angle θ must be integrated from
θ = 0 to θ = π to cover the entire semicircular disk. So in this regard, this integration is very
much like integration over a rectangle in rectangular coordinates. Next the integrand x2 y much
written in polar coordinates: x2 y = r3 cos2 θ sin θ.
It may seem that expressing the integration limits and the integrand in polar coordinates are the
only two issues that need to be resolved to carry out polar integration, but there is one more: dA
must be expressed in polar coordinates. In rectangular coordinates, dA in the double integral
becomes dx dy in the iterated integral, but in polar coordinates it is a bit more complicated:
dA now becomes r dr dθ. Why dA has this form is discussed after the example; the good news
is that r dr dθ always replaces dA whenever a double integral is written in polar coordinates,
no matter what the integration limits and integrand are.
Thus
ˆˆ ˆ π ˆ 1 ˆ π
1
r5
2 3 2 2
x y dA = r cos θ sin θ rdr dθ = cos θ sin θ dθ
0 0 0 5 0
D
ˆ π ˆ −1 ˆ 1
1 2 1 2 1 2
= cos θ (sin θ dθ) = u (−du) = u2 du = .
5 0 5 1 5 −1 15
123
The above integration uses the substitution u := cos θ.
Notice that this integral can also be evaluated in rectangular coordinates, though the limits of
integration are no longer constant:
ˆˆ ˆ 1 ˆ √
1−x2
! ˆ 1 ˆ √
1−x2
!
x2 y dA = x2 y dy dx = x2 y dy dx
−1 0 −1 0
D
ˆ 1
√1−x2 ˆ
y 2 1 1 2
x2 x 1 − x2 dx
= dx =
−1 2 0 2 −1
ˆ 1
1
x3 x5
1 2 14 1 1 2
= x − x dx = − = − = .
2 −1 2 3 5 −1 3 5 15
Notice
√that the upper limit of integration is now the upper semicircle in rectangular coordinates:
y = 1 − x2 . Evaluating an integral in rectangular coordinates when the domain is easily
described in polar coordinates may be much more difficult than it was in this example—indeed
it might be impossible.
∆A
r ∆θ
∆r
r
∆θ
θ
x
Figure 5.6: The area ∆A (in blue) of the portion of the R2 -plane with radial length ∆r subtend-
ing an angle ∆θ at a distance r from the origin is ∆A = (∆r)(r ∆θ) = r ∆r∆θ . Notice that
the plane can be tiled with patches of this form. This tiling suggests that in polar coordinates,
dA should become r dr dθ.
124
But while Figure 5.6 is suggestive, it is not rigorous. What is needed is a theorem. The following
result tells us how to change variables in R. Its proof is not presented here, but can be found
in Rudin [5, pp. 252-253]
∂x ∂x
∂(x, y) ∂ξ ∂η
:=
∂(ξ, η)
∂y ∂y
∂ξ ∂η
Remark: Careful inspection of the second integral in Theorem 16 might lead someone to
think that there is a redundant determinant in this integral. In fact, this is not a determinant,
it is an absolute value. In other words, it is indeed the absolute value of the determinant in the
change of variable formula for multiple integrals.
The next example is a bit more difficult, but it still can be solved directly in polar coordinates:
Example 5.9: Let S be the portion of the circular disk x2 + (y − 1)2 ≤ 1 lying in the first
quadrant as shown in Figure 5.7. Please compute
ˆˆ
xy dA .
S
Answer: At first glance, it may seem that this domain would be difficult to describe in polar
coordinates, at least without setting up a nonstandard polar system with its origin at the centre
125
y
r = 2 sin θ
Figure 5.7: Semicircular disk S centred at (0, 1) with radius 1 lying in the first quadrant. The
gold rays indicate integration in the radial direction; the red arc indicates the outer θ integration.
of this disk. Standard polar coordinates, however, can be used to evaluate the integral in this
example.
The key issue is finding the equation of the boundary circle for this disk in polar coordinates.
Notice that in rectangular coordinates, this circle is x2 + (y − 1)2 = 1 since the centre is at
(0, 1) and the radius is 1. Setting x = r cos θ and y = r sin θ, one finds that r2 − 2r sin θ = 0.
Since normally r > 0 in polar coordinates, it must be the case that r = 2 sin θ, and this is the
equation of this circle in polar coordinates. Using this equation, one can see that integration
over this circular disk can be described as in Figure 5.7, resulting in the following computation:
ˆˆ ˆ π/2 ˆ 2 sin θ ˆ π/2
2 sin θ
r4
2
xy dA = r cos θ sin θ rdr dθ = cos θ sin θ dθ
0 0 0 4 0
S
ˆ π/2 ˆ 1
5 2
= 4 sin θ (cos θ dθ) = 4 u5 du = .
0 0 3
The above integration uses the substitution u := sin θ, and again dA is represented in polar
integration as r dr dθ.
Until now, our discussion of double integrals has concentrated on how to evaluate them using
rectangular or polar coordinates. But equally important to evaluation is understanding how to
interpret these integrals, that is, understanding what these integrals represent. As before, we
can start by reviewing the single-variable case.
For a continuous, nonnegative function f defined on an interval [a, b], in single-variable calculus,
one learns that ˆ b
f (x) dx = A
a
126
where A is the area under the curve y = f (x) above the x-axis between a and b. But this area
could also be found as a double integral, specifically,
ˆ b ˆ f (x) ˆ b ˆ f (x) ˆ b
dy dx = 1 dy dx = f (x) dx = A .
a 0 a 0 a
where the integrand in the first double integral is implicitly 1, implying that the y-integral
can be evaluated immediately to reach the generic integral from single-variable calculus. This
leads to the first major interpretation for the double integral: If D ⊂ R2 is a bounded with a
rectifiable boundary ∂D, then
ˆˆ ˆˆ
dA = 1 dA = Area(D) .
D D
A second generalization of the integral from single-variable calculus involves keeping f in the
integrand, but changing the domain from [a, b] to some bounded D ⊂ R2 . Suppose again that
the function f ≥ 0 is continuous. This situation is depicted in Figure 5.8; from this figure, one
can see that ˆˆ
f (x, y) dA = V
D
where V is the volume under the surface z = f (x, y) above the domain D ⊂ R2 .
z
z = f ( x, y)
Figure 5.8: The volume V under the surface z = f (x, y) above the domain D ⊂ R2 .
Double integrals can also be used to define the average value of functions of two variables, as
well as the centre or centroid of a bounded, two-dimensional domain, D:
127
and the centre of D is (x̄, ȳ) where
ˆˆ ˆˆ
1 1
x̄ := x dA and ȳ := y dA
|D| |D|
D D
provided that all of the integrals above exist and are finite.
Example 5.10: Please find the centre of the triangle T bounded by the x-axis, the line y = x
and the line y = 6 − 2x. Hint: Draw a diagram for the integration domain.
Answer: Because here the domain is a triangle, there are other ways to find its centre, but using
the definitions above, one finds that
ˆ ˆ !
2 3−y/2
|T | = dx dy = 3
0 y
so
ˆ 2 ˆ 3−y/2
! ˆ 2 ˆ 3−y/2
!
1 1
x̄ := x dx dy = 5/3 and ȳ := y dx dy = 2/3
3 0 y 3 0 y
One can compare these values with those found using the traditional definition for the centre
of a triangle to see that they are the same.
128
5.3 Triple Integrals: Integration over Domains in R3
We now turn our attention to integration in three-dimensional space; as in the previous section,
two of the most important issues in evaluating triple integrals are which coordinate system to
use and in which order to integrate the variables.
where the integrand f is continuous and the integration domain Ω is bounded, connected and
can be described as
where z = a3 (x, y) and z = b3 (x, y) are smooth surfaces defined on some domain D = {(x, y) ∈
R2 |a1 ≤ x ≤ b1 , a2 (x) ≤ y ≤ b2 (x)} and in turn, y = a3 (x) and y = b3 (x) are smooth curves
defined on some interval [a1 , b1 ]. Implicitly we assume that a3 lies below b3 (i.e., a3 (x, y) ≤
b3 (x, y) for all (x, y) ∈ D) and that a2 lies below or to the left of b2 (i.e., a2 (x) ≤ b2 (x) for all
x ∈ [a1 , b1 ]). This integration domain Ω is shown in Figure 5.9.
z
z = b3 (x, y)
z = a3 (x, y)
y
a1
b1 D
y = a2 (x) y = b2 (x)
x
Figure 5.9: The integration domain Ω ⊂ R3 where Ω lies above the surface z = a3 (x, y) and
below the surface z = b3 (x, y). The projection of Ω onto the x, y-plane is D which is between
the curves y = a2 (x) and y = b2 (x). Finally, the projection of D onto the x-axis is [a1 , b1 ] .
As was the case for the double integral, the main tool for computing triple integrals is the Fubini
theorem:
129
Theorem 17 (Fubini) If f is continuous on a domain Ω described above, then
ˆˆˆ ˆ ˆ ˆ b1
! !
b2 (x) b3 (x,y)
f (x, y, z) dV = f (x, y, z) dz dy dx .
a1 a2 (x) a3 (x,y)
Ω
( 1, 0 , 1 )
D y
Figure 5.10: The cube C := [−1, 1]×[0, 2]×[1, 3] in R3 lying above the square D := [−1, 1]×[0, 2].
Notice that this cube is the same whether the integration variables are x, y and z, or x1 , x2
and x3 .
Answer: This requires, of course, just a direct application of the Fubini theorem:
ˆˆˆ ˆ 1 ˆ 2 ˆ 3
2 2 2 2 2 2
1 + x + y + z dV = 1 + x + y + z dz dy dx =
−1 0 1
C
ˆ 1 ˆ 2 3 z = 3
! ˆ 1 ˆ 2
2 2 z 16 2 2
z+x z+y z+ dy dx = 2 + x + y dy dx =
−1 0 3 1 −1 0 3
ˆ 1
y = 2 ˆ 1 ˆ 1 1
y 3 x3
16 2 20 2 20 2 20 168
2 y+x y+ dx = 4 + x dx = 8 + x dx = 8 x+ =
−1 3 3 0 −1 3 0 3 3 3 0 3
The simplest interpretation of a triple integral is as the volume of the integration domain Ω; in
this case, the integrand is f ≡ 1 as in the next example:
130
Example 5.12: Please find the volume of a prism P in R3 which lies in the first octant and is
bounded by the plane passing through the points (a, 0, 0), (0, b, 0) and (0, 0, c) where a, b, c > 0
(see Figure 5.11).
y
z
( 0, 0, c )
x y z x y
+ + =1 + = 1
a b c a b
( 0, b , 0 )
y x
y =0
z =0
( a , 0, 0 )
x
Figure 5.11: (Left) Prism P in the first octant bounded by the plane (x/a) + (y/b) + (z/c) = 1.
The blue rays indicate the z-integration from z = 0 to the bounding plane (upper surface).
(Right) The triangular domain in the x, y-plane that the prism lies above. Here the gold rays
indicate y-integration from y = 0 to y = b(1 − x/a).
Answer: First recall that the general equation of a plane in three-dimensional space is Ax +
By + Cz = D where A, B, C and D are constants, and that three noncolinear points uniquely
determine a plane. So the only plane that passes through these three points is
x y z
+ + = 1.
a b c
Solving for z, one finds that the upper surface is z = c(1 − x/a − y/b); the lower surface for
this prism is of course z = 0. Next we must find the domain in the x, y-plane that this prism
sets above; in this case, this domain is the triangular base of the prism (see Figure 5.11). The
diagonal line that bounds this triangular base in the first quadrant is defined by the intersection
of the upper and lower surfaces: y = b(1 − x/a). Setting up the double integral over this
triangular base is then the same as it was in the previous section; assuming that integration in
the y-direction is carried out first, one must integrate from the lower curve y = 0 to the upper
curve y = b(1 − x/a), then finally integrate across all x-values associated with the domain, from
x = 0 to x = a. So the integration is
ˆˆˆ ˆ a ˆ b(1−x/a) ˆ c(1−x/a−y/b) ! ! ˆ a ˆ b(1−x/a) !
x y
1 dV = dz dy dx = c 1 − − dy dx
0 0 0 0 0 a b
P
ˆ a
y = b(1−x/a) ˆ a
xy y 2 x x x 1 x 2 abc
=c y− − dx = bc 1− − 1− − 1− dx =
0 a 2b 0 0 a a a 2 a 6
131
Domains can become even more complicated than this first prism—indeed they can become so
complicated that setting up an explicit integral for them may be difficult. The next example
deals with a somewhat more complicated prism; the next subsection deals with domains which
would be complicated to describe in rectangular coordinates, but which are much easier to
describe in either cylindrical or spherical coordinates.
Example 5.13: Please find the volume V of the prism P in the first octant bounded above
by the plane z = 5 − 2x − y and below by the plane z = 3x + 4y (see Figure 5.12).
y
z
z = 5 − 2x − y
( 0, 0, 5 )
( 0, 1 )
( 0, 1, 4 ) x+y =1
( 1, 0, 3 ) z = 3 x + 4y
y x
y =0
( 1, 0 )
Figure 5.12: (Left) Prism in the first octant bounded above by the plane z = 5 − 2x − y
and below by the plane z = 3x + 4y. The blue rays indicate the z-integration from the lower
bounding plane to the upper bounding plane. (Right) The triangular domain in the x, y-plane
that the prism lies above. Here the gold rays indicate y-integration from y = 0 to y = 1 − x.
Answer: Perhaps the most important thing in setting up and evaluating integrals over more com-
plicated domains is an accurate drawing of the domain(s) involved. Sometimes such drawings
are given, but if not, then they must be sketched. Here the domains are shown in Figure 5.12.
In this case, from both the diagram on the left in Figure 5.12 and the problem statement, it
seems clear that the inner integration can go vertically from the lower surface z = 3x + 4y to
the upper surface z = 5 − 2x − y. The more complicated issue, perhaps, is what should be
the integration limits for the other two dimensions—the x and y limits. The question here is
again “Over which portion of the x,y-plane does this prism lie above?” Notice that in this case,
the outer edge of this prism is defined as the intersection of the upper and lower planes. This
intersection is found, of course, by setting equal the z values for both planes. The result in
terms of x and y is that x + y = 1. This is shown on the right side of Figure 5.12: it is the
triangular region bounded by the x and y axes and the line y = 1 − x. Here the integration is
ˆˆˆ ˆ 1 ˆ 1−x ˆ 5−2x−y ˆ 1 ˆ 1−x
V = 1 dV = dz dy dx = 5 − 5x − 5y dy dx
0 0 3x+4y 0 0
P
ˆ 1
y=1−x ˆ
y 2 5 1 5
=5 y − xy − dx = (1 − x)2 dx =
0 2 0 2 0 6
132
What should one do if a drawing of the integration domain is not given? Often the answer
is that it must be drawn, perhaps by hand, perhaps using a computer. But sometimes it is
possible to find the integration limits without an actual drawing of Ω.
Example 5.14: Please find the volume of the object Ω bounded by the surfaces z = x2 + y 2 ,
y = x2 , y = 9, z = sin πx and x = 1.
Answer: At first glance, this may seem like a mess of equations—and a mess of surfaces. But
notice that three of these equations involve only x and y, not z, so they bound a domain D in
the x, y-plane. Considering these three equations, and perhaps making a quick sketch of them
in the x, y-plane, one finds that integration limits in y are from y = x2 to y = 9, and in x from
x = 1 to x = 3. The two remaining equations both involve the third variable z; notice that
based on the domain D in the x, y-plane, the surface z = x2 +y 2 > 2 on D, while z = sin πx ≤ 1.
This tells us which is the upper surface and which is the lower. So the integration is
ˆˆˆ ˆ 3 ˆ 9 ˆ x2 +y2 ! !
V = 1 dV = dz dy dx
1 x2 sin πx
Ω
ˆ 3 4
x 45686 8
= + 4x + 27 − sin πx (9 − x2 ) dx =
2
+ ' 437.65
1 3 105 π
Notice that several integrals involving sin πx must be zero because of its symmetry on [1, 3].
When the domain of integration and the integrand have rectangular symmetry (or no symmetry
at all), using rectangular coordinates in setting up and evaluating integrals makes perfect sense.
But what happens when a problem involves cylindrical or spherical symmetry? Can one use
those coordinate systems to simplify integration? The answer of course is yes, but to take
advantage of either of these systems, one must be able to express dV in one or both of these
coordinate systems.
ˆˆ ˆ d(x,y)
! ˆˆˆ
f (x, y, z) dz r dr dθ = f (r cos θ, r sin θ, z) dz r dr dθ
c(x,y)
D Ω
133
where the integration domain Ω lies between the surfaces z = c(x, y) (below) and z = d(x, y)
(above) and projects onto D in the x, y-plane. In the above relation, each equality is based
on a version of the Fubini theorem. A heuristic drawing which suggests why dV in a triple
integral corresponds to dz r dr dθ for an iterated integral in cylindrical coordinates is shown in
Figure 5.13.
z
∆z r ∆θ
∆r
∆z
∆θ
y
r
θ
∆r
r ∆θ
x
Figure 5.13: Schematic showing ∆V in cylindrical coordinates. Here the polar area ∆A shown
in Figure 5.6 is projected in the vertical z direction to a thickness ∆z to form the cylindrical
∆V shown in blue. ∆V = (∆z)(∆r)(r∆θ) ⇒ dV → dz rdr dθ.
where Ω is the cylinder bounded by the circle x2 + y 2 = 16 and the planes z = 3 and z = 7.
Answer: It seems clear that nothing has more cylindrical symmetry than a cylinder, so the
integration domain begs for the use of cylindrical coordinates. The integrand 1 + x does not
have cylindrical symmetry, but this would seem to be a small price to pay for the benefit of
having constant integration limits with z running from 3 to 7, r from 0 to 4, and θ from 0 to
2π. Carrying out the integration, one finds that
ˆˆˆ ˆ 2π ˆ 4 ˆ 7
1 + x dV = 1 + r cos θ dz r dr dθ =
0 0 3
Ω
ˆ 2π ˆ 4 ˆ
r=42π
r2 r3
4 1 + r cos θ r dr dθ = 4 + cos θ dθ =
0 0 0 2 3 r=0
ˆ 2π
1 4
64 + cos θ dθ = 64(π + 0) = 64π .
0 2 3
Notice that the second part of the integral being zero could have been anticipated since x has
odd symmetry relative to this cylinder (it equally takes on positive and negative values on
halves of the cylinder).
134
Example 5.16: Please find the volume of the parabolic dome P bounded above by the
paraboloid z = 8 − x2 − y 2 and below by the x, y-plane (see Figure 5.14).
z
z = 8 − x2 − y 2 r=2 2
z=0
x
Figure 5.14: The parabolic dome bounded above by the paraboloid z = 8 − x2 − y 2 and below
by the x, y-plane.
Answer: Even though this dome is not a cylinder, it does have a large degree of cylindrical
symmetry, suggesting that cylindrical coordinates are appropriate. As indicated in Figure 5.14,
the inner-most z-integration goes from z = 0 up to the paraboloid z = 8 − x2 − y 2 = 8 − r2 .
Integration in the x, y-plane is then over the base of this parabolic dome; the outer edge of this
base is the circle defined by the paraboloid z = 8 − r2 intersecting the x, y-plane where z = 0:
√
z = 0 = 8 − r2 ⇒ r = 2 2.
√
Thus the radial integration goes from r = 0 to r = 2 2, and the angular integration goes once
around the plane from θ = 0 to θ = 2π. The integration is then
ˆˆˆ ˆ ˆ √ ˆ
2π 2
2 2
! !
8−r
dV = dz r dr dθ =
0 0 0
P
ˆ √
2 2 √
4 2 2
2 r
r dr = 2π 4r2 −
2π 8−r = 2π(32 − 16) = 32π .
0 4 0
Notice that since the integration limits in r and z and the integrand itself do not depend on
θ, the outer θ integral may be evaluated first. Indeed this is frequently the case for cylindrical
integration—one often can immediately find a factor of 2π in front of the integration.
135
where the Jacobian determinant in this case is
∂x ∂x ∂x
∂ρ ∂φ ∂θ
∂(x, y, z) ∂y ∂y ∂y
= = ρ2 sin φ .
∂(ρ, φ, θ) ∂ρ ∂φ ∂θ
∂z ∂z ∂z
∂ρ ∂φ ∂θ
Intuitively one may arrive at this change of variable identity as is shown in Figure 5.15.
z
r ∆θ
∆ρ
ρ
∆θ
y
θ r = ρ sin φ
ρ sin φ ∆θ
x
Figure 5.15: Schematic showing ∆V in spherical coordinates. Note that unlike in previous cases,
none of the edges of ∆V (shown in blue) are parallel here. ∆V = (ρ∆φ)(∆ρ)(ρ sin φ∆θ) ⇒
dV → ρ2 sin φ dρ dφ dθ.
The above formula makes spherical integration seem complicated; in fact, when the integra-
tion domain and integrand have spherical symmetry, the use of spherical coordinates greatly
simplifies our work as the next example makes clear.
Answer: For simplicity, let us call the sphere S, its volume, V , and assume that it is centred
at the origin (0, 0, 0). Putting the centre at the origin is a matter of convenience; the volume
would be the same no matter where it is centred. Then
ˆˆˆ ˆˆˆ
V = dV = ρ2 sin φ dρ dφ dθ =
S S
136
ˆ 2π ˆ π ˆ R ˆ
R3 π
2 4
ρ sin φ dρ dφ dθ = 2π sin φ dφ = πR3 .
0 0 0 3 0 3
This is of course the famous formula for the volume of a sphere. It can be found in a number
of other ways, including integrating in rectangular coordinates, but this would be much more
complicated. Whenever possible, it is best to take as much advantage of symmetry as possible.
The next example introduces another interpretation of a triple integral: mass. To obtain the
mass M of a object Ω, one must integrate the mass density function δ over the entire volume (or
object) Ω. In physical terms, mass density has units of mass per unit volume, so this integration
in effect multiplies mass per unit volume by volume to obtain mass:
ˆˆˆ
M= δ(x, y, z) dV .
Ω
The next example also makes clear that seeing symmetry may be difficult. Still there is often
a payoff in terms of computational simplicity to taking advantage of the symmetry whenever
possible.
Example 5.18: The Ice Cream Cone. Please find the mass M of an ice cream cone, C,
bounded below by the cone z 2 = 3(x2 + y 2 ), bounded above by the sphere x2 + y 2 + z 2 = 1, if
density is proportional to z, and the entire ice cream cone is above z = 0 (see Figure 5.16).
x 2 + y2 + z2 = 1
z
z 2 = 3( x2 + y2 )
r = 1/2
x
Figure 5.16: The ice cream cone bounded below z 2 = 3(x2 + y 2 ) and above by x2 + y 2 + z 2 = 1.
The blue rays indicate integration from the lower surface to the upper surface, while the gold
disk in the x, y-plane has radius r = 1/2 and is the domain for polar integration.
Answer 1: Since the density is proportional to z, we have that δ(x, y, z) = δo z for some constant
δo . Because the density is proportional to z, and because of the geometry of the cone itself,
cylindrical coordinates are the obvious choice for this integral. So the innermost z integral goes
from the lower surface to the upper surface, and the outer double integral in polar coordinate
137
must cover the disk in the x, y-plane that the ice cream cone stands above. The radius of this
2 2 2
√ the intersection of the sphere above and the cone below: z /3 = r = 1−z
disk is determined by
implying that z = 3/2 and r = 1/2.
ˆ 1/2 1/2
3
1 2 4
= πδo r − 4r dr = πδo r − r = πδo /16
0 2 0
The θ-integral may be evaluated immediately because neither of the other two integrals nor the
integrand depend on θ.
ρ=1
z
ϕ = π/6
Figure 5.17: The ice cream cone integration using spherical coordinates. ρ-integration is indi-
cated by blue rays from the origin to the sphere ρ = 1; ϕ-integration is indicated by red arc
rays from the z-axis to the cone ϕ = π/6; and θ-integration is indicated by a gold loop starting
and ending at the positive x-axis.
Answer 2: Although it is less obvious, this cone and integrand also have spherical symmetry.
Indeed all of the integration limits for this cone are constant in spherical coordinates. The
ρ-integral must go from the origin ρ = 0 to the sphere ρ = 1; the ϕ-integral must go from the
positive z-axis to the cone ϕ = π/6; finally the θ-integral must go from 0 to 2π (see Figure 5.17).
So in spherical coordinates the integration is
ˆˆˆ ˆ 2π ˆ π/6 ˆ 1
M= δo z dV = δo ρ cos ϕ ρ2 sin ϕ dρ dϕ dθ
0 0 0
C
ˆ 1 ˆ π/6
!
3 1 1
= 2πδo ρ dρ cos ϕ sin ϕ dϕ = 2πδo = πδo /16
0 0 4 8
In this case, the triple iterated integral can be factored and written as the product of three
separate integrals because all the limits of integration are constant, and the integrand factors
138
into the product of three expressions, the first depending only on ρ, the second, only on ϕ, and
the third, only on θ.
For three-dimensional objects, there is a distinction between the centre, centroid or geometric
centre‡ on the one hand, and the centre of mass or centre of gravity on the other.
Definition: For an object (domain) Ω ⊂ R3 , suppose that its density at any point (x, y, z) ∈ Ω
is given by δ(x, y, z) ≥ 0. Recall that the volume of Ω be
ˆˆˆ
|Ω| ≡ Volume(Ω) = dV ,
Ω
and ˆˆˆ
1
zm := zδ(x, y, z) dV
M
Ω
provided that the appropriate integrals above exist and are finite and there is no division by
zero.
Remarks:
1. Notice that if density is constant, say δ(x, y, z) ≡ δo > 0∀ x, y, z ∈ Ω, then the centre of
mass and the centroid are the same point.
139
Example 5.19: Please find the centre of mass of the ice cream cone, C, in Example 5.3.2: again
C is bounded below by the cone z 2 = 3(x2 + y 2 ), bounded above by the sphere x2 + y 2 + z 2 = 1,
density is proportional to z, and the entire ice cream cone is above z = 0.
Answer: Notice that because of the symmetry of the cone and its density, (xm , ym ) = (0, 0) and
only zm needs to be computed. From Example 5.3.2, the mass of this ice cream cone is πδo /16,
so ˆˆˆ
16
zm = z 2 δo dV .
πδo
C
As before, one has the choice of using either cylindrical or spherical coordinates. In spherical
coordinates, this integral becomes
ˆ 1 ˆ π/6 ! √ ! √
16 4 2 1 8 − 3 3 32 − 12 3
zm = (2π) ρ dρ cos ϕ sin ϕ, dϕ = 32 = .
π 0 0 5 24 15
The integrals in the expressions for the centroid and the centre of mass have names in their
own right:
Notice that z is the signed distance between the point (x, y, z) and the xy-plane, and this
explains its presence in the definition of Mxy . The analogous statement is true for Mxz and
Myz . So x̄ = Myz /|Ω|, ȳ = Mxz /|Ω| and z̄ = Mxy /|Ω|.
Remarks:
1. These definitions for moments are given based on the centroid rather than the centre
of mass; some authors would include the density in each definition, and thus based their
definitions on the centre of mass. Again, one must simply know the choices a given author
has made.
140
2. The name “first moment” suggests that there is a second moment, and indeed this is the
case. The second moment is the moment of inertia and defined relative to an axis, rather
than a plane. The moment of inertia for an object Ω is important in discussing angular
momentum, and about the z-axis is defined as
ˆˆˆ
Iz := (x2 + y 2 ) δ(x, y, z) dV .
Ω
Notice that for the moment of inertia one expects to see the density included. Also notice
that x2 + y 2 is the square of the distance from the point (x, y, z) to the z-axis.
141
Exercises 5
1. Following Example 5.1, please use a uniform partition with left endpoint of each subin-
terval as the sampling point to compute
ˆ 4
x3 dx
0
as the limit of Riemann sums. Use standard integration to confirm your answer.
Hint: You should confirm that here ∆x = 4/n and ξi = 4i/n. You may need to look up
the formula for the sum of the first n integers cubed.
3. Please evaluate the following integral and sketch the integration domain.
ˆ ˆ 2
2
!
5x +2
x dy dx
0 3x−1
for each of the following integration domains. You may choose the most convenient inte-
gration order and variables.
Hint: A sketch may help.
(a) D is the triangle bounded by the y-axis, the line y = 3, and the line y = 3x.
(b) D is the disk sector in the first quadrant bounded by the x-axis, the line y = x, and
the circle x2 + y 2 = 5.
(c) D is the domain bounded by the line y = 2x, and the parabola x = y 2 .
142
(d) D is the triangle bounded by the x-axis, the line y = 2x, and the line y = 4x − 2.
(e) D is the semicircular disk in the first quadrant bounded by the y-axis, and the circle
x2 + y 2 = 4y.
5. For the following integral, sketch the integration domain, and reverse the order of inte-
gration (i.e., put the x-integration on the inside and the y-integration on the outside).
ˆ ˆ √
1 x
!
f (x, y) dy dx
0 x
6. What is the area of the portion of the first quadrant bounded by the lemniscate r = sin 2θ.
Hint: A sketch may help.
Answer: π/8
(a) D is the triangle bounded by the y-axis, the line y = 3, and the line y = 3x.
(b) D is the semicircular disk bounded above by the unit circle and below by the x-axis.
(Notice that by symmetry, x̄ = 0, so only ȳ must be computed.)
(c) D is bounded by parabola y = x2 − 2x and the line y = x.
p
8. For the function f (x, y) = x2 + y 2 ,
10. What is the volume of the spherical cap above the plane z = 1 and below the sphere
x2 + y 2 + z 2 = 4.
11. Please compute the volume of the solid paraboloid bounded below by z = x2 + y 2 and
above by the plane z = 5.
Answer: 25π/2
143
12. Compute ˆˆˆ
x + y 2 dV .
Ω
13. What is the volume of the prism in the first octant below the plane z = 6 − x − 2y and
above the plane z = 2x + y.
Answer: 4
14. Please compute the Jacobian determinant for the integration change of variables from
rectangular to spherical coordinates discussed in Section 3.2.b above.
Hint: Expand by the first row, and take out front the common ρ2 sin φ factor.
15. Using spherical coordinates, please confirm that the volume of a sphere of radius R is
4 3
3 πR .
16. Find the volume of the discus bounded above by x2 +y 2 +z 2 = 1 and x2 +y 2 +(z −3)2 = 8.
17. Please find the mass M of the bounded paraboloid where x2 + y 2 ≤ z ≤ 1 if the density
is proportional to 1 − z.
Answer: πδo /3 where δo is the proportionality constant for density.
18. (a) Find the centre of mass of the ice cream cone in Example 5.18.
(b) Find the centroid of the ice cream cone in Example 5.18.
19. Please explain why, if density is a positive constant, then the centroid and the centre of
mass are the same point.
20. Compute the moment of inertia about the z-axis, Iz , for a solid sphere centred at the
origin of radius R assuming that the density is constant.
Answer: 8πR5 /15
144
Chapter VI
The first thing that everyone should know about line integrals is that they only occasionally
involve integrating over lines. Rather these integrals are generally over curves in R2 or R3 , but
somehow the term “curve integral” has never taken root. They are sometime called contour
integrals, but this term is more common for integration in the complex plane rather than Rn .
provided that the integrals on the right of := exist and are finite.
Remarks:
1. In what follows, n = 2 or n = 3, though the definition above is the same for larger integer
values of n ∈ Z+ .
2. The line integral above represents or measures the tendency of the vector field F to flow
along the curve C from the beginning point α to the ending point ω. Later we will discuss
a separate type of line integral that represents or measures the tendency of the vector
field F to flow across the curve C.
145
y F(x(t2 ))
. ω
x (t 2 )
.
x ( t1)
F(x (t1 ))
Figure 6.1: The Line Integral of the vector field F from α to ω along a curve parameterized by
the vector function x. The derivative v̇ is tangent to the curve, pointing in the direction from
α to ω. At t2 the component of F in the direction of v̇ is positive, while at t1 , this component
is negative.
3. In this definition, “smooth” can be replaced by “piecewise smooth” and then the definition
can be applied separately to each piece. See Proposition 7 and Exercise 6.2 below.
There are a couple of basic results about line integrals that will be used frequently in this
chapter. From the definition above, one might wonder if the value of a line integral can depend
on how the curve C is parameterized. As one would hope, this is not the case, as the following
proposition states:
provided that the integrals on the right both exist and are finite. In other words, the value
of a line integral depends on the vector field F and the curve C, but not how that curve is
parameterized.
The next proposition allows us to break curves into several pieces, or to reverse the direction of
integration:
146
any smooth curve in Rn , then −C is the same curve, but traversed in the opposite direction.
Then ˆ ˆ ˆ
F (x) · dx = F (x) · dx = F (x) · dx =
C1 +C2 C1 C2
and ˆ ˆ
F (x) · dx = − F (x) · dx
−C C
The proof of the first proposition above is just a change of variables; the proof of the second is
left as an exercise (see Exercise 6.2 below).
We begin by directly evaluating several line integrals using the above definition. Later we will
see that in some cases, the evaluation of line integrals may be simplified.
Example 6.1: Given a vector field F (x, y, z) = hz, x, yi and a smooth curve C parameterized
by
x = t2
y = t
z = t3
It might seem that Proposition 6 implies that it does not matter how a curve is parameterized
when a line integral is computed—this is not quite true. While the value of the line integral
will be the same for any smooth parameterization, computing the integral may be easier or
harder depending on the choice of parameterization; one should always try to choose an easier
approach.
147
where C is the semicircle x2 + y 2 = 1, y ≤ 0, starting at α = (1, 0) and ending at ω = (−1, 0)
(see Figure 6.2), and f (x, y) = h−y, xi.
ω C1 α x
Figure 6.2: The curves C (semicircle in dark blue) and C1 (line segment in light blue) from α
(1, 0) to ω (1, 0). The arrows indicate the direction of integration.
√
Answer: At first glance, one might think that setting x = −t, y = 1 − t2 for t ∈ [−1, 1]
would be a effective parameterization. But this approach requires that one carefully simplify
and integrate an expression involving a square root. A much easier overall approach is to use
trig functions: let x = cos t and y = sin t. Then
ˆ ˆ π ˆ π
f (x) · dx = h − sin t, cos ti · h − sin t, cos ti dt = (sin2 t + cos2 t) dt = π .
0 0
C
Proposition 6 also does not say that the value of a line integral in general depends only on the
endpoints; in general if one changes the integration path (the curve), one changes the integral
value.
Answer: Notice that these are the same endpoints as in the previous example, with not the
same curve. In this case, one can simply take y = 0, x = −t and integrate with respect to t
from −1 to 1: ˆ ˆ ˆ
1 1
f (x) · dx = h 0, (−t)i · h −1, 0i dt = − 0 dt = 0 .
−1 −1
C1
148
6.1.2 Path Dependence; Path Independence
In the previous section, line integrals were computed along curves that start at a point α and
end at a point a point ω. One might wonder, given a certain α and ω, does it matter which
curve (path) is used to connect them? In general, as we saw in Example 6.2 and Example 6.3,
the answer is “Yes”: different integration paths lead to different values for the integral. But
there is an important class of vector fields whose line integrals depend only on the starting and
ending points α and ω, and not on the integration path or curve C that connects them.
Definition: A vector field F is conservative on its open domain if and only if, given any two
points α and ω in the domain, the value of the line integral from α to ω is the same, independent
of which path or curve C through the domain is chosen to move from α to ω:
ˆ ˆ ω
f (x) · dx = f (x) · dx
α
C
Remark: There is a technical point here that is worth mentioning: Saying that a vector field
“is conservative on its open domain” implicitly requires that this domain is pathwise connected
since the definition of conservative requires that there is at least one curve between any two
given points in the domain.
Conservative vector fields are among the most important in nature. Electrical fields are con-
servative, and gravitational fields are conservative when friction forces can be neglected. The
applications of these mathematical results are not discussed here, but they should be the central
part of any good treatment of basic physics.
One immediate consequence of a vector field being conservative is that a line integral around
any closed curve (a curve that begins and ends at the same point, so α = ω) is zero.
Theorem 18 A vector field F is conservative on its open domain D ⊂ Rn if and only if the
line integral over any closed curve O ⊂ D is zero:
˛
F (x) · dx = 0
O
Proof: First assume that the line integral over any closed curve O is zero, and suppose that
C1 and C2 are any two curves lying in the domain D ⊂ Rn that each begin at the point α and
end at the point ω. Then C1 − C2 = C1 + (−C2 ) is a closed curve, and if F is conservative,
then ˆ ˆ ˆ
F (x) · dx = F (x) · dx − F (x) · dx = 0
C1 −C2 C1 C2
149
implying of course that ˆ ˆ
F (x) · dx = F (x) · dx .
C1 C2
The proof for the opposite direction is perhaps most easily done by considering the contrapos-
itive: assume that there is a closed curve O over which the line integral is not zero. Pick any
two distinct points on O as α and ω; let C1 be one portion of O moving from α to ω, and let
C2 be the other portion. Then since
˛ ˆ ˆ ˆ
F (x) · dx = F (x) · dx = F (x) · dx − F (x) · dx 6= 0
O C1 −C2 C1 C2
implying that ˆ ˆ
F (x) · dx 6= F (x) · dx .
C1 C2
It would seem clear that if a vector field is conservative, then computing line integrals of this
vector field should be simpler, but the obvious questions are (1) How does one recognize that
a vector field is conservative? and (2) How does one take advantage of the fact that the vector
field is conservative? Happily, these questions both have at their root the same answer: find a
potential function.
Of course, the really important thing is the connection between the previous two definitions:
Theorem 19 A vector field F is conservative on its open domain D ⊂ Rn if and only if there
is a potential function ϕ defined on the same domain such that F (x) = ∇ϕ(x) for all x ∈ D .
Proof: Only the backward direction of this proof is discussed here; the forward version is
discussed in Exercise 6.7. Suppose that a potential function ϕ exists with F = ∇ϕ. Suppose
that the vector function x smoothly traces out the curve C beginning at t = to and ending at
t = t1 . Then using the chain rule and a changing variables, one can reduce the line integral to
a single-variable integral:
ˆ ˆ ˆ t1
dx ∂ϕ dx1 ∂ϕ dxn
F (x) · dx = ∇ϕ(x) · dt = + ... + dt
dt to ∂x1 dt ∂xn dt
C C
ˆ t1
d t1
= ϕ(x(t)) dt = ϕ(x(t)) = ϕ(ω) − ϕ(α)
to dt to
∗
In physics, particularly in electromagnetism, there may be a negative sign: E = −∇V where E is the
electric (vector) field and V is the electrical potential (function). This sign is just a matter of convenience or
inconvenience, depending on ones point of view, but the general definition of potential function as given here
does not include a negative sign.
150
where to corresponds to α (the beginning of the curve) and t1 , to ω (the end of the curve).
where C is any curve starting at α = (1, 1), ending at ω = (3, 5), and f (x, y) = h2x + y, x + 2yi.
Answer: In this case, if there is going to be a single value for the integral, it must be the
same no matter which curve between α to ω is chosen. This suggests that we should look
for a potential function. The method for finding this potential function can be called partial
integration because it is essentially the reverse of partial differentiation. If the potential function
ϕ exists, then from its definition, we know that the components of f must equal the partial
derivatives of ϕ:
∂ϕ ∂ϕ
f1 = = 2x + y , f2 = = x + 2y .
∂x ∂y
Thus we can find how ϕ depends on x by integrating the first of these two equations with
respect to x (and treating y as a constant), and then we can differentiate with respect to y:
∂ϕ
ϕ(x, y) = x2 + xy + c(y) =⇒ = x + c0 (y)
∂y
where c is the integration constant which here may depend on y. The question now is can the
two expressions for ∂ϕ/∂y be reconciled? In the present example, the answer is yes, provided
that c0 (y) = 2y, and thus c(y) = y 2 . One may add any constant, but it is usually convenient to
take this constant to be zero. Hence in this example, ϕ(x, y) = x2 + xy + y 2 .
Finally, to compute the actual line integral, one needs only to compute the difference between
the potential function evaluated at the two points:
ˆ ˆ (3,5)
(3,5) (3,5)
2 2
f (x) · dx = f (x) · dx = ϕ(x, y) = x + xy + y = 49 − 3 = 46
(1,1) (1,1) (1,1)
C
What happens if one attempts the partial integration process discussed above on a vector field
that does not have any potential function? The next example addresses this situation.
Example 6.5: Show that no potential function exists for the vector field f (x, y) = h2xy, 2xyi.
Answer: As in the previous example, let us try to find a potential function ϕ. Again we know
that the components of f must equal the partial derivatives of ϕ:
∂ϕ ∂ϕ
f1 = = 2xy , f2 = = 2xy .
∂x ∂y
151
Integrating the first of these equations with respect to x, one finds that
∂ϕ
ϕ(x, y) = x2 y + c(y) =⇒ = x2 + c0 (y) .
∂y
But from the second component of f , one then has
This final equation is a contradiction because c(y) must be a function of y alone—it must be
constant with respect to x. Here, this requirement can not be satisfied, hence no potential
function ϕ is possible.
Up to this point, this section has dealt only with integration of the component of a vector
field along a curve, and hence flow along a curve; now we consider integration that deals with
flow across a curve. Line integrals representing flow along a curve are the primary form of line
integrals, but there are cases where this second form is important. In particular, this second
type of line integral is used to describe flow into and out of regions. This type of integral is a
line integral only in R2 (the x, y-plane).
where s is arclength along the curve C starting from α provided that the integral on the right
exists and is finite.
Remark: If C is a simple closed curve, than n is normally taken to be the outward unit
normal vector. For other curves, one must simply be careful to notice which unit normal vector
is being used.
where C is the circle x2 + y 2 = 4, n is the outward unit normal vector, and f (x, y) = hy, xi.
152
is a circle, the outward unit normal vector is the radial vector: n = x(t)/|x(t)| . In this case
n = hcos t, sin ti. This
ˆ ˆ 2π ˆ 2π
f (x) · n ds = h sin t, cos ti · h cos t, sin ti (2) dt = 4 sin t cos t dt = 0 .
0 0
C
Notice that in this example the flow across the circle is generally not zero at any given point
on the circle, but the integral above shows that the net flow is zero.
It is, of course, possible to define still other types of line integrals, but those would be unusual;
the two forms defined here (along the curve, and across the curve) are by far the most common
types to appear in mathematics, science and engineering.
153
6.2 Surface Integrals: Integration over Surfaces in R3
As it is possible to integrate along curves that are not intervals on the x-axis, it is also possible
to integrate over surfaces that are not simply portions of the x, y-plane. Unlike the line integral
case, however, the primary component of integration for the integrand for surface integrals is
the component normal to the surface.
F
F
z = s ( x, y) n
Figure 6.3: The Integral of the vector field F over the surface z = s(x, y) (in blue) above an
integration domain D (in gold).
Remarks:
1. The square-root factor in the definition of the surface integral comes from the tilt of the
surface S relative to the x, y-plane. It is 1 when S is horizontal, greater than 1 when S is
tilted, and will vary with x and/or y unless S is a plane.
†
This is not the most general definition for surface integrals, since it requires that the surface lie above some
domain in the x, y-plane; basic closed surfaces like spheres can not be fully described in this way. Still this
definition can be used to compute and discuss all of the surface integrals that we are interested in here, including
integrals over spheres.
154
2. In this integral, dA will dx dy = dy dx in rectangular coordinates, or r dr dθ in polar
coordinates.
The next two examples show the direct evaluation of two surface integrals using this definition.
where F (x, y, z) = hz, x, yi, Π is the portion of the plane 2x + 3y + z = 1 lying in the first
octant, and n is the upper unit normal vector for this plane.
Answer: Given this surface Π, the corresponding integration domain D is the triangular region
bounded by the x-axis, the y-axis and the line y = (1 − 2x)/3. Also since Π is a plane, one
√ normal vector N = h2, 3, 1i, implying that the upper unit normal vector is
can read off the
n = h2, 3, 1i/ 14. The integral is then
ˆˆ ˆ 1/2 ˆ (1−2x)/3 D E h2, 3, 1i p
F · n dS = 1 − 2x − 3y, x, y · √ (−2)2 + (−3)2 + 1 dy dx
0 0 14
Π
where F (ρ, φ, θ) = ρ , ρ being the vector from the origin to a point in three-space, S is the
sphere of radius 2 centred at the origin, and n is the outward unit normal vector for this sphere.
Answer: Technically this surface does not fit the definition for a surface integral given above
because the entire surface can not be expressed as z = s(x, y), but this issue can be dealt with
by dividing the sphere into two parts: the upper hemisphere and the lower hemisphere. First
consider the upper hemisphere; the corresponding integration domain D is the circular disk in
the x, y-plane centered at the origin with radius 2. The upper hemisphere H is z = s(x, y) =
p
4 − x2 − y 2 , and thus
∂s −x ∂s −y
=p =p .
∂x 4 − x2 − y 2 ∂y 4 − x2 − y 2
155
For any point on the sphere, ρ is the vector from the origin to that point; because of the
symmetry of the sphere, the outward unit normal vector is n = ρ/ρ. The integral for the upper
hemisphere is
ˆˆ ˆ 2π ˆ 2
v !2 !2
u
u −x −y
F · n dS = ρ · ρ/ρ t p + p + 1 r dr dθ
0 0 4 − x2 − y 2 4 − x2 − y 2
H
ˆ 2π ˆ 2
2
= ρ √ r dr dθ
0 0 4 − r2
ˆ 2
r dr
= 8π = 16π √
0 4 − r2
since ρ = 2 on the surface. The integral over the lower hemisphere is the same because of
symmetry; for the entire sphere S,
ˆˆ
F · n dS = 32π .
S
156
6.3 Differential Operators
6.3.1 Definitions
There are several “collections” of partial derivatives that arise naturally in mathematics and
science; a couple of these have already come up in our discussion: the gradient vector and the
Jacobian matrix. We now introduce several more: the divergence, the curl and the Laplacian.‡
As the names suggest, the divergence measures or describes the tendency of a vector field to
diverge from a point, while curl measures or describes the tendency of a vector field to circulate
around a point (see below for details).
Definition: Suppose again that u : D ⊂ R3 → R3 is a vector field. The curl of this vector
field u is defined as
i j k
∂u3 ∂u2 ∂u1 ∂u3 ∂u2 ∂u1 ∂ ∂ ∂
curl u ≡ ∇ × u := − , − , − ≡
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2 ∂x1 ∂x2 ∂x3
u1 u2 u3
Remarks:
1. For any n ∈ Z+ , the definition of the divergence can be given for a vector field on Rn , and
the Laplacian can be defined for a scalar function on Rn , but as with the cross product,
the curl requires that n = 3, i.e., three-dimensional space.
‡
Named in honor of Pierre-Simon Laplace (1749–1827), a French mathematician, scientist and engineer from
roughly two generations after Leibniz and Newton.
157
2. As has been the case throughout this text, the numbering of the variables (x1 , x2 , x3 ) is
interchanged with the use of traditional letters (x, y, z).
The definition for divergence given above is simple and standard, but at first glance, it may not
be at all clear why div(u) should represent divergence. What follows is both a justification for
this definition, and in essence, the basis for a proof for the divergence theorem presented in the
following section. This presentation is for n = 2 (two-dimensional space), but the same sort
of argument works for n-dimensional space. Also a similar argument justifies the definition for
curl (see Exercise 6.13).
where A is a square centred at (x, y) having edge length > 0, the boundary of A is ∂A, and
|A| is the area of A (see Figure 6.4).
s3
n ( x, y)
s2 s1 ε
s4
Figure 6.4: Square centred at (x, y) with edge length . The vertical edges are s1 and s2 ; the
horizontal edges are s3 and s4 . The outward unit normal vectors on each edge are n.
Remark: Recall that the line integral in this theorem is the second form of line integral
discussed above. Notice that the centre (x, y) is the only point in every square no matter how
small is. Theorem 20 states that the divergence of a vector field u from (x, y) is the average
value of the outward flow of u crossing the boundary of smaller and smaller squares as → 0.
Thus this quantity is the tendency for the vector field u to diverge from the point (x, y).
Proof: The line integral (VI.1) can be computed explicitly by breaking it into four edge
158
segments as shown in Figure 6.4. Consider first the two vertical segments, s1 and s2 . For s1 ,
ˆ ˆ y+/2 ˆ y+/2
u · n ds = u(x + /2, η) · h1, 0i dη = u1 (x + /2, η) dη
s1
y−/2 y−/2
where u1 is the first component of u. Using a Taylor expansion, one can write
∂u1
u1 (x + /2, η) = u1 (x + /2, y) + (x + /2, y)(η − y) + O(2 )
∂y
where O(2 ) represent the remaining terms in the expansion that are of the order of 2 or higher.
Hence ˆ
1 u1 (x + /2, y) ∂u1
u · n ds = + (x + /2, y) + O() (VI.2)
|A| s1 ∂y
since |A| = 2 , and u1 (x + /2, y) and (∂u1 /∂y)(x + /2, y) are constants with respect to this
integration. The same sort of computation holds for s2 , except that now, the integral is from
y + /2 to y − /2 (because one moves counterclockwise around ∂A, and n = h−1, 0i). As a
result,
ˆ ˆ y−/2 ˆ y+/2
u · n ds = u(x − /2, η) · h−1, 0i (−dη) = − u1 (x − /2, η) dη
s2
y+/2 y−/2
because the continuity of the partial derivatives implies that the ∂u1 /∂y terms in (VI.2) and
(VI.3) cancel.
Now consider the two horizontal edge segments: s3 and s4 . The same argument as before leads
to ˆ
1 u2 (x, y + /2) − u2 (x, y − /2) ∂u2
lim u · n dA = lim = (x, y) ,
→0 |A| →0 ∂y
s3 +s4
Remark:
The argument above uses rectangular coordinates; one might ask whether or not this result
really requires rectangular coordinates. The answer is that it does not—see Exercise 6.14.
159
6.4 The Theorems of Gauss, Green and Stokes
In many ways, this is the main section, not only of this chapter, but of this entire text. Here
shall we discuss the work from the early 19-th century of a number of mathematicians, including
Gauss (or Gauß), Green and Stokes. Of particular note is the contribution of George Green who
proved most of these results while working as a miller. Indeed in many ways, calculus began
with the work of Leibniz and Newton and was complete with the work of Green. Work in the
second half of the 19-th century and later is better thought of as analysis rather than calculus.
ni j+1 nij
A ij ε A ij
ε
ni+1 j
n Ω
∂Ω
Figure 6.5: Domain for integration Ω for the divergence theorem. The partition is in blue, and
each partition square Aij has area 2 ; partition elements near the boundary have smaller area.
with areas |∆Aij | = 2 where i is the index for the x coordinate (the column in Figure 6.5) and
j is the index for the y coordinate (the row in Figure 6.5). Near the boundary, the partition
§
This theorem is widely attributed to Gauss (1777–1855) in both mathematics and physics. A version was
known earlier to Lagrange (1736–1813), and independent proofs were given by Ostrogradsky (1801–1862) and
Green (1793–1841).
160
elements are not necessarily square, but their areas are still no greater than 2 . For each of the
∆Aij , let (xi , yj ) be the centre of ∆Aij .
From Theorem 20, at each centre (xi , yj ), we can pick a sufficiently small so that
˛ ˛
1 1
div(u)(xi , yj ) = lim u · n ds = 2 u · n ds + O()
|∆Aij |→0 |∆Aij |
∂∆Aij ∂∆Aij
Now, multiplying by |∆Aij | = 2 and summing over all of the partition elements, one finds that
˛ ˛
X X
3
div(u)(xi , yj )|∆Aij | = u · n ds + O( ) = u · n ds + O()
i,j i,j ∂∆Aij ∂Ω
where the second equality is due to the cancellation of line integrals on the boundaries of
adjacent partition elements (see Figure 6.5). For example, the flow rightward out of ∆Aij is
equal and opposite the flow leftward out of ∆Ai+1,j because nij = −ni+1j along their common
boundary, and the flow upward out of ∆Aij is equal and opposite the flow downward out of
∆Ai,j+1 . The error term goes from O(3 ) to O() because there are O(1/2 ) partition squares.
Remarks:
1. The integral on the left in the divergence theorem is a multiple integral with as many
integrations as the dimension of the space in which Ω is embedded—here, Rn . The
integral on the right has one fewer integrations since it is over the boundary. So if n = 3,
ˆˆˆ ‹
u dV = u · n dS ,
Ω ∂Ω
whereas if n = 2, ˆˆ ˛
div u dA = u · n ds .
Ω ∂Ω
2. Physically and mathematically, what this theorem is saying is that the total amount that
a vector field flows out of or diverges from the inside of a domain Ω is equal to the amount
161
that this vector field crosses out of the domain boundary ∂Ω. Thus this theorem is a
statement of conservation of flow for the vector field. If the divergence is negative, then
the vector field flows into the domain Ω, and this must equal to the amount that the
vector field crosses into the domain boundary ∂Ω. Suppose that f > 0 is a given function
defined on a domain Ω; if div u = f > 0 throughout Ω, then f is the source function for
a vector field u and the total flow crossing the boundary out of Ω is
ˆ ˛
f dV = u · n dS .
Ω ∂Ω
3. This is not the most general statement of the divergence theorem, for example, the domain
of integration need not be connected. Still this version demonstrates all of the essential
mathematics of the theorem.
In many cases, the divergence theorem can be used to turn a complicated integral into a much
simpler integral. Frequently it is much easier to compute the integral over Ω than to compute
the integral over its boundary, as the following example demonstrates.
Example 6.9: Given a vector field F (x, y, z) = h3x, 2y, zi, please find the value of the surface
integral over the surface of the unit cube C := {(x, y, z)|0 < x < 1, 0 < y < 1, 0 < z < 1}:
ˆˆ
F · n dS
∂C
Answer: Computing this integral directly would require that one compute six separate surface
integrals. But since divF = 3 + 2 + 1 = 6, the divergence theorem implies that
ˆˆ ˆˆˆ ˆˆˆ
F · n dS = divF dV = 6 dV = 6
∂C C C
It is also at times possible to use the divergence theorem to compute surface integrals for surfaces
that do not by themselves bound domains, as the following example shows:
Example 6.10: Given a vector field u(x, y, z) = h−y 2 + z, x2 + z, −xy 2 + z 2 i, please find the
value of the surface integral over the surface of the hemisphere H := {(x, y, z)|x2 + y 2 + z 2 =
3, z > 1}: ˆˆ
u · n dS
H
162
Answer: Computing this integral directly again would be at least somewhat tedious, but taking
into account the circular disk D := {(x, y, 0)|x2 + y 2 ≤ 3} in the x, y-plane, one can then use
the divergence theorem to compute the given integral.
ˆˆ ˆˆ ˆˆˆ
u · n dS + u · n dS = divu dV
H D Ω
where Ω is the domain inside the hemisphere so that ∂Ω = H ∪ D and n = h0, 0, 1i. Hence
ˆˆ ˆˆˆ ˆˆ
u · n dS = divu dV − u · n dS =
H Ω D
Now
ˆˆˆ ˆˆˆ ˆ 2π ˆ π/2 ˆ
√
3
(ρ cos φ) ρ2 sin φ dρ dφ dθ =
div u dV = 2z dV = 2
0 0 0
Ω Ω
ˆ 1 ˆ √
3
!
3 9 9π
2 (2π) w dw ρ dρ = (2π) (1) =
0 0 4 2
while ˆˆ ˆ ˆ √
2π 3
r3 cos θ sin2 θ − 0 r dr dθ =
u · n dS =
0 0
D
ˆ ˆ √ !
2π
2
3
4
√
cos θ sin θ dθ r dr = (0)(9 3/5) = 0 .
0 0
Thus ˆˆ
9π
u · n dS = .
2
H
The divergence theorem is not only used to compute various integrals, but it is also used to
establish a number of other famous named results. Two of these have come to be known
as Green’s first and second identities. Green’s first identity is a kind of higher-dimensional
integration by parts and basically involves applying the divergence theorem to uF . The classical
version of this identity has F = ∇v so that the divergence theorem is applied to u∇v.
Theorem 22 (Green’s Identities) Suppose that u and v are both C 2 (Ω) for some common
domain Ω. Then Green’s first identity is
ˆ ˛
∇u · ∇v + u4v dV = u∇v · n dS
Ω ∂Ω
163
where n is the outer unit normal vector to ∂Ω. Also Green’s second identity is
ˆ ˛
u4v − v4u dV = u∇v − v∇u · n dS .
Ω ∂Ω
Proof: The proof of Green’s identities is discussed in Exercise 6.12. The key to proving
Green’s first identity is to apply the divergence theorem to the vector field u∇v.
where ∂S is traversed in the positive direction relative to n, a unit normal vector for S.
Definition: A surface is orientable if and only if an ant standing on a point on the surface
can not walk to the same point but on the opposite side of the surface without crossing the
boundary. For an orientable surface, its boundary ∂S is traversed in the positive direction (or
orientation) if and only if the direction of motion along the boundary and the unit normal
vector n satisfy the right hand rule: if one curls the fingers of ones right hand in the direction
of motion along the boundary, the right thumb points in the direction of n.
At first glance, one might think that all smooth surfaces are orientable, but this is not true.
The classic example of a non-orientable surface is a Möbius¶ strip.
Remarks:
¶
Named for August Möbius (1790-1868), a German mathematician and astronomer.
164
2. Physically and mathematically, what this theorem says that the total amount of circulation
on the surface S is equal to the amount that the vector field flows along the boundary
∂S. Thus this theorem is again a statement of conservation of flow for the vector field,
but now in the tangential or circulation sense. If the curl is positive, then on average
the vector field flows along the boundary ∂S in the positive direction (according to the
right-hand rule). If the curl is negative, then on average the vector field flows along the
boundary ∂S in the opposite (negative) direction.
3. As was the case when surface integrals were defined, the version of Stokes’ theorem given
here does not directly cover closed surfaces (for example, spheres) because our definition
of a smooth surface requires that the surface be explicitly representable as z = f (x, y)
where f is a differentiable function, and of course this is impossible for closed surfaces.
But as before, for a sphere, the solution to this problem is simply to divide it into an
upper hemisphere and a lower hemisphere. Also the line integrals around the boundaries
(the equator in this case) are equal in magnitude and opposite in sign, hence they cancel.
So for a sphere S, ˆˆ
curl u · n dS = 0
S
This result generalizes to closed surfaces that can be divided into smooth upper and lower
portions:
where n is the outward unit normal vector for S and u is any continuously differentiable
vector field.
As with the divergence theorem, Stokes’ theorem can sometimes be used to turn a complicated
integral into a much simpler integral.
Example 6.11: For the vector field u(x, y, z) = h−y, x, 1i and for a conical silo surface S with
a circular base whose radius is R (see Figure 6.6), please compute
ˆˆ
curl u · n dS .
S
Answer: By direct computation, one finds that curl u = h0, 0, 2i, but it would be perhaps more
difficult to carefully parameterize the surface of the silo. By Stokes’ theorem, however, the
requested surface integral can be found by computing the line integral
˛ ˆ 2π ˆ 2π
dx
u · dx = u· dθ = h−R sin θ, R cos θ, 1i · h−R sin θ, R cos θ, 0i dθ =
0 dθ 0
∂S
165
z
ˆ 2π
2
R (sin2 θ + cos2 θ) dθ = 2πR2 .
0
So Stokes’ theorem implies that the exact shape of the silo is not important as long as the shape
of the base is fixed.
Example 6.12: Suppose that F : R3 → R3 is a vector field, and suppose that there is a
continuously differentiable function ϕ such that F = ∇ϕ. How does Stokes’ theorem relate to
this vector field?
Answer: Of course, as was discussed in §6.1.2, this vector field F is conservative, and so the line
integral over any closed curve is zero. So for any surface S satisfying the conditions of Stokes’
theorem, the integral around the closed boundary ∂S must be zero, and hence Stokes’ theorem
implies that
ˆˆ ˆˆ
curl F · n dS = curl ∇ϕ · n dS = 0 .
S S
curl ∇ ≡ 0
regardless of which potential function this operator is applied to. So both the surface integral
and the line integral from Stokes’ theorem is zero.
Up to this point, the theorems of Gauss, Green and Stokes have been presented mostly as
separate though clearly related results. In this concluding section, however, they are combined
along with the fundamental theorem of calculus and given as a single result:
166
Theorem 24 (Green, Stokes, Cartan) All of the above integral theorems of vector calculus
along with the fundamental theorem of calculus can be expressed in a single form:
ˆ ˛
dF = F
Ω ∂Ω
Remarks:
1. The proof of this theorem is beyond the scope of this text and requires the concepts of
differential forms and manifolds (see e.g., Marsden & Tromba [2], §8.5). For our purposes,
this result can be thought of as a notational summary of the theorems of Gauss, Stokes
and Green, and the fundamental theorem of calculus.
2. Notice that in this version, the theorem seems to say that one can commute the differential
operator off the integrand and onto the domain of integration; this is essentially correct
(again in the realm of differential forms) and justifies the use of the partial derivative
symbol to denote the boundary of the open domain Ω.
3. Whose name should be on this theorem is difficult to say. Frequently Stokes is credited
with the theorem, but several of the vector-calculus forms of this result were due to Green,
and the differential forms result is due to Cartank (1945).
4. Again, this version is not the most general version of the theorem.
Example 6.13: What are Ω, ∂Ω, F and dF if Theorem 24 is to represent the fundamental
theorem of calculus?
Answer: For the fundamental theorem of calculus, Ω = [a, b], an interval in the real line with
a < b, the boundary ∂Ω = {a, b} is just the two end points, and dF = f 0 (x)dx is the derivative
of a continuously differentiable function F = f . In this case, the integral over the boundary
is just evaluation at the endpoints, with the negative sign coming from outward unit normal
“vector” at a being −1:
ˆ b ˆ ˆ ˛
0 0
f (x) dx = f (x) dx = dF = F = f (b) − f (a) .
a
[a,b] Ω ∂Ω
k
Élie Cartan (1869–1951) was a French mathematician who worked extensively on Lie groups and differential
geometry mainly in the first half of the 20-th century.
167
Exercises 6
(a) The line integral of F (x1 , x2 ) = hx1 , x2 i along the curve traced out by the vector
function x(t) = ht, ti from (0, 0) to (2, 2).
(b) The line integral of F (x, y, z) = h2x, 3y 2 , zi along the curve traced out by the vector
function x(t) = ht, 1/t, t2 i from (1, 1, 1) to (2, 1/2, 4).
(c) The line integral of F (x, y, z) = hy, −z, xi along the three-dimensional alpha curve
traced out by the vector function x(t) = ht2 − 1, t(t2 − 1), ti from t = −2 to t = 2.
(d) The integral of F (x, y) = hx, yi over the unit circle from (0, 0) to a point on the
circle corresponding to the polar angle θ.
2. Use the definition of the line integral to prove Proposition 7 based on the similar result
from single variable calculus:
ˆ a ˆ b
f (x) dx = − f (x) dx
b a
and ˆ ˆ ˆ
c b c
f (x) dx = f (x) dx + f (x) dx
a a b
provided that all of these integrals make sense.
3. Using the results of Example 6.2 and 6.3, please compute the line integral
ˆ
f (x) · dx
S
where S = C − C1 is the closed semicircular loop starting at (1, 0), moving along the
upper unit circle to (−1, 0), then moving along the x-axis to (1, 0).
4. For each of the following vector fields, please either find a potential function ϕ or determine
that no such potential function exists. When a potential function exists, please compute
the value I of any line integral from the origin (either (0, 0) or (0, 0, 0)) to either (1, 1) or
(1, 1, 1).
168
(d) F
(x1 , x2 , x3 ) =
sin x2 cos x3 + cos x1 cos x2 , x1 cos x2 cos x3 − sin x1 sin x2 , x1 sin x2 sin x3
Answer: (a) ϕ(x, y, z) = xyz+x2 ez , I = 1+e; (b) ϕ DNE; (c) ϕ(x1 , x2 ) = x1 x2 ex1 x2 +x1 x2 ,
I = 1 + e; (d) ϕ(x1 , x2 , x3 ) = x1 sin x2 cos x3 + sin x1 cos x2 , I = 2 sin 1 cos 1
∂f1 ∂f2
=
∂y ∂x
where f = hf1 , f2 i
6. Its definition, Theorem 18 and Theorem 19 give three characterizations for a vector field
being conservative; there is a fourth:
(a) Assuming that all derivatives exist, please show that curl(∇u) ≡ 0 for any scalar
function (scalar field) u.
(b) Please explain why a vector field F is conservative on its open domain if and only if
curlF ≡ 0 on the domain of F .
where E is the ellipse 4x2 + y 2 = 4, n is the outward unit normal vector, and f (x, y) =
hx, yi. Hint: Notice that because this exercise is posed in the x, y-plane, it is possible to
determine the unit normal vector n directly from the unit vector T .
ˆ 2π
dt
Answer: 2 √
0 1 + 3 cos2 t
where C is the curve x = y 2 in the x, y-plane from (0, 0) to (4, 2) and n is the downward
and rightward unit normal vector.
Answer: −8/3
169
10. Please evaluate the surface integral
ˆˆ
F · n dS
Π
where F (x, y, z) = hz, y, xi, S is the sphere of radius 3 centred at the origin, and n is the
outward unit normal vector for this sphere.
12. For the vector field v(x, y, z) = hxyz, xyz, xyzi and the scalar function u(x, y, z) = xyz,
please compute
Answer: (a) yz + xz + xy; (b) h x(z − y), y(x − z), z(y − x) i; (c) 0; (d) h yz, xz, yz i;
(e) h z + y, x + z, x + y i; (f) 0
13. Please follow the general outline of the argument for Theorem 21 of a vector field to show
that ˛
1
curl(u)(x, y) = lim u · dx
→0 |A| ∂A
15. Consider the rectangular prism (Box) B = [0, 2] × [0, 3] × [0, 5] and the vector field
F (x, y, z) = hx2 y, y 2 z, z 2 xi. Suppose that n is the outward unit normal vector. Compute
˛
F (x, y, z) · n dS
∂B
Answer: 465
170
16. Let P be the prism bounded in the first octant by the plane Π : x + y + z = 1 (so P is
bounded by Π and the three coordinate planes). For F (x, y, z) = hx, y, zi, please directly
compute ˆˆ
F · n dS
Π
where n is again the outward unit normal vector to P , then use the divergence theorem
to compute this integral.
Answer: 1/2
17. Suppose that u ∈ C 2 (Ω) for some open, bounded, connected domain Ω ∈ Rn , and suppose
that 4u = f for a given f ∈ C(Ω), that is f is continuous on Ω. Please compute
˛
∇u · n dS
∂Ω
ˆˆ
Answer: f (x, y) dA Hint: 4 ≡ div∇.
Ω
p
18. Consider the solid hemisphere H bounded above by the surface S: z = 1 − x2 − y 2 and
below by the circular disk D centred at the origin with radius 1. For F (x, y, z) = hx, y, zi,
please compute the surface integral
ˆˆ
F (x, y, z) · n dS
S
19. If u(x, y) = sin(xy) and v(x, y) = x2 y, please verify Green’s first identity.
20. (a) Please prove Green’s first identity by applying the divergence theorem to u∇v and
proving a multivariable product rule:
div(u∇v) = ∇u · ∇v + u4v
171
22. For u(x, y, z) = h3z, 2x, yi, if S is the unit disk x2 + y 2 ≤ 1 lying in the plane z = 3, please
compute ˛
u · dx
∂S
23. Suppose that a vector field F is conservative everywhere in R3 . Explain why for any
bounded, connected, smooth, orientable surface S ⊂ R3 , with a piecewise smooth bound-
ary ∂S, ˆˆ
curl F · n dS = 0 .
S
24. Determine what Ω, dF , ∂Ω and F from Theorem 24 are for both the divergence theorem
and Stokes’ theorem, as was done for the fundamental theorem of calculus in Example
6.13.
172
Bibliography
[1] W.A.J. Kosmala, A Friendly Introduction to Analysis, 2 ed., Pearson, Prentice Hall, Upper
Saddle River, NJ, 2004.
[2] J.E. Marsden and A. Tromba, Vector Calculus, 6 ed., W.H. Freeman, New York, 2012.
[5] W. Rudin, Principles of Mathematical Analysis, 3 ed., McGraw-Hill, Inc., New York, 1976.
173