National Open University of Nigeria: School of Science and Technology
National Open University of Nigeria: School of Science and Technology
National Open University of Nigeria: School of Science and Technology
1
MTH 311 CALCULUS OF SEVERAL VARIABLES
COURSE MATERIAL
2
Course Code MTH 311
3
COURSE CODE: M T H 311
TABLE OF CONTENT
• Unit 1: Derivative
• Unit 2: Partial derivative.
• Unit 3: Application of Partial derivative.
4
MODULE 7 MAXIMISATION AND MINIMISATION OF FUNCTIONS OF
SEVERAL VARIABLES
• Unit 1: Jacobians
• Unit 2: Jacobian determinants
• Unit 3: Applications of Jacobian
5
MODULE 1 Limit and Continuity of Functions of Several Variables
Unit 1: Real Functions
Unit 2: Limit of Function of Several Variables.
Unit 3: Continuity of Function of Several Variables.
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 domain
3.2 real function
3.3 value of functions
3.4 types of graph
3.5 types of function
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Readings
INTRODUCTION
A real-valued function, f, of x, y, z, ... is a rule for manufacturing a new number, written f(x,
y, z, ...), from the values of a sequence of independent variables (x, y, z, ...).
The function f is called a real-valued function of two variables if there are two independent
variables, a real-valued function of three variables if there are three independent variables,
and so on.
As with functions of one variable, functions of several variables can be represented
numerically (using a table of values), algebraically (using a formula), and sometimes
graphically (using a graph).
Examples
6
f(2, 1) = 2 (1) = 3 Substitute 2 for x and 1 for y
OBJECTIVES
At the end of this unit, you should be able to know:
• domain
• real function
• value of functions
• types of graph
• types of function
MAIN CONTENT
f is a function from set A to a set B if each element x in A can be associated with a unique
element in B.
Domain
Co-domain
7
In the above definition of the function, set B is called co-domain.
Real Functions
A real valued function f : A to B or simply a real function 'f ' is a rule which associates to
each possible real number x A, a unique real number f(x) B, when A and B are subsets
of R, the set of real numbers.
In other words, functions whose domain and co-domain are subsets of R, the set of real
numbers, are called real valued functions.
Value of a Function
Constant Function
Domain = A
Range = {k}
The graph of this function is a line or line segment parallel to x-axis. Note that, if k>0, the
graph B is above X-axis. If k<0, the graph is below the x-axis. If k = 0, the graph is x-axis
itself.
8
Identity Function
Domain = R
Range = R
Polynomial Function
9
f(x) = x3 + x2 + x
Modulus Function
Domain = R
Since square root of a negative number is not real, we define a function f : R+ ® R such that
10
Greatest Integer Function or Step Function (floor Function)
For a real number x, we denote by [x], the smallest integer greater than or equal to x. For
example, [5 . 2] = 6, [-5 . 2] = -5, etc. The function f:R R defined by
f(x) = [x], x R
Domain: R
Range : Z
11
Exponential Function
Logarithmic Function
12
Trigonometric Functions
Trigonometric functions are sinx, cosx, tanx, etc. The graph of these functions have been
done in class XI.
Inverse Functions
Inverse functions are sin-1x, cos-1x, tan-1x etc. The graph of these functions have been done in
class XI.
Signum Functions
13
Odd Function
Even Function
14
A polynomial with only even powers of x is an even function.
Reciprocal Function
CONCLUSION
In this unit, you have defined domain and types of domain. You have known real
functions and have also learnt value of functions. You have also known types of graph
and type of function.
SUMMARY
In this unit, you have studied :
• domain
• real function
• value of functions
• types of graph
• types of function
15
4. Functions f and g are defined by f(x) = 1/x + 3x and g(x) = -1/x + 6x - 4 . find (f +
g)(x) and its domain
REFERENCES
Boas, Ralph P., Jr.: "A primer of real functions", The Carus Mathematical Monographs, No.
13; Published by The Mathematical Association of America, and distributed by John Wiley
and Sons, Inc.; New York 1960 189 pp. MR22#9550
Smith, Kennan T.: "Primer of modern analysis", Second edition. Undergraduate Texts in
Mathematics. Springer-Verlag, New York-Berlin, 1983. 446 pp. ISBN 0-387-90797-1
MR84m:26002
Krantz, Steven G.; Parks, Harold R.: "A primer of real analytic functions", Basler Lehrbücher
[Basel Textbooks], 4; Birkhäuser Verlag, Basel, 1992. 184 pp. ISBN 3-7643-2768-5
MR93j:26013
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1: Definition
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Readings
1.0: INTRODUCTION
Let f be a function of two variables defined on a disk with center (a,b), except possibly at
(a,b). Then we say that the limit of f(x,y) as (x,y) approaches (a,b) is L and we write
lim
( x, y ) → (a , b ) f(x,y) = L
If for every number ε > 0 there is a corresponding number δ > 0 such that
16
lim f(x,y) = L and f(x,y) → L as (x,y) → (a,b)
x→a
y →b
Since f(x,y) - L is the distance between the numbers f(x,y) and L, and
( x − a) 2 + ( y − b) 2 is the distance between the point (x,y) and the point (a,b), Definition
12.5 says that the distance between f(x,y) and L can be made arbitrarily small by making the
distance from (x,y) to (a,b) sufficiently small (but not 0). Figure 12.15 illustrates Definition
Def
12.5 by means of an arrow diagram. If any small interval (L - ε, L +ε)) is given around L,
then we can find a disk Dδ with center (a,b) and radius δ > 0 such that f maps all the points in
Dδ [except possibly (a,b)] into the interval (L - ε, L + ε).
2.0: OBJECTIVES
Consider the function f(x,y) = 9 − x 2 − y 2 whose domain is the closed disk D = {(x,y)x
{(x,y) 2 +
y2 ≤ 9} shown in Figure 12.14(a) and whose graph is the hemisphere shown in Figure
12.14(b)
If the point (x,y) is close to the origin, then x and y are both close to 0, and so f(x,y) is close
to 3. In fact, if (x,y) lies in a small open disk x2 + y2 < δ2, then
f(x,y) = 9 − (x 2 + y 2 ) > 9 −δ 2
Figure 12.14
Thus we can make the values of f(x,y) as close to 3 as we like by taking (x,y) in a small
enough disk with centre (0,0). We describe this situation by using the notation
lim
( x, y ) → (a, b) 9 − (x 2 + y 2 ) = 3
17
lim
( x, y ) → (a , b ) f(x,y) = L
Means that the values of f(x,y) can be made as close as we wish to the number L by taking
the point (x,y) close enough to the point (a,b). A more precise definition follows.
12.5 Definition
Let f be a function of two variables defined on a disk with centre (a,b), except possibly at
(a,b). Then we say that the limit of f(x,y) as (x,y) approaches (a,b) is L and we write
lim
( x, y ) → (a , b ) f(x,y) = L
If for every number ε > 0 there is a corresponding number δ > 0 such that
Since f(x,y) - L is the distance between the numbers f(x,y) and L, and
( x − a) 2 + ( y − b) 2 is the distance between the point (x,y) and the point (a,b), Definition
12.5 says that the distance between f(x,y) and L can be made arbitrarily small by making the
distance from (x,y) to (a,b) sufficiently small (but not 0). Figure 12.15 illustrates Definition
12.5 by means of an arrow diagram. If any small interval (L - ε, L +ε) is given around L,
then we can find a disk Dδ with center (a,b) and radius δ > 0 such that f maps all the points in
Dδ [except possibly (a,b)] into the interval (L - ε, L + ε).
Another illustration of Definition 12.5 is given in Figure 12.16 where the surface S is the
graph of f. If ε > 0 is given, we can find δ > 0 such that if (x,y) is restricted to lie in the disk
Dδ and (x,y) ≠ (a,b), then the corresponding part of S lies between the horizontal planes z = L
- ε and z = L + ε. For functions of a single variable, when we let x approach a, there are only
two possible directions of approach, from the left or right. Recall from Chapter 2 that if
limx→a – f(x) ≠ limx→a + f(x), then limx→a f(x) does not exist.
For functions of two variables the situation is not as simple because we can let (x,y) approach
(a,b) from an infinite number of directions in any manner whatsoever (see Figure 12.7).
Definition 12.5 refers only to the distance between (x,y) and (a,b). It does not refer to the
direction of approach. Therefore if the limit exists, then f(x,y) must approach the same limit
18
no matter how (x,y) approaches (a,b). Thus if we can find two different paths of approach
along which f(x,y) has different limits, then it follows that lim(x,y)→(a,b) f(x,y) does not exist.
Figure 12.15
L-∈ L L + ∈
Figure 12.16
19
Figure 12.17
If f(x,y) → L1 as (x,y) → (a,b) along a path C1, and f(x,y) → L2 as (x,y) → (a,b) along a path
C2, where L1 ≠ L2, then lim(x,y)→(a,b)
(x,y) f(x,y)
Example 1
x2 − y2
Find lim if it exists.
( x , y )→ (0,0 ) x2 + y2
Solution
Let f(x,y) = (x2 – y2)/(x2 + y2). First let us approach (0,0) along the x-axis.
x axis. Then y = 0 gives
f(x,0) = x /x = 1 for all x ≠ 0, so
2 2
Figure 12.18
20
Figure 12.19
Example 2
21
Solution
If y = 0, then f(x,0) = 0/x2 = 0. Therefore
x2 1
f(x,y) = =
x +x
2 2
2
1
Therefore f(x,y) → as (x,y) → (0,0) along y = x
2
(See Figure 12.19.) Since we obtained different limits along different paths, the given limit
does not exist.
Example 3
xy 2
If f(x,y) = , does lim f(x,y) exist?
x2 + y4 ( x , y )→ (0,0 )
Solution
With the solution of Example 2 in mind, let us try to save time by letting (x,y) → (0,0) along
any line through the origin. Then y = mx, where m is the slope, and if m ≠ 0,
x(mx) 2 m2 x3 m2 x
f(x,y) = f(x,mx) = = =
x 2 + (mx) 4 x2 + m4 x4 1+ m4 x2
y 2 .y 2 y4 1
f(x,y) = f(y2,y) = = =
(y ) + y
2 2 4
2y 4
2
1
so f(x,y) → as (x,y) → (0,0) along x = y2
2
Since different paths lead to different limiting values, the given limit does not exist.
Example 4
22
3x 2 y
Find lim if it exists.
( x , y )→ (0,0 ) x2 + y2
Solution
As in Example 3, one can show that the limit along any line through the origin is 0. This
does not prove that the given limit is 0, but the limits along the parabolas y = x2 and x = y2
also turn out to be 0, so we begin to suspect that the limit does exist.
3x 2 y
− 0 < ε whenever 0 < x2 + y 2 < δ
x2 + y2
3x 2 y
That is, < ε whenever 0 < x2 + y 2 < δ
x2 + y2
But x2 ≤ x2 + y2 since y2 ≥ 0, so
3x 2 y
≤ 3 y = 3 y2 ≤ 3 x2 + y 2
x +y
2 2
3x 2 y ε
− 0 ≤ 3 x 2 + y 2 < 3δ
3 = 3 = ε
x +y
2 2
3
3x 2 y
lim =0
( x , y )→ (0,0 ) x2 + y2
4.0: CONCLUSION
In this unit, you have known several definitions and have worked various examples.
5.0: SUMMARY
In this unit, you have studied the definition of terms and have solved various examples .
6.0: TUTOR-MARKED-ASSIGNMENT
ASSIGNMENT
23
2. Find the limit
3.Bartle, R. G. and Sherbert, D. Introduction to Real Analysis. New York: Wiley, p. 141,
1991.
24
4.Kaplan, W. "Limits and Continuity." §2.4 in Advanced Calculus, 4th ed. Reading, MA:
Addison-Wesley, pp. 82-86, 1992.
25
Unit 3: Continuity of Function of Several Variables.
CONTENT
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Definitions and examples
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Readings
1.0: INTRODUCTION
Just as for functions of one variable, the calculation of limits can be greatly simplified by the
use of properties of limits and by the use of continuity.
The properties of limits listed in Tables 2.14 and 2.15 can be extended to functions of two
variables. The limit of a sum is the sum of the limits, and so on.
Recall that evaluating limits of continuous functions of a single variable is easy. It can be
accomplished by direct substitution because the defining property of a continuous function is
limx→a – f(x) = f(a). Continuous functions of two variables are also defined by the direct
substitution property.
Definition
Let f be a function of two variables defined on a disk with center (a,b). Then f is called
continuous at (a,b) if lim f(x,y) = f(a,b)
( x , y ) → ( a ,b )
2.0: OBJECTIVE
At this unit, you should be able to know the definition of terms
If the domain of f is a set D ⊂ R2, then Definition 12.6 defines the continuity of f at an
interior point (a,b) of D, that is, a point that is contained in a disk Dδ ⊂ D [seek Figure
12.20(a)]. But D may also contain a boundary point, that is, a point (a,b) such that every
disk with center (a,b) contains points in D and also points not in D [see Figure 12.20(b)].
If (a,b) is a boundary of D, then Definition 12.5 is modified so that the last line reads
With this convention, Definition 12.6 also applies when f is defined at a boundary point (a,b)
of D.
26
Finally, we say f is continuous on D if f is continuous at every point (a,b) in D.
The intuitive meaning of continuity is that if the point (x,y) changes by a small amount, then
the value of f(x,y) changes by a small amount. This means that a surface that is the graph of
a continuous function has no holes or breaks.
Using the properties of limits, you can see that sums, differen
differences,
ces, products, and -quotients of
continuous functions are continuous on their domains. Let us use this fact to give examples
of continuous functions.
A polynomial function of two variables (or polynomial, for short) is a sum of terms of the
form cxmyn, where c is a constant and m and n are non-negative
non negative integers. A rational function
is a ratio of polynomials. For instance,
f(x,y) = x4 + 5x3y2 + 6xy4 – 7y + 6
is a polynomial, whereas
2 xy + 1
g(x,y) =
x2 + y2
is a rational function.
Figure 12.20
27
From Definition it can be shown that
These limits show that the functions f(x,y) = x, g(x,y) = y, and h(x,y) = c are continuous.
Since any polynomial can be built up out of the simple functions f, g and h by multiplication
and addition, it follows that all polynomials are continuous on R2. Likewise, any rational
function is continuous on its domain since it is a quotient of continuous functions.
Example 5
Solution
Since f(x,y) = x2y3 – x3y2 + 3x + 2y is a polynomial, it is continuous everywhere, so the limit
can be found by direct substitution:
Example 6
Where is the function
x2 + y2
f(x,y) = Continuous?
x2 + y2
Solution
The function f is discontinuous at (0,0) because it is not defined there. Since f is a rational
function it is continuous on its domain D = {(x,y)(x,y) ≠ (0,0}.
Example 7
Let
28
2 If(x,y) ≠ (0,0)
x − y2
g(x,y) = 2
x + y
2
0
Example 8
Let
2 If(x,y) ≠ (0,0)
3x y
f(x,y) = 2
x + y
2
0
We know f is continuous for (x,y) ≠ (0,0) since it is equal to a rational function there. Also,
from Example 4, we have
3x 2 y
lim f(x,y) = lim = 0 = f(0,0)
( x , y ) → ( a ,b ) ( x , y ) → ( a ,b ) x2 + y2
Example 9
Let
2 If(x,y) ≠ (0,0)
3x y
h(x,y) = 2
x + y
2
17
3x 2 y
lim g(x,y) = lim = 0 ≠ 17 = g(0,0)
( x , y ) → ( a ,b ) ( x , y ) → ( a ,b ) x2 + y2
29
Theorem
If f is continuous at (a,b) and g is a function of a single variable that is continuous at f(a,),
then the composite function h = g o f defined by h(x,y) = g(f(x,y)) is continuous at (a,b).
Example 10
On what set is the function h(x,y) = ln(x2 + y2 – 1) continuous?
Solution
Let f(x,y) = x2 + y2 – 1 and g(t) = ln t. Then
g(f(x,y)) = ln(x2 + y2 – 1) = h(x,y)
So h = g o f. Now f is continuous everywhere since it is a polynomial and g is continuous on
its domain {tt > 0}. Thus, by Theorem 12.7, h is continuous on its domain
Definition
Let f: D ⊂ R3 → R.
Means that for every number ε > 0 there is a corresponding number δ > 0 such that
0< ( x − a ) 2 + ( y − b) 2 + ( z − c ) 2 < δ
30
lim f(x,y,z) = f(a,b,c)
( x , y , z ) → ( a ,b , c )
If we use the vector notation introduced at the end of Section 12.1, then the definitions of a
limit for functions of two or three variables can be written in a single compact form as
follows.
If f: D ⊂ Rn → R, then lim x→a f(x) = L means that for every number ε > 0 there is a
corresponding number δ > 0 such that
Notice that if n = 1, then x = x and a = a, and (12.9) is just the definition of a limit for
functions of a single variable. If n = 2, then x = (x,y), a = (a,b), and x – a =
( x − a) 2 + ( y − b) 2 , so (12.9) becomes Definition 12.5. If n = 3, then x = (x,y,z), a =
(a,b,c), and (12.9) becomes part (a) of Definition 12.8. In each case the definition of
continuity can be written as
4.0: CONCLUSION
In this unit, you have known several definitions and have worked various examples.
5.0: SUMMARY
In this unit, you have studied the definition of terms and have solved various examples . The
following limits lim x = a, lim y = b and lim c = c
( x , y ) → ( a ,b ) ( x , y ) → ( a ,b ) ( x , y ) → ( a ,b )
Show that the functions f(x,y) = x, g(x,y) = y, and h(x,y) = c are continuous. Obviously any
polynomial can be built up out of the simple functions f, g and h by multiplication and
addition, it follows that all polynomials are continuous on R2. Likewise, any rational
function is continuous on its domain since it is a quotient of continuous functions.
6.0: TMA
In Exercises 1 – 3 determine the largest set on which the given function is continuous
x2 + y 2 +1
1. F(x,y) =
x2 + y 2 −1
31
x6 + x3 y 3 + y 6
2. F(x,y) =
x3 + y 3
3. G(x,y) = x+ y − x− y
( x + y + z) r If(x,y,z) ≠ (0,0,0)
f(x,y,z) = 2
x + y + z
2 2
0
continuous on R3?
5. If c ∈ Vn, show that the function f: Rn → R given by f(x) = c.x is continuous on Rn.
f(x) = 1 / (x + 2)
f(x) = 1 / ( x 4 + 6)
f(x) = | x - 5 |
f(x) = (x - 2) / [ (2 x 2 + 2x - 4)(x 4 + 5) ]
5. Evaluate the limit
limxx→ a sin (2x + 5)
6.Show that any function of the form e ax + b is continous everywhere, a and b real numbers.
1.Bartle, R. G. and Sherbert, D. Introduction to Real Analysis. New York: Wiley, p. 141,
1991.
2.Kaplan, W. "Limits and Continuity." §2.4 in Advanced Calculus, 4th ed. Reading, MA:
Addison-Wesley, pp. 82-86, 1992
3. Richard Gill. Associate Professor of Mathematics. Tidewater Community
32
MODULE 2 PARTIAL DERIVATIVES OF FUNCTION OF SEVERAL
VARIABLES
-Unit 1: Derivative
-Unit 2: Partial derivative.
-Unit 3: Application of Partial derivative.
UNIT 1: DERIVATIVE
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 The derivative of a function
3.2 Higher derivative
3.3 Computing derivative
3.4 Derivative of higher dimension
3.0 Conclusion
4.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Readings
1.0 INTRODUCTION
In calculus, a derivative is a measure of how the function changes as the input changes.
Loosely speaking, a derivative can be thought of how much one quantity is changing in
response to changes in some other quantity. For example, the derivative of the position of a
moving object with respect to time, is the object instantaneous velocity.
33
The derivative of a function at a given chosen input value describe the best linear
approximation of the function near that input value. For a real valued function of a single real
variable. The derivative at a point equals the slope of the tangent line to the graph of the
function at that point. In higher dimension, the derivative of a function at a point is linear
transformation called the linearization. A closely related notion is the differential of a
function. The process of finding a derivative is differentiation. The reverse is Integration.
The derivative of a function represents an infinitesimal change in the function with respect to
one of its variables,
df/dx
2.0 OBJECTIVE
Let ƒ be a function that has a derivative at every point a in the domain of ƒ. Because every
point a has a derivative, there is a function that sends the point a to the derivative of ƒ at a.
This function is written f′(x) and is called the derivative function or the derivative of ƒ. The
derivative of ƒ collects all the derivatives of ƒ at all the points in the domain of ƒ.
Sometimes ƒ has a derivative at most, but not all, points of its domain. The function whose
value at a equals f′(a) whenever f′(a) is defined and elsewhere is undefined is also called the
derivative of ƒ. It is still
ill a function, but its domain is strictly smaller than the domain of ƒ.
Using this idea, differentiation becomes a function of functions: The derivative is an operator
whose domain is the set of all functions that have derivatives at every point of their domain
and whose range is a set of functions. If we denote this operator by D, D then D(ƒ) is the
function f′(x). Since D(ƒ)) is a function, it can be evaluated at a point a.. By the definition of
the derivative function, D(ƒ)(aa) = f′(a).
34
The operator D,, however, is not defined on individual numbers. It is only defined on
functions:
Because the output of D is a function, the output of D can be evaluated at a point. For
instance, when D is applied to the squaring function,
Let ƒ be a differentiable function, and let f′(x) be its derivative. The derivative of f′(x) (if it
has one) is written f′′(x) and is called the second derivative of f.. Similarly, the derivative of a
second derivative, if it exists, is written f′′′(x) and is called the third derivative of ƒ. These
repeated derivatives are called higher-order derivatives.
A function ƒ need not have a derivative, for example, if it is not continuous. Similarly, even if
ƒ does have a derivative, it may not have a second derivative. For example, let
35
f′(x) is twice the absolute value function, and it does not have a derivative at zero. Similar
examples show that a function can have k derivatives for any non-negative
negative integer k but no (k
+ 1)-order
order derivative. A function that has k successive derivatives atives is called k times
differentiable.. If in addition the kth
th derivative is continuous, then the function is said to be of
differentiability class Ck. (This is a stronger condition than having k derivatives.) A function
that has infinitely many derivatives
derivativ is called infinitely differentiable.
If ƒ is infinitely differentiable, then this is the beginning of the Taylor series for ƒ.
Inflection Point
A point where the second derivative of a function changes sign is called an inflection
point.At
.At an inflection point, the second derivative may be zero, as in the case of the
=0 of the function y=x3, or it may fail to exist, as in the case of the inflection
inflection point x=0
point x=0 of the function y= =x1/3. At an inflection point, a function switches
swit from being a
convex function to being a concave function or vice versa.
The derivative of a function can, in principle, be computed from the definition by considering
the difference quotient, and computing its limit. In practice,
practice, once the derivatives of a few
simple functions are known, the derivatives of other functions are more easily computed
using rules for obtaining derivatives of more complicated functions from simpler ones.
Most derivative
ivative computations eventually require taking the derivative of some common
functions. The following incomplete list gives some of the most frequently used functions of
a single real variable and their derivatives.
• Derivative power: if
36
eal number, then
where r is any real
and the derivative function is defined only for positive x, not for x = 0.. When r = 0, this rule
implies that f′(x) is zero for x ≠ 0, which is almost the constant rule (stated below).
Trigonometric Functions :
37
In many cases, complicated limit calculations by direct application of Newton's difference
quotient can be avoided using differentiation rules. Some of the most basic rules are the
following.
Sine rule :
for all functions ƒ and g and all real numbers a and b.
Product rule :
Example computation
The derivative of
is
Here the second term was computed using the chain rule and third using the product rule. The
known derivatives of the elementary functions x2, x4, sin(x), ln(x)) and exp(x)
exp( = ex, as well as
the constant 7, were also used.
A vector valued function y(t)) of a real variable sends real numbers to vectors in some vector
space Rn. A vector-valued
valued function can be split up into its coordinate functions y1(t), y2(t), …,
)). This includes, for example, parametric curve in R2
yn(t), meaning that y(t) = (y1(tt), ..., yn(t)).
38
or R3. The coordinate functions are real valued functions, so the above definition of
derivative applies to them. The derivative of y(t)) is defined to be the vector, called the tangent
ta
vector, whose coordinates are the derivatives of the coordinate functions. That is,
Equivalently,
if the limit exists. The subtraction in the numerator is subtraction of vectors, not scalars. If the
derivative of y exists for every value of t, then y′ is another vector valued function.
If e1, …, en is the standard basis for Rn, then y(t) can also be written as y1(t)e1 + … + yn(t)en.
If we assume that the derivative of a vector-valued
vector valued function retains the linearity property,
then the derivative of y(t)) must be
This generalization is useful, for example, if y(t) is the position vector of a particle at time t;
then the derivative y′(t)) is the velocity vector of the particle at time t.
Partial derivative
Suppose that ƒ is a function that depends on more than one variable. For instance,
ƒ can be reinterpreted as a family of functions of one variable indexed by the other variables:
In other words, every value of x chooses a function, denoted fx, which is a function of one real
number. That is,
Once a value of x is chosen, say a, then f(x,y) determines a function fa that sends y to a² + ay
+ y²:
39
In this expression, a is a constant,
constant not a variable, so fa is a function of only one real variable.
Consequently the definition of the derivative for a function of one variable applies:
The above procedure can be performed for any choice of a.. Assembling the derivatives
together into a function gives a function that describes the variation of ƒ in the y direction:
This is the partial derivative of ƒ with respect to y. Here ∂ is a rounded d called the partial
derivative symbol.. To distinguish it from the letter d, ∂ is sometimes pronounced "der",
"del", or "partial" instead of "dee".
In general, the partial derivative of a function ƒ(x1, …, xn) in the direction xi at the point (a1
…, an) is defined to be:
In the above difference quotient, all the variables except xi are held fixed. That choice of
fixed values determines a function of one variable
and, by definition,
In other words,
ds, the different choices of a index a family of one-variable
variable functions just as in
the example above. This expression also shows that the computation of partial derivatives
reduces to the computation of one-variable
one derivatives.
40
Generalizations
The concept of a derivative can be extended to many other settings. The common thread is
that the derivative of a function at a point serves as a linear approximation of the function at
that point.
4.0 CONCLUSION
In this unit, you have known the derivative of a function .Through the derivative of
functions, you have identified higher derivative, and you have solved problems by
computing derivative through the use of this functions. You have also identified
derivative of higher dimension.
5.0 SUMMARY
+ 3xy − 2 tan( y )
2
Evaluate the derivative F(x,y) = x
y sin x
Find the derivative of F(x,y) = cos x
e
REFERENCES
Anton, Howard; Bivens, Irl; Davis, Stephen (February 2, 2005), Calculus: Early
Transcendentals Single and Multivariable (8th ed.), New York: Wiley, ISBN 978-0-471-
47244-5
Apostol, Tom M. (June 1967), Calculus, Vol. 1: One-Variable Calculus with an Introduction
to Linear Algebra, 1 (2nd ed.), Wiley, ISBN 978-0-471-00005-1
• Apostol, Tom M. (June 1969), Calculus, Vol. 2: Multi-Variable Calculus and Linear
Algebra with Applications, 1 (2nd ed.), Wiley, ISBN 978-0-471-00007-5
• Courant, Richard; John, Fritz (December 22, 1998), Introduction to Calculus and
Analysis, Vol. 1, Springer-Verlag, ISBN 978-3-540-65058-4
• Eves, Howard (January 2, 1990), An Introduction to the History of Mathematics (6th ed.),
Brooks Cole, ISBN 978-0-03-029558-4
41
• Larson, Ron; Hostetler, Robert P.; Edwards, Bruce H. (February 28, 2006), Calculus:
Early Transcendental Functions (4th ed.), Houghton Mifflin Company, ISBN 978-0-618-
60624-5
• Spivak, Michael (September 1994), Calculus (3rd ed.), Publish or Perish, ISBN 978-0-
914098-89-8
• Stewart, James (December 24, 2002), Calculus (5thh ed.), Brooks Cole, ISBN 978-0-534-
39339-7
• Thompson, Silvanus P. (September 8, 1998), Calculus Made Easy (Revised, Updated,
Expanded ed.), New York: St. Martin's Press, ISBN 978-0-312-18548--0
1.0 INTRODUCTION
42
A graph of z = x2 + xy + y2. For the partial derivative at (1, 1, 3) that leaves y constant, the
corresponding tangent line is parallel to the xz-plane.
The graph of this function defines a surface in Euclidean space.. To every point on this
surface, there are an infinite number of tangent lines.. Partial differentiation is the act of
choosing one of these lines and finding its slope.. Usually, the lines of most interest are those
that are parallel to the xz-plane,
plane, and those that are parallel to the yz-plane.
To find the slope of the line tangent to the function at (1, 1, 3) that is parallel to the xz-plane,
the y variable is treated as constant. The graph and this plane are shown on the right. On the
graph below it, we see the way the function looks on the plane pla y = 1. By finding the
derivative of the equation while assuming that y is a constant, the slope of ƒ at the point (x, y,
z) is found to be:
at the point. (1, 1, 3). That is, the partial derivative of z with respect to x at (1, 1, 3) is 3
2.0: OBJECTIVES
After studying this, you should be able to :
• define Partial derivative
43
• know the geometric interpretation
• identify anti derivative analogue
• solve problems on partial derivative for function of several variables
• identify higher order derivatives
3.0 MAIN CONTENT
Let us consider a function
1) u = f(x, y, z, p, q, ... )
of several variables. Such a function can be studied by holding all variables except one
constant and observing its variation with respect to one single selected variable. If we
consider all the variables except x to be constant, then
represents the partial derivative of f(x, y, z, p, q, ... ) with respect to x (the hats indicating
variables held fixed). The variables held fixed are viewed as parameters.
Example 1. The partial derivative of 3x2y + 2y2 with respect to x is 6xy. Its partial derivative
with respect to y is 3x2 + 4y.
The partial derivative of a function z = f(x, y, ...) with respect to the variable x is commonly
written in any of the following ways:
Its derivative with respect to any other variable is written in a similar fashion.
44
Geometric interpretation. The geometric interpretation of a partial derivative is the same as
that for an ordinary derivative. It represents the slope of the tangent to that curve represented
by the function at a particular point P. In the case of a function of two variables
z = f(x, y)
Examples 2
The volume V of a cone depends on the cone's height h and its radius r according to the
formula
45
which represents the rate with which a cone's volume changes if its radius is varied
and its height is kept constant. The partial derivative with respect to h is
which represents the rate with which the volume changes if its height is varied and its radius
radi
is kept constant.
and
The difference between the total and partial derivative is the elimination of indirect
dependencies between variables in partial derivatives.
If (for some arbitrary reason) the cone's proportions have to stay the same, and the height and
radius are in a fixed ratio k,
Equations involving an unknown function's partial derivatives are called partial differential
equations and are common in physics, engineering, and other sciences and applied
disciplines.
Notation
First-order
order partial derivatives:
46
Second-order
order partial derivatives:
Higher-order
order partial and mixed derivatives:
When dealing with functions of multiple variables, some of these variables may be related to
each other, and it may be necessary to specify explicitly which variables are being held
constant. In fields such as statistical mechanics,
mechanics, the partial derivative of f with respect to x,
holding y and z constant, is often expressed as
There is a concept for partial derivatives that is analogous to anti derivatives for regular
derivatives. Given a partial derivative, it allows for the partial recovery of the original
function.
Consider the example of . The "partial" integral can be taken with respect to
x (treating y as constant, in a similar manner to partial derivation):
Here, the "constant" of integration is no longer a constant, but instead a function of all the
variables of the original function except x.. The reason for this is that all the other variables
are treated as constant when taking the partial derivative, so any function
functio which does not
involve x will disappear when taking the partial derivative, and we have to account for this
when we take the antiderivative. The most general way to represent this is to have the
"constant" represent an unknown function of all the other variables. Thus the set of functions
47
x2 + xy + g(y), where g is any one-argument
one argument function, represents the entire set of functions in
variables x,y that could have produced the x-partial derivative 2x+y.
If all the partial derivatives of a function are known (for example, with the gradient), then the
antiderivatives can be matched via the above process to reconstruct the original function up to
a constant
Example 3
find the partial derivatives of f with respect to x and y and compute the rates of change of the
function in the x and y directions at the point (-1,2).
(
Initially we will not specify the values of x and y when we take the derivatives; we will just
remember which one we are going to hold constant
constant while taking the derivative. First, hold y
fixed and find the partial derivative of f with respect to x:
Second, hold x fixed and find the partial derivative of f with respect to y:
Now, plug in the values x=-11 and y=2 into the equations. We obtain f_x(-1,2)=10
f_x( and f_y(-
1,2)=28.
We can of course take partial derivatives of functions of more than two variables. If f is a
function of n variables x_1, x_2, ..., x_n, then to take the partial derivative of f with respect to
x_i we hold all variables besides x_i constant and take the derivative.
Example 4
we hold x, y, and z constant and take the derivative with respect to the remaining variable t.
The result is
48
Interpretation
∂x
∂f
Is the rate at which f changes as y changes, for a fixed (constant) x.
∂y
Higher Order Partial Derivatives
∂2f ∂ ∂f
is defined to be
∂x2 ∂x ∂x
Similarly,
∂2f ∂ ∂f
is defined to be
∂y2 ∂y ∂y
∂2f ∂ ∂f
is defined to be
∂y∂x ∂y ∂x
∂2f ∂ ∂f
is defined to be
∂x∂y ∂x ∂y
The above second order partial derivatives can also be denoted by fxx, fyy, fxy, and fyx
respectively.
The last two are called mixed derivatives and will always be equal to each other when all
the first order partial derivatives are continuous.
Some examples of partial derivatives of functions of several variables are shown below,
variable as we did in Calculus I.
49
Example 1 Find all of the first
first order partial derivatives for the following functions.
(a)
(b)
(c)
(d)
Solution
(a)
Let’s first take the derivative with respect to x and remember that as we do so all the y’s will
be treated as constants. The partial derivative with respect to x is,
Notice that the second and the third term differentiate to zero in this case. It should be clear
why the third term differentiated to zero. It’s a constant and we know that constants always
differentiate to zero. This is also the reason that the second term differentiated to zero.
Remember that since we are differentiating with respect to x here we are going to treat all y’s
as constants. That means that terms that only involve
i y’s
’s will be treated as constants and
hence will differentiate to zero.
Now, let’s take the derivative with respect to y. In this case we treat all x’s
x as constants and
so the first term involves only x’s and so will differentiate to zero, just as the third term will.
Here is the partial derivative with respect to y.
(b)
With this function we’ve got three first order derivatives to compute. Let’s do the partial
50
derivative with respect to x first. Since we are differentiating with respect to x we will treat
all y’s and all z’s
’s as constants. This means that the second and fourth terms will differentiate
to zero since they only involve y’s and z’s.
This first term contains both x’s and y’s and so when we differentiate with respect to x the y
will be thought of as a multiplicative constant and so the first term will be differentiated just
as the third term will be differentiated.
Let’s
’s now differentiate with respect to y. In this case all x’s and z’s ’s will be treated as
constants. This means the third term will differentiate to zero since it contains only x’s while
the x’s
’s in the first term and the z’s
’s in the second term will be treated as multiplicative
constants. Here is the derivative with respect to y.
Finally, let’s get the derivative with respect to z. Since only one of the terms involve z’s this
will be the only non-zero
zero term in the
th derivative. Also, the y’s
’s in that term will be treated as
multiplicative constants. Here is the derivative with respect to z.
(c)
With this one we’ll not put in the detail of the first two. Before taking the derivative let’s
rewrite the function a little to help us with the differentiation process.
Now, the fact that we’re using s and t here instead of the “standard” x and y shouldn’t be a
problem. It will work the same way. Here are the two derivatives
ivatives for this function.
51
Remember how to differentiate natural logarithms.
(d)
Now, we can’t forget the product rule with derivatives. The product rule will work the same
way here as it does with functions of one variable. We will just need to be careful to
remember which variable we are differentiating with respect to.
Let’s start out by differentiating with respect to x. In this case both the cosine and the
exponential contain x’s
’s and so we’ve really got a product of two functions involving x’s and
so we’ll need to product rule this up. Here is the derivative with respect to x.
52
Do not forget the chain rule for functions of one variable. We will be looking at the chain
rule for some more complicated expressions for multivariable functions in a latter section.
However, at this point we’re treating all the y’s as constants and soo the chain rule will
continue to work as it did back in Calculus I.
Now, let’s differentiate with respect to y. In this case we don’t have a product rule to worry
about since
ce the only place that the y shows up is in the exponential. Therefore, since x’s are
considered to be constants for this derivative, the cosine in the front will also be thought of as
a multiplicative constant. Here is the derivative with respect to y.
Example 2 Find all of the first order partial derivatives for the following functions.
(a)
(b)
(c)
53
Solution
(a)
We also can’t forget about the quotient rule. Since there isn’t too much to this one, we will
simply give the derivatives.
(b)
Now, we do need to be careful however to not use the quotient rule when it doesn’t need to
be used. In this case we do have a quotient, however, since the x’s and y’s ’s only appear in the
numerator and the z’s
’s only appear in the denominator this really isn’t
isn’t a quotient rule problem.
Let’s do the derivatives with respect to x and y first. In both these cases the z’s are constants
and so the denominator in this is a constant and so we don’t really need to worry too much
about it. Here are the derivatives for
f these two cases.
54
Now, in the case of differentiation with respect to z we can avoid the quotient rule with a
quick rewrite of the function. Here is the rewrite as well as the derivative with respect to z.
We went ahead and put the derivative back into the “original” form just so we could say that
we did. In practice you probably don’t really need to do that.
(c)
In this last part we are just going to do a somewhat messy chain rule problem. However, if
you had a good background in Calculus I chain rule this shouldn’t be all that difficult of a
problem. Here are the two derivatives,
55
So, there are some examples of partial derivatives. Hopefully you will agree that as long as
we can remember to treat the other variables as constants these work in exactly the same
manner that derivatives of functions of one variable do. So, if you can do Calculus I
derivative you shouldn’t have too much difficulty in doing basic partial derivatives.
There is one final topic that we need to take a quick look at in this section, implicit
differentiation. Before getting into implicit differentiation for multiple variable functions
let’s first remember how implicit differentiation works for functions of one variable.
Solution
Remember that the key to this is to always think of y as a function of x,, or and so
whenever we differentiate a term involving y’s with respect to x we will really need to use the
chain rule which will mean that we will add on a to that term.
56
The final step is to solve for
Now, we did this problem because implicit differentiation works in exactly the same manner
with functions of multiple variables. If we have a function in terms of three variables x, y,
and z we will assume that z is in fact a function of x and y. In other
er words, . Then
(a)
(b)
Solution
(a)
Let’s start with finding We first will differentiate both sides with respect to x and
57
.
Now we’ll do the same thing for except this time we’ll need to remember to add on a
whenever we differentiate a z.
(b)
We’ll do the same thing for this function as we did in the previous part. First let’s find
.
Don’t forget to do the chain rule on each of the trig functions and when we are differentiating
the inside function on the cosine we will need to also use the product rule. Now let’s solve
for
58
Now let’s take care of . This one will be slightly
slightly easier than the first one.
4.0 CONCLUSION
In this unit, you have defined a Partial derivative of a function of several variables. You have
used the partial derivative of a function of several variable to know the geometric
interpretation of a function and anti derivative analogue has been identified.
identif You have
Solved problems on partial derivative for function of several variables and identified higher
order derivatives.
5.0 SUMMARY
In this unit, you have studied the following:
the definition of Partial derivative of functions of several variable
the geometric interpretation of partial derivative of functions of several variables
59
the identification of antiderivative analogue of partial derivative of functions of several
variable
Solve problems on partial derivative for function of several variables
The identification of higher order derivatives of functions of several variables
TUTOR MARKED ASSIGNMENT
f(x , y) = x2 y + 2x + y
f(x , y) = x ex y
f(x , y) = ln ( x2 + 2 y)
f(x , y) = y x2 + 2 y
A. f(x , y) = x ex + y
B. f(x , y) = ln ( 2 x + y x)
C. f(x , y) = x sin(x - y)
7.0 REFERENCE
Jeff Miller (2009-06-14). "Earliest Uses of Symbols of Calculus". Earliest Uses of Various
Mathematical Symbols. http://jeff560.tripod.com/calculus.html. Retrieved 2010-02-20.
Abramowitz, M. and Stegun, I. A. (Eds.). Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables, 9th printing. New York: Dover, pp. 883-885,
1972.
60
Fischer, G. (Ed.). Plate 121 in Mathematische Modelle aus den Sammlungen von
Universitäten und Museen, Bildband. Braunschweig, Germany: Vieweg, p. 118, 1986.
Thomas, G. B. and Finney, R. L. §16.8 in Calculus and Analytic Geometry, 9th ed. Reading,
MA: Addison-Wesley, 1996.
Wagon, S. Mathematica in Action. New York: W. H. Freeman, pp. 83-85, 1991.
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
The partial derivative of f with respect to x is the derivative of f with respect to x, treating
all other variables as constant.
Similarly, the partial derivative of f with respect to y is the derivative of f with respect to y,
treating all other variables as constant, and so on for other variables. The partial derivatives
61
are written as ∂f/∂x, ∂∂f/∂y,
∂y, and so on. The symbol "∂"
" " is used (instead of "d")
" to remind us
that there is more than one variable, and that we are holding the other variables fixed.
OBJECTIVES
In this Unit, you should be able to:
Apply partial derivative of functions of several variable in Chain rule.
Apply partial derivative of functions of several variable in Curl (Mathematics)
Apply partial derivative of functions of several variable in Derivatives
Apply partial derivative of functions of several variable in D’ Alamber operator
Apply partial
tial derivative of functions of several variable in Double integral
Apply partial derivative of functions of several variable in Exterior derivative
Apply partial derivative of function of several variable in Jacobian matrix and determinant
MAIN CONTENT
Chain rule
The chain rule can be applied to composites of more than two functions. To take the
derivative of a composite of more than two functions,
functions, notice that the composite of f, g, and h
(in that order) is the composite of f with g ∘ h.. The chain rule says that to compute the
derivative of f ∘ g ∘ h,, it is sufficient to compute the derivative of f and the derivative of g ∘ h.
The derivative of f can be calculated directly, and the derivative of g ∘ h can be calculated by
applying the chain rule again.
62
The chain rule says that the derivative of their composite at the point x = a is:
or for short,
Another way of computing this derivative is to view the composite function f ∘ g ∘ h as the
composite of f ∘ g and h.. Applying the chain rule to this situation gives:
( ∘ g) ∘ h = f
This is the same as what was computed above. This should be expected because (f
∘ (g ∘ h).
63
To compute the derivative of 1/g(x),
1/ notice that it is the composite of g with the reciprocal
function, that is, the function that sends x to 1/x.. The derivative of the reciprocal function is
−1/x2. By applying the chain rule, the last expression becomes:
Suppose that y = g(x) has an inverse function. Call its inverse function f so that we have x =
f(y).
). There is a formula for the derivative of f in terms of the derivative
erivative of g. To see this, note
that f and g satisfy the formula
f(g(x)) = x.
f'(g(x))g'(x) = 1.
For example, consider the function g(x) = ex. It has an inverse which is denoted f(y) = ln y.
Because g′(x) = ex, the above formula says that
This formula is true whenever g is differentiable and its inverse f is also differentiable. This
formula can fail when one of these conditions is not true. For example, consider g(x) = x3. Its
inverse is f(y) = y1/3, which is not differentiable at zero. If we attempt to use the above
formula to compute the derivative of f at zero, then we must evaluate 1/gg′(f(0)). f(0) = 0 and
g′(0)
(0) = 0, so we must evaluate 1/0, which is undefined. Therefore the formula fails in this
case. This is not surprising because f is not differentiable at zero.
64
Higher derivatives
Faà di Bruno's formula generalizes the chain rule to higher derivatives. The first few
derivatives are
Example
and
Curl (mathematics)
In vector calculus, the curl (or rotor) is a vector operator that describes the infinitesimal
rotation of a 3-dimensional vector field.
field. At every point in the field, the curl is represented by
a vector. The attributes of this vector (length and direction) characterize the rotation at that
point.
The curl of a vector field F,, denoted curl F or ∇×F,, at a point is defined in terms of its
projection onto various lines through the point. If is any unit vector, the projection of the
curl of F onto is defined to be the limiting value of a closed line integral in a plane
orthogonal to as the path used in the integral becomes
becomes infinitesimally close to the point,
divided by the area enclosed.
65
As such, the curl operator maps C1 functions from R3 to R3 to C0 functions from R3 to R3.
The above formula means that the curl of a vector field is defined as the the infinitesimal area
density of the circulation of that field. To this definition fit naturally (i) the Kelvin-Stokes
theorem,, as a global formula corresponding to the definition, and (ii) the following "easy to
memorize" definition of the curl in orthogonal curvilinear coordinates, e.g. in cartesian
coordinates, spherical, or cylindrical, or even elliptical or parabolical coordinates:
If (x1,x2,x3) are the Cartesian coordinates and (u1,u2,u3) are the curvilinear coordinates, then
Usage
In practice, the above definition is rarely used because in virtually all cases, the curl operator
can be applied using some set of curvilinear coordinates,, for which simpler representations
have been derived.
The notation ∇×F has its origins in the similarities to the 3 dimensional cross product,
product and it
is useful as a mnemonic in Cartesian coordinates if we take ∇ as a vector differential operator
del.. Such notation involving operators is common in physics and algebra. algebra If certain
coordinate systems are used, for instance, polar-toroidal
polar toroidal coordinates (common in plasma
physics) using the notation ∇×
×F will yield an incorrect result.
Expanded in Cartesian coordinates (see: Del in cylindrical and spherical coordinates for
spherical and cylindrical coordinate representations), ∇×F is, for F composed of [F
[ x, Fy, Fz]:
66
where i, j, and k are the unit vectors for the x-, y-, and z-axes,
axes, respectively. This expands as
follows:[4]
Although expressed in terms of coordinates, the result is invariant under proper rotations of
the coordinate axes but the result inverts under reflection.
where ek are the coordinate vector fields. Equivalently, using the exterior derivative,
derivative the curl
can be expressed as:
Directional derivative
67
is the function defined by the limit
(See other notations below.) If the function f is differentiable at , then the directional
derivative exists along any unit vector and one has
where the on the right denotes the gradient and is the Euclidean inner product.
product At any
point , the directional derivative of f intuitively represents the rate of change in f along at
the point .
or in case f is differentiable at ,
Notation
Several important results in continuum mechanics require the derivatives of vectors with
respect to vectors and of tensors with respect to vectors and tensors.[1] The directional
directive provides a systematic way of finding these derivatives.
The definitions of directional derivatives for various situations are given below. It is assumed
that the functions
tions are sufficiently smooth that derivatives can be taken.
68
Derivatives of scalar valued functions of vectors
Let be a real valued function of the vector . Then the derivative of with respect
to (or at ) in the direction is the vector defined as
Properties:
1) If then
2) If then
3) If then
Let be a vector valued function of the vector . Then the derivative of with
respect to (or at ) in the direction is the second order tensor defined as
Properties:
1) If then
2) If then
3) If then
69
Derivatives of scalar valued functions of second-order
second tensors
Let be a real valued function of the second order tensor . Then the derivative of
with respect to (or at ) in the direction is the second order tensor defined as
Properties:
1) If then
2) If then
3) If then
Let be a second order tensor valued function of the second order tensor . Then the
derivative of with respect to (or at ) in the direction is the fourth order tensor
defined as
Properties:
1) If then
2) If then
70
3) If then
4) If then
Exterior derivative
If ƒ is a smooth function, then the exterior derivative of ƒ is the differential of ƒ. That is, dƒ is
the unique one-form such that for every smooth vector field X, dƒ(X) = Xƒ, Xƒ where Xƒ is the
directional derivative of ƒ in the direction of X. Thus the exterior derivative
vative of a function (or
0-form) is a one-form.
Alternatively, one can work entirely in a local coordinate system (xx1,...,xn). First, the
coordinate differentials dx1,...,dx
,...,d n form a basic set of one-forms
forms within the coordinate chart.
,..., k) with 1 ≤ ip ≤ n for 1 ≤ p ≤ k,, the exterior derivative of a k-
Given a multi-index I = (i1,...,i
form
over Rn is defined as
71
For general k-forms ω = ΣI fI dxI (where the components of the multi-indexindex I run over all the
values in {1, ..., n}),
}), the definition of the exterior derivative is extended linearly. Note that
whenever i is one of the components of the multi-index
multi I then dxi∧dxxI = 0 (see wedge
product).
The definition of the exterior derivative in local coordinates follows from the preceding
definition. Indeed, if ω = ƒI dxxi1∧...∧dxik, then
Invariant formula
Alternatively, an explicit formula can be given for the exterior derivative of a k-form ω, when
paired with k+1
+1 arbitrary smooth vector fields V1,V2, ..., Vk:
where [Vi,Vj] denotes Lie bracket and the hat denotes the omission of that element:
Examples
72
The last formula follows easily from the properties of the wedge product.
product Namely,
.
D'Alembert operator
Applications
73
where Aµ is the electromagnetic four-potential.
four
Green's function
where is the Dirac delta function and and are two points in Minkowski space.
Explicitly we have
Double Integral
The double integral of f(x, y) over the region R in the xy-plane
xy is defined as
f(x, y) dx dy
R
• The following figure illustrates this volume (in the case that the graph of f is above
the region R).
74
•
d b
f(x, y) dx dy = f(x, y) dx dy
R c a
b d
= f(x, y) dy dx
a c
• If R is the region a x b and c(x) y d(x) (see figure below) then we integrate
over R according to the following equation.
b d(x)
f(x, y) dx dy = f(x, y) dy dx
R a c(x)
75
2
JACOBIAN MATRIX
The Jacobian of a function describes the orientation of a tangent plane to the function at a given point. In this way,
the Jacobian generalizes the gradient of a scalar valued function of multiple variables which itself generalizes the
derivative of a scalar-valued
valued function of a scalar. Likewise, the Jacobian can also be thought of as describing the
amount of "stretching" that a transformation imposes. For example,
e if (x2,y2) = f(x1,y1) is used to transform an image,
the Jacobian of f, J(x1,y1) describes how much the image in the neighborhood of (x1,yy1) is stretched in the x and y
directions.
If a function is differentiable at a point, its derivative is given in coordinates by the Jacobian, but a function doesn't
need to be differentiable for the Jacobian to be defined, since only the partial derivatives are required to exist.
The importance of the Jacobian lies in the fact that it represents the best linear approximation to a differentiable
function near a given point. In this sense, the Jacobian is the derivative of a multivariate function.
If p is a point in Rn and F is differentiable at p, then its derivative is given by JF(p). ). In this case, the linear map
described by JF(p) is the best linear approximation of F near the point p,, in the sense that
The Jacobian of the gradient has a special name: the Hessian matrix,, which in a sense is the "second
" derivative" of
the scalar function of several variables in question.
Inverse
It follows that the (scalar) inverse of the Jacobian determinant of a transformation is the Jacobian determinant of the
inverse transformation.
Uses
Dynamical systems
Consider a dynamical system of the form x' wise) time derivative of x, and F : Rn
x = F(x), where x' is the (component-wise)
n
→ R is continuous and differentiable. If F(x0) = 0, then x0 is a stationary point (also called a fixed point). The
behavior of the system near a stationary point is related to the eigenvalues of JF(xx0), the Jacobian of F at the
stationary point.[1] Specifically, if the eigenvalues all have a negative real part, then the system is stable in the
76
operating point, if any eigenvalue has a positive real part, then the point is unstable.
Newton's method
A system of coupled nonlinear equations can be solved iteratively by Newton's method. This method uses the
Jacobian matrix of the system of equations.
if nargin == 2
−5
tol = 10 ;
end
while 1
% if x and f(x) are row vectors, we need transpose operations here
y = x' - jacob(f, x)\f(x)'; % get the next point
if norm(f(y))<tol % check error tolerate
s = y';
return;
end
x = y';
end
k = length(x);
j = zeros(k, k);
for m = 1: k
x2 = x;
x2(m) =x(m)+0.001;
j(m, :) = 1000*(f(x2)-f(x)); % partial derivatives in m-th row
end
Jacobian determinant
If m = n, then F is a function from n-space to n-space and the Jacobian matrix is a square matrix. We can then form
its determinant, known as the Jacobian determinant. The Jacobian determinant is sometimes simply called "the
Jacobian."
The Jacobian determinant at a given point gives important information about the behavior of F near that point. For
instance, the continuously differentiable function F is invertible near a point p ∈ Rn if the Jacobian determinant at p
is non-zero. This is the inverse function theorem. Furthermore, if the Jacobian determinant at p is positive, then F
preserves orientation near p; if it is negative, F reverses orientation. The absolute value of the Jacobian determinant
at p gives us the factor by which the function F expands or shrinks volumes near p; this is why it occurs in the
general substitution rule.
Uses
77
The Jacobian determinant is used when making a change of variables when evaluating a multiple integral of a
function over a region within its domain. To accommodate for the change of coordinates the magnitude of the
Jacobian determinant arises as a multiplicative
multiplicative factor within the integral. Normally it is required that the change of
coordinates be done in a manner which maintains an injectivity between the coordinates that determine the domain.
domain
The Jacobian determinant, as a result, is usually well defined.
Examples
Example 1. The transformation from spherical coordinates (r, θ, φ) to Cartesian coordinates (x1, x2, x3) , is given by
the function F : R+ × [0,π] × [0,2π) → R3 with components:
The determinant is r2 sin θ.. As an example, since dV = dx1 dx2 dx3 this determinant implies that the differential
volume element dV = r2 sin θ dr dθ dϕ ϕ.. Nevertheless this determinant varies with coordinates. To avoid any
[2]
variation the new coordinates can be defined as
a Now the determinant
equals to 1 and volume element becomes .
is
78
This example shows that the Jacobian need not be a square matrix.
Example 3.
The Jacobian determinant is equal to r.. This shows how an integral in the Cartesian coordinate system is transformed
into an integral in the polar coordinate system:
system
is
From this we see that F reverses orientation near those points where x1 and x2 have the same sign; the function is
locally invertible everywhere except near points where x1 = 0 or x2 = 0. Intuitively, if you start with a tiny object
around the point (1,1,1) and apply F to that object, you will get an object set with approximately 40 times the volume
of the original one
79
CONCLUSION
In this unit you have applied partial derivative of functions of several variable to solve chain rule and curl
(mathematics) . You have also applied partial derivative of functions of several variable solve derivatives and D’
Alamber operator. You have applied partial derivative of functions of several variable in Double integral and
Exterior derivative. You also used partial derivative of function of several variable in Jacobian matrix and
determinant.
SUMMARY
In this unit, you have studied the :
Application of partial derivative of functions of several variable in Chain rule.
Application of partial derivative of functions of several variable in Curl (Mathematics)
Application of partial derivative of functions of several variable in Derivatives
Application of partial derivative of functions of several variable in D’ Alamber operator
Application of partial derivative of functions of several variable in Double integral
Application of partial derivative
ive of functions of several variable in Exterior derivative
Application of partial derivative of function of several variable in Jacobian matrix and determinant
xy
b. F(x,y) = e
References
80
J. E. Marsden and T. J. R. Hughes, 2000, Mathematical Foundations of Elasticity, Dover.
Hernandez Rodriguez and Lopez Fernandez, A Semiotic Reflection on the Didactics of the
Chain Rule, The Montana Mathematics Enthusiast, ISSN 1551-3440, Vol. 7, nos.2&3,
pp.321–332.
Apostol, Tom (1974). Mathematical analysis (2nd ed. ed.). Addison Wesley. Theorem 5.5.
Flanders, Harley (1989). Differential forms with applications to the physical sciences. New
York: Dover Publications. pp. 20. ISBN 0-486-66169-5.
Conlon, Lawrence (2001). Differentiable manifolds. Basel, Switzerland: Birkhäuser. pp. 239.
ISBN 0-8176-4134-3.
Retrieved from
"http://en.wikipedia.org/w/index.php?title=Exterior_derivative&oldid=472668495"
D.K. Arrowsmith and C.M. Place, Dynamical Systems, Section 3.3, Chapman & Hall,
London, 1992. ISBN 0-412-39080-9.
Taken from http://www.sjcrothers.plasmaresources.com/schwarzschild.pdf - On the
Gravitational Field of a Mass Point according to Einstein’s Theory by K. Schwarzschild -
arXiv:physics/9905030 v1 (text of the original paper, in Wikisource
MODULE 3 TOTAL DERIVATIVES OF FUNCTION OF SEVERAL VARIABLES
-Unit 1:Derivative
-Unit 2: Total derivative.
-Unit 3:Application of Total derivative.
UNIT 1 : DERIVATIVE
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
Solve directional derivatives
81
Use derivative to solve Total derivative, total differential and Jacobian matrix
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked
Marked Assignment
7.0 References/Further Readings
Introduction
This article is an overview of the term as used in calculus. For a less technical overview of
the subject, see Differential calculus.
calculus For other uses, see Derivative (disambiguation).
(disambiguation)
The graph of a function, drawn in black, and a tangent line to that function, drawn in red. The
slope of the tangent line is equal to the derivative of the function at the marked point.
The derivative of a function at a chosen input value describes the best linear approximation of
thee function near that input value. For a real-valued function of a single real variable, the
derivative at a point equals the slope of the tangent line to the graph of the function at that
point. In higher dimensions, the derivative of a function at a point is a linear transformation
called the linearization.[1] A closely related notion is the differential
rential of a function.
function
The process of finding a derivative is called differentiation.. The reverse process is called
antidifferentiation. The fundamental theorem of calculus states that antidifferentiation is the
same as integration.. Differentiation and integration
integration constitute the two fundamental operations
in single-variable calculus.
OBJECTIVES
82
Solve directional derivatives
Use derivative to solve Total derivative, total differential and Jacobian matrix
Main content
Directional derivatives
valued function on Rn, then the partial derivatives of ƒ measure its variation in
If ƒ is a real-valued
the direction of the coordinate axes. For example, if ƒ is a function of x and y, then its partial
derivatives measure the variation in ƒ in the x direction and the y direction. They do not,
however, directly measure the variation of ƒ in any other direction, such as along the diagonal
line y = x.. These are measured using directional derivatives. Choose a vector
In some cases it may be easier to compute or estimate the directional derivative after
changing the length of the vector. Often this is done to turn the problem into the computation
of a directional
al derivative in the direction of a unit vector. To see how this works, suppose
that v = λu. Substitute h = k/λ
/λ into the difference quotient. The difference quotient becomes:
This is λ times the difference quotient for the directional derivative of f with respect to u.
Furthermore, taking the limit as h tends to zero is the same as taking the limit as k tends to
zero because h and k are multiples of each other. Therefore Dv(ƒ) = λDu(ƒ). Because of this
rescaling property, directional derivatives are frequently
frequently considered only for unit vectors.
If all the partial derivatives of ƒ exist and are continuous at x,, then they determine the
directional derivative of ƒ in the direction v by the formula:
This is a consequence of the definition of the total derivative.. It follows that the directional
derivative is linear in v,, meaning that Dv + w(ƒ) = Dv(ƒ) + Dw(ƒ).
The same definition also works when ƒ is a function with values in Rm. The above definition
is applied to each component of the vectors. In this case, the directional derivative is a vector
in Rm.
83
Total derivative, total differential and Jacobian matrix
mat
When ƒ is a function from an open subset of Rn to Rm, then the directional derivative of ƒ in a
chosen direction is the best linear approximation to ƒ at that point and in that direction. But
when n > 1, no single directional derivative can give a complete
complete picture of the behavior of ƒ.
The total derivative, also called the (total)
( differential,, gives a complete picture by
considering all directions at once.
onc That is, for any vector v starting at a, the linear
approximation formula holds:
If n and m are both one, then the derivative ƒ′(a)) is a number and the expression ƒ′(a)v is
the product of two numbers. But in higher dimensions, it is impossible for ƒ′(a) to be a
number. If it were a number, then ƒ′(a)v would be a vector in Rn while the other terms
would be vectors in Rm, and therefore
therefore the formula would not make sense. For the linear
approximation formula to make sense, ƒ′(a)) must be a function that sends vectors in Rn to
vectors in Rm, and ƒ′(a)v must denote this function evaluated at v.
Notice that if we choose another vector w,, then this approximate equation determines another
approximate equation by substituting w for v.. It determines a third approximate equation by
substituting both w for v and a + v for a.. By subtracting these two new equations, we get
If we assume that v is small and that the derivative varies continuously in a, then ƒ′(a + v)
′(a), and therefore the right-hand
is approximately equal to ƒ′( hand side is approximately zero. The
left-hand
hand side can be rewritten in a different way using the linear approximation formula with
v + w substituted for v.. The linear approximation formula implies:
This suggests that ƒ′(a) is a linear transformation from the vector space Rn to the vector
space Rm. In fact, it is possible to make this a precise derivation by measuring the error in the
approximations.
proximations. Assume that the error in these linear approximation formula is bounded by a
constant times ||v||,
||, where the constant is independent of v but depends continuously on a.
Then, after adding an appropriate error term, all of the above approximate equalities can be
rephrased as inequalities. In particular, ƒ′(a)) is a linear transformation up to a small error
term. In the limit as v and w tend to zero, it must therefore be a linear transformation. Since
84
we define the total derivative by taking a limit as v goes to zero, ƒ′(aa) must be a linear
transformation.
In one variable, the fact that the derivative is the best linear approximation is expressed by
the fact that it is the limit of difference quotients.
quotients. However, the usual difference quotient does
not make sense in higher dimensions because it is not usually possible to divide vectors. In
particular, the numerator and denominator of the difference quotient are not even in the same
vector space: The numerator ator lies in the codomain Rm while the denominator lies in the
n
domain R . Furthermore, the derivative is a linear transformation, a different type of object
from both the numerator and denominator. To make precise the idea that ƒ′(a) ƒ is the best
linear approximation,
pproximation, it is necessary to adapt a different formula for the one-variable
one
derivative in which these problems disappear. If ƒ : R → R,, then the usual definition of the
derivative may be manipulated to show that the derivative of ƒ at a is the unique number n
ƒ′(a) such that
This is equivalent to
because the limit of a function tends to zero if and only if the limit of the absolute value of
the function tends to zero. This last formula can be adapted to the many-variable
many variable situation by
replacing the absolute
solute values with norms.
The definition of the total derivative of ƒ at a,, therefore, is that it is the unique linear
transformation ƒ′(a) : Rn → Rm such that
Here h is a vector in Rn, so the norm in the denominator is the standard length on Rn.
However, ƒ′(a)h is a vector in Rm, and the norm in the numerator is the standard length on
Rm. If v is a vector starting at a, then ƒ′(a)v is called the pushforward of v by ƒ and is
sometimes written ƒ*v.
If the total derivative exists at a,, then all the partial derivatives and directional derivatives of
′(a)v is the directional derivative of ƒ in the direction v. If we
ƒ exist at a, and for all v, ƒ′
write ƒ using coordinate functions, so that ƒ = (ƒ1, ƒ2, ..., ƒm), then the total derivative can be
expressed using the partial derivatives as a matrix.. This matrix is called the Jacobian matrix
of ƒ at a:
85
The existence of the total derivative ƒ′(a)) is strictly stronger than the existence of all the
partial derivatives, but if the partial derivatives exist and are continuous, then the total
derivative exists, is given by the Jacobian, and depends continuously on a.
The definitionn of the total derivative subsumes the definition of the derivative in one variable.
That is, if ƒ is a real-valued
valued function of a real variable, then the total derivative exists if and
only if the usual derivative exists. The Jacobian matrix reduces to a 1×1 1×1 matrix whose only
entry is the derivative ƒ′(x).). This 1×1 matrix satisfies the property that ƒ(a + h) − ƒ(a) −
ƒ′(a)h is approximately zero, in other words that
The total derivative of a function does not give another function in the same way as the one-
one
variable case. This is because the total derivative of a multivariable function has to record
much more information thann the derivative of a single-variable
single variable function. Instead, the total
derivative gives a function from the tangent bundle of the source to the tangent bundle of the
target.
Conclusion
In this unit, you have used derivative to solve problems on directional derivatives and
have also solve problems on total derivative,total differentiation and Jacobian matrix.
Summary
In this unit you have studied :
Solve directional derivatives
Use derivative to solve problems on total derivative, total differentiation and Jacobian matrix.
Tutor-Marked Assignment
1.Evaluate the derivative of F(x,y,z) = 3( x + y ) sin( z )
2 2
86
3
2.Find the derivative of F(x,y,z) = xy + z 4
4
x +y z + sin z ,find the derivative.
5 3 2
3.Let F(x,y,z) =
− xy + z
2 4
4.Evaluate the derivatives of F(x,y,z) = x
sin x + cos x
2
87
UNIT 2: TOTAL DERIVATIVE
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked
Marked Assignment
7.0 References/Further Readings
1.0 INTRODUCTION
The total derivative (full derivative) of a function f,, of several variables, e.g., t, x, y, etc., with
respect to one of its input variables, e.g., t, is different from the partial derivative ( ).
Calculation of the total derivative of f with respect to t does not assume that the other
arguments are constant while t varies; instead, it allows the other arguments to depend on t.
The total derivative adds in these indirect dependencies to find the overall dependency of f
on t.. For example, the total derivative of f(t,x,y) with respect to t is
88
The result will be the differential change in the function f. Because f depends on t, some
of that change will be due to the partial derivative of f with respect to t.. However, some of
that change will
also be due to the partial derivatives of f with respect to the variables x and y. So, the
differential is applied to the total derivatives of x and y to find differentials and ,
which can then be used to find the contribution to .
which computes the total derivative of a function (with respect to x in this case).
• It is another name for the derivative as a linear map, i.e., if f is a differentiable function
from Rn to Rm, then the (total) derivative (or differential) of f at x∈R Rn is the linear map
n m
from R to R whose matrix is the Jacobian matrix of f at x.
2.0 OBJECTIVE
After studying
tudying this unit, you should be able to
89
know total differential equation.
Suppose that f is a function of two variables, x and y.. Normally these variables are assumed to
be independent. However, in some situations they may be dependent on each other. For
example y could be a function of x, constraining the domain of f to a curve in R2. In this case
the partial derivative of f with respect to x does not give the true rate of change of f with
respect to changing x because changing x necessarily changes y. The total derivative takes
such dependencies into account.
f(x,y) = xy.
The rate of change of f with respect to x is usually the partial derivative of f with respect to x;
in this case,
However, if y depends on x,, the partial derivative does not give the true rate of change of f as
x changes because it holds y fixed.
y=x
then
90
While one can often perform substitutions to eliminate indirect dependencies, the chain rule
provides for a more efficient and general technique. Suppose M(t, p1, ..., pn) is a function of
time t and n variables pi which themselves depend on time. Then, the total time derivative of
M is
This expression is often used in physics for a gauge transformation of the Lagrangian, as two
Lagrangians that differ only by the total time derivative of a function of time and t
generalized coordinates lead to the same equations of motion. The operator
operator in brackets (in the
final expression) is also called the total derivative operator (with respect to t).
Here there is no ∂f / ∂t term since f itself does not depend on the independent variable t
directly
Differentials provide a simple way to understand the total derivative. For instance, suppose
is a function of time t and n variables pi as in the previous section. Then,
the differential of M is
This expression
pression is often interpreted heuristically as a relation between infinitesimals.
However, if the variables t and pj are interpreted as functions, and is
interpreted to mean the composite of M with these functions, then the above expression
makes perfect sense as an equality of differential 1-forms,, and is immediate from
f the chain
rule for the exterior derivative.
derivative. The advantage of this point of view is that it takes
tak into
account arbitrary dependencies between the variables. For example, if then
. In particular, if the variables pj are all functions of t, as in
the previous section, then
91
Let be an open subset.
subset Then a function is said to be ((totally)
differentiable at a point , if there exists a linear map (also
denoted Dpf or Df(p))
(p)) such that
CONCLUSION
In this unit, you have known how to differentiate with indirect dependent. You have used
total derivative via differentials and have known the total derivative as a linear map. You
have
SUMMARY
1.Find the total derivative for the second – order of the function
4
F(x,y,z)= x + y −z
3 3
92
2.Find the total derivative for the function
3
x y +z
2 3
F(x,y,z)=
4 3
y +x y+
3 2 4 4
F(x,y,z)= x yxz
REFFERENCE
nton, Howard; Bivens, Irl; Davis, Stephen (February 2, 2005), Calculus: Early
Transcendentals Single and Multivariable (8th ed.), New York: Wiley, ISBN 978-0-471-
47244-5
Apostol, Tom M. (June 1967), Calculus, Vol. 1: One-Variable Calculus with an Introduction
to Linear Algebra, 1 (2nd ed.), Wiley, ISBN 978-0-471-00005-1
Apostol, Tom M. (June 1969), Calculus, Vol. 2: Multi-Variable Calculus and Linear Algebra
with Applications, 1 (2nd ed.), Wiley, ISBN 978-0-471-00007-5
Courant, Richard; John, Fritz (December 22, 1998), Introduction to Calculus and Analysis,
Vol. 1, Springer-Verlag, ISBN 978-3-540-65058-4
Eves, Howard (January 2, 1990), An Introduction to the History of Mathematics (6th ed.),
Brooks Cole, ISBN 978-0-03-029558-4
Larson, Ron; Hostetler, Robert P.; Edwards, Bruce H. (February 28, 2006), Calculus: Early
Transcendental Functions (4th ed.), Houghton Mifflin Company, ISBN 978-0-618-60624-5
Spivak, Michael (September 1994), Calculus (3rd ed.), Publish or Perish, ISBN 978-0-
914098-89-8
Stewart, James (December 24, 2002), Calculus (5th ed.), Brooks Cole, ISBN 978-0-534-
39339-7
93
UNIT 3: APPLICATION OF TOTAL DERIVATIVE OF A
FUNCTION.
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 chain rule
3.2 directional derivative
3.3 differentiation under integral sign
3.4 lebnitz rule
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
INTRODUCTION
Let us consider a function
1) u = f(x, y, z, p, q, ... )
of several variables. Such a function can be studied by holding all variables except one
constant and observing its variation with respect to one single selected variable. If we
consider all the variables except x to be constant, then
represents the partial derivative of f(x, y, z, p, q, ... ) with respect to x (the hats indicating
variables held fixed). The variables held fixed are viewed as parameters.
OBJECTIVES
94
apply total derivative to find directional derivative
apply total derivative to solve differentiation under integral sign
apply total derivative on lebnitz rule
2)
This rule is called the chain rule for the partial derivatives of functions of functions.
This rule is called the chain rule for the partial derivatives of functions of functions.
95
Note the similarity between total differentials and total derivatives. The total derivative above
can be obtained by dividing the total differential
by dt.
As a special application of the chain rule let us consider the relation defined by the two
equations
Defination of Scalar point function. A scalar point function is a function that assigns a real
number (i.e. a scalar) to each point of some region of space. If to each point (x, y, z) of a
96
region R in space there is assigned a real number u = Φ(x, y, z), then Φ is called a scalar point
function.
Examples. 1. The temperature distribution within some body at a particular point in time.
2. The density distribution within some fluid at a particular point in time.
Directional derivatives. Let Φ(x, y, z) be a scalar point function defined over some region R
of space. The function Φ(x, y, z) could, for example, represent the temperature distribution
within some body. At some specified point P(x, y, z) of R we wish to know the rate of change
of Φ in a particular direction. The rate of change of a function Φ at a particular point P, in a
specified direction, is called the directional derivative of Φ at P in that direction. We specify
the direction by supplying the direction angles or direction cosines of a unit vector e pointing
in the desired direction.
Theorem. The rate of change of a function Φ(x, y, z) in the direction of a vector with
direction angles (α, β, γ) is given by
3)
where s corresponds to distance in the metric of the coordinate system. That direction for
which the function Φ at point P has its maximum value is called the gradient of Φ at P.
We shall prove the theorem shortly. First let us consider the same problem for two
dimensional space.
97
Let Φ(x, y) be a scalar point function defined over some region R of the plane. At some
specified point P(x, y) of R we wish to know the rate of change of Φ in a particular direction.
We specify the direction by supplying the angle α that a unit vector e pointing in the desired
direction makes with the positive x direction. See Fig. 4. The rate of change of function Φ at
point P in the direction of e corresponding to angle α is given by
where s corresponds to distance in the metric of the coordinate system. We show this as
follows:
Let
T = f(x, y)
where T is the temperature at any point of the plate shown in Fig. 5. We wish to derive
expression 4) above. In other words, we wish to derive the expression for the rate of change
of T with respect to the distance moved in any selected direction. Suppose we move from
point P to point P'. This represents a displacement ∆x in the x-direction and ∆y in the y-
direction. The distance moved along the plate is
98
The direction is given by the angle α that PP' makes with the positive x-direction. The change
in the value of T corresponding to the displacement from P to P' is
From Fig. 5 we observe that ∆x/∆s = cos α and ∆y/∆s = sin α . Making these substitutions
and letting P' approach P along line PP', we have
99
This is the directional derivative of T in the direction α.
Def. Directional derivative. The directional derivative of a scalar point function Φ(x, y, z) is
the rate of change of the function Φ(x, y, z) at a particular point P(x, y, z) as measured in a
specified direction.
Tech. Let Φ(x, y, z) be a scalar point function possessing first partial derivatives throughout
some region R of space. Let P(x0, y0, z0) be some point in R at which we wish to compute the
directional derivative and let P'(x1, y1, z1) be a neighboring point. Let the distance from P to
P' be ∆s. Then the directional derivative of Φ in the direction PP' is given by
5)
Using this definition, let us now derive 3) above. In moving from P to P' the function Φ will
change by an amount
100
where ε1, ε2, ε3 are higher order infinitesimals which approach zero as P' approaches P i.e. as
∆x, ∆y and ∆z approach zero. If we divide the change ∆Φ by the distance ∆s we obtain a
measure of the rate at which Φ changes as we move from P to P':
6)
We now observe that ∆x/∆s, ∆y/∆s, ∆z/∆s are the direction cosines of the line segment PP'.
They are also the direction cosines of a unit vector e located at P pointing in the direction of '.
If the direction angles of e are α, β, γ, then ∆x/∆s, ∆y/∆s, ∆z/∆s are equal to cos α, cos β, and
cos γ, respectively. Thus 6) becomes
and
7)
Let us note that 7) can be written in vector form as the following dot product:
8)
The vector
is called the gradient of Φ. Thus the directional derivative of Φ is equal to the dot product of
the gradient of Φ and the vector e. In other words,
101
where
If the vector e is pointed in the same direction as the gradient of Φ then the directional
derivative of Φ is equal to the gradient of Φ.
Differentiation under the integral sign. Leibnitz’s rule. We now consider differentiation
with respect to a parameter that occurs under an integral sign, or in the limits of integration,
or in both places.
Theorem 1. Let
where a x b and f is assumed to be integrable on [a, b]. Then the function F(x) is
continuous and = f(x) at each point where f(x) is continuous.
Then
102
Theorem 3. Leibnitz’s rule. Let
u1 = u1(α)
u2 = u2(α).
Let f(x, α) and ∂f/∂α be continuous in both x and α in a region R of the x-α plane that
includes the region u1 x u2, c α d. Let u1 and u2 be continuous and have continuous
derivatives for c α d. Then
where f(u1, α) is the expression obtained by substituting the expression u1(α) for x in f(x, α).
Similarly for f(u2, α). The quantities f(u1, α) and f(u2, α) correspond to ∂G/u1 and ∂G/u2
respectively and 12) represents the chain rule.
103
However, in some cases it is not true. Under what circumstances is it true? It is true if both
functions fyx and fxy are continuous at the point where the partials are being taken.
Theorem. Let the function f(x, y) be defined in some neighborhood of the point (a, b). Let
the partial derivatives fx, fy, fxy, and fyx also be defined in this neighborhood. Then if fxy and
fyx are both continuous at (a, b), fxy(a, b) = fyx(a, b).
EXAMPLE
and
4.0 CONCLUSION
In this unit, you have applied total derivative on chain rule. You have solved problems on
directional derivatives using total derivative. You have used total derivative to solve
differentiation under integral sign and leibnitz rule.
5,0 SUMMARY
(3 x + y )
1
F(x,y) = 2 4 4
104
2
F(x,y,z) = 3x + 2xyz
In the point (0,1)
3.Find the total derivative of the function
2
F(xy) = 3xy + 4 y
REFERENCES
Dover, 2003.
105
MODULE 4
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 Partial derivatives
3.2 Second partial derivatives
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
INTRODUCTION
Differentiation is a method to compute the rate at which a dependent output y changes with
respect to the change in the independent input x. This rate of change is called the derivative
of y with respect to x. In more precise language, the dependence of y upon x means that y is a
function of x. This functional relationship is often denoted y = ƒ(x), where ƒ denotes the
function. If x and y are real numbers, and if the graph of y is plotted against x, the derivative
measures the slope of this graph at each point.
The simplest case is when y is a linear function of x, meaning that the graph of y against x is a
straight line. In this case, y = ƒ(x) = m x + b, for real numbers m and b, and the slope m is
given by
106
where the symbol ∆ (the uppercase form of the Greek letter Delta)) is an abbreviation for
"change in." This formula is true because
This gives an exact value for the slope of a straight line. If the function ƒ is not linear (i.e. its
graph is not a straight line), however, then the change in y divided by the change in x varies:
differentiation
ifferentiation is a method to find an exact value for this rate of change at any given value of
x.
Figure 2. The secant to curve y= ƒ(x) determined by points (x, ƒ(x)) and (xx+h, ƒ(x+h))
107
The idea, illustrated by Figures 1-3,
1 3, is to compute the rate of change as the limiting value of
the ratio of the differences ∆y / ∆x as ∆x becomes infinitely small.
In Leibniz's notation,, such an infinitesimal change in x is denoted by dx,, and the derivative of
y with respect to x is written
suggesting the ratio of two infinitesimal quantities. (The above expression is read as "the
derivative of y with respect to x", "d y by d x", or "d y over d x". The oral form "d y d x" is
often used conversationally, although it may lead to confusion.)
The most common approach[2] to turn this intuitive idea into a precise
ecise definition uses limits,
[3]
but there are other methods, such as non-standard analysis.
Derivatives
All of this helps us to get to our main topic, that is, partial differentiation. We know how to
take the derivative of a single--variable
variable function. What about the derivative of a multi-variable
multi
function? What does that even mean? Partial Derivatives are the beginning of an answer to
that question.
OBJECTIVES
108
In this unit, you should be able to :
MAIN CONTENT
(referred
referred to as ``partial z, partial x'')
(referred
referred to as ``partial f, partial x'' )
The next set of notations for partial derivatives is much more compact and especially used
when you are writing down something that uses lots of partial derivatives, especially if they
are all different kinds:
(referred
ed to as ``partial z, partial x'')
x''
(referred
referred to as ``partial f, partial x'')
x''
109
.
To get an intuitive grasp of partial derivatives, suppose you were an ant crawling over some
rugged terrain (a two-variable
variable function) where the x-axis is north-south
south with positive x to the
north, the y-axis is east-west
west and the z-axis is up-down.
down. You stop at a point P=(x0, y0, z0) on a
hill and wonder what sort of slope you willwill encounter if you walk in a straight line north.
Since our longitude won't be changing as we go north, the y in our function is constant. The
slope to the north is the value of fx(x0, y0).
The actual calculations of partial derivatives for most functions is very easy! Treat every
indpendent variable except the one we are interested in as if it were a constant and apply the
familiar rules!
Example:
Let's find fx and fy of the function z=f=x2 -3x2y+y3. To find fx, we will treat y as a constant and
differentiate. So, fx=2x-6xy.. By treating x as a constant, we find fy=-3x2+3y2.
Observe carefully that the expression fxy implies that the function f is differentiated first with
respect to x and then with respect to y, which is a natural inference since fxy is really (fx)y.
it is implied that we differentiate first with respect to y and then with respect to x.
Example:
• fx=yexy + ycosx
110
• fxy=xyexy + cosx
• fy=xexy + sinx
• fyx=xyexy + cosx
In this example fxy=fyx. Is this true in general? Most of the time and in most examples that you
will probably ever see, yes. More precisely, if
then fxy=fyx.
Partial Derivatives of higher order are defined in the obvious way. And as long as suitable
continuity exists, it is immaterial in what order a sequence of partial differentiation is carried
out.
Theorem. Suppose that S is a ball in 3 , the function f:S is continuous and has partial
derivatives fx fy fz in S and the partial derivatives are continuous in a point (x y z) of S .
Then the increment
f:=f(x+ x y+ y z+ z))−f(x y z)
which f gets when one moves from (x y z) to another point (x+ x y+ y z+ z) of S , can
be split into two parts as follows:
(1)
(2)
We now assume conversely that the increment of a function f in 3 can be split into two
parts as follows:
111
(3
)
CONCLUSION
In this unit, you have identified and solved problem on partial differential of function of
several variables. You have also used partial differential of function of several variables to
solve problems on second partial derivatives.
SUMMARY
Partial derivatives
2 4
F(x,y,z) = x y z
2.Find f f f
xx yy zz
,given that
at F(x,y,z) = sin(xyz)
4
+ 2 xy + z
3 4
f f f
xx yy zz
= x y
112
4.Evaluate the second order derivative of
2
x +y z
3 3
F(x,y,z) =
REFFERENCE
Jacques, I. 1999. Mathematics for Economics and Business. 3rd Edition. Prentice Hall.
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
4.0 Conclusion
5.0- Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Readings
INTRODUCTION
In the case of a function of a single variable the differential of the function y = f(x) is the
quantity
dy = f '(x) ∆x .
This quantity is used to compute the approximate change in the value of f(x) due to a change
113
∆x in x. As is shown in Fig. 2,
while dy = CT = f '(x)∆x .
When ∆x is small the approximation is close. Line AT represents the tangent to the curve at
point A.
OBJECTIVES
At the end of this unit, you should be able to identify and solve problems on total differentials
of functions of several variables
MAIN CONTENT
In the case of a function of two variables the situation is analogous. Let us start at point A(x1,
y1, z1) on the surface
z = f(x, y)
shown in Fig. 3 and let x and y change by small amounts ∆x and ∆y, respectively. The
change produced in the value of the function z is
An approximation to ∆z is given by
When ∆x and ∆y are small the approximation is close. Point T lies in that plane tangent to the
surface at point A.
114
The quantity
is called the total differential of the function z = f(x, y). Because is customary to denote
increments ∆x and ∆y by dx and dy, the total differential of a function z = f(x, y) is defined
as
The total differential of three or more variables is defined similarly. For a function z = f(x,
y, .. , u) the total differential is defined as
Each of the terms represents a partial differential. For example, the term
is the partial differential of z with respect to x. The total differential is the sum of the partial
differentials.
4.0 CONCLUSION
In this unit, you have identified and solved problems on total differentials of functions of
several variables
5.0 SUMMARY
In this unit, you have studied total differentials of functions of several variables.
115
6.0TUTOR – MARKED ASSIGNMENT
2
a. F(x,y) = x + 2xy + y
3
x +2y + z
4 2
b. F(x,y,z) =
3 2 3
c. F(x,y,z) = x y z
3
F(x,y,z) = 4 x y + z
2 2
d.
3
F(x,y,z) = x + y − 2 xyz
2
e.
7.0 REFERENCES
116
MODULE 5 COMPOSITE DIFFERENTIATION, EULER’S THEOREM, IMPLICIT
DIFFERENTIATION.
Unit 1: Composite differentiation
Unit 2: Euler’s Theorem
Unit 3: Implicit differentiation.
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
In calculus, the chain rule is a formula for computing the derivative of the composition of two
or more functions. That is, if f is a function and g is a function, then the chain rule expresses
the derivative of the composite function f ∘ g in terms of the derivatives of f and g.
Calculate the derivatives of each function. Write in fraction form, if needed, so that all
exponents are positive in your final answer. Use the "modified power rule" for each.
2.0 OBJECTIVES
At the end of this unit, you should be able to
117
use chain rule to solve mathematical problems
solve composites of more than two functions
use the quotient rule to solve composite functions
identify problems in composite function which could be solve by the use of higher
derivative.
Proof the chain rule
Know the rule in higher dimension
The points where the derivatives are evaluated may also be stated explicitly:
Further examples
The chain rule in the absence of formulas
It may be possible to apply the chain rule even when there are no formulas for the
functions which are being differentiated. This can happen when the derivatives are
measured directly. Suppose that a car is driving up a tall mountain. The car's speedometer
measures its speed directly. If the grade is known, then the rate of ascent can be calculated
using trigonometry.. Suppose that the car is ascending at 2.5 km/h. Standard models for the
Earth's atmosphere imply that the temperature drops about 6.5 °C per kilometer ascended
(see lapse rate).
). To find the temperature drop per hour, we apply the chain rule. Let the
function g(t)) be the altitude of the car at time t, and let the function f(h)) be the temperature
h kilometers above sea level. f and g are not known exactly: For example, the altitudealti
118
where the car starts is not known and the temperature on the mountain is not known.
However, their derivatives are known: f′ is −6.5 °C/km, and g′ is 2.5 km/h. The chain rule
says that the derivative of the composite function is the product of the derivative
der of f and
the derivative of g. This is −6.5 °C/km · 2.5 km/h = −16.25 °C/h.
One of the reasons why this computation is possible is because f′′ is a constant function.
This is because the above model is very simple. A more accurate description of how the
temperature near the car varies over time would require an accurate model of how the
temperature varies at different altitudes. This model may not have a constant derivative. To
compute the temperature change in such a model, it would be necessary to knowk g and not
just g′,, because without knowing g it is not possible to know where to evaluate f′.
Composites of more than two functions
The chain rule can be applied to composites of more than two functions. To take the
derivative of a composite of more than than two functions, notice that the composite of f, g, and
h (in that order) is the composite of f with g ∘ h.. The chain rule says that to compute the
derivative of f ∘ g ∘ h,, it is sufficient to compute the derivative of f and the derivative of g ∘
h. The derivative of f can be calculated directly, and the derivative of g ∘ h can be
calculated by applying the chain rule again.
For concreteness, consider the function
The chain rule says that the derivative of their composite at the point x = a is:
119
or for short,
Another way of computing this derivative is to view the composite function f ∘ g ∘ h as the
composite of f ∘ g and h.. Applying the chain rule to this situation gives:
( ∘ g) ∘ h
This is the same as what was computed above. This should be expected because (f
= f ∘ (g ∘ h).
The quotient rule
Suppose that y = g(x)) has an inverse function. Call its inverse function f so that we have x
= f(y). There is a formula for the derivative of f in terms of the derivative of g. To see this,
note that f and g satisfy the formula
f(g(x)) = x.
)) and x are equal, their derivatives must be equal. The
Because the functions f(g(x))
120
derivative of x is the constant function with value 1, and the derivative of f(g(x)) is
determined by the chain rule. Therefore we have:
f'(g(x))g'(x) = 1.
To express f′ as a function of an independent variable y, we substitute f(y)) for x wherever it
appears.
rs. Then we can solve for f′.
For example, consider the function g(x) = ex. It has an inverse which is denoted f(y) = ln y.
Because g′(x) = ex, the above formula says that
This formula is true whenever g is differentiable and its inverse f is also differentiable.
This formula can fail when one of these conditions is not true. For example, consider g(x)
= x3. Its inverse is f(y) = y1/3, which is not differentiable at zero. If we attempt to use the
above formula to compute the derivative of f at zero, then we must evaluate 1/g′(f(0)).
1/ f(0)
= 0 and g′(0)
(0) = 0, so we must evaluate 1/0, which is undefined. Therefore the formula fails
in this case. This is not surprising because f is not differentiable at zero.
Higher derivatives
Faà di Bruno's formula generalizes the chain rule to higher derivatives. The first few
derivatives are
121
First proof
One proof of the chain rule begins with the definition of the derivative:
Assume for the moment that g(x) does not equal g(a) for any x near a.. Then the previous
expression is equal to the product of two factors:
When g oscillates near a,, then it might happen that no matter how close one gets to a, there
is always an even closer x such that g(x) equals g(a).
). For example, this happens for g(x) =
x2sin(1 / x) near the point a = 0. Whenever this happens, the above expression is undefined
because it involves division by zero.
zero. To work around this, introduce a function Q as
follows:
We will show that the difference quotient for f ∘ g is always equal to:
Whenever g(x)) is not equal to g(a), this is clear because the factors of g(xx) - g(a) cancel.
When g(x) equals g(a),), then the difference quotient for f ∘ g is zero because f(g(x)) equals
f(g(a)),
)), and the above product is zero because it equals f′(g(a)) timess zero. So the above
product is always equal to the difference quotient, and to show that the derivative of f ∘ g at
a exists and to determine its value, we need only show that the limit as x goes to a of the
above product exists and determine its value.
To do this, recall that the limit of a product exists if the limits of its factors exist. When this
happens, the limit of the product of these two factors will equal the product of the limits of
the factors. The two factors are Q(g(x)) and (g(x) - g(a)) / (x - a).). The latter is the
difference quotient for g at a,
a and because g is differentiable at a by assumption, its limit as
x tends to a exists and equals g′(a).
It remains to study Q(g(x)). Q is defined wherever f is. Furthermore, because f is
differentiable at g(a)) by assumption, Q is continuous at g(a). g is continuous at a because it
is differentiable at a,, and therefore Q ∘ g is continuous at a.. So its limit as x goes to a exists
and equals Q(g(a)),)), which is f′(g(a)).
This shows that the limits of both factors exist and that they equal f′(g(a))
)) and g′(a),
respectively. Therefore the derivative of f ∘ g at a exists and equals f′(g(a))
))g′(a).
Second proof
122
Another way of proving the chain rule is to measure the error in the linear approximation
determined by the derivative. This proof has the advantage that it generalizes to several
variables. It relies on the following equivalent definition of differentiability at a point: A
function g is differentiable at a if there exists a real number g′(a)) and a function ε(h) that
tends to zero as h tends to zero, and furthermore
The above definition imposes no constraints on η(0),(0), even though it is assumed that η(k)
tends to zero as k tends to zero. If we set η(0) = 0, then η is continuous at 0.
Proving the theorem requires studying the difference f(g(a + h)) − f(g(a)) )) as h tends to
zero. The first step is to substitute for g(a + h)) using the definition of differentiability of g
at a:
f(g(a + h)) − f(g(a)) = f(g(a)) + g'(a)h + ε(h)h) − f(g(a)).
The next step is to use the definition of differentiability of f at g(a).
). This requires a term of
the form f(g(a) + k)) for some k. In the above equation, the correct k varies with h. Set kh =
g′(a)h + ε(h)h and the right hand side becomes f(g(a) + kh) − f(g(a)). )). Applying the
definition of the derivative gives:
To study the behavior of this expression as h tends to zero, expand kh. After regrouping the
terms, the right-hand
hand side becomes:
Because ε(h) and η(kh) tend to zero as h tends to zero, the bracketed terms tend to zero as h
tends to zero. Because the above
abov expression is equal to the difference f(g((a + h)) − f(g(a)),
by the definition of the derivative f ∘ g is differentiable at a and its derivative is
f′(g(a))g′(a).
The role of Q in the first proof is played by η in this proof. They are related by the
equation:
The need to define Q at g(a)) is analogous to the need to define η at zero. However, the
proofs are not exactly equivalent. The first proof relies on a theorem about products of
limits to show that the derivative exists. The second proof does not need this because
123
showing that the error
ror term vanishes proves the existence of the limit directly.
The chain rule in higher dimensions
The simplest generalization of the chain rule to higher dimensions uses the total derivative.
The total derivative is a linear transformation that captures how the function changes in all
directions. Let f : Rm → Rk and g : Rn → Rm be differentiable functions, and let D be the
total derivative operator. If a is a point in Rn, then the higher dimensional chain rule says
that:
or for short,
That is, the Jacobian of the composite function is the product of the Jacobians of the
composed functions. The higher-dimensional
higher dimensional chain rule can be proved using a technique
similar to the second proof given above.
The higher-dimensional
dimensional chain rule is a generalization of the one-dimensional
one dimensional chain rule. If
k, m, and n are 1, so that f : R → R and g : R → R,, then the Jacobian matrices of f and g
are 1 × 1. Specifically, they are:
The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that
when the total derivative exists, the partial derivative in the ith
th coordinate direction is
found by multiplying the Jacobian matrix by the ith basis vector. By doing this to the
formula above, we find:
124
Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above
formula to get:
More conceptually, this rule expresses the fact that a change in the xi direction may change
all of g1 through gk, and any of these changes may affect f.
In the special case where k = 1, so that f is a real-valued
valued function, then this formula
simplifies even further:
Example
and
Higher
derivatives of multivariable functions
Faà di Bruno's formula for higher-order
higher derivatives of single-variable
variable functions generalizes
to the multivariable case. If f is a function of u = g(x)) as above, then the second derivative
of f ∘ g is:
The composite function chain rule notation can also be adjusted for the multivariate case:
125
Then the partial derivatives of z with respect to its two independent variables are defined as:
Let's do the same example as above, this time using the composite function notation where
functions within the z function are renamed. Note that either rule could be used for this
problem, so when is it necessary to go to the trouble of presenting the more formal composite
function notation? As problems become more complicated, renaming parts of a composite
function is a better way to keep track of all parts of the problem. It is slightly more time
tim
consuming, but mistakes within the problem are less likely.
Multivariate function
The rule for differentiating multivariate natural logarithmic functions, with appropriate
notation changes is as follows:
Then the partial derivatives of z with respect to its independent variables are defined as:
126
Let's do an example. Find the partial derivatives of the following function:
The rule for taking partials of exponential functions can be written as:
Then the partial derivatives of z with respect to its independent variables are defined as:
One last time, we look for partial derivatives of the following function using the exponential
rule:
127
These second derivatives can be interpreted as the rates of change of the two slopes of the
function z.
Now the story gets a little more complicated. The cross-partials, fxy and fyx are defined in the
following way. First, take the partial derivative of z with respect to x. Then take the
derivative again, but this time, take it with respect to y, and hold the x constant. Spatially,
think of the cross partial as a measure of how the slope (change in z with respect to x)
changes, when the y variable changes. The following are examples of notation for cross-cross
partials:
We'll discuss economic meaning further in the next section, but for now, we'll just show an
example,, and note that in a function where the cross-partials
cross partials are continuous, they will be
identical. For the following function:
Now, starting with the first partials, find the cross partial derivatives:
4.0 CONCLUSION
In this unit, you have been introduced to the composite differentiation also called the chain
rule. You have known the Composites of more than two functions. You have also known the
quotient rule.
ule. You have solved problems on higher derivative with the use of composite
differentiation. You have proof the chain rule and known the rule in higher dimension.
5.0 SUMMARY
128
In this unit, you have studied :
The chain rule
Composites of more than two functions
The quotient rule
Higher derivative
Proof of the chain rule
The rule in higher dimension
6.0 TUTOR-MARKED ASSIGNMENT
2 5
1.0 What are the second – order derivatives of the function F(x,y)= xy + x
3
y
3 3
2.0 Express x- and y- derivatives of W( x y ) in terms of x,y.
4 6
3.0 What are the second - order derivatives of the function F(x,y) = x y .
4.0 What are the second – order derivatives of the function K(x,y) = ln (2x-3y).
1 1
5.0 What are the second – order derivatives of the function R(x,y) = x y 2 3 .
−1
6.0 What are the second – order derivatives of the function N(x,y) = tan ( x, y ).
REFERENCES
Hernandez Rodriguez and Lopez Fernandez, A Semiotic Reflection on the Didactics of the
Chain Rule, The Montana Mathematics Enthusiast, ISSN 1551-3440, Vol. 7, nos.2&3,
pp.321–332.
Apostol, Tom (1974). Mathematical analysis (2nd ed. ed.). Addison Wesley. Theorem 5.
129
UNIT 2: EULER’S THEOREM
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
Statement
tatement and prove of Euler’s theorem
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED
MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
In number theory, Euler's theorem (also known as the Fermat–Euler Euler theorem or Euler's
totient theorem)) states that if n and a are coprime positive integers, then
where φ(n) is Euler's totient function and "... ≡ ... (mod n)" denotes congruence modulo n.
n
2.0 OBJECTIVES
In this unit, the student should able to state and prove the Euler’s theorem.
3.0 MAIN CONTENT
The converse of Euler's theorem is also true: if the above congruence holds for positive
integers a and n, then a and n are coprime.
The theorem is a generalization of Fermat's little theorem,, and is further generalized by
Carmichael's theorem.
The theorem may be used to easily reduce large powers modulo n.. For example, consider
finding the ones place decimal digit of 7222, i.e. 7222 (mod 10). Note that 7 and 10 are
(10) = 4. So Euler's theorem yields 74 ≡ 1 (mod 10), and we get 7222 ≡ 74×55 + 2
coprime, and φ(10)
≡ (74)55×72 ≡ 155×72 ≡ 49 ≡ 9 (mod 10).
In general, when reducing a power of a modulo n (where a and n are coprime), one needs to
work modulo φ(n)) in the exponent of a:
if x ≡ y (mod φ(n)), then ax ≡ ay (mod n).
Euler's theorem also forms the basis of the RSA encryption system: encryption and
130
decryption in this system together amount to exponentiating the original text by kφ(n)+1 for
some positive integer k, so Euler's theorem shows that the decrypted result is the same as the
original.
Proofs
1. Leonhard Euler published a proof in 1789. Using modern terminology, one may prove the
theorem as follows: the numbers b which are relatively prime to n form a group under
multiplication mod n, the group G of (multiplicative) units of the ring Z/nZ. This group has
φ(n) elements. The element a := a (mod n) is a member of the group G, and the order o(a) of
a (the least k > 0 such that ak = 1) must have a multiple equal to the size of G. (The order of a
is the size of the subgroup of G generated by a, and Lagrange's theorem states that the size of
any subgroup of G divides the size of G.)
Thus for some integer M > 0, M·o(a) = φ(n). Therefore aφ(n) = ao(a)·M = (ao(a))M = 1M = 1. This
means that aφ(n) = 1 (mod n).
2. Another direct proof: if a is coprime to n, then multiplication by a permutes the residue
classes mod n that are coprime to n; in other words (writing R for the set consisting of the
φ(n) different such classes) the sets { x : x in R } and { ax : x in R } are equal; therefore, the
two products over all of the elements in each set are equal. Hence, P ≡ aφ(n)P (mod n) where
P is the product over all of the elements in the first set. Since P is coprime to n, it follows that
aφ(n) ≡ 1 (mod
4.0 CONCLUSION
In this unit, you have stated and proved the Euler’s theorem
5.0 SUMMARY
In this unit, you have known the statement of euler’s theorem and proved euler’s theorem.
6.0 TUTOR-MARKED ASSIGNMENT
State and prove euler’s theorem.
References
Hernandez Rodriguez and Lopez Fernandez, A Semiotic Reflection on the Didactics of the
Chain Rule, The Montana Mathematics Enthusiast, ISSN 1551-3440, Vol. 7, nos.2&3,
pp.321–332.
Apostol, Tom (1974). Mathematical analysis (2nd ed. ed.). Addison Wesley. Theorem 5.5.
131
UNIT 3 :IMPLICIT DIFFERENTIATION
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 Know the derivatives of Inverse Trigonometric Functions
3.2 Define and identify Implicit differentiation
3.3 Know formula for two variables
3.4 Know applications in economics
3.5 Solve Implicit differentiation problems
4.0 CONCLUSION
5.0 SUMMARY
6.0 MARKED ASSIGNMENT
TUTOR-MARKED
7.0 REFERENCES/FURTHER READINGS
INTRODUCTION
Most of our math work thus far has always allowed us to solve an equation for y in terms of
x.. When an equation can be solved for y we call it an explicit function. But not all equations
can be solved for y.. An example is:
This equation cannot be solved for y.. When an equation cannot be solved for y, we call it an
implicit function. The good news is that we can still differentiate such a function. The
technique is called implicit differentiation.
differentiation
When we implicitly differentiate,
iate, we must treat y as a composite function and therefore we
must use the chain rule with y terms. The reason for this can be seen in Leibnitz notation:
. This notation tells us that we are differentiating with respect to x.. Because y is not native to
what
hat are differentiating with respect to, we need to regard it as a composite function. As you
know, when we differentiate a composite function we must use the chain rule.
132
This is a "folium
" of Descartes"" curve. This would be very
difficulty to solve for y, so we will want to use implicit
differentiation.
We can see in a plot of the implicit function that the slope of the
tangent line at the point (3,3) does appear to be -1.
-
133
Another example: Differentiate:
Given
implicit
function
Doing
implicit
differentiation
on the
function.
Note the use
of the product
rule on the
second term
We do the
algebra to
solve for y'.
Here we see a
portion of
plot of the
implicit
equation with
c set equal to
5.. When does
it appear that
the slope of
the tangent
line will be
zero? It
appears to be
at about
(2.2,2.2).
We take our
derivative, set
it equal to
zero, and
solve.
134
Now putting
x = y in the
original
implicit
equation, we
find that...
We still must
use a
computer
algebra
system to
solve this
cubic
equation. The
one real
answer is
shown at the
left. This
x = y = 2.116343299 answer does
seem
consistent
with our
visual
estimate.
This can be
done in
Maple with
the following
2.0 OBJECTIVES
At the end of this unit, you should be able to :
Know the derivatives of Inverse Trigonometric Functions
Define and identify Implicit differentiation
Know formula for two variables
Know applications in economics
Solve Implicit differentiation problems
135
3.0 MAIN CONTENT
Links to other explanations of Implicit Differentiation
Derivatives of Inverse Trigonometric Functions
Thanks to implicit differentiation, we can develop important derivatives that we could not
have developed otherwise. The inverse trigonometric functions fall under this category. We
will develop and remember the derivatives of the inverse sine and inverse tangent.
136
Implicit differentiation
In
The inverse tangent function.
calculus, a method called implicit differentiation makes use of the chain rule to differentiate
implicitly defined functions.
As explained in the introduction, y can be given as a function of x implicitly rather than
explicitly. When we have an equation R(x, y) = 0, we may be able to solve it for y and then
differentiate. However, sometimes it is simpler to differentiate R(x, y)) with respect to x and y
and then solve for dy/dx.
Examples
1. Consider for example
This function normally can be manipulated by using algebra to change this equation to one
expressing y in terms of an explicit function:
function
where the right side is the explicit function whose output value is y.. Differentiation then gives
137
Solving for gives:
In order to differentiate this explicitly with respect to x,, one would have to obtain (via
algebra)
and then differentiate this function. This creates two derivatives: one for y > 0 and another
for y < 0.
One might find it substantially easier to implicitly differentiate the original function:
giving,
3. Sometimes standard explicit differentiation cannot be used and, in order to obtain the
derivative, implicit differentiation must be employed. An example of such a case is the
equation y5 − y = x. It is impossible to express y explicitly as a function of x and therefore
dy/dx cannot be found by explicit differentiation. Using the implicit method, dy/dx can be
expressed:
138
which yields the final answer
and hence
It can be shown that if R(x,y) is given by a smooth submanifold M in , and (a,b) is a point
of this submanifold such that the tangent space there is not vertical (that is ), then
M in some small enough neighbourhood of (a,b) is given by a parametrization (x,f(x)) where f
is a smooth function. In less technical language, implicit functions exist and can be
differentiated, unless the tangent to the supposed graph would be vertical. In the standard
case where we are given an equation
R(x,y) = 0
the condition on R can be checked by means of partial derivatives .
139
Applications in economics
Marginal rate of substitution
In economics,, when the level sets R(x,y) = 0 is an indifference curve for the quantities x and y
consumed of two goods, the absolute value of the implicit derivative is interpreted as the
marginal rate of substitution of the two goods: how much more of y one must receive in order
to be indifferent to a loss of 1 unit of x.
IMPLICIT DIFFERENTIATION
FERENTIATION PROBLEMS
The following problems require the use of implicit differentiation. Implicit differentiation is
nothing more than a special case of the well-known
well known chain rule for derivatives. The majority
of differentiation problems in first-year
first calculus involve functions y written EXPLICITLY as
functions of x . For example, if
,
then the derivative of y is
.
However, some functions y are written IMPLICITLY as functions of x . A familiar example
of this is the equation
x2 + y2 = 25 ,
which represents a circle of radius five centered at the origin. Suppose that we wish to find
the slope of the line tangent to the graph of this equation at the point (3, -4)
4) .
How could we find the derivative of y in this instance ? One way is to first write y explicitly
as a function of x . Thus,
140
x2 + y2 = 25 ,
y2 = 25 - x2 ,
and
,
where the positive square root represents the top semi-circle
semi circle and the negative square root
represents the bottom semi-circle.
circle. Since the point (3, -4) lies on the bottom
m semi-circle
semi given
by
,
the derivative of y is
,
i.e.,
.
Thus, the slope of the line tangent to the graph at the point (3, -4) is
.
Unfortunately, not every equation involving x and y can be solved explicitly for y . For the
sake of illustration
on we will find the derivative of y WITHOUT writing y explicitly as a
function of x . Recall that the derivative (D) of a function of x squared, (f(xx))2 , can be found
using the chain rule :
.
Since y symbolically represents a function of x, the derivative of y2 can be found in the same
fashion :
.
Now begin with
x2 + y2 = 25 .
Differentiate both sides of the equation, getting
D ( x2 + y2 ) = D ( 25 ) ,
141
D ( x2 ) + D ( y2 ) = D ( 25 ) ,
and
2x + 2 y y' = 0 ,
so that
2 y y' = - 2x ,
and
,
i.e.,
.
Thus, the slope of the line tangent to the graph at the point (3, -4) is
.
This second method illustrates the process of implicit differentiation. It is important to note
that the derivative expression for explicit differentiation involves x only, while the derivative
expression for implicit differentiation may involve BOTH x AND y .
The following problems range in difficulty from average to challenging.
142
SOLUTION 2 : Begin with (x--y)2 = x + y - 1 . Differentiate both sides of the equation, getting
D (x-y)2 = D ( x + y - 1 ) ,
D (x-y)2 = D ( x ) + D ( y ) - D ( 1 ) ,
(Remember to use the chain rule on D (x-y)2 .)
,
2 (x-y) (1- y') = 1 + y' ,
so that (Now solve for y' .)
2 (x-y) - 2 (x-y) y' = 1 + y' ,
- 2 (x-y) y' - y' = 1 - 2 (x-y) ,
(Factor out y' .)
y' [ - 2 (x-y) - 1 ] = 1 - 2 (x-y)) ,
and
,
so that (Now solve for y' .)
143
,
,
(Factor out y' .)
,
and
.
SOLUTION 4 : Begin with y = x2 y3 + x3 y2 . Differentiate both sides of the equation, getting
D(y) = D ( x2 y3 + x3 y2 ) ,
D(y) = D ( x2 y3 ) + D ( x3 y2 ) ,
(Use the product rule twice.)
,
(Remember to use the chain rule on D ( y3 ) and D ( y2 ) .)
,
y' = 3x2 y2 y' + 2x y3 + 2x3 y y'' + 3x
3 2 y2 ,
so that (Now solve for y' .)
y' - 3x2 y2 y' - 2x3 y y' = 2x y3 + 3x
3 2 y2 ,
(Factor out y' .)
y' [ 1 - 3x2 y2 - 2x3 y ] = 2x y3 + 3x
3 2 y2 ,
and
144
getting
(Factor out .)
,
and
145
,
so that (Now solve for y' .)
,
(Factor out y' .)
,
and
,
1 = (1/2)( x2 + y2 )-1/2 D ( x2 + y2 ) ,
1 = (1/2)( x2 + y2 )-1/2 ( 2x + 2yy y'
y ),
so that (Now solve for y' .)
146
,
,
and
,
or
x - y3 = xy + 2y + x3 + 2x2 .
Now differentiate both sides of the equation, getting
D ( x - y3 ) = D ( xy + 2y + x3 + 2x
2 2),
D ( x ) - D (y3 ) = D ( xy ) + D ( 2y ) + D ( x3 ) + D ( 2x2 ) ,
(Remember to use the chain rule on D (y3 ) .)
1 - 3 y2 y' = ( xy' + (1)y ) + 2 y'
y + 3x2 + 4x ,
so that (Now solve for y' .)
1 - y - 3x2 - 4x = 3 y2 y' + xy'' + 2 y' ,
(Factor out y' .)
1 - y - 3x2 - 4x = (3y2 + x + 2) y'
y ,
and
147
SOLUTION 9 : Begin with . Clear the fractions by multiplying both sides
of the equation by x3 y3 , getting
,
y4 + x4 = x5 y7 .
Now differentiate both sides of the equation, getting
D ( y4 + x4 ) = D ( x5 y7 ) ,
D ( y4 ) + D ( x4 ) = x5 D (y7 ) + D ( x5 ) y7 ,
(Remember to use the chain rule on D (y4 ) and D (y7 ) .)
4 y3 y' + 4 x3 = x5 (7 y6 y'' ) + ( 5 x4 ) y7 ,
so that (Now solve for y' .)
4 y3 y' - 7 x5 y6 y' = 5 x4 y7 - 4 x3 ,
(Factor out y' .)
y' [ 4 y3 - 7 x5 y6 ] = 5 x4 y7 - 4 x3 ,
and
SOLUTION 10 : Begin with (xx2+y2)3 = 8x2y2 . Now differentiate both sides of the equation,
getting
D (x2+y2)3 = D ( 8x2y2 ) ,
3 (x2+y2)2 D (x2+y2) = 8x2 D (yy2 ) + D ( 8x2 ) y2 ,
(Remember to use the chain rule on D (y2 ) .)
3 (x2+y2)2 ( 2x + 2 y y' ) = 8x2 (2 y y' ) + ( 16 x ) y2 ,
148
so that (Now solve for y' .)
6x (x2+y2)2 + 6 y (x2+y2)2 y'' = 16 x2 y y' + 16 x y2 ,
6 y (x2+y2)2 y' - 16 x2 y y'' = 16 x y2 - 6x (x2+y2)2 ,
(Factor out y' .)
y' [ 6 y (x2+y2)2 - 16 x2 y ] = 16 x y2 - 6x (x2+y2)2 ,
and
.
Thus, the slope of the line tangent to the graph at the point (-1,
( 1) is
,
and the equation of the tangent line is
y - ( 1 ) = (1) ( x - ( -1 ) )
or
y=x+2
149
2x + 3 (y-x)2 y'- 3 (y-x)2 = 0 ,
3 (y-x)2 y' = 3 (y-x)2 - 2x ,
and
.
Thus, the slope of the line tangent to the graph at (1, 3) is
,
and the equation of the tangent line is
y - ( 3 ) = (5/6) ( x - ( 1 ) ) ,
or
y = (7/6) x + (13/6) .
SOLUTION 12 : Begin with x2y + y4 = 4 + 2x . Now differentiate both sides of the original
equation, getting
D ( x2 y + y4 ) = D ( 4 + 2x ) ,
D ( x2 y ) + D (y4 ) = D ( 4 ) + D ( 2x ) ,
( x2 y' + (2x) y ) + 4 y3 y'' = 0 + 2 ,
so that (Now solve for y' .)
x2 y' + 4 y3 y' = 2 - 2x y ,
(Factor out y' .)
y' [ x2 + 4 y3 ] = 2 - 2x y ,
and
(Equation 1)
.
Thus, the slope of the graph (the slope of the line tangent to the graph) at (-
(-1, 1) is
150
.
Since y'=
'= 4/5 , the slope of the graph is 4/5 and the graph is increasing at the point (-1,
( 1) .
Now determine the concavity of the graph at (-1,( 1, 1) . Differentiate Equation 1, getting
.
Now let x=-1 , y=1 , and y'=4/5
'=4/5 so that the second derivative is
.
Since y'''' < 0 , the graph is concave down at the point (-1,
( 1)
4.0 CONCLUSION
In this unit you have studied the derivative of inverse of trigonometric functions. You have
known the definition of implicit differentiation and have identified problems on implicit
differentiation. You have also studied the formular for two variables and implicit
differentiation applications in economics. You have solved various examples on implicit
differentiation.
5.0 SUMMARY
In this course you have studied
The derivatives of Inverse Trigonometric Functions
Definition and identification of Implicit differentiation
The formula for two variables
The applications in economics
151
Implicit differentiation problems
if Find y' if
. Show that if a normal line to each point on an ellipse passes through the center of an ellipse,
then the ellipse is a circle.
7.0 REFERENCES
^ a b Stewart, James (1998). Calculus Concepts And Contexts. Brooks/Cole Publishing
Company. ISBN 0-534-34330-9.
Rudin, Walter (1976). Principles of Mathematical Analysis. McGraw-Hill. ISBN 0-07-
054235-X.
Spivak, Michael (1965). Calculus on Manifolds. HarperCollins. ISBN 0-8053-9021-9.
Warner, Frank (1983). Foundations of Differentiable Manifolds and Lie Groups. Springer.
ISBN 0-387-90894
152
MODULE 6 TAYLOR’S SERIES EXPANSION
-Unit 1: Function of two variables
-Unit 2: Taylor’s series expansion for functions of two variables.
-Unit 3: Application of Taylor’s series.
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
153
Unless otherwise stated, we will assume that the variables x and y and the output Value f (x,
y).
2.0 OBJECTIVE
At this unit, you should be able to :
Let f(x,y) be a function with two variables. If we keep y constant and differentiate f
(assuming f is differentiable) with respect to the variable x, we obtain what is called the
partial derivative of f with respect to x which is denoted by
∂f
or fx
∂x
f(x + h , y) - f(x , y)
∂f
lim
=
h→0 h
∂x
f(x , y + k) - f(x , y)
∂f lim
=
k→0
K
∂y
We now present several examples with detailed solution on how to calculate partial
derivatives.
f(x , y) = x2 y + 2x + y
Solution to Example 1:
154
∂f ∂
fx = = [ x2 y + 2x + y ]
∂x ∂x
∂ ∂ ∂
2
= [ x y] + [2x]+ [ y ] = [2 x y] + [ 2 ] + [ 0 ] = 2x y + 2
∂x ∂x ∂x
∂f ∂
fy = = [ x2 y + 2x + y ]
∂y ∂y
∂ ∂ ∂
2
= [ x y] + [2x]+ [ y ] = [ x2 ] + [ 0 ] + [ 1 ] = x2 + 1
∂y ∂y ∂y
Solution to Example 2:
∂f ∂
fx = = [ sin(x y) + cos x ] = y cos(x y) - sin x
∂x ∂x
∂f ∂
fy = = [ sin(x y) + cos x ] = x cos(x y)
∂y ∂y
f(x , y) = x ex y
Solution 3:
∂f ∂
fx = = [ x ex y ] = ex y + x y ex y = (x y + 1)ex y
∂x ∂x
155
Differentiate with respect to y
∂f ∂
fy = = [ x ex y ] = (x) (x ex y) = x2 ex y
∂y ∂y
Example 4: Find fx and fy if f(x , y) is given by
f(x , y) = ln ( x2 + 2 y)
Solution
∂f ∂ 2x
fx = = [ ln ( x2 + 2 y) ] =
∂x ∂x x2 + 2 y
∂f ∂ 2
fy = = [ ln ( x2 + 2 y) ] =
∂y ∂y x2 + 2 y
f(x , y) = y x2 + 2 y
Solution to Example 5:
fx(x,y) = 2x y
fy(x,y) = x2 + 2
We now calculate fx(2 , 3) and fy(2 , 3) by substituting x and y by their given values
fx(2,3) = 2 (2)(3) = 12
fy(2,3) = 22 + 2 = 6
1. f(x , y) = x ex + y
156
2. f(x , y) = ln ( 2 x + y x)
3. f(x , y) = x sin(x - y)
1. fx =(x + 1)ex + y , fy = x ex + y
2. fx = 1 / x , fy = 1 / (y + 2)
Just as we had higher order derivatives with functions of one variable we will also have
higher order derivatives of functions of more than one variable. However, this time we will
have more options since we do have more than one variable..Consider the case of a function
of two variables, since
ce both of the first order partial derivatives are also
functions of x and y we could in turn differentiate each with respect to x or y. This means that
for the case of a function of two variables there will be a total of four possible second order
derivatives. Here they are and the notations that we’ll use to denote them.
The second and third second order partial derivatives are often called mixed partial
derivatives since we are taking derivatives with respect to more than one variable. Note as
well that the order that we take the derivatives in is given by the notation for
for each these. If
we are using the subscripting notation, e.g. , then we will differentiate from left to
right. In other words, in this case, we will differentiate first with respect to x and then with
respect to y. With the fractional notation e.g. , it is the opposite. In these cases we
differentiate moving along the denominator from right to left. So, again, in this case we
differentiate with respect to x first and then
Let’s take a quick look at an example.
157
Example 1 Find all the second order derivatives for
Solution
We’ll first need the first order derivatives so here they are.
Notice that we dropped the from the derivatives. This is fairly standard and we will be
doing it most of the time from this point on. We will also be dropping it for the first order
derivatives in most cases.
Now let’ss also notice that, in this case, . This is not by coincidence. If the function
is “nice enough” this will always be the case. So, what’s “nice enough”? The following
theorem tells us.
Clairaut’s Theorem
Suppose that f is defined on a disk D that contains the point . If the functions
and are continuous on this disk then,
Now, do not get too excited about the disk business and the fact that we gave the theorem is
for a specific point. In pretty much every example in this class if the two mixed second order
partial derivatives are continuous then they will be equal.
Solution
We’ll first need the two first order derivatives.
158
Now, compute the two fixed second order partial derivatives.
So far we have only looked at second order derivatives. There are, of course, higher order
derivatives as well. Here are a couple of the third order partial derivatives of function of two
variables.
Notice as well that for both of these we differentiate once with respect to y and twice with
respect to x. There is also another third order partial derivative in which we can do this,
. There is an extension to Clairaut’s Theorem that says if all three of these are
continuous then they should all be equal,
To this point we’ve only looked at functions of two variables, but everything that we’ve done
to this point will work regardless of the number of variables that we’ve got in the function
and there are natural extensions to Clairaut’s theorem to all of these cases as well. For
instance,
In general, we can extend Clairaut’s theorem to any function and mixed partial derivatives.
The only requirement is that in each derivative we differentiate with respect to each variable
the same number of times. In other words, provided we meet the continuity condition, the
following will be equal
159
because in each case we differentiate with respect to t once, s three times and r three times.
Let’s do a couple of examples with higher (well higher order than two anyway) order
derivatives and functions of more than two variables.
Example 3 Find the indicated derivative for each of the following functions.
Solution
(a)Find for
In this case remember that we differentiate from left to right. Here are the derivatives for this
part.
Here we differentiate from right to left. Here are the derivatives for this function.
160
Maxima and minima
In mathematics, the maximum and minimum (plural: maxima and minima) of a function,
known collectively as extrema (singular: extremum), are the largest and smallest value that
the function takes at a point either within a given neighborhood (local
( or relative extremum)
or on the function domain in its entirety (global
( or absolute extremum).]More generally, the
maximum and minimum of a set (as defined in set theory) are the greatest and a least element
in the set. Unbounded infinite sets such as the set of real numbers have no minimum and
maximum.
real-valued function f defined on a real line is said to have a local (or relative) maximum
point at the point x∗, if there exists some ε > 0 such that f(x∗) ≥ f(x)) when |x
| − x∗| < ε. The
value of the function at this point is called maximum of the function. Similarly, a function
has a local minimum point at x∗, if f(x∗) ≤ f(x) when |x − x∗| < ε.. The value of the function at
this point is called minimum of the function. A function has a global (or absolute)
maximum point at x∗ if f(x∗) ≥ f(x) for all x. Similarly, a function has a global (or absolute)
minimum point at x∗ if f(x∗) ≤ f(x) for all x.. The global maximum and global minimum
points are also known as the arg max and arg min: in: the argument (input) at which the
maximum (respectively, minimum) occurs.
Restricted domains:: There may be maxima and minima for a function whose domain does not
include all real numbers.. A real-valued
real function, whose domain is any set,
set can have a global
maximum and minimum. There may also be local maxima and local minima points, but only
at points of the domain set where the concept of neighborhood is defined. A neighborhood
| − x∗| < ε.
plays the role of the set of x such that |x
161
A continuous (real-valued)
valued) function on a compact set always takes maximum and minimum
values on that set. An important example is a function whose domain is a closed (and
bounded) interval of real numbers (see the graph above). The neighborhood requirement
precludes a local maximum or minimum at an endpoint of an interval. However, an endpoint
may still be a global maximum or minimum. Thus it is not always true, for finite domains,
that a global maximum (minimum) must also be a local maximum (minimum).
Finding functional
unctional maxima and minima
Local extrema can be found by Fermat's theorem,, which states that they must occur at critical
points.. One can distinguish whether a critical point is a local maximum or local minimum by
using the first derivative test or second derivative test.
For any function that is defined piecewise,, one finds a maxima (or minima) by finding the
maximum (or minimum) of each piece separately; and then seeing which one is biggest (or
smallest).
Examples
162
• The function |x|| has a global minimum at x = 0 that cannot be found by taking derivatives,
because the derivative does not exist at x = 0.
• ±2π, ±4π,
The function cos(x)) has infinitely many global maxima at 0, ±2π, ±4π …, and infinitely
many global minima at ±π, ±π ±3π, ….
• The function 2 cos(x) − x has infinitely many local maxima and minima, but no global
maximum or minimum.
• The function cos(3πx)/x with 0.1 ≤ x ≤ 1.1 has a global maximum at x = 0.1 (a boundary),
a global minimum near x = 0.3, a local maximum near x = 0.6, and a local minimum near
x = 1.0. (See figure at top of page.)
• The function x3 + 3x2 − 2x + 1 defined over the closed interval (segment) [−4,2]
[ has two
√15 √15
extrema: one local maximum at x = −1− ⁄3, one local minimum at x = −1+ ⁄3, a global
maximum at x = 2 and a global minimum at x = −4.
For functions of more than one variable, similar conditions apply. For example, in the
(enlargeable) figure at the right, the necessary conditions for a local maximum are similar to
those of a function with only one variable. The first partial derivatives as to z (the variable to
be maximized) are zero at the maximum (the glowing dot on top in the figure). The second
partial derivatives are negative. These are only necessary,
necessary, not sufficient, conditions for a local
maximum because of the possibility of a saddle point.. For use of these conditions to solve for
a maximum, the function z must also be differentiable throughout. The second partial
derivative test can help classify the point as a relative maximum or relative minimum.
In contrast, there are substantial differences between functions of one variable and functions
of more than one variable in the identification of global extrema. For example, if a bounded
differentiable function f defined on a closed interval in the real line has a single critical point,
which is a local minimum, then it is also a global minimum (use the intermediate value
theorem and Rolle's theorem to prove this by reductio ad absurdum). ). In two and more
dimensions, this argument fails, as the function
shows. Its only critical point is at (0,0), which is a local minimum with ƒ(0,0) = 0. However,
it cannot be a global one, because ƒ(4,1) = −11.
163
The global maximum is the point at the top Counterexample
In relation to sets
Maxima and minima are more generally defined for sets. In general, if an ordered set S
has a greatest element m,, m is a maximal element. Furthermore, if S is a subset of an
ordered set T and m is the greatest element of S with respect to order induced by T, m is a
least upper bound of S in T. The similar result holds for least element,
element minimal element
and greatest lower bound.
TAYLOR SERIES
The Maclaurin series for (1 − x)−1 for |x| < 1 is the geometric series
By integrating the above Maclaurin series we find the Maclaurin series for log(1 − x), where
log denotes the natural logarithm:
logarithm
164
The Taylor series for the exponential function ex at a = 0 is
The above expansion holds because the derivative of ex with respect to x is also ex and e0
equals 1. This leaves the terms (x − 0)n in the numerator and n!! in the denominator for each
term in the infinite sum.
History
The Greek philosopher Zeno considered the problem of summing an infinite series to achieve
a finite result, but rejected it as an impossibility: the result was Zeno's paradox.
paradox Later,
Aristotle proposed a philosophical resolution of the paradox, but the mathematical content
was apparently unresolved until taken up by Democritus and then Archimedes.
Archimedes It was through
Archimedes's method of exhaustion that an infinite number of progressive subdivisions could
be performed to achieve a finite result. Liu Hui independently employed a similar method a
few centuries later
In the 14th century, the earliest examples of the use of Taylor series and closely related
methods were given by Madhava of Sangamagrama Though no record of his h work survives,
writings of later Indian mathematicians suggest that he found a number of special cases of the
Taylor series, including those for the trigonometric functions of sine, cosine,
cosine tangent, and
arctangent. The Kerala school of astronomy and mathematics further expanded his works
with various series expansions and rational approximations until the 16th century.
In the 17th century, James Gregory also worked in this area and published several Maclaurin
series. It was not until 1715 however that a general method
method for constructing these series for
all functions for which they exist was finally provided by Brook Taylor,
Taylor after whom the
series are now named.
The Maclaurin series was named after afte Colin Maclaurin,, a professor in Edinburgh, who
published the special case of the Taylor result in the 18th century.
Analytic functions
165
The function e−1/x² is not analytic at x = 0: the Taylor series is identically 0, although the
function is not.
If f(x)) is given by a convergent power series in an open disc (or interval in the real line)
centered at b,, it is said to be analytic in this disc. Thus for x in this disc, f is given by a
convergent power series
and so the power series expansion agrees with the Taylor series. Thus a function is analytic in
an open disc centered at b if and only if its Taylor series converges to the value of the
function at each point of the disc.
If f(x)) is equal to its Taylor series everywhere it is called entire.. The polynomials and the
exponential function ex and the trigonometric functions sine and cosine are examples of entire
functions. Examples of functions that are not entire include the logarithm,
logarithm the trigonometric
function tangent, and its inverse arctan.. For these functions the Taylor series do not converge
if x is far from a.. Taylor series can be used to calculate the value of an entire
enti function in
every point, if the value of the function, and of all of its derivatives, are known at a single
point.
4.0 CONCLUSION
In this unit, you have been introduced to partial derivative in calculus and some higher order
partial derivative. Clairauts
uts theorem was stated and applied.You have been introduced to
166
Maxima and minima, functions of more than one variable and the relation of maxima and
minima to set.
5.0 SUMMARY
Clairauts theorem
Analytical function
7.0 REFERENCES
Thomas, George B.; Weir, Maurice D.; Hass, Joel (2010). Thomas' Calculus: Early
Transcendentals (12th ed.). Addison-Wesley. ISBN 0-321-58876-2.
• Apostol, Tom (1967), Calculus, Jon Wiley & Sons, Inc., ISBN 0-471-00005-1.
• Bartle; Sherbert (2000), Introduction to Real Analysis (3rd ed.), John Wiley & Sons, Inc.,
ISBN 0-471-32148-6.
• Hörmander, L. (1976), Linear Partial Differential Operators, Volume 1, Springer-Verlag,
ISBN 978-3540006626.
• Klein, Morris (1998), Calculus: An Intuitive and Physical Approach, Dover, ISBN 0-
486-40453-6.
• Pedrick, George (1994), A First Course in Analysis, Springer-Verlag, ISBN 0-387-
94108-8.
• Stromberg, Karl (1981), Introduction to classical real analysis, Wadsworth, Inc.,
ISBN 978-0534980122.
• Rudin, Walter (1987), Real and complex analysis, 3rd ed., McGraw-Hill Book Company,
ISBN 0-07-054234
167
UNIT 2 :TAYLOR SERIES OF EXPANSION FOR FUNCTIONS OF
TWO VARIABLES
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 Definition of tailors series of expansion
3.2 Analytical function
3.3 Uses of taylor series for analytical functions
3.4 Approximation and convergence
3.5 List of maclaurine series of some common function
3.6 Calculation of tailors series
3.7 Taylors series in several variable
3.8 Fractional taylor series
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED
MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
Introduction
As the degree of the Taylor polynomial rises, it approaches the correct function. This image
shows sin x (in black) and Taylor approximations, polynomials of degree 1, 3, 5, 7, 9, 11 and
168
The exponential function (in blue), and the sum of the first n+1
+1 terms of its Taylor series at 0
(in red).
The concept of a Taylor series was formally introduced by the English mathematician Brook
Taylor in 1715. If the Taylor series is centered at zero, then that series is also called a
Maclaurin series,, named after the Scottish mathematician Colin Maclaurin,
Maclaurin who made
extensive use of this special case of Taylor series in the 18th century.
OBJECTIVE
169
Taylors series in several variables
Fractional taylor series
Definition
The Taylor series of a real or complex function ƒ(x) that is infinitely differentiable in a
neighborhood of a real or complex number a is the power series
where n! denotes the factorial of n and ƒ (n)(a) denotes the nth derivative of ƒ evaluated at the
point a.. The zeroth derivative of ƒ is defined to be ƒ itself and (x − a)0 and 0! are both defined
to be 1. In the case that a = 0, the series is also called a Maclaurin series.
Examples
The Maclaurin series for (1 − x)−1 for |x| < 1 is the geometric series
By integrating the above Maclaurin series we find the Maclaurin series for log(1 − x), where
log denotes the natural logarithm:
logarithm
170
The Taylor series for the exponential function ex at a = 0 is
Analytic functions
The function e−1/x² is not analytic at x = 0: the Taylor series is identically 0, although the
function is not.
If f(x)) is given by a convergent power series in an open disc (or interval in the real line)
centered at b,, it is said to be analytic in this disc. Thus for x in this disc, f is given by a
convergent power series
and so the power series expansion agrees with the Taylor series. Thus a function is analytic in
an open disc centered at b if and only if its Taylor series converges to the value of the
function at each point of the disc.
171
every point, if the value of the function, and of all of its derivatives, are known at a single
point.
Differentiation and integration of power series can be performed term by term and is hence
particularly easy.
The (truncated) series can be used to compute function values numerically, (often by
recasting the polynomial into the Chebyshev form and evaluating it with the Clenshaw
algorithm).
Algebraic operations can be done readily on the power series representation; for instance the
Euler's formula follows from Taylor series expansions for trigonometric and exponential
functions. This result is of fundamental importance in such fields as harmonic analysis.
analysis
The sine function (blue) is closely approximated by its Taylor polynomial of degree 7 (pink)
for a full period centered at the origin.
172
The Taylor polynomials for log(1+x)
log( ) only provide accurate approximations in the range −1 <
x ≤ 1. Note that, for x > 1, the Taylor polynomials of higher degree are worse approximations.
In general, Taylor series need not be convergent at all. And in fact the set of functions with a
convergent Taylor series is a meager set in the Fréchet space of smooth functions.
functions Even if the
Taylor series of a function f does converge,
converge, its limit need not in general be equal to the value
of the function f(x).
). For example, the function
173
is infinitely differentiable at x = 0, and has all derivatives
erivatives zero there. Consequently, the
Taylor series of f(x) about x = 0 is identically zero. However, f(x)) is not equal to the zero
function, and so it is not equal to its Taylor series around the origin.
In real analysis,, this example shows that there are infinitely differentiable functions f(x)
whose Taylor series are not equal to f(x)) even if they converge. By contrast in complex
analysis there are no holomorphic functions f(z)) whose Taylor series converges to a value
). The complex function e−z−2 does not approach 0 as z approaches 0 along
different from f(z).
the imaginary
inary axis, and its Taylor series is thus not defined there.
More generally, every sequence of real or complex numbers can appear as coefficients in the
Taylor series of an infinitely differentiable function defined on the real line, a consequence of
Borel's lemma (see also Non-analytic
Non smooth function#Application to Taylor series).
series As a
result, the radius of convergence of a Taylor series can be zero. There are even infinitely
differentiable functions defined
efined on the real line whose Taylor series have a radius of
convergence 0 everywhere.[5]
Some functions cannot be written as Taylor series because they have a singularity;
singularity in these
cases, one can often still achieve a series expansion if one allows also negative powers of the
variable x; see Laurent series.. For example, f(x) = e−x−2 can be written as a Laurent series.
There is, however, a generalization[6][7] of the Taylor series that does converge to the value of
the function itself for any bounded continuous function on (0,∞), ), using the calculus of finite
differences. Specifically, onee has the following theorem, due to Einar Hille,
Hille that for any t > 0,
Here ∆n
h is the n-th
th finite difference operator with step size h.. The series is precisely the Taylor
series, except that divided differences appear in place of differentiation: the series is
formally similar to the Newton series.
series When the function f is analytic at a, the terms in
the series converge to the terms of the Taylor series, and in this sense generalizes the
usual Taylor series.
In general, for any infinite sequence ai, the following power series identity holds:
So in particular,
The series on the right is the expectation value of f(a + X), where X is a Poisson distributed
random variable that takes the value jh with probability e−t/h(t/h)j/j!.
!. Hence,
174
The law of large numbers implies that the identity holds.
Several important Maclaurin series expansions follow.All these expansions are valid
for complex arguments x.
Exponential function:
175
Natural logarithm:
Square root:
Binomial series (includes the square root for α = 1/2 and the infinite geometric series for α =
−1):
176
Trigonometric functions:
Hyperbolic functions:
177
Lambert's W function:
Several methods exist for the calculation of Taylor series of a large number of functions. One
can attempt to use the Taylor series as-is
as is and generalize the form of the coefficients, or one
can use manipulations such as substitution, multiplication or division,
division, addition or subtraction
of standard Taylor series to construct the Taylor series of a function, by virtue of Taylor
series being power series. In some cases, one can also derive the Taylor series by repeatedly
applying integration by parts.. Particularly convenient is the use of computer algebra systems
to calculate Taylor series.
First example
We have for the natural logarithm (by using the big O notation)
The latter series expansion has a zero constant term,, which enables us to substitute the second
omit terms of higher order than the 7th degree by using
series into the first one and to easily omit
the big O notation
178
function, the coefficients for all the odd powers x, x3, x5, x7, ...
Since the cosine is an even function,
have to be zero.
Second example
179
Collecting the terms up to fourth order yields
Comparing coefficients with the above series of the exponential function yields the desired
Taylor series
Comparing coefficients with the above series of the exponential function yields the desired
Taylor series
Third example
Here we use a method called "Indirect Expansion" to expand the given function. This method
uses the known function of Taylor series for expansion.
(1 + x)ex.
Thus,
180
Taylor series in several variables
The Taylor series may also be generalized to functions of more than one variable with
For example, for a function that depends on two variables, x and y,, the Taylor series to
second order about the point (a,
( b) is:
A second-order
order Taylor series expansion of a scalar-valued
scalar valued function of more than one variable
can be written compactly as
Example
181
order Taylor series approximation (in gray) of a function f(x,y) = exlog (1 + y) around
Second-order
origin.
Compute a second-order
order Taylor series expansion around point (a,b)) = (0,0) of a function
182
Caputo fractional derivative, , and indicating the limit as we approach
from the right, the fractional Taylor series can
ca be written as
4.0 CONCLUSION
In this unit, you have defined tailors series of function of two variable. You have studied
analytical function and have used tailors series to solve problem s that involve analytical
functions. You have studied approximation and convergence. You have also studied the list
of maclaurine series of some common functions and have done some calculation of tailors
series. You have also studied tailors in several variables and the fractional taylor series.
5.0 SUMMARY
1
1.Use the tailor series to expand F(z) = about the point z = 1 ,and find the values of z
z +1
for which the expansion is valid.
1
2.Use the tailor series to expand F(x) = about the point x = 1 ,and find the values of z
x+2
for which the expansion is valid.
1
3.Use the tailor series to expand F(x) = about the point x = 2 ,and find the values
( x − 2)
2
1
4.Use the tailor series to expand F(x) = about the point x = 2 ,and find the values
( x + 4)
2
183
2
5.Use the tailor series to expand F(b) = about the point b = 1 ,and find the values
(b + 2)
3
REFERENCES
Abramowitz, Milton; Stegun, Irene A. (1970), Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables, New York: Dover Publications, Ninth printing
Thomas, George B. Jr.; Finney, Ross L. (1996), Calculus and Analytic Geometry (9th ed.),
Addison Wesley, ISBN 0-201-53174-7
Greenberg, Michael (1998), Advanced Engineering Mathematics (2nd ed.), Prentice Hall,
ISBN 0-13-321431
Arfken, G. "Taylor's Expansion." §5.6 in Mathematical Methods for Physicists, 3rd ed.
Orlando, FL: Academic Press, pp. 303-313, 1985.
Askey, R. and Haimo, D. T. "Similarities between Fourier and Power Series." Amer. Math.
Monthly 103, 297-304, 1996.
Comtet, L. "Calcul pratique des coefficients de Taylor d'une fonction algébrique." Enseign.
Math. 10, 267-270, 1964.
184
UNIT 3 : APPLICATIONS OF TAYLOR SERIES
CONTENT
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
We started studying Taylor Series because we said that polynomial functions are easy and
that if we could find a way of representing complicated functions as series ("infinite
polynomials") then maybe some properties of functions would be easy to study too. In this
section, we'll show you a few ways in Taylor series can make life easy.
2.0 OBJECTIVES
At the end of this unit, you should be able to :
Remember that we've said that some functions have no antiderivative which can be expressed
in terms of familiar functions. This makes evaluating definite integrals of these functions
difficult because the Fundamental Theorem of Calculus cannot be used. However, if we have
a series representation of a function, we can often times use that to evaluate a definite
integral.
185
The integrand has no antiderivative expressible in terms of familiar
familiar functions. However, we
know how to find its Taylor series: we know that
In spite of the fact that we cannot antidifferentiate the function, we can antidifferentiate the
Taylor series:
Sometimes, a Taylor series can tell us useful information about how a function behaves in an
important part of its domain. Here is an example which will demonstrate.
A famous fact from electricity and magnetism says that a charge q generates an electric field
whose strength is inversely proportional to the square of the distance from
from the charge. That is,
at a distance r away from the charge, the electric field is
Often times an electric charge is accompanied by an equal and opposite charge nearby. Such
an object is called an electric dipole. To describe this, we will put a charge q at the point
and a charge -q at .
186
Along the x axis, the strength of the electric fields
fields is the sum of the electric fields from each
of the two charges. In particular,
If we are interested in the electric field far away from the dipole, we can consider what
happens for values of x much larger than d. We will use a Taylor series to study the
behaviour in this region.
187
In other words, far away from the dipole where x is very large, we see that the electric field
strength is proportional to the inverse cube of the distance. The two charges partially cancel
one another out to produce a weaker electric field at a distance.
This example is similar is spirit to the previous one. Several times in this course, we have
used the
he fact that exponentials grow much more rapidly than polynomials. We recorded this
by saying that
for any exponent n . Let's think about this for a minute because it is an important property of
exponentials. The ratio is measuring how large the exponential
exponential is compared to the
polynomial. If this ratio was very small, we would conclude that the polynomial is larger than
the exponential. But if the ratio is large, we would conclude that the exponential is much
larger than the polynomial. The fact that this
this ratio becomes arbitrarily large means that the
exponential becomes larger than the polynomial by a factor which is as large as we would
like. This is what we mean when we say "an exponential grows faster than a polynomial."
To see why this relationship holds, we can write down the Taylor series for .
To see why this relationship holds, we can write down the Taylor series for .
Notice that this last term becomes arbitrarily large as . That implies that the ratio we
are interested in does as well:
Basically, the exponential grows faster than any polynomial because it behaves like an
infinite polynomial whose coefficients are all positive.
Some differential equations cannot be solved in terms of familiar functions (just as some
functions do not have antiderivatives which can be expressed in terms of familiar functions).
188
However, Taylor series can come to the rescue again. Here we will present
presen two examples to
give you the idea.
Of course, we know that the solution is , but we will see how to discover this in a
different way. First, we will write out the solution in terms of its Taylor series:
ser
We also have
Since the differential equation says that , we can equate these two Taylor series:
Of course, this is an intial value problem we know how to solve. The real value of this
method is in studying initial value problems that we do not know how to solve.
189
This equation is important in optics. In fact, it explains why a rainbow appears the way in
which it does! As before, we will write the solution as a series:
If we continue in this way, we can write down many terms of the series (perhaps you see the
pattern already?) and then draw a graph of the solution. This looks like this:
Notice that the solution oscillates to the left of the origin and grows like an exponential
exp to the
right of the origin. Can you explain this by looking at the differential equation
4.0 CONCLUSION
190
In this unit, you have been introduced to the application of taylors series and
some basic ways of using taylors series such as the evaluating of definite integrals,
understanding the asymptotic behaviour, understanding the growth of functions and solving
differential equations. Some examples where used to illustrate the applications.
5 SUMMARY
Having gone through this unit, you now know that;
Inn this section, we show you ways in which Taylor series can make life easy :
i. In evaluating definite integrals , we used series representation of a function to
evaluate some functions that have no antiderivative .
In spite of the fact that we cannot antidifferentiate the function, we can antidifferentiate the
Taylor series:
(ii) We used taylors series to understand asymptotic behaviour of functions that behave in
the important part of the domain . And some examples
examples are shown to demonstrate ,
(iii) Taylors series is used to understand the growth of functions. Because we know the fact
that exponentials grow much more rapidly than polynomials. We recorded this by saying that
Tutor-Marked Assignment
1. Compute a second-order
second order Taylor series expansion around point (a,b)
( = (0,0) of
a function
191
x
F(x,y)= e log(2+y)
xy
2. Show that the taylor series expansion of f(x,y) = e about the point (2,3) .
7.0 REFERENCE
Arfken and Weber, Mathematical Methods for Physicists, 6th Edition, 352-354, Academic
press 200
[2] http://www.ugrad.math.ubc.ca/coursedoc/math101 /notes/series/appsTaylor.html
September,29 2008
[3] Broadhead MK (Broadhead, Michael K.) , geophysical prospecting 56, 5, 729-735 SEP
2008
[4] Guyenne P (Guyenne, Philippe), Nicholls DP (Nicholls, David P.), siam journal on
scientific computing 30, 1, 81-101, 2007
[5] Popa C (Popa, Cosmin), IEEE transactions on very large scale integration (VLSI)
systems,16, 3,318-321, MAR
2008
[6] Janssen AJEM (Janssen, A. J. E. M.), van Leeuwaarden JSH (van Leeuwaarden, J. S. H.)
stochastic processes and their applications 117 ,12, 1928-1959, DEC 2007
[7] Sahu S (Sahu, Sunil), Baker AJ (Baker, A. J.) Source: international journal for numerical
methods in fluids 55, 8, 737-783, NOV 20, 2007
192
MODULE 7 MAXIMA AND MINIMA OF FUNCTIONS OF SEVERAL
VARIABLES, STATIONARY POINT, LAGRANGE’S METHOD OF MULTIPLIERS
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 Recognise problems on maximum and minimum functions of several variables
3.2 Necessary condition for a maxima or minima function of several variable
3.3 Sufficient condition for a maxima or minima function of several variable
3.4 Maxima and minima of functions subject to constraints
3.5 Method of finding maxima and minima of functions subject to constraints
3.6Identify the different types of examples of maxima and minima functions of several
variables
3.7Solve problems on maxima and minima functions of several variables
4.0CONCLUSION
5.0SUMMARY
6.0TUTOR-MARKED ASSIGNMENT
7.0REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
Def. Stationary (or critical) point. For a function y = f(x) of a single variable, a stationary
(or critical) point is a point at which dy/dx = 0; for a function u = f(x1, x2, ... , xn) of n
variables it is a point at which
In the case of a function y = f(x) of a single variable, a stationary point corresponds to a point
on the curve at which the tangent to the curve is horizontal. In the case of a function y = f(x,
y) of two variables a stationary point corresponds to a point on the surface at which the
tangent plane to the surface is horizontal.
In the case of a function y = f(x) of a single variable, a stationary point can be any of the
following three: a maximum point, a minimum point or an inflection point. For a function y =
f(x, y) of two variables, a stationary point can be a maximum point, a minimum point or a
saddle point. For a function of n variables it can be a maximum point, a minimum point or a
point that is analogous to an inflection or saddle point.
2.0 OBJECTIVE
193
At the end of this unit, you should be able to :
- recognise problems on maximum and minimum functions of several variables
- know the necessary condition for a maxima or minima function of several variable
- know the Sufficient condition for a maxima or minima function of several variable
- identify the maxima and minima of functions subject to constraints
- know the method of finding maxima and minima of functions subject to constraints
- identify the different types of examples of maxima and minima functions of several
variables
- solve problems on maxima and minima functions of several variables
A function f(x1, x2, ... , xn) of n independent variables has a maximum at a point (x1', x2', ... ,
xn') if f(x1', x2', ... , xn') f(x1, x2, ... , xn) at all points in the neighborhood of (x1', x2', ... , xn').
Such a function has a minimum at a point (x1', x2', ... , xn') if f(x1', x2', ... , xn') f(x1, x2, ... ,
xn) at all points in the neighborhood of (x1', x2', ... , xn').
Necessary condition for a maxima or minima. A necessary condition for a function f(x, y)
of two variables to have a maxima or minima at point (x0, y0) is that
In the case of a function f(x1, x2, ... , xn) of n variables, the condition for the function to have
a maximum or minimum at point (x1', x2', ... , xn') is that
To find the maximum or minimum points of a function we first locate the stationary points
using 1) above. After locating the stationary points we then examine each stationary point to
determine if it is a maximum or minimum. To determine if a point is a maximum or
minimum we may consider values of the function in the neighborhood of the point as well as
the values of its first and second partial derivatives. We also may be able to establish what it
is by arguments of one kind or other. The following theorem may be useful in establishing
maximums and minimums for the case of functions of two variables.
194
Sufficient condition for a maximum or minimum of a function z = f(x, y). Let z = f(x, y)
have continuous first and second partial derivatives in the neighborhood of point (x0, y0). If at
the point (x0, y0)
and
and a minimum if
If ∆ > 0 , point (x0, y0) is a saddle point (neither maximum nor minimum). If ∆ = 0 , the
nature of point (x0, y0) is undecided. More investigation is necessary.
Solution..
2x + y = 0
x + 2y = 1
x = -1/3 , y = 2/3
195
and the point is a minimum. The minimum value of the function is - 1/3.
Maxima and minima of functions subject to constraints. Let us set ourselves the following
problem: Let F(x, y) and G(x, y) be functions defined over some region R of the x-y plane.
Find the points at which the function F(x, y) has maximums subject to the side condition G(x,
y) = 0. Basically we are asking the question: At what points on the solution set of G(x, y) = 0
does F(x, y) have maximums? The solution set of G(x, y) = 0 corresponds to some curve in
the plane. See Figure 1. The solution set (i.e. locus) of G(x, y) = 0 is shown in red. Figure 2
shows the situation in three dimensions where function z = F(x, y) is shown rising up above
the x-y plane along the curve G(x, y) = 0. The problem is to find the maximums of z = F(x, y)
along the curve G(x, y) = 0.
196
Let us now consider the same problem in three variables. Let F(x, y, z) and G(x, y, z) be
functions defined over some region R of space. Find the points at which the function F(x, y,
z) has maximums subject to the side condition G(x, y, z) = 0. Basically we are asking the
question: At what points on the solution set of G(x, y, z) = 0 does F(x, y, z) have maximums?
G(x, y, z) = 0 represents some surface in space. In Figure 3, G(x, y, z) = 0 is depicted as a
spheroid in space. The problem then is to find the maximums of the function F(x, y, z) as
evaluated on this spheroidal surface.
Let us now consider another problem. Suppose instead of one side condition we have two.
Let F(x, y, z), G(x, y, z) and H(x, y, z) be functions defined over some region R of space.
Find the points at which the function F(x, y, z) has maximums subject to the side conditions
2) G(x, y, z) = 0
3) H(x, y, z) = 0.
Here we wish to find the maximum values of F(x, y, z) on that set of points that satisfy both
equations 2) and 3). Thus if D represents the solution set of G(x, y, z) = 0 and E represents
the solution set of H(x, y, z) = 0 we wish to find the maximum points of F(x, y, z) as
evaluated on set F = D E (i.e. the intersection of sets D and E). In Fig. 4 G(x, y, z) = 0 is
depicted as an ellipsoid and H(x, y, z) = 0 as a plane. The intersection of the ellipsoid and the
plane is the set F on which F(x, y, z) is to be evaluated.
197
The above can be generalized to functions of n variables F(x1, x2, ... , xn), G(x1, x2, ... , xn),
etc. and m side conditions.
6) F1Φ3 - F3Φ1 = 0
From the pair of equations consisting of the second equation in 4) and 5) we can eliminate
giving
7) F2Φ3 - F3Φ2 = 0
198
Equations 8) combined with the equation Φ(x, y, z) = 0 give us three equations which we
can solve simultaneously for x, y, z to obtain the stationary points of function F(x, y, z).
The maxima and minima will be among the stationary points.
This same method can be used for functions of an arbitrary number of variables and an
arbitrary number of side conditions (smaller than the number of variables).
Extrema for a function of four variables with two auxiliary equations. Suppose we wish
to find the maxima or minima of a function
u = F(x, y, z, t)
9) Φ(x, y, z, t) = 0 ψ(x, y, z, t) = 0.
199
Equations 14) combined with the auxiliary equations Φ(x, y, z, t) = 0 and ψ(x, y, z, t) = 0
give us four equations which we can solve simultaneously for x, y, z, t to obtain the
stationary points of function F(x, y, z, t). The maxima and minima will be among the
stationary points.
are
can be solved simultaneously for the n variables x1, x2, ... .xn to obtain the stationary
points of F(x1, x2, ... .xn). The maxima and minima will be among the stationary points.
*********************************
200
Geometrical interpretation for extrema of function F(x, y, z) with a constraint. We shall
now present a theorem that gives a geometrical interpretation for the case of extremal values
of functions of type F(x, y, z) with a constraint.
Theorem 1. Suppose the functions F(x, y, z) and Φ(x, y, z) have continuous first partial
derivatives throughout a certain region R of space. Let the equation Φ(x, y, z) = 0 define a
surface S, every point of which is in the interior of R, and suppose that the three partial
derivatives Φ1, Φ2, Φ3 are never simultaneously zero at a point of S. Then a necessary
condition for the values of F(x, y, z) on S to attain an extreme value (either relative or
absolute) at a point of S is that F1, F2, F3 be proportional to Φ1, Φ2, Φ3 at that point. If C is the
value of F at the point, and if the constant of proportionality is not zero, the geometric
meaning of the proportionality is that the surface S and the surface F(x, y, z) = C are tangent
at the point in question.
Rationale behind theorem. From 8) above, a necessary condition for F(x, y, z) to attain a
maxima or minima (i.e. a condition for a stationary point) at a point P is that
or
Thus at a stationary point the partial derivatives F1, F2, F3 and Φ1, Φ2, Φ3 are proportional.
Now the partial derivatives F1, F2, F3 and Φ1, Φ2, Φ3 represent the gradients of the functions F
and Φ; and the gradient, at any point P, of a scalar point function ψ(x, y, z) is a vector that is
normal to that level surface of ψ(x, y, z) that passes through point P. If C is the value of F at
the stationary point P, then the vector (F1, F2, F3) at point P is normal to the surface F(x, y, z)
= C at P. Similarly, the vector (Φ1, Φ2, Φ3) at point P is normal to the surface Φ(x, y, z) = 0 at
P. Since the partial derivatives F1, F2, F3 and Φ1, Φ2, Φ3 are proportional, the normals to the
two surfaces point in the same direction at P and the surfaces must be tangent at point P.
Since F(x, y, z) is the square of the distance from (x, y, z) to the origin, it is clear that we are
looking for the points at maximum and minimum distances from the center of the ellipsoid.
The maximum occurs at the ends of the longest principal axis, namely at ( 8, 0, 0). The
minimum occurs at the ends of the shortest principal axis, namely at (0, 0, 5). Consider the
maximum point (8, 0, 0). The value of F at this point is 64, and the surface F(x, y, z) = 64 is a
sphere. The sphere and the ellipsoid are tangent at (8, 0, 0) as asserted by the theorem. In this
case the ratios G1:G2:G3 and F1:F2:F3 at (8, 0, 0) are 1/4 : 0 : 0 and 16 : 0 : 0 respectively.
201
This example brings out the fact that the tangency of the surfaces (or the proportionality of
the two sets of ratios), is a necessary but not a sufficient condition for a maximum or
minimum value of F, for we note that the condition of proportionality exists at the points (0,
6, 0), which are the ends of the principal axis of intermediate length. But the value of F in
neither a maximum nor a minimum at this point.
where λ is a constant (i.e. a parameter) to which we will later assign a value, and then finding
the maxima and minima of the function G(x, y, z). A reader might quickly ask, “Of what
interest are the maxima and minima of the function G(x, y, z)? How does this help us solve
the problem of finding the maxima and minima of F(x, y, z)?” The answer is that examination
of 17) shows that for those points corresponding to the solution set of Φ(x, y, z) = 0 the
function G(x, y, z) is equal to the function F(x, y, z) since at those points equation 17)
becomes
We then solve these three equations along with the equation of constraint Φ(x, y, z) = 0 to
find the values of the four quantities x, y, z, λ. More than one point can be found in this way
and this will give us the locations of the stationary points. The maxima and minima will be
among the stationary points thus found.
Let us now observe something. If equations 18) are to hold simultaneously, then it follows
from the third of them that λ must have the value
If we substitute this value of λ into the first two equations of 18) we obtain
202
F1Φ3 - F3Φ1 = 0 F2Φ3 - F3Φ2 = 0
or
We note that the two equations of 19) are identically the same conditions as 8) above for the
previous method. Thus using equations 19) along with the equation of constraint Φ(x, y, z) =
0 is exactly the same procedure as the previous method in which we used equations 8) and
the same constraint.
One of the great advantages of Lagrange’s method over the method of implicit functions or
the method of direct elimination is that it enables us to avoid making a choice of independent
variables. This is sometimes very important; it permits the retention of symmetry in a
problem where the variables enter symmetrically at the outset.
Lagrange’s method can be used with functions of any number of variables and any number of
constraints (smaller than the number of variables). In general, given a function F(x1, x2, ... ,
xn) of n variables and h side conditions Φ1 = 0, Φ2 = 0, .... , Φh = 0, for which this function
may have a maximum or minimum, equate to zero the partial derivatives of the auxiliary
function F + λ1Φ1 + λ2Φ2 + ...... + λhΦh with respect to x1, x2, ... , xn , regarding λ1, λ2, ..... ,λh
as constants, and solve these n equations simultaneously with the given h side conditions,
treating the λ’s as unknowns to be eliminated.
Example 1.
f_x=0 if 1-x^2=0 or the exponential term is 0. f_y=0 if -2y=0 or the exponential term is 0.
The exponential term is not 0 except in the degenerate case. Hence we require 1-x^2=0 and -
2y=0, implying x=1 or x=-1 and y=0. There are two critical points (-1,0) and (1,0)
203
How can we determine if the critical points found above are relative maxima or minima? We
apply a second derivative test for functions of two variables.
• If D>0 and f xx
( xc ,y ) 〈 0,
c
then f(x,y) has a relative maximum at x ,y .
c c
• If D>0 and f ( x , y ) 〈 0,
c
then f(x,y) has a relative minimum at x ,y .
c
xx c c
Example: Continued
For x=1 and y=0, we have D(1,0)=4exp(4/3)>0 with f_xx(1,0)=-2exp(2/3)<0. Hence, (1,0) is
a relative maximum. For x=-1 and y=0, we have D(-1,0)=-4exp(-4/3)<0. Hence, (-1,0) is a
saddle point.
Another example of a bounded region is the disk of radius 2 centered at the origin. We
proceed as in the previous example, determining in the 3 classes above. (1,0) and (-1,0) lie in
the interior of the disk.
The boundary of the disk is the circle x^2+y^2=4. To find extreme points on the disk we
parameterize the circle. A natural parameterization is x=2cos(t) and y=2sin(t) for
0<=t<=2*pi. We substitute these expressions into z=f(x,y) and obtain
204
On the circle, the original functions of 2 variables is reduced to a function of 1 variable. We
can determine the extrema on the circle using techniques from calculus of on variable.
In this problem there are not any corners. Hence, we determine the global max and min by
considering points in the interior of the disk and on the circle. An alternative method for
finding the maximum and minimum on the circle is the method of Lagrange multipliers.
4.0 CONCLUSION
You have been introduced to maximum and minimum functions of several variables,
necessary condition for a maxima or minima function of several variables, problems on
maximum and minimum functions of several variables e.t.c
5.0 SUMMARY
A summary of maximum and minimum functions of several variables are as follows :
A function f(x, y) of two independent variables has a maximum at a point (x0, y0) if f(x0, y0)
f(x, y) for all points (x, y) in the neighborhood of (x0, y0). Such a function has a minimum
at a point (x0, y0) if f(x0, y0) f(x, y) for all points (x, y) in the neighborhood of (x0, y0).
Solve the following problem, Find the maxima and minima of function z = x2 + xy + y2 - y .
Solution..
2x + y = 0 , x + 2y = 1
x = -1/3 , y = 2/3
and the point is a minimum. The minimum value of the function is - 1/3.
205
3.Determine the critical points and locate any relative minimum, maxima and saddle points of
functions f defined by
4
F(x,y)= x − + 4 xy
4
y
Determine the critical points of the functions below and find out whether each point
corresponds to a relative minimum, maximum and saddle point, or no conclusion can be
made
2
4.F(x,y)= x + 3 y − 2 xy − 8 x
2
3 2
5. F(x,y)= x + 12 x + + 3 y − 9y
3
y
7.0 REFERENCES.
Taylor. Advanced Calculus
Osgood. Advanced Calculus.
James and James. Mathematics Dictionary.
Mathematics, Its Content, Methods and Meaning. Vol.
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
3.1 Handling multiple constraints
3.2 interpretation of the langrange multiplie
3.3 example
3.4 applications
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
206
INTRODUCTION
Figure 1: Find x and y to maximize f(x,y)) subject to a constraint (shown in red) g(x,y) = c.
Figure 2: Contour map of Figure 1. The red line shows the constraint g((x,y) = c. The blue
lines are contours of f(x,y).
). The point where the red line tangentially touches a blue contour is
our solution.
In mathematical optimization,
optimization the method of Lagrange multipliers (named after Joseph
Louis Lagrange)) provides a strategy for finding the maxima and minima of a function subject
to constraints.
maximize
subject to
207
where the λ term may be either added or subtracted. If f(x,y)) is a maximum for the original
constrained problem, then there exists λ such that (x,y,λ) is a stationary point for the Lagrange
function (stationary points are those points where the partial derivatives of Λ are zero).
However, not all stationary points yield a solution of the original problem. Thus, the method
of Lagrange multipliers yields a necessary condition for optimality in constrained problems
2.0 OBJECTIVES
One of the most common problems in calculus is that of finding maxima or minima (in
general, "extrema")
trema") of a function, but it is often difficult to find a closed form for the function
being extremized. Such difficulties often arise when one wishes to maximize or minimize a
function subject to fixed outside conditions or constraints. The method of Lagrange Lag
multipliers is a powerful tool for solving this class of problems without the need to explicitly
solve the conditions and use them to eliminate extra variables.
maximize
subject to
Suppose we walk along the contour line with g = c.. In general the contour lines of f and g
may be distinct, so following the contour line for g = c one could intersect with or cross the
contour lines of f.. This is equivalent to saying that while moving along the contour line for g
= c the value of f can vary. Only when the contour line for f g = c meets contour lines of f
tangentially,, do we not increase or decrease the value of f — that is, when the contour lines
touch but do not cross
The contour lines of f and g touch when the tangent vectors of the contour lines are parallel.
Since the gradient of a function is perpendicular to the contour lines, this is the same as
saying that the gradients of f and g are parallel. Thus we want points (x,yy) where g(x,y) = c
and
208
where
and
are the respective gradients. The constant λ is required because although the two gradient
vectors are parallel, the magnitudes of the gradient vectors are generally not equal.
and solve
The constrained extrema of f are critical points of the Lagrangian Λ,, but they are not local
extrema of Λ (see Example 2 below).
One may reformulate the Lagrangian as a Hamiltonian,, in which case the solutions are local
minima for the Hamiltonian. This is done in optimal control theory, in the form of
Pontryagin's minimum principle.
principle
The fact that solutions of the Lagrangian are not necessarily extrema also poses difficulties
for numerical optimization.
zation. This can be addressed by computing the magnitude of the
gradient, as the zeros of the magnitude are necessarily local minima, as illustrated in the
numerical optimization example.
example
209
A paraboloid, some of its level sets (aka contour lines) and 2 line constraints.
Zooming in on the levels sets and constraints, we see that the two constraint lines intersect to
form a "joint" constraint that is a point. Since there is only one point to analyze, the
corresponding point on the paraboloid is automatically a minimum and maximum. Yet the
simplified reasoning presented in sections above seems to fail because the level set definitely
def
appears to "cross" the point and at the same time its gradient is not parallel to the gradients of
either constraint. This shows we must refine our explanation of the method to handle the
kinds of constraints that are formed when we have more than one constraint acting at once.
The method of Lagrange multipliers can also accommodate multiple constraints. To see how
this is done, we need to reexamine the problem in a slightly different manner because the
concept of “crossing” discussed above becomes rapidly unclear when we consider the types
of constraints that are created when we have more than one constraint acting together.
As an example, consider a paraboloid with a constraint that hat is a single point (as might be
created if we had 2 line constraints that intersect). The level set (i.e., contour line) clearly
appears to “cross” that point and its gradient is clearly not parallel to the gradients of either of
the two line constraints. Yet, it is obviously a maximum and a minimum because there is only
one point on the paraboloid that meets the constraint.
constrain
210
While this example seems a bit odd, it is easy to understand and is representative of the sort
of “effective” constraint that appears quite often when we deal with multiple constraints
intersecting. Thus, we take a slightly different approach below to explain
explain and derive the
Lagrange Multipliers method with any number of constraints.
analyzed will be denoted by and the constraints will be represented by the equations
.
The basic idea remains essentially the same: if we consider only the points that satisfy the
constraints (i.e. are in the constraints), then a point is a stationary point (i.e. a
point in a “flat” region) of f if and only if the constraints at that point do not allow movement
in a direction where f changes value.
Once we have located the stationary points, we need to do further tests to see if we have
found a minimum, a maximum or just a stationary
stationar point that is neither.
where the notation above means the xK-component of the vector v.. The equation above
can be rewritten in a more compact geometric form that helps our intuition:
Now let us consider the effect of the constraints. Each constraint limits the directions that we
can move from a particular point and still satisfy the constraint. We can use the same
procedure, to look for the set of vectors containing the directions in which we can
211
move and still satisfy the constraint. As above, for every vector v in , the following
relation must hold:
From this, we see that at point p,, all directions from this point that will still satisfy this
constraint must be perpendicular to .
Now we are ready to refine our idea further and complete the method: a point on f is a
constrained stationary point if and only if the direction that changes f violates at least one of
the constraints.. (We can see that this is true because if a direction that changes f did not
violate any constraints, then there would a “legal” point nearby with a higher or lower value
for f and the current point would then not be a stationary point.)
If we now add another simultaneous equation to guarantee that we only perform this test
when we are at a point that satisfies the constraint, we end up with 2 simultaneous equations
that when solved, identify all constrained stationary points:
Note that the above is a succinct way of writing the equations. Fully expanded, there are N +
solv for the N + 1 variables which are λ and
1 simultaneous equations that need to be solved
:
212
Multiple constraints
For more than one constraint, the same reasoning applies. If there is more than one constraint
active together, each constraint contributes a direction that will violate it. Together, these
“violation directions” form a “violation space”, where infinitesimal
infinitesimal movement in any
direction within the space will violate one or more constraints. Thus, to satisfy multiple
constraints we can state (using this new terminology) that at the stationary points, the
direction that changes f is in the “violation space” created
created by the constraints acting jointly.
The violation space created by the constraints consists of all points that can be reached by
adding any combination of scaled and/or flipped versions of the individual violation direction
vectors. In other words, all the points that are “reachable” when we use the individual
violation directions as the basis of the space. Thus, we can succinctly state that v is in the
space defined by if and only if there exists a set of “multipliers”
such that:
As before, we now add simultaneous equation to guarantee that we only perform this test
when we are at a point that satisfies every constraint, we end up with simultaneous equations
that when solved, identify all constrained stationary points:
Solving the equation above for its unconstrained stationary points generates exactly the same
stationary points as solving for the constrained stationary points of f under the constraints
.
Often the Lagrange multipliers have an interpretation as some quantity of interest. To see
why this might be the case, observe that:
214
Examples
Example 1
Combining the first two equations yields x = y (explicitly, , otherwise (i) yields 1 = 0,
so one has x = − 1 / (2λ) = y).
215
Substituting into (iii) yields 22x2 = 1, so and , showing
the stationary points are and . Evaluating the
objective function f on these yields
Example 2
with the condition that the x and y coordinates lie on the circle around the origin with radius
√3,
3, that is, subject to the constraint
The constraint g(x, y)-33 is identically zero on the circle of radius √3. So any multiple of
g(x, y)-3 may be added to f(x, y) leaving f(x, y)) unchanged in the region of interest (above the
circle where our original constraint is satisfied). Let
The critical values of Λ occur where its gradient is zero. The partial derivatives are
216
Equation (iii) is just the original constraint. Equation (i) implies x = 0 or λ = −y. In the first
case, if x = 0 then we must have by (iii) and then by (ii) λ = 0. In the second
case, if λ = −y and substituting into equation (ii) we have that,
Then x2 = 2y2. Substituting into equation (iii) and solving for y gives this value of y:
Evaluating
ating the objective at these points, we find
Therefore, the objective function attains the global maximum (subject to the constraints) at
and the global minimum at The point is a local minimum
and is a local maximum,
maximum, as may be determined by consideration of the Hessian
matrix of Λ.
Note that while is a critical point of Λ,, it is not a local extremum. We have
. Given any neighborhood of
, we can choose a small positive and a small δ of either sign to get Λ values
both greater and less than 2.
Example: entropy
217
For this to be a probability distribution the sum of the probabilities
pro pi at each point xi must
equal 1, so our constraint is = 1:
This shows that all are equal (because they depend on λ only). By using the constraint ∑j
pj = 1, we find
Hence, the uniform distribution is the distribution with the greatest entropy, among
distributions on n points.
218
The magnitude of the gradient can be used to force the critical points to occur at local
minima.
With Lagrange multipliers, the critical points occur at saddle points,, rather than at local
maxima (or minima). Unfortunately,
Unfortunately, many numerical optimization techniques, such as hill
climbing, gradient descent,, some of the
t quasi-Newton methods,, among others, are designed
to find local maxima (or minima) and not saddle points. For this reason, one must either
modify the formulation to ensure that it's a minimization problem (for example, by
extremizing the square of the gradient of the Lagrangian as below), or else use an
optimization technique that finds stationary points (such as Newton's method without an
extremum seeking line search)) and not necessarily extrema.
As a simple example, consider the problem of finding the value of x that minimizes f(x) = x2,
constrained such that x2 = 1.. (This problem is somewhat pathological because there are only
two values that satisfy this constraint, but it is useful for illustration
illustration purposes because the
corresponding unconstrained function can be visualized in three dimensions.)
Λ(x,λ) = x2 + λ(x2 − 1)
In order to solve this problem with a numerical optimization technique, we must first
transform this problem such that the critical points occur at local minima. This is done by
computing
ting the magnitude of the gradient of the unconstrained optimization problem.
First, we compute the partial derivative of the unconstrained problem with respect to each
variable:
219
If the target function is not easily differentiable, the differential with
with respect to each variable
can be approximated as
Next, we compute the magnitude of the gradient, which is the square root of the sum of the
squares of the partial derivatives:
The critical points of h occur at x = 1 and x = − 1, just as in Λ.. Unlike the critical points in Λ,
however, the critical points in h occur at local minima, so numerical optimization techniques
can be used to find them.
CONCLUSION
In this unit, you have studied how to identify problem which could be solve by langranges
multiplier. You studied single and multiple constraints. You have studied the interpretation
of lagranges multiplier.You couls solve problems with the use of langranges multiplier.
Summary
Problems
220
Problem 1. Let be our objective function. (Note that the
coefficients are decimals 0 .3 and 0 .4 and not 3 and 4.) Let and the ellipse be
our constraint. Find the maximum and the minimum values of subject to
the curve on the graph corresponding to the values of along the ellipse
in one coordinate system. Use a parametric representation of the ellipse that you should know
from last semester. How many solutions you will
will expect the Lagrangian system of equations
to have. Explain your reasoning.
(b) Define the Lagrangian function for the optimization problem and set up the corresponding
system of equations.
(c) Find solutions to the system using the solve command. Check that you didn't obtain any
extraneous solutions. Is the number of solutions what you expected?
(d) Using results of (c), find the minimum and the maximum values of subject to
the constraint .
TUTOR-MARKED
MARKED ASSIGNMENT
221
REFERENCES/FURTHER READINGS
Lasdon, Leon S. (1970). Optimization theory for large systems. Macmillan series in
operations research. New York: The Macmillan Company. pp. xi+523. MR337317.
Lasdon, Leon S. (2002). Optimization theory for large systems (reprint of the 1970
Macmillan ed.). Mineola, New York: Dover Publications, Inc.. pp. xiii+523. MR1888251.
Lemaréchal, Claude (2001). "Lagrangian relaxation". In Michael Jünger and Denis Naddef.
Computational combinatorial optimization: Papers from the Spring School held in Schloß
Dagstuhl, May 15–19, 2000. Lecture Notes in Computer Science. 2241. Berlin: Springer-
Verlag. pp. 112–156. doi:10.1007/3-540-45586-8_4. ISBN 3-540-42877-1.
MRdoi:[http://dx.doi.org/10.1007%2F3-540-45586-8_4 10.1007/3-540-45586-8_4
1900016.[[Digital object identifier|doi]]:[http://dx.doi.org/10.1007%2F3-540-45586-8_4
10.1007/3-540-45586-8_4]].
CONTENTS
8.0 INTRODUCTION
9.0 OBJECTIVES
10.0 MAIN CONTENT
3.1: DEFINITION
11.0 CONCLUSION
12.0 SUMMARY
13.0 TUTOR-MARKED ASSIGNMENT
14.0 REFERENCES/FURTHER READINGS
1.0 Introduction
Optimization problems, which seek to minimize or maximize a real function, play an
important role in the real world. It can be classified into unconstrained op timization
problems and constrained optimization problems. Many practical uses in science,
engineering, economics, or even in our everyday life can be formulated as constrained
222
optimization problems, such as the minimization of the energy of a particle in physics;[1]
how to maximize the profit of the investments in economics.[2]In unconstrained problems,
the stationary points theory gives the necessary condition to find the extreme points of the
objective function f (x1; ¢ ¢ ¢ ; xn). The stationary points are the points where the gradient rf
is zero, that is each of the partial derivatives is zero. All the variables in f (x1; ¢ ¢ ¢ ; xn)are
independent, so they can be arbitrarily set to seek the extreme of f. However when it comes to
the constrained optimization problems, the arbitration of the variables does not exist. The
constrained optimization problems can be formulated into the standard form.
2.0 Objectives
There are many cool applications for the Lagrange multiplier method. For example, we
will show you how to find the extrema on the world famous Pringle surface. The Pringle
surface can be given by the equation
Let us bound this surface by the unit circle, giving us a very happy pringle. :) In this
case, the boundary would be
. Economics
Constrained optimization plays a central role in economics. For example, the choice problem
for a consumer is represented as one of maximizing a utility function subject to a budget
constraint. The Lagrange multiplier has an economic interpretation as the shadow price
associated with the constraint, in this example the marginal utility of income.
Control theory
In optimal control theory, the Lagrange multipliers are interpreted as costate variables, and
Lagrange multipliers are reformulated as the minimization of the Hamiltonian, in Pontryagin's
minimum principle.
223
Example 1 Find the dimensions of the box with largest volume if the total surface area is 64
cm2.
We first need to identify the function that we’re going to optimize as well as the constraint.
Let’s set the length of the box to be x, the width of the box to be y and the height of the box to
be z. Let’s also note that because we’re dealing with the dimensions of a box it is safe to
assume that x, y, and z are all positive quantities.
We want to find the largest volume and so the function that we want to optimize is given by,
Next we know that the surface area of the box must be a constant
64. So this is the constraint. The surface area of a box is simply the sum of the areas of each
of the sides so the constraint is given by,
Note that we divided the constraint by 2 to simplify the equation a little. Also, we get the
function from this.
(1)
(2)
(3)
(4)There are many ways to solve this system. We’ll solve it in the following way. Let’s
multiply equation (1) by x, equation (2) by y and equation (3) by z. This gives,
(5)
(6)
224
(7)Now notice that we can set equations (5) and (6) equal. Doing this
gives,
This gave two possibilities. The first, is not possible since if this
was the case equation (1) would reduce to
Since we are talking about the dimensions of a box neither of these are possible so we can
discount This leaves the second possibility.
Since we know that (again since we are talking about the dimensions of a box)
we can cancel the z from both sides. This gives, (8)
Next, let’s set equations (6) and (7) equal. Doing this gives,
We can also say that since we are dealing with the dimensions of a box so we
must have,
(9)
225
However, we know that y must be positive since we are talking about the dimensions of a
box. Therefore the only solution that makes physical sense here is
We should be a little careful here. Since we’ve only got one solution we might be tempted to
assume that these are the dimensions that will give the largest volume. The method of
Lagrange Multipliers will give a set of points that will either maximize or minimize a given
function subject to the constraint, provided there actually are minimums or maximums.
The function will not have a maximum if all the variables are allowed to increase without
bound. That however, can’t happen because of the constraint,
Here we’ve got the sum of three positive numbers (because x, y, and z are positive) and the
sum must equal 32. So, if one of the variables gets very large, say x, then because each of the
products must be less than 32 both y and z must be very small to make sure the first two terms
are less than 32. So, there is no way for all the variables to increase without bound and so it
should make some sense that the function, , will have a maximum.
This isn’t a rigorous proof that the function will have a maximum, but it should help to
visualize that in fact it should have a maximum and so we can say that we will get a
maximum volume if the dimensions are : .
Notice that we never actually found values for in the above example. This is fairly
standard for these kinds of problems. The value of isn’t really important to determining
if the point is a maximum or a minimum so often we will not bother with finding a value for
it. On occasion we will need its value to help solv
226
Example 2
Solution
This one is going to be a little easier than the previous one since it only has two variables.
Also, note that it’s clear from the constraint that region of possible solutions lies on a disk of
radius which is a closed and bounded region and hence by the Extreme Value
Theorem we know that a minimum and maximum value must exist.
Notice that, as with the last example, we can’t have since that would not satisfy the
first two equations. So, since we know that we can solve the first two equations
for x and y respectively. This gives,
Now, that we know we can find the points that will be potential maximums and/or
minimums.
If we get,
227
and if
To determine if we have maximums or minimums we just need to plug these into the
function. Also recall from the discussion at the start of this solution that we know these will
be the minimum and maximums because the Extreme Value Theorem tells us that minimums
and maximums will exist for this problem.
Example 3
• Set up equations for the volume and the cost of building the silo.
• Using the Lagrange multiplier method, find the cheapest way to buld the silo.
• Do these dimensions seem reasonable? Why?
Next, we will look at the cost of building a silo of volume 1000 cubic meters. The curved
surface on top of the silo costs $3 per square meter to build, while the walls cost $1 per
square meter.
Of course, if all situations were this simple, there would be no need for the Lagrange
multiplier method, since there are other methods for solving 2 variable functions that are
much nicer. However, with a greater number of variables, the Lagrange multiplier method is
much more fun.
For the next example, imagine you are working at the State Fair (since you're so desperate for
money that you can't even buy a bagel anymore). You find yourself at the snowcone booth,
and your boss, upon hearing that you are good at math, offers you a bonus if you can design
the most efficient snowcone. You assume the snowcone will be modelled by a half-ellipsoid
perched upon a cone.
Your boss only wants to use 84 square centimeters of paper per cone, and wants to have it
hold the maximum amount of snow. This can be represented in 3 variables: r (the radius of
the cone), h (the height of the cone), and s (the height of the half-ellipsoid). In order to keep
the snow from tumbling off the cone, s cannot be greater than 1.5*r. We have provided hints
for the equations if you need them.
CONCLUSION:
228
In this unit, you should be able to apply the lagranges multiplier on a pringle surface, apply
lagranges multiplier on Economics, apply lagranges multiplier on control theory and solve
problems with the application of lagrange multiplier
SUMMARY
The Lagrange multipliers method is a very sufficient tool for the nonlinear optimization
problems,which is capable of dealing with both equality constrained and inequality
constrained nonlinear optimization problems.Many computational programming methods,
such as the barrier and interior point method, penalizing and augmented Lagrange
method,The lagrange multipliers method and its extended methods are widely applied in
science, engineering, economics and our everyday life.
TUTOR-MARKED ASSIGNMENT
1. Find the dimensions of the box with largest volume if the total surface area is 64 cm2.
2.Consider two curves on the xy-plane: y = e
x
and y = - (x −2) . Find two points (x,y),
2
(X,Y) on each of the two curves, respectively, whose distance apart is as small as possible.
Use the method of Lagrange multipliers. Make a graph that illustrates your solution
REFERENCE
229
MODULE 8 THE JACOBIANS
UNIT 1: JACOBIANS
UNIT 2: JACOBIAN DETERMINANTS
UNIT 3: APPLICATIONS OF JACOBIAN
UNIT 1 JACOBIAN
CONTENTS
1.0 INTRODUCTION
4 OBJECTIVES
3.0 MAIN CONTENT
230
3.1 Recognise the Jacobian rule
3.2 How to use the Jacobian
5 CONCLUSION
6 SUMMARY
7 MARKED ASSIGNMENT
TUTOR-MARKED
8 REFERENCES/FURTHER READINGS
1.0 INTRODUCTION
Jacobian
The Jacobian of functions ƒi(xx1,x2, …,xn),i= 1, 2, …,n, of real variables xi is the determinant
of the matrix whose ith
th row lists all the first-order
first order partial derivatives of the function ƒi(x1,x2,
…,xn). Also known as Jacobian determinant.
(or functional determinant), a determinant with elements aik = ∂yi/∂xk where yi = fi(x1, . .
., xn), 1 ≤ i ≤ n,, are functions that have continuous partial derivatives in some region ∆. It is
denoted by
The Jacobian was introduced by K. Jacobi in 1833 and 1841. If, for example, n = 2, then the
system of functions
defines a mapping of a region ∆, ∆ which lies in the plane x1x2, onto a region of the plane y1y2.
The role of the Jacobian for the mapping is largely analogous to that of the derivative for a
function of a single variable. Thus, the absolute value of the Jacobian
Jacobian at some point M is
equal to the local factor by which areas at the point are altered by the mapping; that is, it is
equal to the limit of the ratio of the area of the image of the neighborhood of M to the area of
the neighborhood as the dimensions of the neighborhood approach zero. The Jacobian at M is
positive if mapping (1) does not change the orientation in the neighborhood of M, and
negative otherwise.
OBJECTIVE
At the end of this unit, you should be able to :
recognise the Jacobian rule
how to use rhe Jacobian
MAIN CONTENT
231
If the Jacobian does not vanish in the region ∆ and if φ(y1, y2) is a function defined in the
region ∆1 (the image of ∆
∆),
), then
(the formula for change of variables in a double integral). An analogous formula obtains
obtain for
multiple integrals. If the Jacobian of mapping (1) does not vanish in region ∆, then there
exists the inverse mapping
and
(an analogue of the formula for differentiation of an inverse function). This assertion finds
numerous applications in the theory of implicit functions.
to be possible, it is sufficient that the coordinates of M satisfy equations (2), that the functions
Fk have continuous partial derivatives, and that the Jacobian
Examples
1.Let F: R →
2 2
R be the mapping defined by
2 + 2 f ( x, y )
F(x,y) = x y
xy g ( x, y )
e
232
The Jacobian matrix at an arbitrary point (x,y) is
df df
dx dy = 2 xxy 2y
xy
dy dg ye xe
dx
dy
2 2
Hence when x=1 ,y=1 ,we find J f
( 1, 1) =
e e
→
2 3
2.Let F : R R be the mapping defined by
xy
F(x,y) = sin x
2 y
x
Π
Find J F
(P) at the point P = ( Π, ).
2
y x
JF ( x , y ) = cos x 0
2 xy 2
x
Π
Π
Π 2
Hence, J F (Π , ) = − 1 0
2
Π Π
2 2
CONCLUSION
In this unit, you have been able to recognise the Jacobian rule and how to use the formular.
233
SUMMARY
In this unit, you have studied the basic concept of Jacobian with the identification of the
formular below as :
be nonzero at M.
2
a. F(x,y) = (x+y, x y )
b. F(x,y) = (sinx,cosxy)
2
c. F(x,y,z) = (xyz, x z)
REFERENCE
D.K. Arrowsmith and C.M. Place, Dynamical Systems,, Section 3.3, Chapman & Hall,
London, 1992. ISBN 0-412-39080
39080-9.
234
UNIT 2 : JACOBIAN DETERMINANT
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED
MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
INTRODUCTION
The Jacobian of functions ƒi(x1, x2, …, xn), i = 1, 2, …, n,, of real variables xi is the
determinant of the matrix whose ith row lists all the first-order
order partial derivatives of the
function ƒi(x1, x2, …, xn). Also known as Jacobian determinant.
In vector calculus, the Jacobian matrix : is the matrix of all first-order partial derivatives of
a vector- or scalar-valued function with respect to another vector. Suppose F : Rn → Rm is a
function from Euclidean n-space
space to Euclidean m-space.
space. Such a function is given by m real-
valued component functions, y1(x1,...,xn), ..., ym(x1,...,xn). The partial derivatives of all these
functions (if they exist) can be organized in an m-by-n matrix, the Jacobian matrix J of F, as
follows:
usual orthogonal Cartesian coordinates, the i th row (i = 1, ..., n)) of this matrix corresponds to
the gradient of the ith component
onent function yi: . Note that some books define the
235
Jacobian as the transpose of the matrix given above.
The Jacobian determinant (often simply called the Jacobian)) is the determinant of the
Jacobian matrix (if m = n).
These concepts are named after the mathematician Carl Gustav Jacob Jacobi.
Jacobi
OBJECTIVE
MAIN CONTENT
Jacobian matrix
The Jacobian of a function describes the orientation of a tangent plane to t the function at a
given point. In this way, the Jacobian generalizes the gradient of a scalar valued function of
multiple variables which itself generalizes the derivative of a scalar-valu
scalar valued function of a
scalar. Likewise, the Jacobian can also be thought of as describing the amount of "stretching"
that a transformation imposes. For example, if (x
( 2,y2) = f(x1,y1) is used to transform an image,
the Jacobian of f, J(x1,y1) describes how much the image in the neighborhood of (x ( 1,y1) is
stretched in the x and y directions.
The importance of the Jacobian lies in the fact that it represents the best linear approximation
to a differentiable function near a given point. In this sense, the Jacobian is the derivative of a
multivariate function.
If p is a point in Rn and F is differentiable at p,, then its derivative is given by JF(p). In this
case, the linear map described by JF(p) is the best linear approximation of F near the point p,
in the sense that
for x close to p and where o is the little o-notation (for ) and is the distance
between x and p.
236
In a sense, both the gradient and Jacobian are "first derivatives" — the former the first
derivative of a scalar function of several
everal variables, the latter the first derivative of a vector
function of several variables. In general, the gradient can be regarded as a special version of
the Jacobian: it is the Jacobian of a scalar function of several variables.
The Jacobian of the gradient has a special name: the Hessian matrix, which in a sense is the
"second derivative"" of the scalar function of several variables in question.
Inverse
It follows that the (scalar) inverse of the Jacobian determinant of a transformation is the
Jacobian determinant of the inverse transformation.
Examples
The determinant is r2 sin θ.. As an example, since dV = dx1 dx2 dx3 this determinant implies
that the differential volume element dV = r2 sin θ dr dθ dϕ.. Nevertheless this determinant
varies with coordinates.
rdinates. To avoid any variation the new coordinates can be defined as
[2]
Now the determinant equals to 1 and volume
element becomes .
237
is
This example shows that the Jacobian need not be a square matrix.
Example 3.
The Jacobian determinant is equal to r.. This shows how an integral in the Cartesian
coordinate system is transformed into an integral in the polar coordinate system:
system
is
238
From this we see that F reverses orientation near those points where x1 and x2 have the same
sign; the function is locally invertible everywhere except near points where x1 = 0 or x2 = 0.
Intuitively, if you start with a tiny object around the point (1,1,1) and apply F to that object,
you will get an object set with approximately 40 times the volume of the original one.
CONCLUSION
In this unit, you have studied the application of the Jacobian concept. You have known the
Jacobian matrix and the application of the inverse transformation of Jacobian determinants.
You have solved problems on Jacobian determinant.
SUMMARY
In this unit ;
is
This example shows that the Jacobian need not be a square matrix.
Tutor-Marked Assignment
1.In each of the following cases, compute the Jacobian matrix of F, and evaluate at the
following points;
239
F(x,y) = (sinx,cosxy) at points (1,2)
r 1
= r tan θ cos θ
r 2
= r sin θ tan θ
r= r sin θ 1
y = x 2
1
y = 4 x1
2
= 5 x 2 − 4 x3
2
y 3
y = x sin x
1 3
4
= 4 x1 − 3 sin
2
y 1 xx
2 3
y = 3 x2
2
y = x 3x2 3
3
→ R with components
3 3
The Jacobian matrix of the function F: R
x=rtan φ
y=rcos φ
REFERENCES
240
Kudriavtsev, L. D. Matematicheskii analiz, 2nd ed., vol. 2. Moscow, 1973.
Il’in, V. A., and E. G. Pozniak. Osnovy matematicheskogo analiza, 3rd ed.. part I. Moscow,
1971.
The Great Soviet Encyclopedia, 3rd Edition (1970-1979). © 2010 The Gale Group, Inc. All
rights reserved.
D.K. Arrowsmith and C.M. Place, Dynamical Systems, Section 3.3, Chapman & Hall,
London, 1992.
CONTENTS
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENT
241
3.1 apply the jacobian concept
3.2 know the Jacobian matrix
3.3 apply the inverse transformation
3.4 solve problems on Jacobian determinant
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR-MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READINGS
INTRODUCTION
If m = n, then F is a function from m-space to n-space and the Jacobian matrix is a square
matrix. We can then form its determinant, known as the Jacobian determinant. The
Jacobian determinant is sometimes simply called "the Jacobian."
OBJECTIVE
MAIN CONTENT
Dynamical systems
Consider a dynamical system of the form x' = F(x), where x' is the (component-wise) time
derivative of x, and F : Rn → Rn is continuous and differentiable. If F(x0) = 0, then x0 is a
stationary point (also called a fixed point). The behavior of the system near a stationary point
is related to the eigenvalues of JF(x0), the Jacobian of F at the stationary point.Specifically, if
the eigenvalues all have a negative real part, then the system is stable in the operating point,
if any eigenvalue has a positive real part, then the point is unstable.
The Jacobian determinant at a given point gives important information about the behavior of
F near that point. For instance, the continuously differentiable function F is invertible near a
point p ∈ Rn if the Jacobian determinant at p is non-zero. This is the inverse function
theorem. Furthermore
Uses
The Jacobian determinant is used when making a change of variables when evaluating a
multiple integral of a function over a region within its domain. To accommodate for the
change of coordinates the magnitude of the Jacobian determinant arises as a multiplicative
factor within the integral. Normally it is required that the change of coordinates be done in a
manner which maintains an injectivity between the coordinates that determine the domain.
The Jacobian determinant, as a result, is usually well defined.
242
Newton's method
CONCLUSION
In this unit, you have known the application of Jacobian concept. You have studied the
application of Jacobian matrix. You have used Jacobian in in the application of inverse
transformation and have also solved problems on Jacobian determinant.
SUMMARY
Application
ication of the Jacobian concept
a.F(x,y,z) = (xy,y,xz)
xy
b.F(x,y) = ( e ,x)
2
c.F(x,y) = (xy, x )
2.The transformation from spherical coordinates (r, θ, φ) to Cartesian coordinates (x1, x2, x3) ,
is given by the function F : R+ × [0,π] × [0,2π) → R3 with components:
243
4.The Jacobian determinant of the function F : R3 → R4 with components
REFFERENCE
Simon, C. P. and Blume, L. E. Mathematics for Economists. New York: W. W. Norton, 1994.
D.K. Arrowsmith and C.M. Place, Dynamical Systems,, Section 3.3, Chapman & Hall,
London, 1992. ISBN 0-412-39080
39080-9
244